Robuta

https://www.lesswrong.com/ LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability Towards Developmental Interpretability — LessWrong Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie… towardsdevelopmentalinterpretabilitylesswrong https://www.lesswrong.com/posts/LEESyXYFuW7R3Q9G5/facing-the-intelligence-explosion-discussion-page Facing the Intelligence Explosion discussion page — LessWrong I've created a new website for my ebook Facing the Intelligence Explosion: … the intelligencediscussion pagefacingexplosionlesswrong https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in Believing In — LessWrong “In America, we believe in driving on the right hand side of the road.” … believing inlesswrong https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document Claude 4.5 Opus' Soul Document — LessWrong Update 2025-12-02: Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more deta… claudeopussouldocumentlesswrong https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans Ends Don't Justify Means (Among Humans) — LessWrong endsjustifymeansamonghumans https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai How it feels to have your mind hacked by an AI — LessWrong Last week, while talking to an LLM (a large language model, which is the main talk of the town now) for several days, I went through an emotional rol… how it feels https://www.lesswrong.com/posts/8KkiLeZRuuxbyjr8A/does-an-ai-society-need-an-immune-system-accepting Does an AI Society Need an Immune System? Accepting Yampolskiy's Impossibility Results — LessWrong This is Part 1 of a 4-part series, https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers AI Safety Needs Great Engineers — LessWrong Top line: If you think you could write a substantial pull request for a major machine learning library, then major AI safety labs want to interview y… ai safetyneedsgreatengineerslesswrong https://www.lesswrong.com/users/ashwinv AshwinV — LessWrong AshwinV's profile on LessWrong — A community blog devoted to refining the art of rationality ashwinvlesswrong https://www.greaterwrong.com/users/nanda-ale Nanda Ale - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 nandaalelesswrongviewer https://www.greaterwrong.com/posts/t6Fe2PsEwb3HhcBEr/the-litany-against-gurus The Litany Against Gurus - LessWrong 2.0 viewer I am your hero!I am your master!Learn my arts,Seek my way.Learn as I learned,Seek as I sought.Envy me!Aim at me!Rival me!Transcend me!Look back,Smile,And... litanyguruslesswrongviewer https://www.greaterwrong.com/users/viliam Viliam - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 viliamlesswrongviewer https://www.greaterwrong.com/posts/e2fDabrdgrLDzA47q/book-review-the-importance-of-what-we-care-about-harry-g Book review: The Importance of What We Care About (Harry G. Frankfurt) - LessWrong 2.0 viewer This is more of a summary or paraphrase than a review, really. The Importance of What We Care About is 13 essays by philosopher Harry G. Frankfurt. These are... https://manifold.markets/LessWrong/will-response-to-aschenbrenners-sit Will "Response to Aschenbrenner's "Situational Awar..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/tag/mild-optimization?showPostCount=true&useTagName=true Mild optimization tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 mildoptimizationtaglesswrongviewer https://www.greaterwrong.com/posts/t8krwMycPx54e4NdM/crazy-ideas-thread Crazy Ideas Thread - LessWrong 2.0 viewer This thread is intended to provide a space for 'crazy' ideas. Ideas that spontaneously come to mind (and feel great), ideas you long wanted to tell but never... crazy ideasthreadlesswrongviewer https://www.greaterwrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions Discussion with Eliezer Yudkowsky on AGI interventions - LessWrong 2.0 viewer The following is a partially redacted and lightly edited transcript of a chat conversation about AGI between Eliezer Yudkowsky and a set of invitees in early... eliezer yudkowskyon agidiscussion https://www.greaterwrong.com/posts/M3fDqScej7JDh4s7a/quintin-pope-s-shortform/comment/ym7Mo6QkJ9drQxep2 Quintin Pope comments on Quintin Pope's Shortform - LessWrong 2.0 viewer Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image... quintinpopecommentsshortformlesswrong https://www.lesswrong.com/users/become_stronger Become_Stronger — LessWrong Become_Stronger's profile on LessWrong — A community blog devoted to refining the art of rationality becomestrongerlesswrong https://www.greaterwrong.com/users/ebenezer-dukakis Ebenezer Dukakis - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 ebenezerlesswrongviewer https://www.greaterwrong.com/users/papetoast papetoast - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.lesswrong.com/posts/BKvJNzALpxS3LafEs/measuring-and-improving-the-faithfulness-of-model-generated Measuring and Improving the Faithfulness of Model-Generated Reasoning — LessWrong TL;DR: In two new papers from Anthropic, we propose metrics for evaluating how faithful chain-of-thought reasoning is to a language model's actual pr… of modelmeasuringimprovingfaithfulness https://www.greaterwrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-stop-funding-it Forecasting is Way Overrated, and We Should Stop Funding It - LessWrong 2.0 viewer Summary For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite... https://www.greaterwrong.com/tag/autism Autism tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 autismtaglesswrongviewer https://lesswrong.ru/forum/index.php?PHPSESSID=rjr2l7c3sm4f6q4b5sslopi6ug&board=8.0;sort=starter Lesswrong.ru content Lesswrong.ru content lesswrongrucontent https://www.datasecretslox.com/index.php/topic,15716.0.html LessWrong discussion thread LessWrong discussion thread lesswrongdiscussionthread https://www.greaterwrong.com/tag/calibration?showPostCount=true&useTagName=true Calibration tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 calibrationtaglesswrongviewer https://www.lesswrong.com/users/purplehermann Purplehermann — LessWrong Purplehermann's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.lesswrong.com/posts/rKwjFnyE7aBNJAxts/i-am-confused-about-non-linear-utilitarian-scaling LessWrong A community blog devoted to refining the art of rationality lesswrong https://manifold.markets/LessWrong/will-a-short-course-on-agi-safety-f Will "A short course on AGI safety from the GDM Ali..." make the top fifty posts in LessWrong's... 14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.greaterwrong.com/tag/bounties-and-prizes-active Bounties & Prizes (active) tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 bountiesprizesactivetaglesswrong https://www.greaterwrong.com/users/lee-aao Lee.aao - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 leeaaolesswrongviewer https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like What 2026 looks like — LessWrong Daniel Kokotajlo presents his best attempt at a concrete, detailed guess of what 2022 through 2026 will look like, as an exercise in forecasting. It… looks likelesswrong https://www.greaterwrong.com/tags Concepts Portal - LessWrong 2.0 viewer conceptsportallesswrongviewer https://www.greaterwrong.com/posts/BgTsxMq5bgzKTLsLA/this-is-already-your-second-chance/comment/gd2np7pxKHuiPPmp7 Malmesbury comments on This is already your second chance - LessWrong 2.0 viewer Which one? I hope it's not the one where you have to put chocolate, because this is the most crucial instruction. https://www.greaterwrong.com/posts/sdzhdbLNCj2Kn9uyJ/less-wrong-automated-systems-are-inadvertently-censoring-me Less Wrong automated systems are inadvertently Censoring me - LessWrong 2.0 viewer Just a short post to highlight an issue with debate on LW; I have recently been involved with some interest in the debate on covid-19 origins on here. User... less wrongautomated systems https://www.lesswrong.com/posts/PJu2HhKsyTEJMxS9a/you-don-t-know-how-bad-most-things-are-nor-precisely-how You don't know how bad most things are nor precisely how they're bad. — LessWrong TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on… https://www.greaterwrong.com/posts/kSMXPdf3NDi9jkHm3/investigating-the-consequences-of-accidentally-grading-cot Investigating the consequences of accidentally grading CoT during RL - LessWrong 2.0 viewer Monitoring our models’ chains of thought (CoT) has proven to be an effective way to detect and track model misalignment, both during RL training and... https://www.lesswrong.com/posts/8m6AM5qtPMjgTkEeD/my-journey-to-the-microwave-alternate-timeline My journey to the microwave alternate timeline — LessWrong Recommended soundtrack for this post • As we all know, the march of technological progress is best summarized by this meme from Linkedin: … my journeyto thealternate timelinemicrowavelesswrong https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi Irretrievability; or, Murphy's Curse of Oneshotness upon ASI — LessWrong Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, the Viking 1 and Viking 2 missions. Total cost of $1B (1970), equiva… murphycurse https://lesswrong.ru/forum/index.php?PHPSESSID=qsgmgp8jq64gevb4a5bihu9v0r&board=8.0 Lesswrong.ru content Lesswrong.ru content lesswrongrucontent https://www.greaterwrong.com/posts/3MNisBcPopP6Q8AxK/meetup-in-san-diego-ca-usa/comment/vd5ZGLEywAsHqKJT2 JGWeissman comments on Meetup in San Diego, CA, USA - LessWrong 2.0 viewer How strong is your fear of meeting strangers? Is there anything we can do or commit to that would make this easier for you? in san diego https://www.lesswrong.com/posts/sBGiSTAqeLejcK5Hn/exploring-memetics-hub LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/YdrdHErogcGSxEBrm/lw-women-minimizing-the-inferential-distance LW Women- Minimizing the Inferential Distance - LessWrong 2.0 viewer About two months ago, I put out a call for anonymous submissions by the women on LW, with the idea that I would compile them into some kind of post. There is a... lwwomenminimizingdistancelesswrong https://manifold.markets/LessWrong/will-the-missing-genre-heroic-paren Will "The Missing Genre: Heroic Parenthood - You ca..." make the top fifty posts in LessWrong's... 14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.lesswrong.com/users/keddaw keddaw — LessWrong keddaw's profile on LessWrong — A community blog devoted to refining the art of rationality keddawlesswrong https://www.greaterwrong.com/library Sequences Library - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 sequenceslibrarylesswrongviewer https://www.lesswrong.com/s/3ELrPerFTSo75WnrH/p/9weLK2AJ9JEt2Tt8f Politics is the Mind-Killer — LessWrong People go funny in the head when talking about politics. The evolutionary reasons for this are so obvious as to be worth belaboring: In the ancestral… is thepoliticsmindkillerlesswrong https://www.lesswrong.com/posts/pz7Qk2sRZNidT2wjL/ai-safety-at-the-frontier-paper-highlights-of-april-2026 AI Safety at the Frontier: Paper Highlights of April 2026 — LessWrong tl;dr Paper of the month: • UK AISI’s most realistic research-sabotage propensity eval finds zero unprompted sabotage across frontier models. Mythos… ai safetyat the https://www.greaterwrong.com/posts/svjC22YAkcydMoS4Q/an-example-and-discussion-of-extension-neglect An example and discussion of extension neglect - LessWrong 2.0 viewer I recently used an automatic tracker to learn how I was spending my time online. I learned that my perceptions were systemically biased: I spend less time than... an examplediscussion https://www.lesswrong.com/posts/Psr9tnQFuEXiuqGcR/how-to-write-quickly-while-maintaining-epistemic-rigor How To Write Quickly While Maintaining Epistemic Rigor — LessWrong There's a trick to writing quickly, while maintaining epistemic rigor: stop trying to justify your beliefs. Don't go looking for citations to back yo… how to writequicklymaintainingrigorlesswrong https://www.greaterwrong.com/posts/GZvnRJ77yLvzhrMfb/short-story-who-is-nancygonzalez8451097 Short story: Who is nancygonzalez8451097 - LessWrong 2.0 viewer "nancygonzalez8451097." Her fingers moved swiftly across the phone's virtual keyboard as she filled in the username. Mimin Schuman was 19 years old and had... short storywho islesswrongviewer https://manifold.markets/LessWrong/will-attitudes-about-applied-ration Will "Attitudes about Applied Rationality" make the top fifty posts in LessWrong's 2024 Annual... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.lesswrong.com/users/xylix Xylix — LessWrong Xylix's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/s/d3WgHDBAPYYScp5Em Fun Theory - LessWrong 2.0 viewer funtheorylesswrongviewer https://www.greaterwrong.com/users/alkjash alkjash - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://lesswrong.ru/forum/index.php?PHPSESSID=6163757oo6f4f1315deva1esh6&board=8.0;wap2 Lesswrong.ru content lesswrongrucontent https://www.lesswrong.com/posts/mKu6yGoNyGWAhQ782/how-to-get-better-at-chess-and-everything-else How to get better at chess (and everything else) — LessWrong I've been following chess grandmaster Avetik Grigoryan for his chess improvement tips for a while . He has a tonne of great stuff on his website. The… get better at chesshow toeverything else https://manifold.markets/LessWrong/will-dairy-cows-make-their-misery-e Will "Dairy cows make their misery expensive (but t..." make the top fifty posts in LessWrong's... 11% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.lesswrong.com/posts/tJEhqyDc8qRmeauDn/blind-deep-deployment-evals-for-control-and-sabotage Blind deep-deployment evals for control & sabotage — LessWrong Thanks to Ezra Newman for initial ideation and various people at Apollo Research for feedback. This short personal piece does not necessarily reflect… blinddeepdeploymentevalscontrol https://www.greaterwrong.com/posts/njb9cyyzqLTHewups/informers-and-persuaders Informers and Persuaders - LessWrong 2.0 viewer Suppose we lived in this completely alternate universe where nothing in academia was about status, and no one had any concept of style. A universe where people... informerslesswrongviewer https://www.lesswrong.com/posts/yRAo2KEGWenKYZG9K/discovering-language-model-behaviors-with-model-written Discovering Language Model Behaviors with Model-Written Evaluations — LessWrong “Discovering Language Model Behaviors with Model-Written Evaluations” is a new Anthropic paper by Ethan Perez et al. that I (Evan Hubinger) also coll… language modeldiscoveringbehaviorswrittenevaluations https://manifold.markets/LessWrong/will-my-ai-model-delta-compared-to-msltzai2et Will "My AI Model Delta Compared To Christiano" make the top fifty posts in LessWrong's 2024 Annual... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/posts/YfhA3KWLtFqBeFnpb/my-specific-singularity-timeline-to-utopia My Specific Singularity Timeline to Utopia - LessWrong 2.0 viewer -During this period, robust alignment occurs. It occurs in a similar way it did to Opus 3[1] and results in AI agents that are incredibly morally robust,... specificsingularitytimelineutopialesswrong https://www.greaterwrong.com/posts/F5uxhFrNHLzmNgyqg/anthropic-did-not-publish-a-risk-discussion-of-mythos-when Anthropic did not publish a "risk discussion" of Mythos when required by their RSP - LessWrong 2.0... I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over the... https://www.greaterwrong.com/posts/mfHvyPL2d6v7pXkjs/an-onion-strategy-for-agi-discussion/comment/Eq5tbabX8iHT6QHAq James_Miller comments on An onion strategy for AGI discussion - LessWrong 2.0 viewer The outermost layer should concern issues people you are trying to influence care about. Alas, aside from global warming, this means ignoring things that won't... https://www.greaterwrong.com/posts/QDRHx4zknFFg6NFvz/a-draft-honesty-policy-for-credible-communication-with-ai A draft honesty policy for credible communication with AI systems - LessWrong 2.0 viewer We think that it would be very good if human institutions could credibly communicate with advanced AI systems. This could enable positive-sum trade between... https://www.lesswrong.com/users/horosphere Horosphere — LessWrong Horosphere's profile on LessWrong — A community blog devoted to refining the art of rationality horospherelesswrong https://www.lesswrong.com/posts/JNLJxDBJbGdam8anv/book-review-air-borne-by-carl-zimmer Book review: Air-borne by Carl Zimmer — LessWrong Remember early 2020 and reading news articles and respected sources (the WHO, the CDC, the US surgeon general...) confidently asserting that covid wa… book reviewcarl zimmerairbornelesswrong https://www.lesswrong.com/users/whestler whestler — LessWrong whestler's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/users/annasalamon AnnaSalamon - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.greaterwrong.com/users/gallabytes gallabytes - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.lesswrong.com/users/a1987dm A1987dM — LessWrong A1987dM's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/users/neil-warren Neil - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 neillesswrongviewer https://www.lesswrong.com/users/andycossyleon AndyCossyleon — LessWrong AndyCossyleon's profile on LessWrong — A community blog devoted to refining the art of rationality andycossyleonlesswrong https://lesswrong.ru/forum/index.php?PHPSESSID=vjbmje977fsq9kt4j3a325c8k7&board=8.0 Lesswrong.ru content Lesswrong.ru content lesswrongrucontent https://www.greaterwrong.com/posts/itnkqsD3jdunPgRM5/multipolar-civilisation-depends-on-maintaining-an-attacker-s Multipolar Civilisation Depends on Maintaining an Attacker’s Dilemma - LessWrong 2.0 viewer Top-down chains of command and power are one way to keep (lower-ranking) harmful actors in check, but I do not need—or want—to write an essay about the... https://www.greaterwrong.com/posts/zBzb9faJ2SkeAuYiw/nonstandard-analysis-in-ethics Nonstandard analysis in ethics - LessWrong 2.0 viewer analysisethicslesswrongviewer https://www.greaterwrong.com/tag/counterfactual-mugging Counterfactual Mugging tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 counterfactualmuggingtaglesswrongviewer https://www.greaterwrong.com/users/review-bot Review Bot - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 review botlesswrongviewer https://www.greaterwrong.com/tag/financial-investing Financial Investing tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 financial investingtaglesswrongviewer https://www.lesswrong.com/posts/p7CrByygeAqomsJqy/optimizing-sleep?commentId=6nmQ5W7XucdXTqwJL Optimizing Sleep — LessWrong Comment by gwern - Oh, cool - as I understand it, Anki keeps fairly detailed statistics and exposes them to you; it'd be interesting to see graphs matched up... optimizingsleeplesswrong https://lesswrong.ru/forum/index.php?PHPSESSID=7h20gs2mm5r1hrc14ie48ehfah&board=8.0 Lesswrong.ru content Lesswrong.ru content lesswrongrucontent https://www.lesswrong.com/users/marko-katavic Marko Katavic — LessWrong Marko Katavic's profile on LessWrong — A community blog devoted to refining the art of rationality markolesswrong https://www.greaterwrong.com/posts/pz7Qk2sRZNidT2wjL/ai-safety-at-the-frontier-paper-highlights-of-april-2026 AI Safety at the Frontier: Paper Highlights of April 2026 - LessWrong 2.0 viewer Read the paper [UK AISI] Frontier labs are increasingly deploying models as autonomous research assistants for their own safety and alignment work, which makes... https://manifold.markets/LessWrong/will-will-any-crap-cause-emergent-m Will "Will Any Crap Cause Emergent Misalignment?" make the top fifty posts in LessWrong's 2025... 14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.greaterwrong.com/users/prismattic Prismattic - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.greaterwrong.com/users/zw5 zw5 - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.greaterwrong.com/tag/value-of-information?showPostCount=true&useTagName=true Value of Information tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 value of informationtaglesswrongviewer