https://www.lesswrong.com/
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability
Towards Developmental Interpretability — LessWrong
Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie…
towardsdevelopmentalinterpretabilitylesswrong
https://www.lesswrong.com/posts/LEESyXYFuW7R3Q9G5/facing-the-intelligence-explosion-discussion-page
Facing the Intelligence Explosion discussion page — LessWrong
I've created a new website for my ebook Facing the Intelligence Explosion: …
the intelligencediscussion pagefacingexplosionlesswrong
https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in
Believing In — LessWrong
“In America, we believe in driving on the right hand side of the road.” …
believing inlesswrong
https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document
Claude 4.5 Opus' Soul Document — LessWrong
Update 2025-12-02: Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more deta…
claudeopussouldocumentlesswrong
https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans
Ends Don't Justify Means (Among Humans) — LessWrong
endsjustifymeansamonghumans
https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai
How it feels to have your mind hacked by an AI — LessWrong
Last week, while talking to an LLM (a large language model, which is the main talk of the town now) for several days, I went through an emotional rol…
how it feels
https://www.lesswrong.com/posts/8KkiLeZRuuxbyjr8A/does-an-ai-society-need-an-immune-system-accepting
Does an AI Society Need an Immune System? Accepting Yampolskiy's Impossibility Results — LessWrong
This is Part 1 of a 4-part series,
https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers
AI Safety Needs Great Engineers — LessWrong
Top line: If you think you could write a substantial pull request for a major machine learning library, then major AI safety labs want to interview y…
ai safetyneedsgreatengineerslesswrong
https://www.lesswrong.com/users/ashwinv
AshwinV — LessWrong
AshwinV's profile on LessWrong — A community blog devoted to refining the art of rationality
ashwinvlesswrong
https://www.greaterwrong.com/users/nanda-ale
Nanda Ale - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
nandaalelesswrongviewer
https://www.greaterwrong.com/posts/t6Fe2PsEwb3HhcBEr/the-litany-against-gurus
The Litany Against Gurus - LessWrong 2.0 viewer
I am your hero!I am your master!Learn my arts,Seek my way.Learn as I learned,Seek as I sought.Envy me!Aim at me!Rival me!Transcend me!Look back,Smile,And...
litanyguruslesswrongviewer
https://www.greaterwrong.com/users/viliam
Viliam - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
viliamlesswrongviewer
https://www.greaterwrong.com/posts/e2fDabrdgrLDzA47q/book-review-the-importance-of-what-we-care-about-harry-g
Book review: The Importance of What We Care About (Harry G. Frankfurt) - LessWrong 2.0 viewer
This is more of a summary or paraphrase than a review, really. The Importance of What We Care About is 13 essays by philosopher Harry G. Frankfurt. These are...
https://manifold.markets/LessWrong/will-response-to-aschenbrenners-sit
Will "Response to Aschenbrenner's "Situational Awar..." make the top fifty posts in LessWrong's...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.greaterwrong.com/tag/mild-optimization?showPostCount=true&useTagName=true
Mild optimization tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
mildoptimizationtaglesswrongviewer
https://www.greaterwrong.com/posts/t8krwMycPx54e4NdM/crazy-ideas-thread
Crazy Ideas Thread - LessWrong 2.0 viewer
This thread is intended to provide a space for 'crazy' ideas. Ideas that spontaneously come to mind (and feel great), ideas you long wanted to tell but never...
crazy ideasthreadlesswrongviewer
https://www.greaterwrong.com/posts/CpvyhFy9WvCNsifkY/discussion-with-eliezer-yudkowsky-on-agi-interventions
Discussion with Eliezer Yudkowsky on AGI interventions - LessWrong 2.0 viewer
The following is a partially redacted and lightly edited transcript of a chat conversation about AGI between Eliezer Yudkowsky and a set of invitees in early...
eliezer yudkowskyon agidiscussion
https://www.greaterwrong.com/posts/M3fDqScej7JDh4s7a/quintin-pope-s-shortform/comment/ym7Mo6QkJ9drQxep2
Quintin Pope comments on Quintin Pope's Shortform - LessWrong 2.0 viewer
Idea for using current AI to accelerate medical research: suppose you were to take a VLM and train it to verbally explain the differences between two image...
quintinpopecommentsshortformlesswrong
https://www.lesswrong.com/users/become_stronger
Become_Stronger — LessWrong
Become_Stronger's profile on LessWrong — A community blog devoted to refining the art of rationality
becomestrongerlesswrong
https://www.greaterwrong.com/users/ebenezer-dukakis
Ebenezer Dukakis - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
ebenezerlesswrongviewer
https://www.greaterwrong.com/users/papetoast
papetoast - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.lesswrong.com/posts/BKvJNzALpxS3LafEs/measuring-and-improving-the-faithfulness-of-model-generated
Measuring and Improving the Faithfulness of Model-Generated Reasoning — LessWrong
TL;DR: In two new papers from Anthropic, we propose metrics for evaluating how faithful chain-of-thought reasoning is to a language model's actual pr…
of modelmeasuringimprovingfaithfulness
https://www.greaterwrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-stop-funding-it
Forecasting is Way Overrated, and We Should Stop Funding It - LessWrong 2.0 viewer
Summary For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite...
https://www.greaterwrong.com/tag/autism
Autism tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
autismtaglesswrongviewer
https://lesswrong.ru/forum/index.php?PHPSESSID=rjr2l7c3sm4f6q4b5sslopi6ug&board=8.0;sort=starter
Lesswrong.ru content
Lesswrong.ru content
lesswrongrucontent
https://www.datasecretslox.com/index.php/topic,15716.0.html
LessWrong discussion thread
LessWrong discussion thread
lesswrongdiscussionthread
https://www.greaterwrong.com/tag/calibration?showPostCount=true&useTagName=true
Calibration tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
calibrationtaglesswrongviewer
https://www.lesswrong.com/users/purplehermann
Purplehermann — LessWrong
Purplehermann's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.lesswrong.com/posts/rKwjFnyE7aBNJAxts/i-am-confused-about-non-linear-utilitarian-scaling
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://manifold.markets/LessWrong/will-a-short-course-on-agi-safety-f
Will "A short course on AGI safety from the GDM Ali..." make the top fifty posts in LessWrong's...
14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.greaterwrong.com/tag/bounties-and-prizes-active
Bounties & Prizes (active) tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
bountiesprizesactivetaglesswrong
https://www.greaterwrong.com/users/lee-aao
Lee.aao - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
leeaaolesswrongviewer
https://www.lesswrong.com/posts/6Xgy6CAf2jqHhynHL/what-2026-looks-like
What 2026 looks like — LessWrong
Daniel Kokotajlo presents his best attempt at a concrete, detailed guess of what 2022 through 2026 will look like, as an exercise in forecasting. It…
looks likelesswrong
https://www.greaterwrong.com/tags
Concepts Portal - LessWrong 2.0 viewer
conceptsportallesswrongviewer
https://www.greaterwrong.com/posts/BgTsxMq5bgzKTLsLA/this-is-already-your-second-chance/comment/gd2np7pxKHuiPPmp7
Malmesbury comments on This is already your second chance - LessWrong 2.0 viewer
Which one? I hope it's not the one where you have to put chocolate, because this is the most crucial instruction.
https://www.greaterwrong.com/posts/sdzhdbLNCj2Kn9uyJ/less-wrong-automated-systems-are-inadvertently-censoring-me
Less Wrong automated systems are inadvertently Censoring me - LessWrong 2.0 viewer
Just a short post to highlight an issue with debate on LW; I have recently been involved with some interest in the debate on covid-19 origins on here. User...
less wrongautomated systems
https://www.lesswrong.com/posts/PJu2HhKsyTEJMxS9a/you-don-t-know-how-bad-most-things-are-nor-precisely-how
You don't know how bad most things are nor precisely how they're bad. — LessWrong
TL;DR: Your discernment in a subject often improves as you dedicate time and attention to that subject. The space of possible subjects is huge, so on…
https://www.greaterwrong.com/posts/kSMXPdf3NDi9jkHm3/investigating-the-consequences-of-accidentally-grading-cot
Investigating the consequences of accidentally grading CoT during RL - LessWrong 2.0 viewer
Monitoring our models’ chains of thought (CoT) has proven to be an effective way to detect and track model misalignment, both during RL training and...
https://www.lesswrong.com/posts/8m6AM5qtPMjgTkEeD/my-journey-to-the-microwave-alternate-timeline
My journey to the microwave alternate timeline — LessWrong
Recommended soundtrack for this post • As we all know, the march of technological progress is best summarized by this meme from Linkedin: …
my journeyto thealternate timelinemicrowavelesswrong
https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi
Irretrievability; or, Murphy's Curse of Oneshotness upon ASI — LessWrong
Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, the Viking 1 and Viking 2 missions. Total cost of $1B (1970), equiva…
murphycurse
https://lesswrong.ru/forum/index.php?PHPSESSID=qsgmgp8jq64gevb4a5bihu9v0r&board=8.0
Lesswrong.ru content
Lesswrong.ru content
lesswrongrucontent
https://www.greaterwrong.com/posts/3MNisBcPopP6Q8AxK/meetup-in-san-diego-ca-usa/comment/vd5ZGLEywAsHqKJT2
JGWeissman comments on Meetup in San Diego, CA, USA - LessWrong 2.0 viewer
How strong is your fear of meeting strangers? Is there anything we can do or commit to that would make this easier for you?
in san diego
https://www.lesswrong.com/posts/sBGiSTAqeLejcK5Hn/exploring-memetics-hub
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/posts/YdrdHErogcGSxEBrm/lw-women-minimizing-the-inferential-distance
LW Women- Minimizing the Inferential Distance - LessWrong 2.0 viewer
About two months ago, I put out a call for anonymous submissions by the women on LW, with the idea that I would compile them into some kind of post. There is a...
lwwomenminimizingdistancelesswrong
https://manifold.markets/LessWrong/will-the-missing-genre-heroic-paren
Will "The Missing Genre: Heroic Parenthood - You ca..." make the top fifty posts in LessWrong's...
14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.lesswrong.com/users/keddaw
keddaw — LessWrong
keddaw's profile on LessWrong — A community blog devoted to refining the art of rationality
keddawlesswrong
https://www.greaterwrong.com/library
Sequences Library - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
sequenceslibrarylesswrongviewer
https://www.lesswrong.com/s/3ELrPerFTSo75WnrH/p/9weLK2AJ9JEt2Tt8f
Politics is the Mind-Killer — LessWrong
People go funny in the head when talking about politics. The evolutionary reasons for this are so obvious as to be worth belaboring: In the ancestral…
is thepoliticsmindkillerlesswrong
https://www.lesswrong.com/posts/pz7Qk2sRZNidT2wjL/ai-safety-at-the-frontier-paper-highlights-of-april-2026
AI Safety at the Frontier: Paper Highlights of April 2026 — LessWrong
tl;dr Paper of the month: • UK AISI’s most realistic research-sabotage propensity eval finds zero unprompted sabotage across frontier models. Mythos…
ai safetyat the
https://www.greaterwrong.com/posts/svjC22YAkcydMoS4Q/an-example-and-discussion-of-extension-neglect
An example and discussion of extension neglect - LessWrong 2.0 viewer
I recently used an automatic tracker to learn how I was spending my time online. I learned that my perceptions were systemically biased: I spend less time than...
an examplediscussion
https://www.lesswrong.com/posts/Psr9tnQFuEXiuqGcR/how-to-write-quickly-while-maintaining-epistemic-rigor
How To Write Quickly While Maintaining Epistemic Rigor — LessWrong
There's a trick to writing quickly, while maintaining epistemic rigor: stop trying to justify your beliefs. Don't go looking for citations to back yo…
how to writequicklymaintainingrigorlesswrong
https://www.greaterwrong.com/posts/GZvnRJ77yLvzhrMfb/short-story-who-is-nancygonzalez8451097
Short story: Who is nancygonzalez8451097 - LessWrong 2.0 viewer
"nancygonzalez8451097." Her fingers moved swiftly across the phone's virtual keyboard as she filled in the username. Mimin Schuman was 19 years old and had...
short storywho islesswrongviewer
https://manifold.markets/LessWrong/will-attitudes-about-applied-ration
Will "Attitudes about Applied Rationality" make the top fifty posts in LessWrong's 2024 Annual...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.lesswrong.com/users/xylix
Xylix — LessWrong
Xylix's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/s/d3WgHDBAPYYScp5Em
Fun Theory - LessWrong 2.0 viewer
funtheorylesswrongviewer
https://www.greaterwrong.com/users/alkjash
alkjash - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://lesswrong.ru/forum/index.php?PHPSESSID=6163757oo6f4f1315deva1esh6&board=8.0;wap2
Lesswrong.ru content
lesswrongrucontent
https://www.lesswrong.com/posts/mKu6yGoNyGWAhQ782/how-to-get-better-at-chess-and-everything-else
How to get better at chess (and everything else) — LessWrong
I've been following chess grandmaster Avetik Grigoryan for his chess improvement tips for a while . He has a tonne of great stuff on his website. The…
get better at chesshow toeverything else
https://manifold.markets/LessWrong/will-dairy-cows-make-their-misery-e
Will "Dairy cows make their misery expensive (but t..." make the top fifty posts in LessWrong's...
11% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.lesswrong.com/posts/tJEhqyDc8qRmeauDn/blind-deep-deployment-evals-for-control-and-sabotage
Blind deep-deployment evals for control & sabotage — LessWrong
Thanks to Ezra Newman for initial ideation and various people at Apollo Research for feedback. This short personal piece does not necessarily reflect…
blinddeepdeploymentevalscontrol
https://www.greaterwrong.com/posts/njb9cyyzqLTHewups/informers-and-persuaders
Informers and Persuaders - LessWrong 2.0 viewer
Suppose we lived in this completely alternate universe where nothing in academia was about status, and no one had any concept of style. A universe where people...
informerslesswrongviewer
https://www.lesswrong.com/posts/yRAo2KEGWenKYZG9K/discovering-language-model-behaviors-with-model-written
Discovering Language Model Behaviors with Model-Written Evaluations — LessWrong
“Discovering Language Model Behaviors with Model-Written Evaluations” is a new Anthropic paper by Ethan Perez et al. that I (Evan Hubinger) also coll…
language modeldiscoveringbehaviorswrittenevaluations
https://manifold.markets/LessWrong/will-my-ai-model-delta-compared-to-msltzai2et
Will "My AI Model Delta Compared To Christiano" make the top fifty posts in LessWrong's 2024 Annual...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.greaterwrong.com/posts/YfhA3KWLtFqBeFnpb/my-specific-singularity-timeline-to-utopia
My Specific Singularity Timeline to Utopia - LessWrong 2.0 viewer
-During this period, robust alignment occurs. It occurs in a similar way it did to Opus 3[1] and results in AI agents that are incredibly morally robust,...
specificsingularitytimelineutopialesswrong
https://www.greaterwrong.com/posts/F5uxhFrNHLzmNgyqg/anthropic-did-not-publish-a-risk-discussion-of-mythos-when
Anthropic did not publish a "risk discussion" of Mythos when required by their RSP - LessWrong 2.0...
I and some other people noticed a potential discrepancy in Anthropic's announcement of Claude Mythos. The version of the RSP that was operative over the...
https://www.greaterwrong.com/posts/mfHvyPL2d6v7pXkjs/an-onion-strategy-for-agi-discussion/comment/Eq5tbabX8iHT6QHAq
James_Miller comments on An onion strategy for AGI discussion - LessWrong 2.0 viewer
The outermost layer should concern issues people you are trying to influence care about. Alas, aside from global warming, this means ignoring things that won't...
https://www.greaterwrong.com/posts/QDRHx4zknFFg6NFvz/a-draft-honesty-policy-for-credible-communication-with-ai
A draft honesty policy for credible communication with AI systems - LessWrong 2.0 viewer
We think that it would be very good if human institutions could credibly communicate with advanced AI systems. This could enable positive-sum trade between...
https://www.lesswrong.com/users/horosphere
Horosphere — LessWrong
Horosphere's profile on LessWrong — A community blog devoted to refining the art of rationality
horospherelesswrong
https://www.lesswrong.com/posts/JNLJxDBJbGdam8anv/book-review-air-borne-by-carl-zimmer
Book review: Air-borne by Carl Zimmer — LessWrong
Remember early 2020 and reading news articles and respected sources (the WHO, the CDC, the US surgeon general...) confidently asserting that covid wa…
book reviewcarl zimmerairbornelesswrong
https://www.lesswrong.com/users/whestler
whestler — LessWrong
whestler's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/users/annasalamon
AnnaSalamon - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.greaterwrong.com/users/gallabytes
gallabytes - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.lesswrong.com/users/a1987dm
A1987dM — LessWrong
A1987dM's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/users/neil-warren
Neil - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
neillesswrongviewer
https://www.lesswrong.com/users/andycossyleon
AndyCossyleon — LessWrong
AndyCossyleon's profile on LessWrong — A community blog devoted to refining the art of rationality
andycossyleonlesswrong
https://lesswrong.ru/forum/index.php?PHPSESSID=vjbmje977fsq9kt4j3a325c8k7&board=8.0
Lesswrong.ru content
Lesswrong.ru content
lesswrongrucontent
https://www.greaterwrong.com/posts/itnkqsD3jdunPgRM5/multipolar-civilisation-depends-on-maintaining-an-attacker-s
Multipolar Civilisation Depends on Maintaining an Attacker’s Dilemma - LessWrong 2.0 viewer
Top-down chains of command and power are one way to keep (lower-ranking) harmful actors in check, but I do not need—or want—to write an essay about the...
https://www.greaterwrong.com/posts/zBzb9faJ2SkeAuYiw/nonstandard-analysis-in-ethics
Nonstandard analysis in ethics - LessWrong 2.0 viewer
analysisethicslesswrongviewer
https://www.greaterwrong.com/tag/counterfactual-mugging
Counterfactual Mugging tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
counterfactualmuggingtaglesswrongviewer
https://www.greaterwrong.com/users/review-bot
Review Bot - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
review botlesswrongviewer
https://www.greaterwrong.com/tag/financial-investing
Financial Investing tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
financial investingtaglesswrongviewer
https://www.lesswrong.com/posts/p7CrByygeAqomsJqy/optimizing-sleep?commentId=6nmQ5W7XucdXTqwJL
Optimizing Sleep — LessWrong
Comment by gwern - Oh, cool - as I understand it, Anki keeps fairly detailed statistics and exposes them to you; it'd be interesting to see graphs matched up...
optimizingsleeplesswrong
https://lesswrong.ru/forum/index.php?PHPSESSID=7h20gs2mm5r1hrc14ie48ehfah&board=8.0
Lesswrong.ru content
Lesswrong.ru content
lesswrongrucontent
https://www.lesswrong.com/users/marko-katavic
Marko Katavic — LessWrong
Marko Katavic's profile on LessWrong — A community blog devoted to refining the art of rationality
markolesswrong
https://www.greaterwrong.com/posts/pz7Qk2sRZNidT2wjL/ai-safety-at-the-frontier-paper-highlights-of-april-2026
AI Safety at the Frontier: Paper Highlights of April 2026 - LessWrong 2.0 viewer
Read the paper [UK AISI] Frontier labs are increasingly deploying models as autonomous research assistants for their own safety and alignment work, which makes...
https://manifold.markets/LessWrong/will-will-any-crap-cause-emergent-m
Will "Will Any Crap Cause Emergent Misalignment?" make the top fifty posts in LessWrong's 2025...
14% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.greaterwrong.com/users/prismattic
Prismattic - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.greaterwrong.com/users/zw5
zw5 - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.greaterwrong.com/tag/value-of-information?showPostCount=true&useTagName=true
Value of Information tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
value of informationtaglesswrongviewer