https://www.lesswrong.com/
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability
Towards Developmental Interpretability — LessWrong
Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie…
towardsdevelopmentalinterpretabilitylesswrong
https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in
Believing In — LessWrong
“In America, we believe in driving on the right hand side of the road.” …
believing inlesswrong
https://www.lesswrong.com/posts/LEESyXYFuW7R3Q9G5/facing-the-intelligence-explosion-discussion-page
Facing the Intelligence Explosion discussion page — LessWrong
I've created a new website for my ebook Facing the Intelligence Explosion: …
the intelligencediscussion pagefacingexplosionlesswrong
https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai
How it feels to have your mind hacked by an AI — LessWrong
Last week, while talking to an LLM (a large language model, which is the main talk of the town now) for several days, I went through an emotional rol…
how it feels
https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document
Claude 4.5 Opus' Soul Document — LessWrong
Update 2025-12-02: Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more deta…
claudeopussouldocumentlesswrong
https://www.lesswrong.com/posts/8KkiLeZRuuxbyjr8A/does-an-ai-society-need-an-immune-system-accepting
Does an AI Society Need an Immune System? Accepting Yampolskiy's Impossibility Results — LessWrong
This is Part 1 of a 4-part series,
https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers
AI Safety Needs Great Engineers — LessWrong
Top line: If you think you could write a substantial pull request for a major machine learning library, then major AI safety labs want to interview y…
ai safetyneedsgreatengineerslesswrong
https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans
Ends Don't Justify Means (Among Humans) — LessWrong
endsjustifymeansamonghumans
https://www.lesswrong.com/r/discussion/tag/rationalityreadinggroup
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/posts/ximou2kyQorm6MPjX/rest-days-vs-recovery-days
Rest Days vs Recovery Days - LessWrong 2.0 viewer
That comment I made generated more positive feedback than usual (in that people seemed to find it helpful to read and found themselves thinking about it months...
rest daysvsrecoverylesswrongviewer
https://manifold.markets/LessWrong/will-metr-measuring-ai-ability-to-c
Will "METR: Measuring AI Ability to Complete Long Tasks" make the top fifty posts in LessWrong's...
19% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.lesswrong.com/posts/p7CrByygeAqomsJqy/optimizing-sleep?commentId=6nmQ5W7XucdXTqwJL
Optimizing Sleep — LessWrong
Comment by gwern - Oh, cool - as I understand it, Anki keeps fairly detailed statistics and exposes them to you; it'd be interesting to see graphs matched up...
optimizingsleeplesswrong
https://www.greaterwrong.com/recentcomments
Recent comments - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
recent commentslesswrongviewer
https://www.lesswrong.com/users/caperu_wesperizzon
Caperu_Wesperizzon — LessWrong
Caperu_Wesperizzon's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/posts/SkXLrDXyHeekqgbFg/shock-level-5-big-worlds-and-modal-realism
Shock Level 5: Big Worlds and Modal Realism - LessWrong 2.0 viewer
In recent times, science and philosophy have uncovered evidence that there is something very seriously weird about the universe and our place in it. We used to...
https://www.greaterwrong.com/posts/vHSrtmr3EBohcw6t8/norms-of-membership-for-voluntary-groups/comment/FRkDubbjqEDQ4PbWz
Connor_Flexman comments on Norms of Membership for Voluntary Groups - LessWrong 2.0 viewer
This is a good point. I was wondering why civic/public is much more functional in meatspace than cyber, whereas a lot of internet communities that seem good...
https://www.greaterwrong.com/posts/CRsYy3xtbMrLjoXZT/evidence-for-the-orthogonality-thesis/comment/ZYwtHaPdfKCDCnXzc
TheAncientGeek comments on Evidence for the orthogonality thesis - LessWrong 2.0 viewer
Wei Dai's comment is full of wisdom. In particular: The Orthogonality Thesis (or it's denial) must assume that certain types of AI, e.g., those based on...
for the
https://www.lesswrong.com/users/engineerofscience
EngineerofScience — LessWrong
EngineerofScience's profile on LessWrong — A community blog devoted to refining the art of rationality
engineerofsciencelesswrong
https://www.greaterwrong.com/posts/ff5YRpt5WY3amtHBf/is-research-into-recursive-self-improvement-becoming-a
Is research into recursive self-improvement becoming a safety hazard? - LessWrong 2.0 viewer
One of the earliest speculations about machine intelligence was that, because it would be made of much simpler components than biological intelligence, like...
https://manifold.markets/LessWrong/will-anomalous-tokens-reveal-the-or
Will "Anomalous tokens reveal the original identiti..." make the top fifty posts in LessWrong's...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.lesswrong.com/posts/toZXD7QgKp9vQbvRF/conviction-skepticism
LessWrong
A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/posts/4gXvnTFy5WCTtYMAA/the-pope-offers-wisdom
The Pope Offers Wisdom - LessWrong 2.0 viewer
The Pope is a remarkably wise and helpful man. He offered us some wisdom. Yes, he is generally playing on easy mode by saying straightforwardly true things,...
the popeofferswisdomlesswrongviewer
https://www.greaterwrong.com/posts/Z92cmpcDr5aDGg68c/fundamentals-of-formalisation-level-3-set-theoretic
Fundamentals of Formalisation Level 3: Set Theoretic Relations and Enumerability - LessWrong 2.0...
Followup to Fundamentals of Formalisation level 2: Basic Set Theory. The big ideas: To move to the next level you need to be able to: Why this is important:...
https://www.greaterwrong.com/tag/fiction-topic?showPostCount=true&useTagName=true
Fiction (Topic) tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
fictiontopictaglesswrongviewer
https://www.lesswrong.com/users/ezra
Ezra — LessWrong
Ezra's profile on LessWrong — A community blog devoted to refining the art of rationality
ezralesswrong
https://manifold.markets/GarrettBaker/will-eliezer-think-there-was-a-sign
Will Eliezer think there was a significant portion of the LessWrong post *My Objections...* which...
Resolved CANCEL. Here's the post: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky# If Eliezer...
https://www.greaterwrong.com/users/hide
Hide - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
hidelesswrongviewer
https://www.lesswrong.com/posts/BcYBfG8KomcpcxkEg/crisis-of-faith
Crisis of Faith — LessWrong
crisis of faithlesswrong
https://www.greaterwrong.com/users/ryancarey
RyanCarey - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://manifold.markets/LessWrong/will-behavioral-redteaming-is-unlik
Will "Behavioral red-teaming is unlikely to produce..." make the top fifty posts in LessWrong's...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.greaterwrong.com/index?view=alignment-forum&sort=active
Alignment-forum posts - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
alignment forumpostslesswrongviewer
https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi
Irretrievability; or, Murphy's Curse of Oneshotness upon ASI — LessWrong
Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, the Viking 1 and Viking 2 missions. Total cost of $1B (1970), equiva…
murphycurse
https://www.greaterwrong.com/posts/ACXTJqHBDxvNivKKK/introducing-the-screwtape-ladders
Introducing The Screwtape Ladders - LessWrong 2.0 viewer
There's been some cool moments. My oldest visible post, Write A Thousand Roads to Rome, got cited in a discussion with Eliezer Yudkowsky once. I keep seeing...
introducingscrewtapeladderslesswrongviewer
https://www.lesswrong.com/posts/8whGos5JCdBzDbZhH/framings-of-deceptive-alignment
Framings of Deceptive Alignment — LessWrong
In this post I want to lay out some framings and thoughts about deception in misaligned AI systems. …
deceptivealignmentlesswrong
https://www.greaterwrong.com/tag/mild-optimization?showPostCount=true&useTagName=true
Mild optimization tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
mildoptimizationtaglesswrongviewer
https://lesswrong.ru/forum/index.php?PHPSESSID=e8frpsl44r9puemt4afiltfn4i&board=8.0
Lesswrong.ru content
Lesswrong.ru content
lesswrongrucontent
https://www.greaterwrong.com/posts/ESnzpoCJrAfwAzpMB/hammertime-day-3-taps/comment/N2Hc2kpeJWd4FyS52
Will Towler comments on Hammertime Day 3: TAPs - LessWrong 2.0 viewer
Sapience Spell: I wanted to use my tattoo because it has deep significance to me, but it's on my back. So, when I become tangibly aware of my (long) hair on my...
https://www.greaterwrong.com/users/euan-ong
Euan Ong - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
ongviewer
https://www.greaterwrong.com/posts/YgedrNsdXNajQ7oCT/punctuation-and-quotation-conventions/comment/uWAnPev4GbTR5LakA
Roman Malov comments on Punctuation & Quotation Conventions - LessWrong 2.0 viewer
A "tomato" is a red, savory fruit. If it were "the word 'tomato' refers to a red, savory fruit", then it would be the perfect case of map/territory use of...
romancommentspunctuation
https://www.greaterwrong.com/posts/NdaCDt8tWABxB6op9/are-we-leaving-literature-to-the-psychotic/answer/x2H2h3KXYhaENxR2p
RHollerith answers Are We Leaving Literature To The Psychotic? - LessWrong 2.0 viewer
I'm basically not worried about this. Google Search has proven pretty OK at preventing spam and content farms from showing up in search results at rates that...
https://www.greaterwrong.com/users/rguerreschi
rguerreschi - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.lesswrong.com/users/horosphere
Horosphere — LessWrong
Horosphere's profile on LessWrong — A community blog devoted to refining the art of rationality
horospherelesswrong
https://www.greaterwrong.com/posts/dtrmr6Fn5AyP5GosQ/rating-my-ai-predictions
Rating my AI Predictions - LessWrong 2.0 viewer
9 months ago I predicted trends I expected to see in AI over the course of 2023. Here's how I did (bold indicates they happened, italics indicates they didn't,...
my airatingpredictionslesswrongviewer
https://www.greaterwrong.com/posts/7c5ZQSrBGpT5CrDWj/the-three-boxes-a-simple-model-for-spreading-ideas-1
The Three Boxes: A Simple Model for Spreading Ideas - LessWrong 2.0 viewer
This is cross-posted from my blog. We need more people on board for life extension in order to hit longevity escape velocity in our lifetimes. But most people...
https://www.lesswrong.com/users/alex3
Alex3 — LessWrong
Alex3's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/users/alex-amadori
Alex Amadori - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
alexlesswrongviewer
https://www.greaterwrong.com/users/jacob-dunefsky
Jacob Dunefsky - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
jacoblesswrongviewer
https://www.greaterwrong.com/posts/8xKhCbNrdP4gaA8c3/sections-3-and-4-credibility-peaceful-bargaining-mechanisms
Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms - LessWrong 2.0 viewer
Credibility is a central issue in strategic interaction. By credibility, we refer to the issue of whether one agent has reason to believe that another will do...
sectionscredibilitypeaceful
https://www.lesswrong.com/posts/4CrumZwbPvc6mJBA3/purging-corrupted-capabilities-across-language-models-1
Backdoors have universal representations across large language models — LessWrong
by Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Amirali Abdullah • …
large language modelsbackdoorsuniversalrepresentationsacross
https://www.greaterwrong.com/?sort=active
LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.greaterwrong.com/tag/human-bodies
Human Bodies tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
humanbodiestaglesswrongviewer
https://www.greaterwrong.com/about
About - LessWrong 2.0 viewer
lesswrongviewer
https://www.greaterwrong.com/users/ata
ata - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
atalesswrongviewer
https://www.greaterwrong.com/posts/5bd75cc58225bf067037540c/infinite-ethics-comparisons
Infinite ethics comparisons - LessWrong 2.0 viewer
It's very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world w1 is better than w2, if the number...
infinite ethicscomparisonslesswrongviewer
https://www.greaterwrong.com/posts/xJr3Byvp4TeRp4csv/guidelines-for-upvoting-and-downvoting
Guidelines for Upvoting and Downvoting? - LessWrong 2.0 viewer
I've only recently joined the LessWrong community, and I've been having a blast reading through posts and making the occasional comment. So far, I've received...
guidelinesupvotinglesswrongviewer
https://manifold.markets/LessWrong/will-precedents-for-the-unprecedent
Will "Precedents for the Unprecedented: Historical ..." make the top fifty posts in LessWrong's...
15% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they...
https://www.greaterwrong.com/posts/yaJsCQokiyeLFHhgy/incorporating-justice-theory-into-decision-theory
Incorporating Justice Theory into Decision Theory - LessWrong 2.0 viewer
When someone wrongs us, how should we respond? We want to discourage this behavior, so that others find it in their interest to treat us well. And yet the goal...
incorporatingjusticetheorydecisionlesswrong
https://www.greaterwrong.com/posts/yenr6Zp83PHd6Beab/which-singularity-schools-plus-the-no-singularity-school-was
Which singularity schools plus the no singularity school was right? - LessWrong 2.0 viewer
TL;DR of this post: Accelerating change and Event Horizon were the most accurate schools, with Intelligence Explosion proving to be interestingly wrong...
https://www.lesswrong.com/posts/YbCc3NRrr5avvWSHT/who-wants-to-start-an-important-startup?commentId=qPQGQd6E3hZ5DsLz4
Who Wants To Start An Important Startup? — LessWrong
Comment by Kindly - The class I'm TAing has about 60 students in it; I see 40 or so regularly because one of the recitations is early in the morning and fewer...
to startwantsimportantstartuplesswrong
https://www.greaterwrong.com/posts/CbSEZSpjdpnvBcEvc/
I found 800 orthogonal "write code" steering vectors - LessWrong 2.0 viewer
A few weeks ago, I stumbled across a very weird fact: it is possible to find multiple steering vectors in a language model that activate very similar behaviors...
i foundwrite code
https://www.greaterwrong.com/posts/tmuFmHuyb4eWmPXz8/rant-on-problem-factorization-for-alignment
Rant on Problem Factorization for Alignment - LessWrong 2.0 viewer
This post is the second in what is likely to become a series of uncharitable rants about alignment proposals (previously: Godzilla Strategies). In general,...
rantproblemfactorizationalignmentlesswrong
https://www.lesswrong.com/users/beren-1
beren — LessWrong
beren's profile on LessWrong — A community blog devoted to refining the art of rationality
berenlesswrong
https://manifold.markets/LessWrong/will-ai-companies-arent-really-usin
Will "AI companies aren't really using external eva..." make the top fifty posts in LessWrong's...
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.lesswrong.com/w/conflict-vs-mistake
Conflict vs Mistake — LessWrong
Conflict vs Mistake is a framework for analyzing disagreements about policy. Mistake theorists think problems in society are caused by people being bad at...
conflict vs mistakelesswrong
https://www.greaterwrong.com/users/walterl
WalterL - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
lesswrongviewer
https://www.lesswrong.com/posts/ezkPRdJ6PNMbK3tp5/unsupervised-elicitation-of-language-models
Unsupervised Elicitation of Language Models — LessWrong
A key problem in alignment research is how to align superhuman models whose behavior humans cannot reliably supervise. If we use today’s standard pos…
language modelsunsupervisedelicitationlesswrong
https://www.lesswrong.com/users/jessica-heard
Jessica Heard — LessWrong
Jessica Heard's profile on LessWrong — A community blog devoted to refining the art of rationality
jessicaheardlesswrong
https://www.greaterwrong.com/posts/oq5CtbsCncctPWkTn/best-of-n-jailbreaking/comment/8r32eaEEo4cSGkkur
anaguma comments on Best-of-N Jailbreaking - LessWrong 2.0 viewer
Nice work! It's surprising that something so simple works so well. Have you tried applying this to more recent models like o1 or QwQ?
best ofanagumacomments
https://www.greaterwrong.com/posts/vwqLfDfsHmiavFAGP/the-library-of-scott-alexandria
The Library of Scott Alexandria - LessWrong 2.0 viewer
I've put together a list of what I think are the best Yvain (Scott Alexander) posts for new readers, drawing from SlateStarCodex, LessWrong, and Scott's...
the libraryscottalexandrialesswrongviewer
https://www.lesswrong.com/posts/ZGgneqEJXJLJxrBZD/what-s-the-this-ai-is-of-moral-concern-fire-alarm
What's the "This AI is of moral concern." fire alarm? — LessWrong
Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I'm curious about wha…
https://manifold.markets/LessWrong/will-o1-a-technical-primer-make-the
Will "o1: A Technical Primer" make the top fifty posts in LessWrong's 2024 Annual Review? | Manifold
Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once...
https://www.greaterwrong.com/tag/academic-papers
Academic Papers tag - LessWrong 2.0 viewer
A faster way to browse LessWrong 2.0
academic paperstaglesswrongviewer
https://www.lesswrong.com/posts/rMzb2dFwfAx6QH8ZH/sculpted-interaction-a-design-first-approach-to-ai-alignment
Sculpted Interaction: a Design-First Approach to AI Alignment — LessWrong
Acknowledgments: Thanks to Aditya Adiga for leading this project and trusting his ideas to me. Thanks to Matt Farr for comments on this draft. Thanks…
approach to aidesign firstsculptedinteraction
https://www.lesswrong.com/users/a1987dm
A1987dM — LessWrong
A1987dM's profile on LessWrong — A community blog devoted to refining the art of rationality
lesswrong
https://www.greaterwrong.com/posts/fnkbdwckdfHS2H22Q/steelmanning-divination
Steelmanning Divination - LessWrong 2.0 viewer
[This post was primarily written in 2015, after I gave a related talk, and other bits in 2018; I decided to finish writing it now because of a recent SSC...
steelmanningdivinationlesswrongviewer