lesswrong - Robuta Search

https://www.lesswrong.com/ LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability Towards Developmental Interpretability — LessWrong Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie… towards developmental interpretability lesswrong https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in Believing In — LessWrong “In America, we believe in driving on the right hand side of the road.” … believing in lesswrong https://www.lesswrong.com/posts/LEESyXYFuW7R3Q9G5/facing-the-intelligence-explosion-discussion-page Facing the Intelligence Explosion discussion page — LessWrong I've created a new website for my ebook Facing the Intelligence Explosion: … the intelligence discussion page facing explosion lesswrong https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai How it feels to have your mind hacked by an AI — LessWrong Last week, while talking to an LLM (a large language model, which is the main talk of the town now) for several days, I went through an emotional rol… how it feels https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document Claude 4.5 Opus' Soul Document — LessWrong Update 2025-12-02: Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more deta… claude opus soul document lesswrong https://www.lesswrong.com/posts/8KkiLeZRuuxbyjr8A/does-an-ai-society-need-an-immune-system-accepting Does an AI Society Need an Immune System? Accepting Yampolskiy's Impossibility Results — LessWrong This is Part 1 of a 4-part series, https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers AI Safety Needs Great Engineers — LessWrong Top line: If you think you could write a substantial pull request for a major machine learning library, then major AI safety labs want to interview y… ai safety needs great engineers lesswrong https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans Ends Don't Justify Means (Among Humans) — LessWrong ends justify means among humans https://www.lesswrong.com/r/discussion/tag/rationalityreadinggroup LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/ximou2kyQorm6MPjX/rest-days-vs-recovery-days Rest Days vs Recovery Days - LessWrong 2.0 viewer That comment I made generated more positive feedback than usual (in that people seemed to find it helpful to read and found themselves thinking about it months... rest days vs recovery lesswrong viewer https://manifold.markets/LessWrong/will-metr-measuring-ai-ability-to-c Will "METR: Measuring AI Ability to Complete Long Tasks" make the top fifty posts in LessWrong's... 19% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.lesswrong.com/posts/p7CrByygeAqomsJqy/optimizing-sleep?commentId=6nmQ5W7XucdXTqwJL Optimizing Sleep — LessWrong Comment by gwern - Oh, cool - as I understand it, Anki keeps fairly detailed statistics and exposes them to you; it'd be interesting to see graphs matched up... optimizing sleep lesswrong https://www.greaterwrong.com/recentcomments Recent comments - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 recent comments lesswrong viewer https://www.lesswrong.com/users/caperu_wesperizzon Caperu_Wesperizzon — LessWrong Caperu_Wesperizzon's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/SkXLrDXyHeekqgbFg/shock-level-5-big-worlds-and-modal-realism Shock Level 5: Big Worlds and Modal Realism - LessWrong 2.0 viewer In recent times, science and philosophy have uncovered evidence that there is something very seriously weird about the universe and our place in it. We used to... https://www.greaterwrong.com/posts/vHSrtmr3EBohcw6t8/norms-of-membership-for-voluntary-groups/comment/FRkDubbjqEDQ4PbWz Connor_Flexman comments on Norms of Membership for Voluntary Groups - LessWrong 2.0 viewer This is a good point. I was wondering why civic/public is much more functional in meatspace than cyber, whereas a lot of internet communities that seem good... https://www.greaterwrong.com/posts/CRsYy3xtbMrLjoXZT/evidence-for-the-orthogonality-thesis/comment/ZYwtHaPdfKCDCnXzc TheAncientGeek comments on Evidence for the orthogonality thesis - LessWrong 2.0 viewer Wei Dai's comment is full of wisdom. In particular: The Orthogonality Thesis (or it's denial) must assume that certain types of AI, e.g., those based on... for the https://www.lesswrong.com/users/engineerofscience EngineerofScience — LessWrong EngineerofScience's profile on LessWrong — A community blog devoted to refining the art of rationality engineerofscience lesswrong https://www.greaterwrong.com/posts/ff5YRpt5WY3amtHBf/is-research-into-recursive-self-improvement-becoming-a Is research into recursive self-improvement becoming a safety hazard? - LessWrong 2.0 viewer One of the earliest speculations about machine intelligence was that, because it would be made of much simpler components than biological intelligence, like... https://manifold.markets/LessWrong/will-anomalous-tokens-reveal-the-or Will "Anomalous tokens reveal the original identiti..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.lesswrong.com/posts/toZXD7QgKp9vQbvRF/conviction-skepticism LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/4gXvnTFy5WCTtYMAA/the-pope-offers-wisdom The Pope Offers Wisdom - LessWrong 2.0 viewer The Pope is a remarkably wise and helpful man. He offered us some wisdom. Yes, he is generally playing on easy mode by saying straightforwardly true things,... the pope offers wisdom lesswrong viewer https://www.greaterwrong.com/posts/Z92cmpcDr5aDGg68c/fundamentals-of-formalisation-level-3-set-theoretic Fundamentals of Formalisation Level 3: Set Theoretic Relations and Enumerability - LessWrong 2.0... Followup to Fundamentals of Formalisation level 2: Basic Set Theory. The big ideas: To move to the next level you need to be able to: Why this is important:... https://www.greaterwrong.com/tag/fiction-topic?showPostCount=true&useTagName=true Fiction (Topic) tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 fiction topic tag lesswrong viewer https://www.lesswrong.com/users/ezra Ezra — LessWrong Ezra's profile on LessWrong — A community blog devoted to refining the art of rationality ezra lesswrong https://manifold.markets/GarrettBaker/will-eliezer-think-there-was-a-sign Will Eliezer think there was a significant portion of the LessWrong post *My Objections...* which... Resolved CANCEL. Here's the post: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky# If Eliezer... https://www.greaterwrong.com/users/hide Hide - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 hide lesswrong viewer https://www.lesswrong.com/posts/BcYBfG8KomcpcxkEg/crisis-of-faith Crisis of Faith — LessWrong crisis of faith lesswrong https://www.greaterwrong.com/users/ryancarey RyanCarey - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrong viewer https://manifold.markets/LessWrong/will-behavioral-redteaming-is-unlik Will "Behavioral red-teaming is unlikely to produce..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/index?view=alignment-forum&sort=active Alignment-forum posts - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 alignment forum posts lesswrong viewer https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi Irretrievability; or, Murphy's Curse of Oneshotness upon ASI — LessWrong Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, the Viking 1 and Viking 2 missions. Total cost of $1B (1970), equiva… murphy curse https://www.greaterwrong.com/posts/ACXTJqHBDxvNivKKK/introducing-the-screwtape-ladders Introducing The Screwtape Ladders - LessWrong 2.0 viewer There's been some cool moments. My oldest visible post, Write A Thousand Roads to Rome, got cited in a discussion with Eliezer Yudkowsky once. I keep seeing... introducing screwtape ladders lesswrong viewer https://www.lesswrong.com/posts/8whGos5JCdBzDbZhH/framings-of-deceptive-alignment Framings of Deceptive Alignment — LessWrong In this post I want to lay out some framings and thoughts about deception in misaligned AI systems. … deceptive alignment lesswrong https://www.greaterwrong.com/tag/mild-optimization?showPostCount=true&useTagName=true Mild optimization tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 mild optimization tag lesswrong viewer https://lesswrong.ru/forum/index.php?PHPSESSID=e8frpsl44r9puemt4afiltfn4i&board=8.0 Lesswrong.ru content Lesswrong.ru content lesswrong ru content https://www.greaterwrong.com/posts/ESnzpoCJrAfwAzpMB/hammertime-day-3-taps/comment/N2Hc2kpeJWd4FyS52 Will Towler comments on Hammertime Day 3: TAPs - LessWrong 2.0 viewer Sapience Spell: I wanted to use my tattoo because it has deep significance to me, but it's on my back. So, when I become tangibly aware of my (long) hair on my... https://www.greaterwrong.com/users/euan-ong Euan Ong - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 ong viewer https://www.greaterwrong.com/posts/YgedrNsdXNajQ7oCT/punctuation-and-quotation-conventions/comment/uWAnPev4GbTR5LakA Roman Malov comments on Punctuation & Quotation Conventions - LessWrong 2.0 viewer A "tomato" is a red, savory fruit. If it were "the word 'tomato' refers to a red, savory fruit", then it would be the perfect case of map/territory use of... roman comments punctuation https://www.greaterwrong.com/posts/NdaCDt8tWABxB6op9/are-we-leaving-literature-to-the-psychotic/answer/x2H2h3KXYhaENxR2p RHollerith answers Are We Leaving Literature To The Psychotic? - LessWrong 2.0 viewer I'm basically not worried about this. Google Search has proven pretty OK at preventing spam and content farms from showing up in search results at rates that... https://www.greaterwrong.com/users/rguerreschi rguerreschi - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrong viewer https://www.lesswrong.com/users/horosphere Horosphere — LessWrong Horosphere's profile on LessWrong — A community blog devoted to refining the art of rationality horosphere lesswrong https://www.greaterwrong.com/posts/dtrmr6Fn5AyP5GosQ/rating-my-ai-predictions Rating my AI Predictions - LessWrong 2.0 viewer 9 months ago I predicted trends I expected to see in AI over the course of 2023. Here's how I did (bold indicates they happened, italics indicates they didn't,... my ai rating predictions lesswrong viewer https://www.greaterwrong.com/posts/7c5ZQSrBGpT5CrDWj/the-three-boxes-a-simple-model-for-spreading-ideas-1 The Three Boxes: A Simple Model for Spreading Ideas - LessWrong 2.0 viewer This is cross-posted from my blog. We need more people on board for life extension in order to hit longevity escape velocity in our lifetimes. But most people... https://www.lesswrong.com/users/alex3 Alex3 — LessWrong Alex3's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/users/alex-amadori Alex Amadori - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 alex lesswrong viewer https://www.greaterwrong.com/users/jacob-dunefsky Jacob Dunefsky - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 jacob lesswrong viewer https://www.greaterwrong.com/posts/8xKhCbNrdP4gaA8c3/sections-3-and-4-credibility-peaceful-bargaining-mechanisms Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms - LessWrong 2.0 viewer Credibility is a central issue in strategic interaction. By credibility, we refer to the issue of whether one agent has reason to believe that another will do... sections credibility peaceful https://www.lesswrong.com/posts/4CrumZwbPvc6mJBA3/purging-corrupted-capabilities-across-language-models-1 Backdoors have universal representations across large language models — LessWrong by Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Amirali Abdullah • … large language models backdoors universal representations across https://www.greaterwrong.com/?sort=active LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrong viewer https://www.greaterwrong.com/tag/human-bodies Human Bodies tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 human bodies tag lesswrong viewer https://www.greaterwrong.com/about About - LessWrong 2.0 viewer lesswrong viewer https://www.greaterwrong.com/users/ata ata - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 ata lesswrong viewer https://www.greaterwrong.com/posts/5bd75cc58225bf067037540c/infinite-ethics-comparisons Infinite ethics comparisons - LessWrong 2.0 viewer It's very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world w1 is better than w2, if the number... infinite ethics comparisons lesswrong viewer https://www.greaterwrong.com/posts/xJr3Byvp4TeRp4csv/guidelines-for-upvoting-and-downvoting Guidelines for Upvoting and Downvoting? - LessWrong 2.0 viewer I've only recently joined the LessWrong community, and I've been having a blast reading through posts and making the occasional comment. So far, I've received... guidelines upvoting lesswrong viewer https://manifold.markets/LessWrong/will-precedents-for-the-unprecedent Will "Precedents for the Unprecedented: Historical ..." make the top fifty posts in LessWrong's... 15% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.greaterwrong.com/posts/yaJsCQokiyeLFHhgy/incorporating-justice-theory-into-decision-theory Incorporating Justice Theory into Decision Theory - LessWrong 2.0 viewer When someone wrongs us, how should we respond? We want to discourage this behavior, so that others find it in their interest to treat us well. And yet the goal... incorporating justice theory decision lesswrong https://www.greaterwrong.com/posts/yenr6Zp83PHd6Beab/which-singularity-schools-plus-the-no-singularity-school-was Which singularity schools plus the no singularity school was right? - LessWrong 2.0 viewer TL;DR of this post: Accelerating change and Event Horizon were the most accurate schools, with Intelligence Explosion proving to be interestingly wrong... https://www.lesswrong.com/posts/YbCc3NRrr5avvWSHT/who-wants-to-start-an-important-startup?commentId=qPQGQd6E3hZ5DsLz4 Who Wants To Start An Important Startup? — LessWrong Comment by Kindly - The class I'm TAing has about 60 students in it; I see 40 or so regularly because one of the recitations is early in the morning and fewer... to start wants important startup lesswrong https://www.greaterwrong.com/posts/CbSEZSpjdpnvBcEvc/ I found 800 orthogonal "write code" steering vectors - LessWrong 2.0 viewer A few weeks ago, I stumbled across a very weird fact: it is possible to find multiple steering vectors in a language model that activate very similar behaviors... i found write code https://www.greaterwrong.com/posts/tmuFmHuyb4eWmPXz8/rant-on-problem-factorization-for-alignment Rant on Problem Factorization for Alignment - LessWrong 2.0 viewer This post is the second in what is likely to become a series of uncharitable rants about alignment proposals (previously: Godzilla Strategies). In general,... rant problem factorization alignment lesswrong https://www.lesswrong.com/users/beren-1 beren — LessWrong beren's profile on LessWrong — A community blog devoted to refining the art of rationality beren lesswrong https://manifold.markets/LessWrong/will-ai-companies-arent-really-usin Will "AI companies aren't really using external eva..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.lesswrong.com/w/conflict-vs-mistake Conflict vs Mistake — LessWrong Conflict vs Mistake is a framework for analyzing disagreements about policy. Mistake theorists think problems in society are caused by people being bad at... conflict vs mistake lesswrong https://www.greaterwrong.com/users/walterl WalterL - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrong viewer https://www.lesswrong.com/posts/ezkPRdJ6PNMbK3tp5/unsupervised-elicitation-of-language-models Unsupervised Elicitation of Language Models — LessWrong A key problem in alignment research is how to align superhuman models whose behavior humans cannot reliably supervise. If we use today’s standard pos… language models unsupervised elicitation lesswrong https://www.lesswrong.com/users/jessica-heard Jessica Heard — LessWrong Jessica Heard's profile on LessWrong — A community blog devoted to refining the art of rationality jessica heard lesswrong https://www.greaterwrong.com/posts/oq5CtbsCncctPWkTn/best-of-n-jailbreaking/comment/8r32eaEEo4cSGkkur anaguma comments on Best-of-N Jailbreaking - LessWrong 2.0 viewer Nice work! It's surprising that something so simple works so well. Have you tried applying this to more recent models like o1 or QwQ? best of anaguma comments https://www.greaterwrong.com/posts/vwqLfDfsHmiavFAGP/the-library-of-scott-alexandria The Library of Scott Alexandria - LessWrong 2.0 viewer I've put together a list of what I think are the best Yvain (Scott Alexander) posts for new readers, drawing from SlateStarCodex, LessWrong, and Scott's... the library scott alexandria lesswrong viewer https://www.lesswrong.com/posts/ZGgneqEJXJLJxrBZD/what-s-the-this-ai-is-of-moral-concern-fire-alarm What's the "This AI is of moral concern." fire alarm? — LessWrong Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I'm curious about wha… https://manifold.markets/LessWrong/will-o1-a-technical-primer-make-the Will "o1: A Technical Primer" make the top fifty posts in LessWrong's 2024 Annual Review? | Manifold Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/tag/academic-papers Academic Papers tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 academic papers tag lesswrong viewer https://www.lesswrong.com/posts/rMzb2dFwfAx6QH8ZH/sculpted-interaction-a-design-first-approach-to-ai-alignment Sculpted Interaction: a Design-First Approach to AI Alignment — LessWrong Acknowledgments: Thanks to Aditya Adiga for leading this project and trusting his ideas to me. Thanks to Matt Farr for comments on this draft. Thanks… approach to ai design first sculpted interaction https://www.lesswrong.com/users/a1987dm A1987dM — LessWrong A1987dM's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/fnkbdwckdfHS2H22Q/steelmanning-divination Steelmanning Divination - LessWrong 2.0 viewer [This post was primarily written in 2015, after I gave a related talk, and other bits in 2018; I decided to finish writing it now because of a recent SSC... steelmanning divination lesswrong viewer