Robuta

https://www.lesswrong.com/ LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.lesswrong.com/posts/TjaeCWvLZtEDAS5Ex/towards-developmental-interpretability Towards Developmental Interpretability — LessWrong Developmental interpretability is a research agenda that has grown out of a meeting of the Singular Learning Theory (SLT) and AI alignment communitie… towardsdevelopmentalinterpretabilitylesswrong https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in Believing In — LessWrong “In America, we believe in driving on the right hand side of the road.” … believing inlesswrong https://www.lesswrong.com/posts/LEESyXYFuW7R3Q9G5/facing-the-intelligence-explosion-discussion-page Facing the Intelligence Explosion discussion page — LessWrong I've created a new website for my ebook Facing the Intelligence Explosion: … the intelligencediscussion pagefacingexplosionlesswrong https://www.lesswrong.com/posts/9kQFure4hdDmRBNdH/how-it-feels-to-have-your-mind-hacked-by-an-ai How it feels to have your mind hacked by an AI — LessWrong Last week, while talking to an LLM (a large language model, which is the main talk of the town now) for several days, I went through an emotional rol… how it feels https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document Claude 4.5 Opus' Soul Document — LessWrong Update 2025-12-02: Amanda Askell has kindly confirmed that the document was used in supervised learning and will share the full version and more deta… claudeopussouldocumentlesswrong https://www.lesswrong.com/posts/8KkiLeZRuuxbyjr8A/does-an-ai-society-need-an-immune-system-accepting Does an AI Society Need an Immune System? Accepting Yampolskiy's Impossibility Results — LessWrong This is Part 1 of a 4-part series, https://www.lesswrong.com/posts/YDF7XhMThhNfHfim9/ai-safety-needs-great-engineers AI Safety Needs Great Engineers — LessWrong Top line: If you think you could write a substantial pull request for a major machine learning library, then major AI safety labs want to interview y… ai safetyneedsgreatengineerslesswrong https://www.lesswrong.com/posts/K9ZaZXDnL3SEmYZqB/ends-don-t-justify-means-among-humans Ends Don't Justify Means (Among Humans) — LessWrong endsjustifymeansamonghumans https://www.lesswrong.com/r/discussion/tag/rationalityreadinggroup LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/ximou2kyQorm6MPjX/rest-days-vs-recovery-days Rest Days vs Recovery Days - LessWrong 2.0 viewer That comment I made generated more positive feedback than usual (in that people seemed to find it helpful to read and found themselves thinking about it months... rest daysvsrecoverylesswrongviewer https://manifold.markets/LessWrong/will-metr-measuring-ai-ability-to-c Will "METR: Measuring AI Ability to Complete Long Tasks" make the top fifty posts in LessWrong's... 19% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.lesswrong.com/posts/p7CrByygeAqomsJqy/optimizing-sleep?commentId=6nmQ5W7XucdXTqwJL Optimizing Sleep — LessWrong Comment by gwern - Oh, cool - as I understand it, Anki keeps fairly detailed statistics and exposes them to you; it'd be interesting to see graphs matched up... optimizingsleeplesswrong https://www.greaterwrong.com/recentcomments Recent comments - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 recent commentslesswrongviewer https://www.lesswrong.com/users/caperu_wesperizzon Caperu_Wesperizzon — LessWrong Caperu_Wesperizzon's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/SkXLrDXyHeekqgbFg/shock-level-5-big-worlds-and-modal-realism Shock Level 5: Big Worlds and Modal Realism - LessWrong 2.0 viewer In recent times, science and philosophy have uncovered evidence that there is something very seriously weird about the universe and our place in it. We used to... https://www.greaterwrong.com/posts/vHSrtmr3EBohcw6t8/norms-of-membership-for-voluntary-groups/comment/FRkDubbjqEDQ4PbWz Connor_Flexman comments on Norms of Membership for Voluntary Groups - LessWrong 2.0 viewer This is a good point. I was wondering why civic/public is much more functional in meatspace than cyber, whereas a lot of internet communities that seem good... https://www.greaterwrong.com/posts/CRsYy3xtbMrLjoXZT/evidence-for-the-orthogonality-thesis/comment/ZYwtHaPdfKCDCnXzc TheAncientGeek comments on Evidence for the orthogonality thesis - LessWrong 2.0 viewer Wei Dai's comment is full of wisdom. In particular: The Orthogonality Thesis (or it's denial) must assume that certain types of AI, e.g., those based on... for the https://www.lesswrong.com/users/engineerofscience EngineerofScience — LessWrong EngineerofScience's profile on LessWrong — A community blog devoted to refining the art of rationality engineerofsciencelesswrong https://www.greaterwrong.com/posts/ff5YRpt5WY3amtHBf/is-research-into-recursive-self-improvement-becoming-a Is research into recursive self-improvement becoming a safety hazard? - LessWrong 2.0 viewer One of the earliest speculations about machine intelligence was that, because it would be made of much simpler components than biological intelligence, like... https://manifold.markets/LessWrong/will-anomalous-tokens-reveal-the-or Will "Anomalous tokens reveal the original identiti..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.lesswrong.com/posts/toZXD7QgKp9vQbvRF/conviction-skepticism LessWrong A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/4gXvnTFy5WCTtYMAA/the-pope-offers-wisdom The Pope Offers Wisdom - LessWrong 2.0 viewer The Pope is a remarkably wise and helpful man. He offered us some wisdom. Yes, he is generally playing on easy mode by saying straightforwardly true things,... the popeofferswisdomlesswrongviewer https://www.greaterwrong.com/posts/Z92cmpcDr5aDGg68c/fundamentals-of-formalisation-level-3-set-theoretic Fundamentals of Formalisation Level 3: Set Theoretic Relations and Enumerability - LessWrong 2.0... Followup to Fundamentals of Formalisation level 2: Basic Set Theory. The big ideas: To move to the next level you need to be able to: Why this is important:... https://www.greaterwrong.com/tag/fiction-topic?showPostCount=true&useTagName=true Fiction (Topic) tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 fictiontopictaglesswrongviewer https://www.lesswrong.com/users/ezra Ezra — LessWrong Ezra's profile on LessWrong — A community blog devoted to refining the art of rationality ezralesswrong https://manifold.markets/GarrettBaker/will-eliezer-think-there-was-a-sign Will Eliezer think there was a significant portion of the LessWrong post *My Objections...* which... Resolved CANCEL. Here's the post: https://www.lesswrong.com/posts/wAczufCpMdaamF9fy/my-objections-to-we-re-all-gonna-die-with-eliezer-yudkowsky# If Eliezer... https://www.greaterwrong.com/users/hide Hide - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 hidelesswrongviewer https://www.lesswrong.com/posts/BcYBfG8KomcpcxkEg/crisis-of-faith Crisis of Faith — LessWrong crisis of faithlesswrong https://www.greaterwrong.com/users/ryancarey RyanCarey - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://manifold.markets/LessWrong/will-behavioral-redteaming-is-unlik Will "Behavioral red-teaming is unlikely to produce..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/index?view=alignment-forum&sort=active Alignment-forum posts - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 alignment forumpostslesswrongviewer https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi Irretrievability; or, Murphy's Curse of Oneshotness upon ASI — LessWrong Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, the Viking 1 and Viking 2 missions. Total cost of $1B (1970), equiva… murphycurse https://www.greaterwrong.com/posts/ACXTJqHBDxvNivKKK/introducing-the-screwtape-ladders Introducing The Screwtape Ladders - LessWrong 2.0 viewer There's been some cool moments. My oldest visible post, Write A Thousand Roads to Rome, got cited in a discussion with Eliezer Yudkowsky once. I keep seeing... introducingscrewtapeladderslesswrongviewer https://www.lesswrong.com/posts/8whGos5JCdBzDbZhH/framings-of-deceptive-alignment Framings of Deceptive Alignment — LessWrong In this post I want to lay out some framings and thoughts about deception in misaligned AI systems. … deceptivealignmentlesswrong https://www.greaterwrong.com/tag/mild-optimization?showPostCount=true&useTagName=true Mild optimization tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 mildoptimizationtaglesswrongviewer https://lesswrong.ru/forum/index.php?PHPSESSID=e8frpsl44r9puemt4afiltfn4i&board=8.0 Lesswrong.ru content Lesswrong.ru content lesswrongrucontent https://www.greaterwrong.com/posts/ESnzpoCJrAfwAzpMB/hammertime-day-3-taps/comment/N2Hc2kpeJWd4FyS52 Will Towler comments on Hammertime Day 3: TAPs - LessWrong 2.0 viewer Sapience Spell: I wanted to use my tattoo because it has deep significance to me, but it's on my back. So, when I become tangibly aware of my (long) hair on my... https://www.greaterwrong.com/users/euan-ong Euan Ong - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 ongviewer https://www.greaterwrong.com/posts/YgedrNsdXNajQ7oCT/punctuation-and-quotation-conventions/comment/uWAnPev4GbTR5LakA Roman Malov comments on Punctuation & Quotation Conventions - LessWrong 2.0 viewer A "tomato" is a red, savory fruit. If it were "the word 'tomato' refers to a red, savory fruit", then it would be the perfect case of map/territory use of... romancommentspunctuation https://www.greaterwrong.com/posts/NdaCDt8tWABxB6op9/are-we-leaving-literature-to-the-psychotic/answer/x2H2h3KXYhaENxR2p RHollerith answers Are We Leaving Literature To The Psychotic? - LessWrong 2.0 viewer I'm basically not worried about this. Google Search has proven pretty OK at preventing spam and content farms from showing up in search results at rates that... https://www.greaterwrong.com/users/rguerreschi rguerreschi - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.lesswrong.com/users/horosphere Horosphere — LessWrong Horosphere's profile on LessWrong — A community blog devoted to refining the art of rationality horospherelesswrong https://www.greaterwrong.com/posts/dtrmr6Fn5AyP5GosQ/rating-my-ai-predictions Rating my AI Predictions - LessWrong 2.0 viewer 9 months ago I predicted trends I expected to see in AI over the course of 2023. Here's how I did (bold indicates they happened, italics indicates they didn't,... my airatingpredictionslesswrongviewer https://www.greaterwrong.com/posts/7c5ZQSrBGpT5CrDWj/the-three-boxes-a-simple-model-for-spreading-ideas-1 The Three Boxes: A Simple Model for Spreading Ideas - LessWrong 2.0 viewer This is cross-posted from my blog. We need more people on board for life extension in order to hit longevity escape velocity in our lifetimes. But most people... https://www.lesswrong.com/users/alex3 Alex3 — LessWrong Alex3's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/users/alex-amadori Alex Amadori - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 alexlesswrongviewer https://www.greaterwrong.com/users/jacob-dunefsky Jacob Dunefsky - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 jacoblesswrongviewer https://www.greaterwrong.com/posts/8xKhCbNrdP4gaA8c3/sections-3-and-4-credibility-peaceful-bargaining-mechanisms Sections 3 & 4: Credibility, Peaceful Bargaining Mechanisms - LessWrong 2.0 viewer Credibility is a central issue in strategic interaction. By credibility, we refer to the issue of whether one agent has reason to believe that another will do... sectionscredibilitypeaceful https://www.lesswrong.com/posts/4CrumZwbPvc6mJBA3/purging-corrupted-capabilities-across-language-models-1 Backdoors have universal representations across large language models — LessWrong by Narmeen Oozeer, Dhruv Nathawani, Nirmalendu Prakash, Amirali Abdullah • … large language modelsbackdoorsuniversalrepresentationsacross https://www.greaterwrong.com/?sort=active LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.greaterwrong.com/tag/human-bodies Human Bodies tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 humanbodiestaglesswrongviewer https://www.greaterwrong.com/about About - LessWrong 2.0 viewer lesswrongviewer https://www.greaterwrong.com/users/ata ata - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 atalesswrongviewer https://www.greaterwrong.com/posts/5bd75cc58225bf067037540c/infinite-ethics-comparisons Infinite ethics comparisons - LessWrong 2.0 viewer It's very difficult to compare utilities across worlds with infinite populations. For instance, it seems clear that world w1 is better than w2, if the number... infinite ethicscomparisonslesswrongviewer https://www.greaterwrong.com/posts/xJr3Byvp4TeRp4csv/guidelines-for-upvoting-and-downvoting Guidelines for Upvoting and Downvoting? - LessWrong 2.0 viewer I've only recently joined the LessWrong community, and I've been having a blast reading through posts and making the occasional comment. So far, I've received... guidelinesupvotinglesswrongviewer https://manifold.markets/LessWrong/will-precedents-for-the-unprecedent Will "Precedents for the Unprecedented: Historical ..." make the top fifty posts in LessWrong's... 15% chance. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once they... https://www.greaterwrong.com/posts/yaJsCQokiyeLFHhgy/incorporating-justice-theory-into-decision-theory Incorporating Justice Theory into Decision Theory - LessWrong 2.0 viewer When someone wrongs us, how should we respond? We want to discourage this behavior, so that others find it in their interest to treat us well. And yet the goal... incorporatingjusticetheorydecisionlesswrong https://www.greaterwrong.com/posts/yenr6Zp83PHd6Beab/which-singularity-schools-plus-the-no-singularity-school-was Which singularity schools plus the no singularity school was right? - LessWrong 2.0 viewer TL;DR of this post: Accelerating change and Event Horizon were the most accurate schools, with Intelligence Explosion proving to be interestingly wrong... https://www.lesswrong.com/posts/YbCc3NRrr5avvWSHT/who-wants-to-start-an-important-startup?commentId=qPQGQd6E3hZ5DsLz4 Who Wants To Start An Important Startup? — LessWrong Comment by Kindly - The class I'm TAing has about 60 students in it; I see 40 or so regularly because one of the recitations is early in the morning and fewer... to startwantsimportantstartuplesswrong https://www.greaterwrong.com/posts/CbSEZSpjdpnvBcEvc/ I found 800 orthogonal "write code" steering vectors - LessWrong 2.0 viewer A few weeks ago, I stumbled across a very weird fact: it is possible to find multiple steering vectors in a language model that activate very similar behaviors... i foundwrite code https://www.greaterwrong.com/posts/tmuFmHuyb4eWmPXz8/rant-on-problem-factorization-for-alignment Rant on Problem Factorization for Alignment - LessWrong 2.0 viewer This post is the second in what is likely to become a series of uncharitable rants about alignment proposals (previously: Godzilla Strategies). In general,... rantproblemfactorizationalignmentlesswrong https://www.lesswrong.com/users/beren-1 beren — LessWrong beren's profile on LessWrong — A community blog devoted to refining the art of rationality berenlesswrong https://manifold.markets/LessWrong/will-ai-companies-arent-really-usin Will "AI companies aren't really using external eva..." make the top fifty posts in LessWrong's... Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.lesswrong.com/w/conflict-vs-mistake Conflict vs Mistake — LessWrong Conflict vs Mistake is a framework for analyzing disagreements about policy. Mistake theorists think problems in society are caused by people being bad at... conflict vs mistakelesswrong https://www.greaterwrong.com/users/walterl WalterL - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 lesswrongviewer https://www.lesswrong.com/posts/ezkPRdJ6PNMbK3tp5/unsupervised-elicitation-of-language-models Unsupervised Elicitation of Language Models — LessWrong A key problem in alignment research is how to align superhuman models whose behavior humans cannot reliably supervise. If we use today’s standard pos… language modelsunsupervisedelicitationlesswrong https://www.lesswrong.com/users/jessica-heard Jessica Heard — LessWrong Jessica Heard's profile on LessWrong — A community blog devoted to refining the art of rationality jessicaheardlesswrong https://www.greaterwrong.com/posts/oq5CtbsCncctPWkTn/best-of-n-jailbreaking/comment/8r32eaEEo4cSGkkur anaguma comments on Best-of-N Jailbreaking - LessWrong 2.0 viewer Nice work! It's surprising that something so simple works so well. Have you tried applying this to more recent models like o1 or QwQ? best ofanagumacomments https://www.greaterwrong.com/posts/vwqLfDfsHmiavFAGP/the-library-of-scott-alexandria The Library of Scott Alexandria - LessWrong 2.0 viewer I've put together a list of what I think are the best Yvain (Scott Alexander) posts for new readers, drawing from SlateStarCodex, LessWrong, and Scott's... the libraryscottalexandrialesswrongviewer https://www.lesswrong.com/posts/ZGgneqEJXJLJxrBZD/what-s-the-this-ai-is-of-moral-concern-fire-alarm What's the "This AI is of moral concern." fire alarm? — LessWrong Given the recent noise on this issue around LaMDA, I thought it might be a good idea to have some discussion around this point. I'm curious about wha… https://manifold.markets/LessWrong/will-o1-a-technical-primer-make-the Will "o1: A Technical Primer" make the top fifty posts in LessWrong's 2024 Annual Review? | Manifold Resolved NO. As part of LessWrong's Annual Review, the community nominates, writes reviews, and votes on the most valuable posts. Posts are reviewable once... https://www.greaterwrong.com/tag/academic-papers Academic Papers tag - LessWrong 2.0 viewer A faster way to browse LessWrong 2.0 academic paperstaglesswrongviewer https://www.lesswrong.com/posts/rMzb2dFwfAx6QH8ZH/sculpted-interaction-a-design-first-approach-to-ai-alignment Sculpted Interaction: a Design-First Approach to AI Alignment — LessWrong Acknowledgments: Thanks to Aditya Adiga for leading this project and trusting his ideas to me. Thanks to Matt Farr for comments on this draft. Thanks… approach to aidesign firstsculptedinteraction https://www.lesswrong.com/users/a1987dm A1987dM — LessWrong A1987dM's profile on LessWrong — A community blog devoted to refining the art of rationality lesswrong https://www.greaterwrong.com/posts/fnkbdwckdfHS2H22Q/steelmanning-divination Steelmanning Divination - LessWrong 2.0 viewer [This post was primarily written in 2015, after I gave a related talk, and other bits in 2018; I decided to finish writing it now because of a recent SSC... steelmanningdivinationlesswrongviewer