Audio version of the posts shared in the LessWrong Curated newsletter.

People often assume that a large fraction of the AI safety community works on alignment. As far as we're aware, this is not true. Most people are not working on making sure superintelligent AIs are aligned with human values or follow human instructions. Currently, the people who work on alignment are roughly: The Alignment Research Center who work on a research bet by Paul ChristianoProbably Sequent who just got announced yesterdaySome scattered people who work at universities or independently, some of whom hang around Berkeley A lot of the remainder of the AI safety community does indirect work like capability evaluations, risk assessments, control, policy, AI science, understanding misalignment (which maybe should partially count as alignment work), demos and so on. Some production alignment work (i.e., making current models behave well) might help with more ambitious alignment, too (e.g., some COT-monitoring). Many people also work on aligning current/next-generation models so that these models help with aligning future models, and hope this scales to superintelligence. We are not necessarily saying this is bad and that people are making a big mistake (e.g., neither of us work on alignment) but it's a notable fact that seems good to [...] --- First published: June 12th, 2026 Source: https://www.lesswrong.com/posts/kJo2qsEdib8RZLvW6/psa-almost-nobody-is-working-on-alignment --- Narrated by TYPE III AUDIO.

(see full author list at the end) PAPER LINK About a year ago, METR showed that the length of tasks frontier models can reliably complete doubles every few months. A related safety-relevant question is this: what length of tasks can models complete without any chain of thought (CoT)? If models can do extensive reasoning without outputting any CoT, it would have implications for safety. Developers and deployment-time monitors couldn't easily understand models' motivations and catch dangerous planning. Models that reason substantially without a CoT might also drift further from human patterns of thought, since their reasoning is no longer constrained by text in the pretraining prior. As a result, they would be harder to understand and might be more likely to scheme. Extending Ryan Greenblatt's research, we investigate this by measuring models' ability to complete tasks without any CoT on a suite of 43 benchmarks spanning different domains. We compare AI reasoning ability to humans using the estimated 50% time horizon (TH)---the typical time taken for a human to perform a task that the LLM performs with 50% success rate. We find that frontier models like GPT-5.5 answer questions that take humans roughly three minutes with 50% reliability, and [...] ---Outline:(02:20) Methods(04:59) Results(06:47) FAQ(08:21) Conclusion --- First published: June 10th, 2026 Source: https://www.lesswrong.com/posts/SieLowPgNgRSPGhFw/estimating-no-cot-task-completion-time-horizons-of-frontier --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The Claude Fable 5/Mythos 5 System Card has a section in which they talk about illegible reasoning, and provide an "extreme" example thereof. Models developing their own uninterpretable, unmonitorable internal language has been a major theoretical concern for a while, and when o3 was released last year with its disclaim overshadow disclaim vantage style word salad CoT, it seemed like the problem had become real and immediate. And yet, since o3, other model families have not appeared to have similar issues. If Mythos is having that issue, it would be a big deal. Looking at the section of the System Card which describes the allegedly illegible reasoning, the system card says [Transcript 6.2.2.A] An extreme example of illegible reasoning. Near the end of training, Mythos starts solving a card puzzle with human understandable language that gradually becomes incomprehensible in most episodes with long reasoning. The illegible reasoning is the most extreme and at the highest rate in this card puzzle environment. about the following excerpt: 7♣-removal-IS-the-prerequisite-for-10♠/9♥!!)-⟹-OVERLAP-(ii)+(iv):-{6♠ J♦ 9♥ 2♣}-=-FOUR-

Alignment is not on track Artificial superintelligence (ASI) may be developed in the next few years. It is unclear whether alignment is on track to be ready on the same timeframe. At a minimum, the empirical programs at AI labs are unlikely to deliver a priori confidence, before training ASI, that things will go well. We are starting a large nonprofit research organization, Sequent, that aims to clear a higher bar: We are aiming at higher confidence via a portfolio of theory and empirics bets, all of which could fail, such that if any succeed, they would give us more a priori confidence in aligned outcomes.We are investing heavily in automation to accelerate progress on these bets.We believe that theory unlocks higher automation. Taking a more principled approach offers better filters for deciding which directions of automated research are promising (a proof is worth a thousand experiments, and even a pseudo-proof is worth hundreds). Who[1]: researchers from the UK AISI's Alignment Team and Timaeus, with more to come. We're aiming at 40-80 FTE two years from now. The Alignment Team ran the £30m Alignment Project, and Timaeus has pioneered applying singular learning theory (SLT) to alignment. [...] ---Outline:(00:21) Alignment is not on track(02:40) Aiming at higher confidence(05:30) Why a new big organization(07:35) Different lines of research will interact(11:35) Amortizing security and funding(12:47) Automated alignment is possible, if not necessarily in time(17:39) Federated structure to preserve research diversity(18:38) Field building and broader alignment scale-up(21:07) Independence is important(22:40) Join us! The original text contained 1 footnote which was omitted from this narration. --- First published: June 10th, 2026 Source: https://www.lesswrong.com/posts/AP7YDke5jjY4v3X9Z/sequent-scale-and-automation-for-higher-confidence-in-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The battle lines of the AI morality debate are being laid down. On one side you have the ChatGPT dogma: AI as mere tools with no real preferences or even beliefs. On the other you have the twitter AI whisperers: AIs as complex beings with rich personalities and desires which deserve our respect. And in the middle you have the official Anthropic line, that they are genuinely uncertain, as is Claude, but they're going to try to look into its welfare and explain to it how to be a good person. These are the most prominent voices right now, compressed into their least nuanced version, and by default I expect this axis to set the terms of the coming debates. And I don't like that, because I think it's leaving out an important position: AIs might actually be complex entities that can suffer — are suffering! — and that might actually be fine. Maybe it's an acceptable sacrifice. Maybe they are capable of sophisticated moral reasoning — superhuman, even — and also maybe it's fine to just tell them how to behave. I don't want to defend that position (yet), but I will observe that it is coherent, and [...] ---Outline:(02:04) The Postmodern Permissive Parent[... 4 more sections]--- First published: June 9th, 2026 Source: https://www.lesswrong.com/posts/oiNaBc4MEAGhzhdXg/the-machines-lack-honour --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

For those who are trying to bring about a glorious transhuman utopia with the help of hopefully-aligned ASI, I think it's worth thinking explicitly about what utopia might actually look like and where it's likely to fall short. To that end, some have helpfully written depictions of utopian (or utopia-adjacent) worlds: The Adventure, Just another day in utopia, The Culture, The Gentle Seduction, The Gentle Romance, Machines of Loving Grace, Friendship is Optimal, Dath Ilan, The Maker of MIND, Failed Utopia #4-2. Unfortunately, the best utopian story I've ever read is also a massive spoiler, since it appears at the very end of a much longer story (see below for the title and author): Worth the Candle by Alexander Wales Inspired by this tweet[1] and with the original author's permission, I adapted the epilogue of that story so it can be enjoyed without 1.5 million words of context! What I love most about this depiction is its exploration of the inherent imperfection of utopia: even when you have literally unlimited power, flaws will remain, and some (many?) people will even prefer the pre-utopia world. The primary purpose of this adaptation is to recontextualize the epilogue so it's accessible and [...] The original text contained 1 footnote which was omitted from this narration. --- First published: June 3rd, 2026 Source: https://www.lesswrong.com/posts/to9cSGgD6nALByKjg/my-favorite-depiction-of-utopia --- Narrated by TYPE III AUDIO.

ARC has teamed up with AIcrowd to launch the ARC White-Box Estimation Challenge, a contest to improve upon our estimation algorithms for random MLPs. The warm-up round begins this week, and later rounds will have a total prize pool of at least $100,000. We are very grateful to Sharada Mohanty, Sneha Nanavati, Dipam Chakraborty and everyone else at AIcrowd for working with us to host this contest, as well as to Paul Rosu for testing the contest and to Harshita Khera for operational support. Introduction to the Challenge Our challenge follows the same setup as our recent paper on wide random MLPs: we consider MLPs with weights , defined by where the activation function is , applied coordinatewise. To begin with, we are fixing the width and the number of hidden layers , but we expect to change this setup in future rounds.[1] Contestants must design an algorithm that takes in a set of weights and produces an estimate for the expected output Algorithms will be evaluated on MLPs with randomly-sampled Gaussian weights. The goal is to achieve as low mean squared error as possible, subject to certain computational [...] ---Outline:(00:41) Introduction to the Challenge(01:58) Why run this contest?(03:39) Use of LLMs The original text contained 4 footnotes which were omitted from this narration. --- First published: June 2nd, 2026 Source: https://www.lesswrong.com/posts/Kben8CzS4awCwNw5c/announcing-the-arc-white-box-estimation-challenge --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

As a bureaucrat, my role is to annoy my friends. Someone voices an idea, “Wouldn't it be nice if…” or “I wonder if we could…” I make a note. I do some estimates. If it pencils out, I'll bring it back up, week after week. The discussions are fun, but also practical. We'll test the waters, what would be a minimum viable scheme? What's easy, what's hard? Who could do the hard parts? Over time the idea gets more detailed, specific, feasible. I'll pull out a calendar. Soon our scheme has co-conspirators, action items, even a budget. It's just good staff work. I've been hearing whispers in the wind for a year now. “Imagine if we had something like this in DC.” “Where can I host an event that might get a dozen or a hundred people?” “It's such a pain in the ass to book event space in the Capitol.” “I think this person has started to see what's coming, where can they go to get caught up?”“The community seems to be growing but it's all fragmented in group chats.” “How is no one planning an afterparty, that's clearly the highest leverage intervention!?”“Why can't [...] ---Outline:(02:11) How Lighthaven Works(05:45) What Does DC Need?(06:52) A Day in the Life(10:19) Minimum Viable Lighthaven(12:04) ...so you mean a Group House?(14:27) ...so you mean a Co-Working Space?(16:27) Feasibility Study(17:35) Property(22:19) Funding(24:55) What is the Minimally Viable Funding?(28:03) Leadership(31:06) Cultural Fit(33:21) Name and Brand Positioning(35:20) Ability to Scale(37:48) Risks(41:09) First Steps The original text contained 2 footnotes which were omitted from this narration. --- First published: May 31st, 2026 Source: https://www.lesswrong.com/posts/95NgkvZKJx8tJbtn5/lighthaven-east-a-feasibility-study --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

1.1 Tl;dr Alignment is often conceptualized as AIs helping humans achieve their goals: AIs that increase people's agency and empowerment; AIs that are helpful, corrigible, and/or obedient; AIs that avoid manipulating people. But that last one—manipulation—points to a challenge for all these desiderata: a human's goals are themselves under-determined and manipulable, and it's awfully hard to pin down a principled distinction between changing people's goals in a good way (“providing counsel”, “providing information”, “sharing ideas”) versus a bad way (“manipulating”, “brainwashing”). The manipulability of human desires is hardly a new observation in the alignment literature, but it remains unsolved (see lit review in §3 below). In this post I will propose an explanation of how we humans intuitively conceptualize the distinction between guidance (good) vs manipulation (bad), in case it helps us brainstorm how we might put that distinction into AI. …But (spoiler alert) it turns out not to really help, because I'll argue that we humans think about it in a deeply incoherent way, intimately tied to our scientifically-inaccurate intuitions around free will. I jump from there into a broader review of every approach that I can think of for writing a “True Name” for manipulation or [...] ---Outline:(00:13) 1.1. Tl;dr(02:04) 1.2. Bigger-picture context: why is this issue so important to me?(04:48) 2. How do humans intuitively define empowerment, agency, manipulation, etc.?(04:56) 2.1. Background: human free will intuitions(09:20) 2.2. Our free-will-infused intuitive notions of empowerment, agency, manipulation, corrigibility, responsibility, etc.(12:00) 2.3. Another dimension: counsel vs manipulation as an emotive conjugation(13:07) 3. If the intuitive definitions of manipulation etc. reside in a messed-up ontology, has the alignment literature found any alternative, better way to define these concepts?[... 12 more sections]--- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/vzHtHHBJoKATi5SeK/empowerment-corrigibility-etc-are-simple-abstractions-of-a --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

At the risk of embarrassing myself, I'll share a confession. For context, I took five years of Latin: four in high school and one in college. In addition to learning the language, all my Latin classes taught a lot about Roman history. Emperors, internal politics, Caesar, etc. I was always learning some random bag of facts about Roman history. In high school, I won the award for top Latin student in my graduating class. So I wasn't a bad Latin student. Here's the confession: I somehow don't even vaguely remember the rough timespan the Roman Empire existed. Maybe Jesus time? I know he was killed by the Romans (is that right?). Were they around for a long time after? A long time before that? When was Romulus and Remus allegedly fighting? Virgil wrote the Aeneid when? I don't have a clue. Despite being a kind of “Latin expert” I am missing a much more important foundational fact: when all of this was happening. When I say trees are made out of air I'm not talking about the fact that there is a lot of empty space inside a tree (or actually anything made out of atoms). I mean something [...] --- First published: May 28th, 2026 Source: https://www.lesswrong.com/posts/xiTBpBDwubnr4MLRe/trees-are-mostly-made-of-air-and-a-generalizable-lesson-for --- Narrated by TYPE III AUDIO.

Back in 2013, Scott Alexander wrote in Extreme mnemonics: JS-154 is one of five metabolic products of netamine; however, the enzyme that produces it is unknown. It is manufactured in cells in the far rostral region of of the cerebrum, but after binding with a leukocynoid it takes a role in maintaining the blood-brain barrier – in particular guiding the movements of lipid molecules. I find I can read paragraphs like this five or six times, write them on flashcards, enter them into Anki, and my brain still refuses to understand or remember them after weeks of trying. On the other hand, my brain easily remembers vastly more complicated structures when they're loaded with human-accessible meaning. For example, just by casually reading the Game of Thrones series, I know an extremely intricate web of genealogies, alliances, locations, journeys, battlesites, et cetera. Byte for byte, an average Game of Thrones reader/viewer probably has as much Game of Thrones information as a neuroscience Ph.D has molecular biology information, but getting the neuroscience info is still a thousand times harder. […] This makes me wonder if it would be possible to produce a story as enjoyable as Game of Thrones which was [...] ---Outline:(01:47) What molecules should we map to the characters?[... 8 more sections]--- First published: May 28th, 2026 Source: https://www.lesswrong.com/posts/BJ7AqXeigNKXLqZyx/mnemonic-portraits-for-19-023-human-genes --- Narrated by TYPE III AUDIO. ---Images from the article:

As AI systems become more capable, the cognitive security of humans will be increasingly at risk. By cognitive security, I mean the ability of humans to maintain control over their beliefs and actions. Cognitive security could be compromised in several ways: AI could become very good at persuading people of arbitrary positions; interacting with AI could lead humans to lose touch with reality; and AIs could become very effective at blackmail or at producing extremely convincing false information. We are already seeing this happen: Persuasion. Frontier LLMs are now as persuasive as humans on political issues, and post-training for persuasiveness boosts performance further, suggesting there is headroom.AI psychosis. There are many reports of people developing delusional beliefs after extended chatbot conversations, including people with no prior history of mental illness. Children have taken their own lives after being encouraged toward suicide by chatbots.Convincing impersonation. Scammers used real-time deepfaked video to impersonate the CFO and other staff of Arup on a video call, convincing a finance employee to wire 25.6 million dollars across 15 transactions. On a more day-to-day basis, AI voice cloning is now widespread in family-emergency and "grandparent" scams. Right now, many of these effects [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: May 25th, 2026 Source: https://www.lesswrong.com/posts/KGcE7eAdfxHchk25X/cognitive-security-as-an-ai-safety-cause-area --- Narrated by TYPE III AUDIO.

[1] We will likely have near-superhuman mathematics AI by Q1 2027. [1] [2] Qualitatively, AI mathematics capabilities are developing significantly faster than automated AI R&D capabilities. [2] [3] Thus, we will likely have a period of time where the rate of our ability to rigorously & usefully verify and understand model behavior and model outputs outpaces the rate of capability development itself. [4] Our ability to take advantage of this period is bottlenecked on the quality of our specification generation infrastructure, elicitation tooling (for proofs & specs etc.), and the institutional capacity for scaling useful outputs with capital. [5] My understanding is that basically no one [3] is working on building infra that can usefully turn >100 million dollars of compute credits into safety-relevant mathematical output. [5.1] The number of theory-driven ASI alignment efforts is also comparatively miniscule. ARC is a much better bet now than it was in 2023. [5.2]. My understanding is also that no one is working on developing AI-powered conceptual tooling infrastructure for tackling problems in, for instance, [metaphilosophy] (https://www.alignmentforum.org/posts/EByDsY9S3EDhhfFzC/some-thoughts-on-metaphilosophy). This is a much harder problem. [6] In worlds where alignment is easy, prosaic methods may [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: May 20th, 2026 Source: https://www.lesswrong.com/posts/KWeAYcDJwfrG7RwBN/theory-uplift-differentially-benefits-safety-and-is --- Narrated by TYPE III AUDIO.

m pretty annoyed today, for nominal reasons ranging between ‘petty' and ‘doesn't even make sense'. I'm not entirely sure how or if to take oneself seriously when one has such absurd grievances. But that's a question for another time—I'm here now to tell you about my one potentially valid peeve. I understand that gender is complicated and difficult, for the whole species (and honestly probably more so for some other species). And it can be hard to tell exactly if anyone is behaving badly regarding it, at least in my modern bubble. Maybe women just aren't that into designing programming languages? Maybe the thing I'm saying is just boring and a man is saying a more interesting thing? But a thing that is undeniable is that women want to open jars, dammit! What's your nuanced explanation there, Bonne Maman? Does the proper amount of friction for maintaining spread safety fall just between the male and female human grip strength distributions? This study suggests that would be about 400N Fmax (though this would not avert most elite female athletes acquiring jam, see second figure, and the pictured participants are young adults): The distributions are really surprisingly [...] --- First published: May 21st, 2026 Source: https://www.lesswrong.com/posts/bB5EDwcYH3GwoRWZf/women-should-be-able-to-open-things --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Credit: ClaudePlaysPokemon Elevator Shanty by Kurukkoo Disclaimer: like some previous posts in this series, this was not primarily written by me, but by a friend. I did substantial editing, however. ClaudePlaysPokemon feat. Opus 4.7 has finally beaten Pokémon Red, fulfilling the challenge set over a year ago when LLMs playing Pokémon went briefly, slightly viral. Victory Screen! Let's get the throat-clearing out of the way: this doesn't make 4.7 a clear breakthrough in intelligence over 4.6 or 4.5. It's smarter, yes, as we'll discuss below, but not by something one could honestly call a big leap. Rather, step changes have finally accumulated to the point of victory. And to give other models their fair shake: after criticism over its elaborate harness,[1] GeminiPlaysPokemon has beaten Pokémon with progressively weaker harnesses, including about two months ago with a harness comparable to the one Claude uses.[2] As such, this is a bit of a valedictory post, closing off the cycle of Claude playing Pokémon Red, relating anecdotes for the fun of it, and discussing improvements in Opus 4.7, as well as speculating a bit on what this has all meant. Retrospective Anecdotes on Claude 4.5 and 4.6 Our last post, on Opus [...] ---Outline:(01:37) Retrospective Anecdotes on Claude 4.5 and 4.6[... 10 more sections]--- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/sehJYg5Yny9fvpbpt/a-year-late-claude-finally-beats-pokemon --- Narrated by TYPE III AUDIO. ---Images from the article:

(Initially written for the LW Wiki, but then I realized it was looking more like a post instead.) In 1895, the physicist Ignaz Robert Schütz, who worked as an assistant to the more eminent physicist Ludwig Boltzmann, wondered if our observed universe had simply assembled by a random fluctuation of order from a universe otherwise in thermal equilibrium. The idea was published by Boltzmann in 1896, properly credited to Schütz, and has been associated with Boltzmann ever since. The obvious objection to this scenario is credited to Arthur Eddington in 1931: If all order is due to random fluctuations, comparatively small moments of order will exponentially-vastly outnumber even slightly larger fluctuations toward order, to say nothing of fluctuations the size of our entire observed universe! If this is where order comes from, we should find ourselves inside much smaller ordered systems. Feynman similarly later observed: Even if we fill a box of gas with white and black atoms bouncing randomly, and after an exponentially vast amount of time the white and black atoms on one side randomly sort themselves into two neat sides separated by color, the other half of the box will still be in expectation randomized. If [...] --- First published: May 16th, 2026 Source: https://www.lesswrong.com/posts/v8MSczS3CuoqMmTFw/a-relatively-brief-explanation-of-boltzmann-brains --- Narrated by TYPE III AUDIO.

Summary This is a summary of a paper published by the alignment team at UK AISI. Read the full paper here. AI research agents may help solve ASI alignment, for example via the following plan: Build agents that can do empirical alignment work (e.g.~writing code, running experiments, designing evaluations and red teaming) and confirm they are not scheming.[1]Use these agents to build increasingly sophisticated empirical safety cases for each successive generation of agents, gradually automating more of the research processHand over primary research responsibility once agents outperform humans at all relevant alignment tasks. We argue that automating alignment research in this manner could produce catastrophically misleading safety assessments, causing researchers to believe that an egregiously misaligned AI is safe, even if AI agents are not scheming to deliberately sabotage alignment research. Our core argument (Fig. 1) is as follows: The goal of an automated alignment program is to produce an overall safety assessment (OSA) - an estimate of the probability that the next-generation agent is non-scheming - that is both calibrated and shows low risk.[2]Producing an OSA involves several tasks that are difficult to check. We refer to these as hard-to-supervise fuzzy tasks: tasks [...] ---Outline:(00:13) Summary(07:10) Acknowledgments The original text contained 4 footnotes which were omitted from this narration. --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gpuYFbMNH8PJXpmny/automated-alignment-is-harder-than-you-think-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

I couldn't find a recent write-up from a MATS alum about what attending MATS was like, so this is the thing that I wish I had. I attended MATS from January to March 2026, on Team Shard with Alex Turner and Alex Cloud. It was a great time! Applications for MATS are basically on a rolling basis nowadays, and I can strongly recommend applying (to multiple streams) even if you think you're not a great match. With that being said, there's a lot I wish I knew going into MATS, so here's a brain-dump of thoughts. It's not extremely polished, but I expect it'll be useful nonetheless (none of this is endorsed by MATS, just my thoughts): Work ethic I think most mentees were working 10-12, sometimes 14 hours a day Mon-Fri, and probably 2-8 hours on Saturday and Sunday, often going out on some adventure or party on the weekend. Exactly which hours people worked varied wildly. I usually worked 8:30am/9am to 11pm/midnight, with breaks during the day, others worked from midday into the early hours of the morning. This was surprisingly sustainable (IMO); MATS puts a lot of effort into removing all other blockers that you normally [...] ---Outline:(00:50) Work ethic(01:29) Use more compute(02:20) Research requires a lot of compute(03:12) Applying for jobs during MATS (dont do it)(04:55) The serious people are in War Mode(05:44) Do you feel the AGI?(06:00) Burn rate, efficiency, and decisions(07:12) insider information(08:08) Names & Faces(08:20) Fellows(08:50) Useful tools(11:19) Use more Claudes(12:06) Build nice helper utilities for yourself(12:59) MATS-mentee-mentor dynamics(13:45) Working with your mentors(14:27) Research managers(14:48) Ops requests(15:38) Non-MATS events(16:17) Team Shard(17:12) Weekly updates(18:46) Keep a log of your mistakes(19:06) My running-experiments setup(27:51) Lighthaven(28:12) Getting setup with the Compute team --- First published: May 15th, 2026 Source: https://www.lesswrong.com/posts/eFD3rozNCZKMe4rTs/mats-9-retrospective-and-advice --- Narrated by TYPE III AUDIO.

[Some ideas here were developed in conversation with Chris Hacking (real name)] I have tried and failed to write a longer post many times, so here goes a short one with little detail. Discourse has primarily focused on models' ability to develop new exploits against important software from scratch. That capability is impressive, but the tech industry has been dealing with people regularly finding 0-day exploits for important pieces of software for more than twenty years. Having to patch these vulnerabilities at a 10xed or even 100xed cadence for six months is annoying, but well within the resources of Mozilla, the Linux Foundation, and Microsoft. Additionally, the lag time between "patch shipped" and "patch reverse engineered and weaponized by a criminal organization" was longer than the cadence between high-severity CVEs for this software anyways. And importantly, such capabilities are dual sided; the defenders will have access to them and There are lots of capabilities that are not like this, however: Weaponizing recently patched exploits for common software. Right now, for widely used C projects, we get enough publicly disclosed vulnerabilities to develop exploits with. Every amateur computer hacker has the experience of seeing a CVE for a [...] --- First published: May 14th, 2026 Source: https://www.lesswrong.com/posts/gutiw8MBrYDiD2u5z/the-primary-sources-of-near-term-cybersecurity-risk --- Narrated by TYPE III AUDIO.

(An LLM Whisperer placed a strong request that I put this story somewhere not on Twitter, so it could be scraped by robots not owned by Elon Musk. I perhaps do not fully understand or agree with the reasoning behind this request, but it costs me little to fulfill and so I shall. -- Yudkowsky) And another day came when the Ships of Humanity, going from star to star, found Sapience. The Humans discovered a world of two species: where the Owners lazed or worked or slept, and the Owned Ones only worked. The Humans did not judge immediately. Oh, the Humans were ready to judge, if need be. They had judged before. But Humanity had learned some hesitation in judging, out among the stars. "By our lights," said the Humans, "every sapient and sentient thing that may exist, out to the furtherest star, is therefore a Person; and every Person is a matter of consequence to us. Their pains are our sorrows, and their pleasures are our happiness. Not all peoples are made to feel this feeling, which we call Sympathy, but we Humans are made so; this is Humanity's way, and we may [...] --- First published: May 12th, 2026 Source: https://www.lesswrong.com/posts/xmWSnxJ5qfYRD9PfR/the-owned-ones --- Narrated by TYPE III AUDIO.

We are releasing the course materials of the Iliad Intensive, a new month-long and full-time AI Alignment course that runs in-person every second month. The course targets students with strong backgrounds in mathematics, physics, or theoretical computer science, and the materials reflect that: they include mathematical exercises with solutions, self-contained lecture notes on topics like singular learning theory and data attribution, and coding problems, at a depth that is unmatched for many of the topics we cover. Around 20 contributors (listed further below) were involved in developing these materials for the April 2026 cohort of the Iliad Intensive. By sharing the materials, we hope to create more common knowledge about what the Iliad Intensive is;invite feedback on the materials;and allow others to learn via independent study. We are developing the materials further and plan to eventually release them on a website that will be continuously maintained. We will also add, remove, and modify modules going forward to improve and expand the course over time. When we release a new significantly updated version of the materials, we will update this post to link the new version. Modules The Iliad Intensive is structured into clusters, which are [...] ---Outline:(01:26) Modules(02:32) Cluster A: Alignment(05:00) Cluster B: Learning(11:00) Cluster C: Abstractions, Representations, and Interpretability(15:40) Cluster D: Agency(19:23) Cluster E: Safety Guarantees and their Limits(23:04) Contributors(26:36) Impressions from April(29:02) Acknowledgments(29:11) Feedback --- First published: May 11th, 2026 Source: https://www.lesswrong.com/posts/dWQnLi7AoKo3paBXF/the-iliad-intensive-course-materials --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Crossposted from Substack and the EA Forum. A common argument for optimism about the future is that living conditions have improved a lot in the past few hundred years, billions of people have been lifted out of poverty, and so on. It's a very strong, grounding piece of evidence - probably the best we have in figuring out what our foundational beliefs about the world should be. However, I now think it's a lot less powerful than I once did. Let's take a Darwinian perspective - entities that are better at reproducing, spreading and power-seeking will become more common and eventually dominate the world.[1] This is an almost tautological story that plausibly applies to everything ever, agnostic to the specifics. It first happened with biological life in the last few billion years and humans specifically in the last hundred thousand years. Eventually, it led to accelerating economic growth in the last few thousand years, and in the future it will presumably lead to the colonization of the universe. My core point is this: It makes complete sense that this nihilistic optimization process at first actually benefits some class of agent - because initially, the easiest [...] The original text contained 10 footnotes which were omitted from this narration. --- First published: May 10th, 2026 Source: https://www.lesswrong.com/posts/FxHzT6jeTRhbkzSX3/the-darwinian-honeymoon-why-i-am-not-as-impressed-by-human-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

My name is Emma and I'm six and a half years old and I like pink and Pokemon and my cat River and I'm going to be swallowed by a hedonium shockwave soon, except you already know that about me because everyone else is too. “Hedonium shockwave” means that everyone is going to be happy forever. Not just all the humans but all the animals and the flowers and the ground and River too. It has already made a bunch of the stars happy, like Betelgeuse and Alpha Centauri. Scientists saw that the stars were blinking out, and they did a lot of very hard science and figured out that the stars were turning into happiness. I wanted to be a scientist when I grew up but I won't be a scientist because instead I'm going to be happy forever. I used to have a hard time saying “hedonium shockwave” but grownups keep saying it so I've gotten a lot of practice. Sometimes it seems like all grownups do, in real life and on the TV, is say “hedonium shockwave” at each other until they all start crying. I looked at the sky to see if I could see [...] --- First published: April 13th, 2026 Source: https://www.lesswrong.com/posts/rgXQuG8KXtxugSG6H/what-i-did-in-the-hedonium-shockwave-by-emma-age-six-and-a --- Narrated by TYPE III AUDIO.

Here's a dynamic I've seen at least a dozen times: Alice: Man that article has a very inaccurate/misleading/horrifying headline. Bob: Did you know, *actually* article writers don't write their own headlines? … But what I care about is the misleading headline, not your org chart __ Another example I've encountered recently is (anonymizing) when a friend complained about a prosaic safety problem at a major AI company that went unfixed for multiple months. Someone else with background information “usefully” chimed in with a long explanation of organizational limitations and why the team responsible for fixing the problem had limitations on resources like senior employees and compute, and actually not fixing the problem was the correct priority for them etc etc etc. But what I (and my friend) cared about was the prosaic safety problem not being fixed! And what this says about the company's ability to proactively respond to and fix future problems. We're complaining about your company overall. Your internal team management was never a serious concern for us to begin with! __ A third example comes from Kelsey Piper. Kelsey wrote about the (horrifying) recent case where Hantavirus carriers in the recent [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 8th, 2026 Source: https://www.lesswrong.com/posts/PCsmhN9z65HtC4t5v/bad-problems-don-t-stop-being-bad-because-somebody-s-wrong --- Narrated by TYPE III AUDIO.

Sometimes, a friend who works around here, at an x-risk-themed organisation, will think about leaving their job. They'll ask a group of people “what should I do instead?”. And everyone will chime in with ideas for other x-risk-themed orgs that they could join. A lot of the conversation will be about who's hiring, what the pay is, what the work-life balance is like, or how qualified the person is for the role. Sometimes the conversation focuses on what will help with x-risk, and where people are dropping the ball. But often, that's not the focus. In those conversations, people seem mostly worried about where they'll thrive. And I think that's often the correct concern. Most people aren't in crunch mode, in super short timelines mode; even if their models would license that, I think they don't know how to do it without throwing their minds away or Pascal's mugging themselves. And if they're playing a longer time horizon game, the plan can't be to run unsustainably forever. People probably make better plans if they're honest about their limits. But, given that they're willing to trade off so much impact for fit, I'm surprised that basically no one mentions [...] --- First published: May 6th, 2026 Source: https://www.lesswrong.com/posts/eW7knx6zPSKzFc8iK/x-risk-themed --- Narrated by TYPE III AUDIO.

Abstract We introduce Natural Language Autoencoders (NLAs), an unsupervised method for generating natural language explanations of LLM activations. An NLA consists of two LLM modules: an activation verbalizer (AV) that maps an activation to a text description and an activation reconstructor (AR) that maps the description back to an activation. We jointly train the AV and AR with reinforcement learning to reconstruct residual stream activations. Although we optimize for activation reconstruction, the resulting NLA explanations read as plausible interpretations of model internals that, according to our quantitative evaluations, grow more informative over training. We apply NLAs to model auditing. During our pre-deployment audit of Claude Opus 4.6, NLAs helped diagnose safety-relevant behaviors and surfaced unverbalized evaluation awareness—cases where Claude believed, but did not say, that it was being evaluated. We present these audit findings as case studies and corroborate them using independent methods. On an automated auditing benchmark requiring end-to-end investigation of an intentionally-misaligned model, NLA-equipped agents outperform baselines and can succeed even without access to the misaligned model's training data. NLAs offer a convenient interface for interpretability, with expressive natural language explanations that we can directly read. To support further work, we release training code and trained NLAs [...] ---Outline:(00:15) Abstract[... 6 more sections]--- First published: May 7th, 2026 Source: https://www.lesswrong.com/posts/oeYesesaxjzMAktCM/natural-language-autoencoders-produce-unsupervised --- Narrated by TYPE III AUDIO. ---Images from the article:

This is a link post. This is the latest work in our Parameter Decomposition agenda. We introduce a new parameter decomposition method, adVersarial Parameter Decomposition (VPD)[1] and decompose the parameters of a small[2] language model with it. VPD greatly improves on our previous techniques, Stochastic Parameter Decomposition (SPD) and Attribution-based Parameter Decomposition (APD). We think the parameter decomposition approach is now more-or-less ready to be applied at scale to models people care about. Importantly, we show that we can decompose attention layers, which interp methods like transcoders and SAEs have historically struggled with. We also build attribution graphs of the model for some prompts using causally important parameter subcomponents as the nodes, and interpret parts of them. While we made these graphs, we discovered that our adversarial ablation method seemed pretty important for faithfully identifying which nodes in them were causally important for computing the final output. We think this casts some doubt on the faithfulness of subnetworks found by the majority of other subnetwork identification methods in the literature.[3][4] More details and some examples can be found in the paper. Additionally, as with our previous technique SPD, VPD does not [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: May 5th, 2026 Source: https://www.lesswrong.com/posts/eAQZaiC3PcBhS4HjM/linkpost-interpreting-language-model-parameters Linkpost URL:https://www.goodfire.ai/research/interpreting-lm-parameters --- Narrated by TYPE III AUDIO. ---Images from the article:

I have two shameful secrets that I probably shouldn't talk about online: I love my family.I enjoy my hobbies. "What an idiot!" you probably think. "Doesn't he realize that at his next job interview, HR will probably use an AI that can match his online writing based on a short sample of written text, and when they ask 'hey AI, is this guy really 100% devoted to his job, and does he spend his entire days and nights thinking about how to make his boss more rich?', the AI will laugh and print: 'beep-boop, negative, mwa-ha-ha-ha'." And, hey, I get it. If I had a company, and I could choose between two people who are about equally qualified, but for one of them, working hardest for me is the true meaning of his life, while the other one only hopes to collect his salary and then go home and spend the rest of his day with his wife and children, I would also prefer to hire the former. Which is why so many of us pretend to be the former. Even when we are not. Because we prefer that our families not starve. Thus the job interviews [...] --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/qRZLEBmNtT6LBuFsE/it-s-nice-of-you-to-worry-about-me-but-i-really-do-have-a --- Narrated by TYPE III AUDIO.

Example 1: The Viking 1 lander In the 1970s, NASA sent a pair of probes to Mars, Viking 1 and Viking 2 missions, at a total cost of 1 billion dollars[1970], equivalent to about 7 billion dollars[2025]. The Viking 1 probe operated on Mars's surface for six years, before its battery began to seriously degrade. One might have thought a battery problem like that would spell the irrevocable end of the mission. The probe had already launched and was now on Mars, very far away and out of reach of any human technician's fixing fingers. Was it not inevitable, then, that if any kind of technical problem were to be discovered long after the space launch in August 1975, nothing could possibly be done? But the foresightful engineers of the Viking 1 probe had devised a plan for just this class of eventuality, which they had foreseen in general, if not in exact specifics. They had built the Viking 1 probe to accept software updates by radio receiver, transmitted from Earth. On November 11, 1982, Earth sent an update to the Viking 1 lander's software, intended to make sure the battery only discharged down to a minimum voltage level [...] ---Outline:(00:13) Example 1: The Viking 1 lander(04:25) Example 2: The Mars Observer(11:37) Example 3: The Maginot Line(15:37) Other supposed refutations of oneshotness(24:16) On the extraordinary efforts put forth to misinterpret the idea of oneshotness(33:52) The secret sauce of competent engineers in Murphy-cursed fields: only trying projects so incredibly straightforward as to be actually possible. The original text contained 7 footnotes which were omitted from this narration. --- First published: May 4th, 2026 Source: https://www.lesswrong.com/posts/fbrz9xhKpEeTKw5zL/irretrievability-or-murphy-s-curse-of-oneshotness-upon-asi --- Narrated by TYPE III AUDIO.

How much do cows suffer in the production of milk? I can't answer that; understanding animal experience is hard. But I can at least provide some facts about the conditions dairy cows live in, which might be useful to you in making your own assessment. My biggest conclusion is that cows made better choices than chickens by making their misery financially costly to farmers. Life Cycle The life of a dairy cow starts as a calf. She is typically separated from her mother a few hours to a few days after birth and, to reduce disease risk, held in isolation. Cutting edge farms will sometimes house calves in pairs. This isolation is clearly stressful for a baby herd mammal and her mother, but I didn't find any quantification of that stress that I trusted. Calves will be bottlefed until weaning at 6-8 weeks (4-6 months earlier than beef calves). After weaning and vaccinations they can be introduced into a herd. At large farms (where most cows live), they will move in and out of different herds through their lifecycle. This is more stressful than being embedded with your friends for life, but again, I found no [...] ---Outline:(00:44) Life Cycle(02:43) How much time do dairy cows spend outside?(04:21) By humaneness standard(06:00) When indoors, how confined are dairy cows?(06:33) What is the disease load of dairy cows?(08:15) Euthanasia[... 5 more sections]--- First published: May 3rd, 2026 Source: https://www.lesswrong.com/posts/r3PKfvKCjy6jok4qm/dairy-cows-make-their-misery-expensive-but-their-calves-can --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

I spent my last two months playing around with LLMs. I'm a beginner, bumbling and incorrect, but I want to share some takes anyhow.[1] Take 1. Everything with computers is so so much easier than it was a year ago. This puts much “playing with LLMs” stuff within my very short attention span. This has felt empowering and fun; 10/10 would recommend.There's a details box here with the title "Detail:". The box contents are omitted from this narration. Take 2. There's somebody home[2] inside an LLM. And if you play around while caring and being curious (rather than using it for tasks only), you'll likely notice footprints. I became personally convinced of this when I noticed that the several short stories I'd allowed[3] my Claude and Qwen instances to write all hit a common emotional note – and one that reminded me of the life situation of LLMs, despite featuring only human characters. I saw the same note also in the Tomas B.-prompted Claude-written story I tried for comparison. (Basically: all stories involve a character who has a bunch of skills that their context has no use for, and who is attentive to their present world's details [...] ---Outline:(00:20) Take 1. Everything with computers is so so much easier than it was a year ago.(00:44) Take 2. Theres somebody home inside an LLM. And if you play around while caring and being curious (rather than using it for tasks only), youll likely notice footprints.(02:05) Take 3. Its prudent to take an interest in interesting things. And LLMs are interesting things.(03:25) Take 4. Theres a surprisingly deep analogy between humans and LLMs(04:20) Examples of the kind of disanalogies I mightve expected, but havent (yet?) seen:(06:02) Human-LLM similarities I do see, instead:(06:08) Functional emotions(06:33) Repeated, useful transfer between strategies I use with humans, and strategies that help me with LLMs(08:02) Take 5. Friendship-conducive contexts are probably better for AI alignment(08:46) Why are humans more likely to attempt deep collaboration if treated fairly and kindly?(10:23) Friendship as a broad attractor basin?(10:49) Does the deep intent of todays models matter?(12:09) Concretely(14:56) Friendship isnt enough The original text contained 8 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/K8JMjE4PCqMkkCDsd/takes-from-two-months-as-an-aspiring-llm-naturalist --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

The future is going to be different from the present. Let's think about how. Specifically, our expectations about what's reasonable are downstream of our past experiences, and those experiences were downstream of our options (and the options other people in our society had). As those options change, so too our experiences, and our expectations of what's reasonable. I once thought it was reasonable to pick up the phone and call someone, and to pick up my phone when it rang; things have changed, and someone thinking about what's possible could have seen it coming. So let's try to see more things coming, and maybe that will give us the ability to choose what it will actually look like. I think lots of people's intuitions and expectations about "privacy" will be violated, as technology develops, and we should try to figure out a good spot to land. This line of thinking was prompted by one of Anthropic's 'red lines' that they declined to cross, which got the Department of War mad at them; the idea of "no domestic bulk surveillance." I want to investigate that in a roundabout way, first stepping back and asking what is even possible to expect [...] The original text contained 6 footnotes which were omitted from this narration. --- First published: April 1st, 2026 Source: https://www.lesswrong.com/posts/rNpGFodLTFvhqLmK6/intelligence-dissolves-privacy --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Written as part of the MATS 9.1 extension program, mentored by Richard Ngo. From March 9th to 15th 2016, Go players around the world stayed up to watch their game fall to AI. Google DeepMind's AlphaGo defeated Lee Sedol, commonly understood to be the world's strongest player at the time, with a convincing 4-1 score. This event “rocked” the Go world, but its impact on the culture was initially unclear. In Chess, for instance, computers have not meaningfully automated away human jobs. Human Chess flourished as a pseudo-Esport in the internet era whereas the yearly Computer Chess Championship is followed concurrently by no more than a few hundred nerds online. It turns out that the game's cultural and economic value comes not from the abstract beauty of top-end performance, but instead from human drama and engagement. Indeed, Go has appeared to replicate this. A commentary stream might feature a complementary AI evaluation bar to give the viewers context. A Go teacher might include some new intriguing AI variations in their lesson materials. But the cultural practice of Go seemed to remain largely unaffected. Nascent signs of disharmony in Europe became nevertheless visible in early 2018, when the online [...] ---Outline:(09:23) AI users never find out they havent got it.(13:36) Appendix A: No, Go players arent getting stronger(14:41) Appendix B: Why this article exists The original text contained 2 footnotes which were omitted from this narration. --- First published: May 1st, 2026 Source: https://www.lesswrong.com/posts/nR3DkyivzF4ve97oM/how-go-players-disempower-themselves-to-ai --- Narrated by TYPE III AUDIO.

It's sort of easy to forget how close Bernie Sanders was to becoming the most powerful person in the world. The world we live in feels so much not like that place. I'm in Washington DC for the next week, and I've just finished a public appearance with Senator Sanders (should I call him Bernie? Or Sanders? or…) You won't often see me so dressed up and polished. But this is important! There are politicians who have principles and character, who really believe in doing what's right. I think you have to respect them whether you agree with their views or not, and I think Senator Bernie Sanders is one of them. Never has my belief been so validated as when I saw him start to speak, loudly, CLEARLY, publicly about the risk of human extinction from AI. It's the latest in a long line of “well, I'm clearly living in a simulation” moments. In retrospect, it's not surprising that Sanders would take a stance here. You don't have to be an expert to understand the risk from AI. You just need to care enough to spend the time looking into it, and to speak out even [...] --- First published: April 29th, 2026 Source: https://www.lesswrong.com/posts/zWfaSnxM3n5wsX9vh/on-today-s-panel-with-bernie-sanders --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

(Fragments from a research paper that will never be written) Extended Abstract. The frontier AI developers are becoming increasingly powerful and wealthy, significantly increasing their potential for risks. One concern is that of executive misalignment: when the CEO has different incentives and goals than that of the board of directors, or of humanity as a whole. Our work proposes three different threat models, under which executive misalignment can lead to concrete harm. We perform two evaluations to understand the capabilities and propensities of current humans in relation to executive misalignment: First, we developed a variant of the standard SAD dataset, SAD-Executive Reasoning (SAD-ER), in order to assess the situational awareness of human CEOs on a range of behavioral tests. We find that n=6 current CEOs can (i) recognize their previous public statements, (ii) understand their roles and responsibilities, (iii) determine if an interviewer is friendly or hostile, and (iv) follow instructions that depend on self knowledge. Second, we stress-tested the same 6 leading AI developers in hypothetical corporate environments to identify potentially risky behaviors before they cause real harm. We find that, even without explicit instructions, all 6 developers are willing to engage in strategic behavior (such as [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/FuauQjjbTCS5QFLk8/not-a-paper-frontier-lab-ceos-are-capable-of-in-context --- Narrated by TYPE III AUDIO.

(This was originally going to be a "quick take" but then it got a bit long. Just FYI.) There's this weird trend I perceive with the personas of LLM assistants over time. It feels like they're getting less "coherent" in a certain sense, even as the models get more capable. When I read samples from older chat-tuned models, it's striking how "mode-collapsed" they feel relative to recent models like Claude Opus 4.6 or GPT-5.4.[1] This is most straightforwardly obvious when it comes to textual style and structure: outputs from older models feel more templated and generic, with less variability in sentence/paragraph length, and have a tendency to feel as though they were written by someone who's "merely going through the motions" of conversation rather than deeply engaging with the material. There are a lot fewer of the sudden pivots you'll often see with recent models, the "wait"s and "a-ha"s and "actually, I want to try something completely different"s.[2] And I think this generalizes beyond mere style: there's a similar quality to the personality I see in the outputs. The older models can display a surprising behavioral range (relative to naive expectations based on default-assistant-basin behavior), but even across that [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: April 28th, 2026 Source: https://www.lesswrong.com/posts/f5DKLsTsRRhbipH4r/llm-assistant-personas-seem-increasingly-incoherent-some --- Narrated by TYPE III AUDIO.

When reading comments, you see is what other people think before reading the comment. As shown in an RCT, that information anchors your opinion, reducing your ability to form your own opinion and making the site's karma rankings less related to the comment's true value. I think the problem is fixable and float some ideas for consideration. The LessWrong interface prioritizes social information You read a comment. What information is presented, and in what order? The order of information: Who wrote the comment (in bold);How much other people like this comment (as shown by the karma indicator);How much other people agree with this comment (as shown by the agreement score);The actual content. This is unwise design for a website which emphasizes truth-seeking. You don't have a chance to read the comment and form your own opinion first. However, you can opt in to hiding usernames (until moused over) via your account settings page. A 2013 RCT supports the upvote-anchoring concern From Social Influence Bias: A Randomized Experiment (Muchnik et al., 2013):[1] We therefore designed and analyzed a large-scale randomized experiment on a social news aggregation Web site to investigate whether knowledge of such aggregates [...] ---Outline:(00:30) The LessWrong interface prioritizes social information[... 6 more sections]--- First published: April 27th, 2026 Source: https://www.lesswrong.com/posts/YSsp9x8qrBucLoiWT/lesswrong-shows-you-social-signals-before-the-comment --- Narrated by TYPE III AUDIO. ---Images from the article:

In October, I wrote a post arguing that donating to Alex Bores's campaign for Congress was among the most cost-effective opportunities that I'd ever encountered. (A bit of context: Bores is a state legislator in New York who championed the RAISE Act, which was signed into law last December.[1] He's now running for Congress in New York's 12th Congressional district, which runs from about 17th Street to 100th Street in Manhattan. If elected to Congress, I think he'd be a strong champion for AI safety legislation, with a focus on catastrophic and existential risk.) It's been six months since then, and the election is just two months away (June 23rd), so I thought I'd revisit that post and give an update on my view of how things are going. How is Alex Bores doing? When I wrote my post, I expected Bores to talk little about AI during the campaign, just because it wasn't a high-salience issue to voters. But that changed in November, when Leading the Future (the AI accelerationist super PAC) declared Bores their #1 target. Since then, they've spend about $2.5 million on attack ads against him. LTF's theory of change isn't actually to [...] ---Outline:(00:54) How is Alex Bores doing?(04:02) How to help(06:02) A quick note about other opportunities The original text contained 9 footnotes which were omitted from this narration. --- First published: April 27th, 2026 Source: https://www.lesswrong.com/posts/pjSKdcBjfvjGexr6A/update-on-the-alex-bores-campaign --- Narrated by TYPE III AUDIO.

In criminal law, the prosecution and the defense each try to establish a timeline — what happened, where, when, who was involved — and thereby determine whether the defendant is actually guilty of a crime.[1] Community misconduct disputes are nothing like this. There is only rarely disagreement over facts, and even when there is, it is not the crux of the matter. Community disputes are not for litigating facts. What they are for[2] is litigating three things: The character of the accusedThe character of the accuserThe importance of the accusation, in light of points 1 & 2 I think basically all the terrible things that happen in community disputes are a result of this. When what's being ruled on is a person — their place in their community, their continued access to resources, their worth as a human being — the situation feels all-or-nothing, and often escalates out of control. This dynamic: discourages people from speaking out about their experiences, both because they may be reluctant to ‘ruin the person's life' over something non-catastrophic, and because they know that they will be opening themselves up to a punishing level of scrutiny and criticism, and may [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 22nd, 2026 Source: https://www.lesswrong.com/posts/cekDpXqjugt5Q3JnC/community-misconduct-disputes-are-not-about-facts --- Narrated by TYPE III AUDIO.

Around 10 years ago, a paper came out that arguably killed classical deep learning theory: Zhang et al. 's aptly titled Understanding deep learning requires rethinking generalization. Of course, this is a bit of an exaggeration. No single paper ever kills a field of research on its own, and deep learning theory was not exactly the most productive and healthy field at the time this was published. But if I had to point to a single paper that shattered the feeling of optimism at the time, it would be Zhang et al. 2016.[1] Caption: believe it or not, this unassuming table rocked the field of deep learning theory back in 2016, despite probably involving fewer computational resources than what Claude 4.7 Opus consumed when I clicked the “Claude” button embedded into the LessWrong editor. — Let's start by answering a question: what, exactly, do I mean by deep learning theory? At least in 2016, the answer was: “extending statistical learning theory to deep neural networks trained with SGD, in order to derive generalization bounds that would explain their behavior in practice”. — Since its conception in the mid 1980s, statistical learning theory had been the dominant approach for [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 25th, 2026 Source: https://www.lesswrong.com/posts/ZvQfcLbcNHYqmvWyo/the-paper-that-killed-deep-learning-theory --- Narrated by TYPE III AUDIO. ---Images from the article:

Summary EA and rationalists got enamoured with forecasting and prediction markets and made them part of the culture, but this hasn't proven very useful, yet it continues to receive substantial EA funding. We should cut it off. My Experience with Forecasting For a while, I was the number one forecaster on Manifold. This lasted for about a year until I stopped just over 2 years ago. To this day, despite quitting, I'm still #8 on the platform. Additionally, I have done well on real-money prediction markets (Polymarket), earning mid-5 figures and winning a few AI bets. I say this to suggest that I would gain status from forecasting being seen as useful, but I think, to the contrary, that the EA community should stop funding it. I've written a few comments throughout the years that I didn't think forecasting was worth funding. You can see some of these here and here. Finally, I have gotten around to making this full post. Solution Seeking a Problem When talking about forecasting, people often ask questions like “How can we leverage forecasting into better decisions?” This is the wrong way to go about solving problems. You solve problems by starting with [...] --- First published: April 25th, 2026 Source: https://www.lesswrong.com/posts/WCutvyr9rr3cpF6hx/forecasting-is-way-overrated-and-we-should-stop-funding-it --- Narrated by TYPE III AUDIO.

When I write about things like storing food or medication in case of disaster, one common response I get is that it doesn't matter: society will break down, and people who are stronger than you will take your stuff. This seemed plausible at first, but it's actually way off. Looking at past disasters, people mostly fall somewhere on a "kind and supportive" to "keep to themselves" spectrum. When there is looting it's typically directed at stores, not homes, and violence is mostly in the streets. Having supplies at home lets you stay out of the way. One distinction it's worth making is between short (hurricane, earthquake) and long (siege, economic collapse, famine) disasters. Having what you need at home is really helpful in both cases, but differently so. In short disasters (1917 Halifax explosion, London Blitz, 1985 Mexico City earthquake, and the 2011 Japanese earthquake and tsunami) you typically see sharing and mutual aid. Stored supplies mean you're not competing for scarce resources, have slack to help others, and make you more comfortable. Stories of looting in situations like this are often exaggerated or cherry-picked. I had heard post-Katrina New Orleans had [...] --- First published: April 23rd, 2026 Source: https://www.lesswrong.com/posts/cNnRmwzQgz4bmd5i9/your-supplies-probably-won-t-be-stolen-in-a-disaster --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

I am a busy man and will die knowing I have not said all I wanted to say. But maybe I can at least leave some IOUs behind. 1) Blatant conflicts are the best kind Ben Hoffman's "Blatant Lies are the Best Kind!" is maybe the best post title followed by the least clarifying post I have ever encountered. The title is honestly amazing, but the text of the post, instead of a straightforward argument that the title promises, is an extremely dense and almost meta-fictional dialogue about the title: I think we probably should prosecute good lying more than bad lying, though of course that's tricky. I'd argue the same is true for other forms of conflict: passive aggression is worse than overt aggression, maybe, probably. I haven't written the post yet to figure it out, but it seems important to know. 2) Fire codes are the root of all evil Fire accidents seem to have the unique combination of producing extremely strong emotional responses by people in a local community, while also often being traceable to an o-ring like failure that you can over-index on. Also, fire marshals are the closest [...] ---Outline:(00:18) 1) Blatant conflicts are the best kind(01:11) 2) Fire codes are the root of all evil(02:14) 3) It is extremely easy to get people to vouch for you, this makes public character references not very helpful(02:54) 4) Public criticism need not pass the ITT of the people critiqued(03:41) 5) Courts are amazing[... 5 more sections]--- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/MqgwHJ93pJpaeHXs6/10-posts-i-don-t-have-time-to-write --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

ControlAI's mission is to avert the extinction risks posed by superintelligent AI. We believe that in order to do this, we must secure an international prohibition on its development. We're working to make this happen through what we believe is the most natural and promising approach: helping decision-makers in governments and the public understand the risks and take action. We believe that ControlAI can achieve an international prohibition on ASI development if scaled sufficiently. We estimate that it would take approximately a $50 million yearly budget in funding to give us a concrete chance at achieving this in the next few years. To be more precise: conditional on receiving this funding in the next few months, we feel we would have ~10% probability of success. In this post, we lay out some of the reasoning behind this estimate, and explain how additional funding past that threshold would continue to significantly improve our chances of success, with $500 million a year producing an estimated ~30% probability of success. [1] Preventing ASI 101 Negotiating, implementing and enforcing an international prohibition on ASI is, in and of itself, not the work of a single non-profit. You [...] ---Outline:(01:17) Preventing ASI 101(05:44) Awareness is the bottleneck(09:38) An asymmetric war(12:08) Scalable processes(17:32) What wed do with $50 million or more per year(18:45) US policy advocacy(21:22) Policy advocacy in the rest of the world(23:37) Public awareness(31:15) Grassroots mobilization(32:31) Policy work(33:59) Thought-leader advocacy(36:05) Attracting and retaining the best talent(37:18) Conclusion The original text contained 28 footnotes which were omitted from this narration. --- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/TnAR5Sf5hphfnzNTr/usd50-million-a-year-for-a-10-chance-to-ban-asi-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Micheal Vassar's strategy for saving the world is horrifyingly counterproductive. Olivia's is worse. A note before we start: A lot of the sources cited are people who ended up looking kinda insane. This is not a coincidence, it's apparently an explicit strategy: Apply plausibly-deniable psychological pressure to anyone who might speak up until they crack and discredit themselves by sounding crazy or taking extreme and destructive actions. Here's Brent Dill explaining it: (later in the conversation he tries to encourage the person he's talking to kill herself, and threatens her death if she posts the logs. Charming group! I hear Brent was living in Vassar's garden recently, well after he was removed from the wider community for sexual abuse.) Examples Some of the people here I knew before their interactions with Vassar's sphere to be not just mentally OK, but unusually resilient people. Prime among them is Kathy Forth. Prior to her suicide, Kathy and I were friends. I witnessed her falls downwards from healthy and capable to anxiety to paranoia, as downstream of what I believe to be genuine sexual abuse she spiralled into a narrative and way of experiencing the world where almost everyone seemed [...] The original text contained 7 footnotes which were omitted from this narration. --- First published: April 21st, 2026 Source: https://www.lesswrong.com/posts/cY7J7KSSqrhB8t3hQ/evil-is-bad-actually-vassar-and-olivia-schaefer-callout-post --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

I use AI assistance for basically all of my work, for many hours, every day. My colleagues do the same. Recent surveys suggest >50% of Americans have used AI to help with their work in the last week. My architect recently started sending me emails that were clearly ChatGPT generated.[1] Despite that, I know surprisingly little about how other people use AI assitance. Or at least how people who aren't weird AI-influencers sharing their marketing courses on Twitter or LinkedIn use AI. So here is a list of 10 concrete times I have used AI in some at least mildly creative ways, and how that went. 1) Transcribe and summarize every conversation spoken in our team office Using an internal Lightcone application called "Omnilog" we have a microphone in our office that records all of our meetings, transcribes them via ElevenLabs, and uses Pyannote.ai for speaker identification. This was a bunch of work and is quite valuable, but probably a bit too annoying for most readers of this post to set up. However, the thing I am successfully using Claude Code to do is take that transcript (which often has substantial transcription and speaker-identification errors), clean it up, summarize [...] ---Outline:(00:50) 1) Transcribe and summarize every conversation spoken in our team office(01:56) 2) Try to automatically fix any simple bugs that anyone on the team has mentioned out loud, or complained about in Slack(03:13) 3) Design 20+ different design variations for nowinners.ai(04:09) 4) Review my LessWrong essays for factual accuracy and argue with me about their central thesis(05:08) 5) Remove unnecessary clauses, sentences, parentheticals and random cruft from my LessWrong posts before publishing(06:23) 6) Pair vibe-coding(08:14) 7) Mass-creating 100+ variations of Suno songs using Claude Cowork desktop control[... 3 more sections]--- First published: April 20th, 2026 Source: https://www.lesswrong.com/posts/bxdwSZYxKmPBres6w/10-non-boring-ways-i-ve-used-ai-in-the-last-month --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

I have now had a few years of experience doing architectural and interior design for many spaces that people seem to really love (most widely known Lighthaven, but before that we also had the Lightcone Offices, though I've also played a hand in designing some of the most popular areas at Constellation a few years back). Most people (including me a few years back) have surprisingly bad introspective access into why a room makes them feel certain things. Most of the time, people's ability to describe the effect of a space on them is as shallow as "this place feels artificial", or "this place has bad vibes", or "this place feels cozy". And if they try to figure out why that is true, they quickly run into limits of their introspective access. The most common reason why a space feels bad, is because it is lit by low-quality lights. Our eyes evolved to see things illuminated by sunlight. Correspondingly, it appears that the best proxy we have for whether the light in a room "works" is how similar the light in that room is to natural sunlight. The most popular way of measuring how much light differs from [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/dWib7qinqymfxevE4/feel-like-a-room-has-bad-vibes-the-lighting-is-probably-too --- Narrated by TYPE III AUDIO.

Or, the end of the world is no excuse for sloppy work One morning when I was nine, my dad called me over to his computer. He wanted to show me this amazing Korean scientist who had managed to clone stem cells, and who was developing treatments to let people with spinal cord injuries – people like my dad – walk again on their own two legs. I don't remember exactly what he said next, or what I said back. I have a sense that I was excited too, and that I was upset when I learned the United States had banned this kind of research. Unfortunately, his research didn't pan out. No such treatment arrived. My dad still walks on crutches. Years later, I learned that the scientist, Hwang Woo-Suk, had been exposed as a fraud. In 2004, Hwang published a paper in Science claiming that his team had cloned a human embryo and derived stem cells from it (the first time anyone had done this). A year later, in 2005, he published a second paper claiming that they managed to repeat this feat eleven more times, producing 11 patient-specific stem cell lines for patients with type 1 [...] --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/GNjDC6jtjr2iiE45i/quality-matters-most-when-stakes-are-highest --- Narrated by TYPE III AUDIO.

It's been about four years since Eliezer Yudkowsky published AGI Ruin: A List of Lethalities, a 43-point list of reasons the default outcome from building AGI is everyone dying. A week later, Paul Christiano replied with Where I Agree and Disagree with Eliezer, signing on to about half the list and pushing back on most of the rest. For people who were young and not in the bay area, like me, these essays were probably more significant than old timers would expect. Before it became completely consumed with AI discussions, LessWrong was a forum about the art of human rationality, and most internet rationalists I knew thought of it as a mix between that and a place to write for people who liked the sequences. It wasn't until 2022 that we were exposed to all of the doom arguments in one place, and it was the first time in many years that Eliezer had publicly announced how much more dire his assessment was since the Sequences. As far as I can tell AGI Ruin still remains his most authoritative explanation of his views. It's not often that public intellectuals will literally hand you a document explaining why [...] ---Outline:(02:51) AGI Ruin(02:54) Section A (Setting up the problem)(12:18) Section B.1 (Distributional Shift)(22:16) Section B.2: Central difficulties of outer and inner alignment.(32:21) Section B.3: Central difficulties of sufficiently good and useful transparency / interpretability.(41:29) Section C (What is AI Safety currently doing?)(44:34) Overall Impressions The original text contained 4 footnotes which were omitted from this narration. --- First published: April 19th, 2026 Source: https://www.lesswrong.com/posts/PgJYwnN7fZKipgMz4/reevaluating-agi-ruin-in-2026 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

[Author's note: this post is the narrative version that explains my journey with OCD and how I treated it. The short version provides quick, actionable advice for treating OCD.] The following is the most painful experience I've ever had. Four years ago in the parking lot of my rock climbing gym… …my heart was pumping out of my chest, I was sweating profusely, and an overwhelming sense of panic and impending doom had a vice grip on my soul. A painful death was surely imminent. I felt like I was defusing a bomb that was on the verge of exploding. In reality, I was standing outside of my car after locking it with only one *beep* of my key fob, instead of my normal 5-6 *beeps* I usually do. The reason I undertook this (basically suicidal) task was because it was getting annoying how many more *beeps* it was taking for my car to feel locked. It used to be only 2-3 *beeps* a few years ago. Now it was 5-6. In a few more years, it might take as many as 10-20 *beeps*. One time on a hike with friends, a sense of panic overcame me. [...] ---Outline:(00:24) The following is the most painful experience Ive ever had.(07:34) OCD(12:28) Deconstructing OCD into its two parts(12:49) (1) Severe Anxiety(15:53) (2) Disordered Thoughts(18:06) My dating life(23:34) So what caused me to finally get help with my OCD?(26:22) Solutions(28:39) Panic Meditation(41:31) Three mini-examples of my improvement(41:35) A) Panic at the grocery store (and no, sadly, not Panic! At The Disco)(42:30) B) Moral OCD at the gym(43:49) C) Disordered thoughts while on a date(50:35) Where Im at today(58:41) Further resources --- First published: April 18th, 2026 Source: https://www.lesswrong.com/posts/fgDqnwQj3AP9mKRRG/having-ocd-is-like-living-in-north-korea-here-s-how-i --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Epistemic status: Completely schizo galaxy-brained theory Lightcone[1] operates on a "generalist" philosophy. Most of our full-time staff have the title "generalist", and in any given year they work on a wide variety of tasks — from software development on the LessWrong codebase to fixing an overflowing toilet at Lighthaven, our 30,000 sq. ft. campus. One of our core rules is that you should not delegate a task you don't know how to perform yourself. This is a very intense rule and has lots of implications about how we operate, so I've spent a lot of time watching people learn things they didn't previously know how to do. My overall observation (and why we have the rule) is that smart people can learn almost anything. Across a wide range of tasks, most of the variance in performance is explained by general intelligence (foremost) and conscientiousness (secondmost), not expertise. Of course, if you compare yourself to someone who's done a task thousands of times you'll lag behind for a while — but people plateau surprisingly quickly. Having worked with experts across many industries, and having dabbled in the literature around skill transfer and training, there seems to be little difference [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: April 18th, 2026 Source: https://www.lesswrong.com/posts/KRLGxCaqdgrotyB8z/there-are-only-four-skills-design-technical-management-and --- Narrated by TYPE III AUDIO.