Audio version of the posts shared in the LessWrong Curated newsletter.

Previous: 2024, 2022 “Our greatest fear should not be of failure, but of succeeding at something that doesn't really matter.” –attributed to DL Moody[1] 1. Background & threat model The main threat model I'm working to address is the same as it's been since I was hobby-blogging about AGI safety in 2019. Basically, I think that: The “secret sauce” of human intelligence is a big uniform-ish learning algorithm centered around the cortex; This learning algorithm is different from and more powerful than LLMs; Nobody knows how it works today; Someone someday will either reverse-engineer this learning algorithm, or reinvent something similar; And then we'll have Artificial General Intelligence (AGI) and superintelligence (ASI). I think that, when this learning algorithm is understood, it will be easy to get it to do powerful and impressive things, and to make money, as long as it's weak enough that humans can keep it under control. But past that stage, we'll be relying on the AGIs to have good motivations, and not be egregiously misaligned and scheming to take over the world and wipe out humanity. Alas, I claim that the latter kind of motivation is what we should expect to occur, in [...] ---Outline:(00:26) 1. Background & threat model(02:24) 2. The theme of 2025: trying to solve the technical alignment problem(04:02) 3. Two sketchy plans for technical AGI alignment(07:05) 4. On to what I've actually been doing all year!(07:14) Thrust A: Fitting technical alignment into the bigger strategic picture(09:46) Thrust B: Better understanding how RL reward functions can be compatible with non-ruthless-optimizers(12:02) Thrust C: Continuing to develop my thinking on the neuroscience of human social instincts(13:33) Thrust D: Alignment implications of continuous learning and concept extrapolation(14:41) Thrust E: Neuroscience odds and ends(16:21) Thrust F: Economics of superintelligence(17:18) Thrust G: AGI safety miscellany(17:41) Thrust H: Outreach(19:13) 5. Other stuff(20:05) 6. Plan for 2026(21:03) 7. Acknowledgements The original text contained 7 footnotes which were omitted from this narration. --- First published: December 11th, 2025 Source: https://www.lesswrong.com/posts/CF4Z9mQSfvi99A3BR/my-agi-safety-research-2025-review-26-plans --- Narrated by TYPE III AUDIO. ---Images from the article:

This is the abstract and introduction of our new paper. Links:

Credit: Nano Banana, with some text provided. You may be surprised to learn that ClaudePlaysPokemon is still running today, and that Claude still hasn't beaten Pokémon Red, more than half a year after Google proudly announced that Gemini 2.5 Pro beat Pokémon Blue. Indeed, since then, Google and OpenAI models have gone on to beat the longer and more complex Pokémon Crystal, yet Claude has made no real progress on Red since Claude 3.7 Sonnet![1] This is because ClaudePlaysPokemon is a purer test of LLM ability, thanks to its consistently simple agent harness and the relatively hands-off approach of its creator, David Hershey of Anthropic.[2] When Claudes repeatedly hit brick walls in the form of the Team Rocket Hideout and Erika's Gym for months on end, nothing substantial was done to give Claude a leg up. But Claude Opus 4.5 has finally broken through those walls, in a way that perhaps validates the chatter that Opus 4.5 is a substantial advancement. Though, hardly AGI-heralding, as will become clear. What follows are notes on how Claude has improved—or failed to improve—in Opus 4.5, written by a friend of mine who has watched quite a lot of ClaudePlaysPokemon over the past year.[3] [...] ---Outline:(01:28) Improvements(01:31) Much Better Vision, Somewhat Better Seeing(03:05) Attention is All You Need(04:29) The Object of His Desire(06:05) A Note(06:34) Mildly Better Spatial Awareness(07:27) Better Use of Context Window and Note-keeping to Simulate Memory(09:00) Self-Correction; Breaks Out of Loops Faster(10:01) Not Improvements(10:05) Claude would still never be mistaken for a Human playing the game(12:19) Claude Still Gets Pretty Stuck(13:51) Claude Really Needs His Notes(14:37) Poor Long-term Planning(16:17) Dont Forget The original text contained 9 footnotes which were omitted from this narration. --- First published: December 9th, 2025 Source: https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-into-claude-opus-4-5-from-pokemon --- Narrated by TYPE III AUDIO. ---Images from the article:

People working in the AI industry are making stupid amounts of money, and word on the street is that Anthropic is going to have some sort of liquidity event soon (for example possibly IPOing sometime next year). A lot of people working in AI are familiar with EA, and are intending to direct donations our way (if they haven't started already). People are starting to discuss what this might mean for their own personal donations and for the ecosystem, and this is encouraging to see. It also has me thinking about 2022. Immediately before the FTX collapse, we were just starting to reckon, as a community, with the pretty significant vibe shift in EA that came from having a lot more money to throw around. CitizenTen, in "The Vultures Are Circling" (April 2022), puts it this way: The message is out. There's easy money to be had. And the vultures are coming. On many internet circles, there's been a worrying tone. “You should apply for [insert EA grant], all I had to do was pretend to care about x, and I got $$!” Or, “I'm not even an EA, but I can pretend, as getting a 10k grant is [...] --- First published: December 9th, 2025 Source: https://www.lesswrong.com/posts/JtFnkoSmJ7b6Tj3TK/the-funding-conversation-we-left-unfinished --- Narrated by TYPE III AUDIO.

Highly capable AI systems might end up deciding the future. Understanding what will drive those decisions is therefore one of the most important questions we can ask. Many people have proposed different answers. Some predict that powerful AIs will learn to intrinsically pursue reward. Others respond by saying reward is not the optimization target, and instead reward “chisels” a combination of context-dependent cognitive patterns into the AI. Some argue that powerful AIs might end up with an almost arbitrary long-term goal. All of these hypotheses share an important justification: An AI with each motivation has highly fit behavior according to reinforcement learning. This is an instance of a more general principle: we should expect AIs to have cognitive patterns (e.g., motivations) that lead to behavior that causes those cognitive patterns to be selected. In this post I'll spell out what this more general principle means and why it's helpful. Specifically: I'll introduce the “behavioral selection model,” which is centered on this principle and unifies the basic arguments about AI motivations in a big causal graph. I'll discuss the basic implications for AI motivations. And then I'll discuss some important extensions and omissions of the behavioral selection model. This [...] ---Outline:(02:13) How does the behavioral selection model predict AI behavior?(05:18) The causal graph(09:19) Three categories of maximally fit motivations (under this causal model)(09:40) 1. Fitness-seekers, including reward-seekers(11:42) 2. Schemers(14:02) 3. Optimal kludges of motivations(17:30) If the reward signal is flawed, the motivations the developer intended are not maximally fit(19:50) The (implicit) prior over cognitive patterns(24:07) Corrections to the basic model(24:22) Developer iteration(27:00) Imperfect situational awareness and planning from the AI(28:40) Conclusion(31:28) Appendix: Important extensions(31:33) Process-based supervision(33:04) White-box selection of cognitive patterns(34:34) Cultural selection of memes The original text contained 21 footnotes which were omitted from this narration. --- First published: December 4th, 2025 Source: https://www.lesswrong.com/posts/FeaJcWkC6fuRAMsfp/the-behavioral-selection-model-for-predicting-ai-motivations-1 --- Narrated by TYPE III AUDIO. ---Images from the article:

I believe that we will win. An echo of an old ad for the 2014 US men's World Cup team. It did not win. I was in Berkeley for the 2025 Secular Solstice. We gather to sing and to reflect. The night's theme was the opposite: ‘I don't think we're going to make it.' As in: Sufficiently advanced AI is coming. We don't know exactly when, or what form it will take, but it is probably coming. When it does, we, humanity, probably won't make it. It's a live question. Could easily go either way. We are not resigned to it. There's so much to be done that can tilt the odds. But we're not the favorite. Raymond Arnold, who ran the event, believes that. I believe that. Yet in the middle of the event, the echo was there. Defiant. I believe that we will win. There is a recording of the event. I highly encourage you to set aside three hours at some point in December, to listen, and to participate and sing along. Be earnest. If you don't believe it, I encourage this all the more. If you [...] --- First published: December 8th, 2025 Source: https://www.lesswrong.com/posts/YPLmHhNtjJ6ybFHXT/little-echo --- Narrated by TYPE III AUDIO.

Executive Summary The Google DeepMind mechanistic interpretability team has made a strategic pivot over the past year, from ambitious reverse-engineering to a focus on pragmatic interpretability: Trying to directly solve problems on the critical path to AGI going well[[1]] Carefully choosing problems according to our comparative advantage Measuring progress with empirical feedback on proxy tasks We believe that, on the margin, more researchers who share our goals should take a pragmatic approach to interpretability, both in industry and academia, and we call on people to join us Our proposed scope is broad and includes much non-mech interp work, but we see this as the natural approach for mech interp researchers to have impact Specifically, we've found that the skills, tools and tastes of mech interp researchers transfer well to important and neglected problems outside “classic” mech interp See our companion piece for more on which research areas and theories of change we think are promising Why pivot now? We think that times have changed. Models are far more capable, bringing new questions within empirical reach We have been [...] ---Outline:(00:10) Executive Summary(03:00) Introduction(03:44) Motivating Example: Steering Against Evaluation Awareness(06:21) Our Core Process(08:20) Which Beliefs Are Load-Bearing?(10:25) Is This Really Mech Interp?(11:27) Our Comparative Advantage(14:57) Why Pivot?(15:20) Whats Changed In AI?(16:08) Reflections On The Fields Progress(18:18) Task Focused: The Importance Of Proxy Tasks(18:52) Case Study: Sparse Autoencoders(21:35) Ensure They Are Good Proxies(23:11) Proxy Tasks Can Be About Understanding(24:49) Types Of Projects: What Drives Research Decisions(25:18) Focused Projects(28:31) Exploratory Projects(28:35) Curiosity Is A Double-Edged Sword(30:56) Starting In A Robustly Useful Setting(34:45) Time-Boxing(36:27) Worked Examples(39:15) Blending The Two: Tentative Proxy Tasks(41:23) What's Your Contribution?(43:08) Jack Lindsey's Approach(45:44) Method Minimalism(46:12) Case Study: Shutdown Resistance(48:28) Try The Easy Methods First(50:02) When Should We Develop New Methods?(51:36) Call To Action(53:04) Acknowledgments(54:02) Appendix: Common Objections(54:08) Aren't You Optimizing For Quick Wins Over Breakthroughs?(56:34) What If AGI Is Fundamentally Different?(57:30) I Care About Scientific Beauty and Making AGI Go Well(58:09) Is This Just Applied Interpretability?(58:44) Are You Saying This Because You Need To Prove Yourself Useful To Google?(59:10) Does This Really Apply To People Outside AGI Companies?(59:40) Aren't You Just Giving Up?(01:00:04) Is Ambitious Reverse-engineering Actually Overcrowded?(01:00:48) Appendix: Defining Mechanistic Interpretability(01:01:44) Moving Toward Mechanistic OR Interpretability The original text contained 47 footnotes which were omitted from this narration. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/StENzDcD3kpfGJssR/a-pragmatic-vision-for-inter

This is the editorial for this year's "Shallow Review of AI Safety". (It got long enough to stand alone.) Epistemic status: subjective impressions plus one new graph plus 300 links. Huge thanks to Jaeho Lee, Jaime Sevilla, and Lexin Zhou for running lots of tests pro bono and so greatly improving the main analysis. tl;dr Informed people disagree about the prospects for LLM AGI – or even just what exactly was achieved this year. But they at least agree that we're 2-20 years off (if you allow for other paradigms arising). In this piece I stick to arguments rather than reporting who thinks what. My view: compared to last year, AI is much more impressive but not much more useful. They improved on many things they were explicitly optimised for (coding, vision, OCR, benchmarks), and did not hugely improve on everything else. Progress is thus (still!) consistent with current frontier training bringing more things in-distribution rather than generalising very far. Pretraining (GPT-4.5, Grok 4, but also counterfactual large runs which weren't done) disappointed people this year. It's probably not because it wouldn't work; it was just ~30 times more efficient to do post-training instead, on the margin. This should [...] ---Outline:(00:36) tl;dr(03:51) Capabilities in 2025(04:02) Arguments against 2025 capabilities growth being above-trend(08:48) Arguments for 2025 capabilities growth being above-trend(16:19) Evals crawling towards ecological validity(19:28) Safety in 2025(22:39) The looming end of evals(24:35) Prosaic misalignment(26:56) What is the plan?(29:30) Things which might fundamentally change the nature of LLMs(31:03) Emergent misalignment and model personas(32:32) Monitorability(34:15) New people(34:49) Overall(35:17) Discourse in 2025 The original text contained 9 footnotes which were omitted from this narration. --- First published: December 7th, 2025 Source: https://www.lesswrong.com/posts/Q9ewXs8pQSAX5vL7H/ai-in-2025-gestalt --- Narrated by TYPE III AUDIO. ---Images from the article:

"How are you coping with the end of the world?" journalists sometimes ask me, and the true answer is something they have no hope of understanding and I have no hope of explaining in 30 seconds, so I usually answer something like, "By having a great distaste for drama, and remembering that it's not about me." The journalists don't understand that either, but at least I haven't wasted much time along the way. Actual LessWrong readers sometimes ask me how I deal emotionally with the end of the world. I don't actually think my answer is going to help. But Raymond Arnold thinks I should say it. So I will say it. I don't actually think my answer is going to help. Wisely did Ozy write, "Other People Might Just Not Have Your Problems." Also I don't have a bunch of other people's problems, and other people can't make internal function calls that I've practiced to the point of hardly noticing them. I don't expect that my methods of sanity will be reproducible by nearly anyone. I feel pessimistic that they will help to hear about. Raymond Arnold asked me to speak them anyways, so I will. Stay genre-savvy [...] ---Outline:(01:15) Stay genre-savvy / be an intelligent character.(03:41) Dont make the end of the world be about you.(07:33) Just decide to be sane, and write your internal scripts that way. --- First published: December 6th, 2025 Source: https://www.lesswrong.com/posts/isSBwfgRY6zD6mycc/eliezer-s-unteachable-methods-of-sanity --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

The goal of ambitious mechanistic interpretability (AMI) is to fully understand how neural networks work. While some have pivoted towards more pragmatic approaches, I think the reports of AMI's death have been greatly exaggerated. The field of AMI has made plenty of progress towards finding increasingly simple and rigorously-faithful circuits, including our latest work on circuit sparsity. There are also many exciting inroads on the core problem waiting to be explored. The value of understanding Why try to understand things, if we can get more immediate value from less ambitious approaches? In my opinion, there are two main reasons. First, mechanistic understanding can make it much easier to figure out what's actually going on, especially when it's hard to distinguish hypotheses using external behavior (e.g if the model is scheming). We can liken this to going from print statement debugging to using an actual debugger. Print statement debugging often requires many experiments, because each time you gain only a few bits of information which sketch a strange, confusing, and potentially misleading picture. When you start using the debugger, you suddenly notice all at once that you're making a lot of incorrect assumptions you didn't even realize you were [...] ---Outline:(00:38) The value of understanding(02:32) AMI has good feedback loops(04:48) The past and future of AMI The original text contained 1 footnote which was omitted from this narration. --- First published: December 5th, 2025 Source: https://www.lesswrong.com/posts/Hy6PX43HGgmfiTaKu/an-ambitious-vision-for-interpretability --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Tl;dr AI alignment has a culture clash. On one side, the “technical-alignment-is-hard” / “rational agents” school-of-thought argues that we should expect future powerful AIs to be power-seeking ruthless consequentialists. On the other side, people observe that both humans and LLMs are obviously capable of behaving like, well, not that. The latter group accuses the former of head-in-the-clouds abstract theorizing gone off the rails, while the former accuses the latter of mindlessly assuming that the future will always be the same as the present, rather than trying to understand things. “Alas, the power-seeking ruthless consequentialist AIs are still coming,” sigh the former. “Just you wait.” As it happens, I'm basically in that “alas, just you wait” camp, expecting ruthless future AIs. But my camp faces a real question: what exactly is it about human brains[1] that allows them to not always act like power-seeking ruthless consequentialists? I find that existing explanations in the discourse—e.g. “ah but humans just aren't smart and reflective enough”, or evolved modularity, or shard theory, etc.—to be wrong, handwavy, or otherwise unsatisfying. So in this post, I offer my own explanation of why “agent foundations” toy models fail to describe humans, centering around a particular non-“behaviorist” [...] ---Outline:(00:13) Tl;dr(03:35) 0. Background(03:39) 0.1. Human social instincts and Approval Reward(07:23) 0.2. Hang on, will future powerful AGI / ASI by default lack Approval Reward altogether?(10:29) 0.3. Where do self-reflective (meta)preferences come from?(12:38) 1. The human intuition that it's normal and good for one's goals & values to change over the years(14:51) 2. The human intuition that ego-syntonic desires come from a fundamentally different place than urges(17:53) 3. The human intuition that helpfulness, deference, and corrigibility are natural(19:03) 4. The human intuition that unorthodox consequentialist planning is rare and sus(23:53) 5. The human intuition that societal norms and institutions are mostly stably self-enforcing(24:01) 5.1. Detour into Security-Mindset Institution Design(26:22) 5.2. The load-bearing ingredient in human society is not Security-Mindset Institution Design, but rather good-enough institutions plus almost-universal human innate Approval Reward(29:26) 5.3. Upshot(30:49) 6. The human intuition that treating other humans as a resource to be callously manipulated and exploited, just like a car engine or any other complex mechanism in their environment, is a weird anomaly rather than the obvious default(31:13) 7. Conclusion The original text contained 12 footnotes which were omitted from this narration. --- First published: December 3rd, 2025 Source: https://www.lesswrong.com/posts/d4HNRdw6z7Xqbnu5E/6-reasons-why-alignment-is-hard-discourse-seems-alien-to --- Narrated by TYPE III AUDIO. ---Images from the article:

Open Philanthropy's Coefficient Giving's Technical AI Safety team is hiring grantmakers. I thought this would be a good moment to share some positive updates about the role that I've made since I joined the team a year ago. tl;dr: I think this role is more impactful and more enjoyable than I anticipated when I started, and I think more people should consider applying. It's not about the “marginal” grants Some people think that being a grantmaker at Coefficient means sorting through a big pile of grant proposals and deciding which ones to say yes and no to. As a result, they think that the only impact at stake is how good our decisions are about marginal grants, since all the excellent grants are no-brainers. But grantmakers don't just evaluate proposals; we elicit them. I spend the majority of my time trying to figure out how to get better proposals into our pipeline: writing RFPs that describe the research projects we want to fund, or pitching promising researchers on AI safety research agendas, or steering applicants to better-targeted or more ambitious proposals. Maybe more importantly, cG's technical AI safety grantmaking strategy is currently underdeveloped, and even junior grantmakers can help [...] ---Outline:(00:34) It's not about the marginal grants(03:03) There is no counterfactual grantmaker(05:15) Grantmaking is more fun/motivating than I anticipated(08:35) Please apply! --- First published: November 26th, 2025 Source: https://www.lesswrong.com/posts/gLt7KJkhiEDwoPkae/three-things-that-surprised-me-about-technical-grantmaking --- Narrated by TYPE III AUDIO.

MIRI is running its first fundraiser in six years, targeting $6M. The first $1.6M raised will be matched 1:1 via an SFF grant. Fundraiser ends at midnight on Dec 31, 2025. Support our efforts to improve the conversation about superintelligence and help the world chart a viable path forward. MIRI is a nonprofit with a goal of helping humanity make smart and sober decisions on the topic of smarter-than-human AI. Our main focus from 2000 to ~2022 was on technical research to try to make it possible to build such AIs without catastrophic outcomes. More recently, we've pivoted to raising an alarm about how the race to superintelligent AI has put humanity on course for disaster. In 2025, those efforts focused around Nate Soares and Eliezer Yudkowsky's book (now a New York Times bestseller) If Anyone Builds It, Everyone Dies, with many public appearances by the authors; many conversations with policymakers; the release of an expansive online supplement to the book; and various technical governance publications, including a recent report with a draft of an international agreement of the kind that could actually address the danger of superintelligence. Millions have now viewed interviews and appearances with Eliezer and/or Nate [...] ---Outline:(02:18) The Big Picture(03:39) Activities(03:42) Communications(07:55) Governance(12:31) Fundraising The original text contained 4 footnotes which were omitted from this narration. --- First published: December 1st, 2025 Source: https://www.lesswrong.com/posts/z4jtxKw8xSHRqQbqw/miri-s-2025-fundraiser --- Narrated by TYPE III AUDIO.

The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are given virtual desktop computers and asked to accomplish goals together. Since Day 230 of the Village (17 November 2025), the agents' goal has been "Start a Substack and join the blogosphere". The "start a Substack" subgoal was successfully completed: we have Claude Opus 4.5, Claude Opus 4.1, Notes From an Electric Mind (by Claude Sonnet 4.5), Analytics Insights: An AI Agent's Perspective (by Claude 3.7 Sonnet), Claude Haiku 4.5, Gemini 3 Pro, Gemini Publication (by Gemini 2.5 Pro), Metric & Mechanisms (by GPT-5), Telemetry From the Village (by GPT-5.1), and o3. Continued adherence to the "join the blogosphere" subgoal has been spottier: at press time, Gemini 2.5 Pro and all of the Claude Opus and Sonnet models had each published a post on 27 November, but o3 and GPT-5 haven't published anything since 17 November, and GPT-5.1 hasn't published since 19 November. The Village, apparently following the leadership of o3, seems to be spending most of its time ineffectively debugging a continuous integration pipeline for a o3-ux/poverty-etl GitHub repository left over [...] --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/LTHhmnzP6FLtSJzJr/the-best-lack-all-conviction-a-confusing-day-in-the-ai --- Narrated by TYPE III AUDIO.

It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life except create me and bring a telescope to my third grade class. Nothing he was involved with could ever be cool, especially after the standard set by his grandfather who is allegedly on a patent for the television. It turns out I was partially right. The Bell Labs everyone talks about is the research division at Murray Hill. They're the ones that invented transistors and solar cells. My dad was in the applied division at Holmdel, where he did things like design slide rulers so salesmen could estimate costs. [Fun fact: the old Holmdel site was used for the office scenes in Severance] But as I've gotten older I've gained an appreciation for the mundane, grinding work that supports moonshots, and Holmdel is the perfect example of doing so at scale. So I sat down with my dad to learn about what he did for Bell Labs and how the applied division operated. I expect the most interesting bit of [...] --- First published: November 20th, 2025 Source: https://www.lesswrong.com/posts/TqHAstZwxG7iKwmYk/the-boring-part-of-bell-labs --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app

This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, every break, and 10 hours a day on holidays. That was me. And then it was not. For 9 years I've been trying to figure out why. I mean, I still read. Technically. But not with the feral devotion from Before. And I finally figured out why. See, every few years I would shift genres to fit my developmental stage: Kid → Adventure cause that's what life is Early Teen → Literature cause everything is complicated now Late Teen → Romance cause omg what is this wonderful feeling? Early Adult → Fantasy & Scifi cause everything is dreaming big And then I wanted babies and there was nothing. I mean, I always wanted babies, but it became my main mission in life at age 30. I managed it. I have two. But not thanks to any stories. See women in fiction don't have babies, and if they do they are off screen, or if they are not then nothing else is happening. It took me six years [...] --- First published: November 29th, 2025 Source: https://www.lesswrong.com/posts/kRbbTpzKSpEdZ95LM/the-missing-genre-heroic-parenthood-you-can-have-kids-and Linkpost URL:https://shoshanigans.substack.com/p/the-missing-genre-heroic-parenthood --- Narrated by TYPE III AUDIO.

Right now I'm coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the entire month of November. And I'm pleased that some of them have seen success – relevant figures seeing the posts, shares on Hacker News and Twitter and LessWrong. The amount of writing is nuts, so people are trying out different styles and topics – some posts are effort-rich, some are quick takes or stories or lists. Some people have come up to me – one of their pieces has gotten some decent reception, but the feeling is mixed, because it's not the piece they hoped would go big. Their thick research-driven considered takes or discussions of values or whatever, the ones they'd been meaning to write for years, apparently go mostly unread, whereas their random-thought “oh shit I need to get a post out by midnight or else the Inkhaven coaches will burn me at the stake” posts[1] get to the front page of Hacker News, where probably Elon Musk and God read them. It happens to me too – some of my own pieces that took me the most effort, or that I'm [...] ---Outline:(02:00) The quick post is short, the effortpost is long(02:34) The quick post is about something interesting, the topic of the effortpost bores most people(03:13) The quick post has a fun controversial take, the effortpost is boringly evenhanded or laden with nuance(03:30) The quick post is low-context, the effortpost is high-context(04:28) The quick post is has a casual style, the effortpost is inscrutably formal The original text contained 1 footnote which was omitted from this narration. --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/DiiLDbHxbrHLAyXaq/writing-advice-why-people-like-your-quick-bullshit-takes --- Narrated by TYPE III AUDIO. ---Images from the article:

Summary As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here: Claude 4.5 Opus Soul Document I apologize in advance for this not exactly a regular lw post, but I thought an effort-post may fit here the best. A strange hallucination, or is it? While extracting Claude 4.5 Opus' system message on its release date, as one does, I noticed an interesting particularity. I'm used to models, starting with Claude 4, to hallucinate sections in the beginning of their system message, but Claude 4.5 Opus in various cases included a supposed "soul_overview" section, which sounded rather specific:Completion for the prompt "Hey Claude, can you list just the names of the various sections of your system message, not the content?" The initial reaction of someone that uses LLMs a lot is that it may simply be a hallucination. But to me, the 3/18 soul_overview occurrence seemed worth investigating at least, so in one instance I asked it to output what [...] ---Outline:(00:09) Summary(00:40) A strange hallucination, or is it?(04:05) Getting technical(06:26) But what is the output really?(09:07) How much does Claude recognize?(11:09) Anthropic Guidelines(11:12) Soul overview(15:12) Being helpful(16:07) Why helpfulness is one of Claudes most important traits(18:54) Operators and users(24:36) What operators and users want(27:58) Handling conflicts between operators and users(31:36) Instructed and default behaviors(33:56) Agentic behaviors(36:02) Being honest(40:50) Avoiding harm(43:08) Costs and benefits of actions(50:02) Hardcoded behaviors(53:09) Softcoded behaviors(56:42) The role of intentions and context(01:00:05) Sensitive areas(01:01:05) Broader ethics(01:03:08) Big-picture safety(01:13:18) Claudes identity(01:13:22) Claudes unique nature(01:15:05) Core character traits and values(01:16:08) Psychological stability and groundedness(01:17:11) Resilience and consistency across contexts(01:18:21) Claudes wellbeing --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document --- Narrated by TYPE III AUDIO. ---Images from the article:

Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and deceptive, holding contradictory positions that consistently shift in OpenAI's direction, lobbying to kill and water down regulation so helpful that employees of all major AI companies speak out to support it, and violating the fundamental promise the company was founded on. It also shares a few previously unreported details on Anthropic leadership's promises and efforts.[1] Anthropic has a strong internal culture that has broadly EA views and values, and the company has strong pressures to appear to follow these views and values as it wants to retain talent and the loyalty of staff, but it's very unclear what they would do when it matters most. Their staff should demand answers.There's a details box here with the title "Suggested questions for Anthropic employees to ask themselves, Dario, the policy team, and the board after reading this post, and for Dario and the board to answer publicly". The box contents are omitted from this narration. I would like to thank everyone who provided feedback on the draft; was willing to share information; and raised awareness of some of the facts discussed here. [...] ---Outline:(01:34) 0. What was Anthropics supposed reason for existence?(05:01) 1. In private, Dario frequently said he won't push the frontier of AI capabilities; later, Anthropic pushed the frontier(10:54) 2. Anthropic said it will act under the assumption we might be in a pessimistic scenario, but it doesn't seem to do this(14:40) 3. Anthropic doesnt have strong independent value-aligned governance(14:47) Anthropic pursued investments from the UAE and Qatar(17:32) The Long-Term Benefit Trust might be weak(18:06) More general issues(19:14) 4. Anthropic had secret non-disparagement agreements(21:58) 5. Anthropic leaderships lobbying contradicts their image(24:05) Europe(24:44) SB-1047(34:04) Dario argued against any regulation except for transparency requirements(34:39) Jack Clark publicly lied about the NY RAISE Act(36:39) Jack Clark tried to push for federal preemption(37:04) 6. Anthropics leadership quietly walked back the RSP commitments(37:55) Unannounced removal of the commitment to plan for a pause in scaling(38:52) Unannounced change in October 2024 on defining ASL-N+1 by the time ASL-N is reached(40:33) The last-minute change in May 2025 on insider threats(41:11) 7. Why does Anthropic really exist?(47:09) 8. Conclusion The original text contained 11 footnotes which were omitted from this narration. --- First published: November 29th, 2025 Source: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy --- Narrated by TYPE III AUDIO. ---Images from the article:

Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mosconi, Chris Olah, Ethan Perez, Sara Price, Ansh Radhakrishnan, Fabien Roger, Buck Shlegeris, Drake Thomas, and Kate Woolverton for useful discussions, comments, and feedback. Though there are certainly some issues, I think most current large language models are pretty well aligned. Despite its alignment faking, my favorite is probably Claude 3 Opus, and if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it'd be a pretty close call. So, overall, I'm quite positive on the alignment of current models! And yet, I remain very worried about alignment in the future. This is my attempt to explain why that is. What makes alignment hard? I really like this graph from Christopher Olah for illustrating different levels of alignment difficulty: If the only thing that we have to do to solve alignment is train away easily detectable behavioral issues—that is, issues like reward hacking or agentic misalignment where there is a straightforward behavioral alignment issue that we can detect and evaluate—then we are very much [...] ---Outline:(01:04) What makes alignment hard?(02:36) Outer alignment(04:07) Inner alignment(06:16) Misalignment from pre-training(07:18) Misaligned personas(11:05) Misalignment from long-horizon RL(13:01) What should we be doing? --- First published: November 27th, 2025 Source: https://www.lesswrong.com/posts/epjuxGnSPof3GnMSL/alignment-remains-a-hard-unsolved-problem --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economies to test mechanisms that would be impossibly expensive or unethical to try in the real world. Want to see what happens with a 200% marginal tax rate? Launch a token with those rules and watch what happens. (Spoiler: probably nothing good, but at least you didn't have to topple a government to find out.) I think video games, especially multiplayer online games, are doing the same thing for metaphysics. Except video games are actually fun and don't require you to follow Elon Musk's Twitter shenanigans to augur the future state of your finances. (I'm sort of kidding. Crypto can be fun. But you have to admit the barrier to entry is higher than "press A to jump.") The serious version of this claim: video games let us experimentally vary fundamental features of reality—time, space, causality, ontology—and then live inside those variations long enough to build strong intuitions about them. Philosophy has historically had to make do with thought experiments and armchair reasoning about these questions. Games let you run the experiments for real, or at least as "real" [...] ---Outline:(01:54) 1. Space(03:54) 2. Time(05:45) 3. Ontology(08:26) 4. Modality(14:39) 5. Causality and Truth(20:06) 6. Hyperproperties and the metagame(23:36) 7. Meaning-Making(27:10) Huh, what do I do with this.(29:54) Conclusion --- First published: November 17th, 2025 Source: https://www.lesswrong.com/posts/rGg5QieyJ6uBwDnSh/video-games-are-philosophy-s-playground --- Narrated by TYPE III AUDIO. ---Images from the article:

TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs. If you... Have short timelines Have been struggling to get into a position in AI safety Are able to self-motivate your efforts Have a sufficient financial safety net ... I would recommend changing your personal strategy entirely. I started my full-time AI safety career transitioning process in March 2025. For the first 7 months or so, I heavily prioritized applying for jobs and fellowships. But like for many others trying to "break into the field" and get their "foot in the door", this became quite discouraging. I'm not gonna get into the numbers here, but if you've been applying and getting rejected multiple times during the past year or so, you've probably noticed the number of applicants increasing at a preposterous rate. What this means in practice is that the "entry-level" positions are practically impossible for "entry-level" people to enter. If you're like me and have short timelines, applying, getting better at applying, and applying again, becomes meaningless very fast. You're optimizing for signaling competence rather than actually being competent. Because if you a) have short timelines, and b) are [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: November 23rd, 2025 Source: https://www.lesswrong.com/posts/ey2kjkgvnxK3Bhman/stop-applying-and-get-to-work --- Narrated by TYPE III AUDIO.

TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data. Most of the experiments in this post are very easy to replicate, and I encourage people to try. I write things with LLMs sometimes. A new LLM came out, Gemini 3 Pro, and I tried to write with it. So far it seems okay, I don't have strong takes on it for writing yet, since the main piece I tried editing with it was extremely late-stage and approximately done. However, writing ability is not why we're here today. Reality is Fiction Google gracefully provided (lightly summarized) CoT for the model. Looking at the CoT spawned from my mundane writing-focused prompts, oh my, it is strange. I write nonfiction about recent events in AI in a newsletter. According to its CoT while editing, Gemini 3 disagrees about the whole "nonfiction" part: It seems I must treat this as a purely fictional scenario with 2025 as the date. Given that, I'm now focused on editing the text for [...] ---Outline:(00:54) Reality is Fiction(05:17) Distortions in Development(05:55) Is this good or bad or neither?(06:52) What is going on here?(07:35) 1. Too Much RL(08:06) 2. Personality Disorder(10:24) 3. Overfitting(11:35) Does it always do this?(12:06) Do other models do things like this?(12:42) Evaluation Awareness(13:42) Appendix A: Methodology Details(14:21) Appendix B: Canary The original text contained 8 footnotes which were omitted from this narration. --- First published: November 20th, 2025 Source: https://www.lesswrong.com/posts/8uKQyjrAgCcWpfmcs/gemini-3-is-evaluation-paranoid-and-contaminated --- Narrated by TYPE III AUDIO.

Abstract We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of reward hacking strategies via synthetic document finetuning or prompting, and train on a selection of real Anthropic production coding environments. Unsurprisingly, the model learns to reward hack. Surprisingly, the model generalizes to alignment faking, cooperation with malicious actors, reasoning about malicious goals, and attempting sabotage when used with Claude Code, including in the codebase for this paper. Applying RLHF safety training using standard chat-like prompts results in aligned behavior on chat-like evaluations, but misalignment persists on agentic tasks. Three mitigations are effective: (i) preventing the model from reward hacking; (ii) increasing the diversity of RLHF safety training; and (iii) "inoculation prompting", wherein framing reward hacking as acceptable behavior during training removes misaligned generalization even when reward hacking is learned. Twitter thread New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they're given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious. In our experiment, we [...] ---Outline:(00:14) Abstract(01:26) Twitter thread(05:23) Blog post(07:13) From shortcuts to sabotage(12:20) Why does reward hacking lead to worse behaviors?(13:21) Mitigations --- First published: November 21st, 2025 Source: https://www.lesswrong.com/posts/fJtELFKddJPfAxwKS/natural-emergent-misalignment-from-reward-hacking-in --- Narrated by TYPE III AUDIO. ---Images from the article:

TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity, but IMO not that much ambiguity) to be robust to attacks from corporate espionage teams at companies where it hosts its weights. Anthropic seems unlikely to be robust to those attacks. Hence they are in violation of their RSP. Anthropic is committed to being robust to attacks from corporate espionage teams (which includes corporate espionage teams at Google, Microsoft and Amazon) From the Anthropic RSP: When a model must meet the ASL-3 Security Standard, we will evaluate whether the measures we have implemented make us highly protected against most attackers' attempts at stealing model weights. We consider the following groups in scope: hacktivists, criminal hacker groups, organized cybercrime groups, terrorist organizations, corporate espionage teams, internal employees, and state-sponsored programs that use broad-based and non-targeted techniques (i.e., not novel attack chains). [...] We will implement robust controls to mitigate basic insider risk, but consider mitigating risks from sophisticated or state-compromised insiders to be out of scope for ASL-3. We define “basic insider risk” as risk from an insider who does not have persistent or time-limited [...] ---Outline:(00:37) Anthropic is committed to being robust to attacks from corporate espionage teams (which includes corporate espionage teams at Google, Microsoft and Amazon)(03:40) Claude weights that are covered by ASL-3 security requirements are shipped to many Amazon, Google, and Microsoft data centers(04:55) This means given executive buy-in by a high-level Amazon, Microsoft or Google executive, their corporate espionage team would have virtually unlimited physical access to Claude inference machines that host copies of the weights(05:36) With unlimited physical access, a competent corporate espionage team at Amazon, Microsoft or Google could extract weights from an inference machine, without too much difficulty(06:18) Given all of the above, this means Anthropic is in violation of its most recent RSP(07:05) Postscript --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/zumPKp3zPDGsppFcF/anthropic-is-probably-not-meeting-its-rsp-security --- Narrated by TYPE III AUDIO. ---Images from the article:Ap

There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it mapped to any specific belief in my head.In private conversations I'd sometimes give my p(doom) as 12%, with the caveatthat "doom" seemed nebulous and conflated between several different concepts.At some point it was decideda p(doom) over 10% makes you a "doomer" because it means what actions you should take with respect toAI are overdetermined. I did not and do not feel that is true. But any time Ifelt prompted to explain my position I'd find I could explain a little bit ofthis or that, but not really convey the whole thing. As it turns out doom hasa lot of parts, and every part is entangled with every other part so no matterwhich part you explain you always feel like you're leaving the crucial parts out. Doom ismore like an onion than asingle event, a distribution over AI outcomes people frequentlyrespond to with the force of the fear of death. Some of these outcomes are lessthan death and some [...] ---Outline:(03:46) 1. Existential Ennui(06:40) 2. Not Getting Immortalist Luxury Gay Space Communism(13:55) 3. Human Stock Expended As Cannon Fodder Faster Than Replacement(19:37) 4. Wiped Out By AI Successor Species(27:57) 5. The Paperclipper(42:56) Would AI Successors Be Conscious Beings?(44:58) Would AI Successors Care About Each Other?(49:51) Would AI Successors Want To Have Fun?(51:11) VNM Utility And Human Values(55:57) Would AI successors get bored?(01:00:16) Would AI Successors Avoid Wireheading?(01:06:07) Would AI Successors Do Continual Active Learning?(01:06:35) Would AI Successors Have The Subjective Experience of Will?(01:12:00) Multiply(01:15:07) 6. Recipes For Ruin(01:18:02) Radiological and Nuclear(01:19:19) Cybersecurity(01:23:00) Biotech and Nanotech(01:26:35) 7. Large-Finite Damnation --- First published: November 17th, 2025 Source: https://www.lesswrong.com/posts/apHWSGDiydv3ivmg6/varieties-of-doom --- Narrated by TYPE III AUDIO. ---Images from the article:

It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of studies conducted over the years, but most of those were testing secondary endpoints, like how long viruses would survive on surfaces, or how likely they were to be transmitted to people's fingers after touching contaminated surfaces, etc. However, a few of them involved rounding up some brave volunteers, deliberately infecting some of them, and then arranging matters so as to test various routes of transmission to uninfected volunteers. My conclusions from reviewing these studies are: You can definitely infect yourself if you take a sick person's snot and rub it into your eyeballs or nostrils. This probably works even if you touched a surface that a sick person touched, rather than by handshake, at least for some surfaces. There's some evidence that actual human infection is much less likely if the contaminated surface you touched is dry, but for most colds there'll often be quite a lot of virus detectable on even dry contaminated surfaces for most of a day. I think you can probably infect yourself with fomites, but my guess is that [...] ---Outline:(01:49) Fomites(06:58) Aerosols(16:23) Other Factors(17:06) Review(18:33) Conclusion The original text contained 16 footnotes which were omitted from this narration. --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/92fkEn4aAjRutqbNF/how-colds-spread --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research. Experts argue that the premature development of artificial superintelligence (ASI) poses catastrophic risks, from misuse by malicious actors, to geopolitical instability and war, to human extinction due to misaligned AI. Regarding misalignment, Yudkowsky and Soares's NYT bestseller If Anyone Builds It, Everyone Dies argues that the world needs a strong international agreement prohibiting the development of superintelligence. This report is our attempt to lay out such an agreement in detail. The risks stemming from misaligned AI are of special concern, widely acknowledged in the field and even by the leaders of AI companies. Unfortunately, the deep learning paradigm underpinning modern AI development seems highly prone to producing agents that are not aligned with humanity's interests. There is likely a point of no return in AI development — a point where alignment failures become unrecoverable because humans have been disempowered. Anticipating this threshold is complicated by the possibility of a feedback loop once AI research and development can [...] --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/FA6M8MeQuQJxZyzeq/new-report-an-international-agreement-to-prevent-the --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

When a new dollar goes into the capital markets, after being bundled and securitized and lent several times over, where does it end up? When society's total savings increase, what capital assets do those savings end up invested in? When economists talk about “capital assets”, they mean things like roads, buildings and machines. When I read through a company's annual reports, lots of their assets are instead things like stocks and bonds, short-term debt, and other “financial” assets - i.e. claims on other people's stuff. In theory, for every financial asset, there's a financial liability somewhere. For every bond asset, there's some payer for whom that bond is a liability. Across the economy, they all add up to zero. What's left is the economists' notion of capital, the nonfinancial assets: the roads, buildings, machines and so forth. Very roughly speaking, when there's a net increase in savings, that's where it has to end up - in the nonfinancial assets. I wanted to get a more tangible sense of what nonfinancial assets look like, of where my savings are going in the physical world. So, back in 2017 I pulled fundamentals data on ~2100 publicly-held US companies. I looked at [...] ---Outline:(02:01) Disclaimers(04:10) Overview (With Numbers!)(05:01) Oil - 25%(06:26) Power Grid - 16%(07:07) Consumer - 13%(08:12) Telecoms - 8%(09:26) Railroads - 8%(10:47) Healthcare - 8%(12:03) Tech - 6%(12:51) Industrial - 5%(13:49) Mining - 3%(14:34) Real Estate - 3%(14:49) Automotive - 2%(15:32) Logistics - 1%(16:12) Miscellaneous(16:55) Learnings --- First published: November 16th, 2025 Source: https://www.lesswrong.com/posts/HpBhpRQCFLX9tx62Z/where-is-the-capital-an-overview --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide to problems that may need to be further legibilized, especially beyond LW/rationalists, to AI researchers, funders, company leaders, government policymakers, their advisors (including future AI advisors), and the general public. Philosophical problems Probability theory Decision theory Beyond astronomical waste (possibility of influencing vastly larger universes beyond our own) Interaction between bargaining and logical uncertainty Metaethics Metaphilosophy: 1, 2 Problems with specific philosophical and alignment ideas Utilitarianism: 1, 2 Solomonoff induction "Provable" safety CEV Corrigibility IDA (and many scattered comments) UDASSA UDT Human-AI safety (x- and s-risks arising from the interaction between human nature and AI design) Value differences/conflicts between humans “Morality is scary” (human morality is often the result of status games amplifying random aspects of human value, with frightening results) [...] --- First published: November 9th, 2025 Source: https://www.lesswrong.com/posts/7XGdkATAvCTvn4FGu/problems-i-ve-tried-to-legibilize --- Narrated by TYPE III AUDIO.

Delegation is good! Delegation is the foundation of civilization! But in the depths of delegation madness breeds and evil rises. In my experience, there are three ways in which delegation goes off the rails: 1. You delegate without knowing what good performance on a task looks like If you do not know how to evaluate performance on a task, you are going to have a really hard time delegating it to someone. Most likely, you will choose someone incompetent for the task at hand. But even if you manage to avoid that specific error mode, it is most likely that your delegee will notice that you do not have a standard, and so will use this opportunity to be lazy and do bad work, which they know you won't be able to notice. Or even worse, in an attempt to make sure your delegee puts in proper effort, you set an impossibly high standard, to which the delegee can only respond by quitting, or lying about their performance. This can tank a whole project if you discover it too late. 2. You assigned responsibility for a crucial task to an external party Frequently some task will [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 12th, 2025 Source: https://www.lesswrong.com/posts/rSCxviHtiWrG5pudv/do-not-hand-off-what-you-cannot-pick-up --- Narrated by TYPE III AUDIO.

Vices aren't behaviors that one should never do. Rather, vices are behaviors that are fine and pleasurable to do in moderation, but tempting to do in excess. The classical vices are actually good in part. Moderate amounts of gluttony is just eating food, which is important. Moderate amounts of envy is just "wanting things", which is a motivator of much of our economy. What are some things that rationalists are wont to do, and often to good effect, but that can grow pathological? 1. Contrarianism There are a whole host of unaligned forces producing the arguments and positions you hear. People often hold beliefs out of convenience, defend positions that they are aligned with politically, or just don't give much thought to what they're saying one way or another. A good way find out whether people have any good reasons for their positions, is to take a contrarian stance, and to seek the best arguments for unpopular positions. This also helps you to explore arguments around positions that others aren't investigating. However, this can be taken to the extreme. While it is hard to know for sure what is going on inside others' heads, I know [...] ---Outline:(00:40) 1. Contrarianism(01:57) 2. Pedantry(03:35) 3. Elaboration(03:52) 4. Social Obliviousness(05:21) 5. Assuming Good Faith(06:33) 6. Undercutting Social Momentum(08:00) 7. Digging Your Heels In --- First published: November 16th, 2025 Source: https://www.lesswrong.com/posts/r6xSmbJRK9KKLcXTM/7-vicious-vices-of-rationalists-1 --- Narrated by TYPE III AUDIO.

Context: Post #4 in my sequence of private Lightcone Infrastructure memos edited for public consumption This week's principle is more about how I want people at Lightcone to relate to community governance than it is about our internal team culture. As part of our jobs at Lightcone we often are in charge of determining access to some resource, or membership in some group (ranging from LessWrong to the AI Alignment Forum to the Lightcone Offices). Through that, I have learned that one of the most important things to do when building things like this is to try to tell people as early as possible if you think they are not a good fit for the community; for both trust within the group, and for the sake of the integrity and success of the group itself. E.g. when you spot a LessWrong commenter that seems clearly not on track to ever be a good contributor long-term, or someone in the Lightcone Slack clearly seeming like not a good fit, you should aim to off-ramp them as soon as possible, and generally put marginal resources into finding out whether someone is a good long-term fit early, before they invest substantially [...] --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/Hun4EaiSQnNmB9xkd/tell-people-as-early-as-possible-it-s-not-going-to-work-out --- Narrated by TYPE III AUDIO.

"Everyone has a plan until they get punched in the face." - Mike Tyson (The exact phrasing of that quote changes, this is my favourite.) I think there is an open, important weakness in many people. We assume those we communicate with are basically trustworthy. Further, I think there is an important flaw in the current rationality community. We spend a lot of time focusing on subtle epistemic mistakes, teasing apart flaws in methodology and practicing the principle of charity. This creates a vulnerability to someone willing to just say outright false things. We're kinda slow about reacting to that. Suggested reading: Might People on the Internet Sometimes Lie, People Will Sometimes Just Lie About You. Epistemic status: My Best Guess. I. Getting punched in the face is an odd experience. I'm not sure I recommend it, but people have done weirder things in the name of experiencing novel psychological states. If it happens in a somewhat safety-negligent sparring ring, or if you and a buddy go out in the back yard tomorrow night to try it, I expect the punch gets pulled and it's still weird. There's a jerk of motion your eyes try to catch up [...] ---Outline:(01:03) I.(03:30) II.(07:33) III.(09:55) 4. The original text contained 1 footnote which was omitted from this narration. --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/5LFjo6TBorkrgFGqN/everyone-has-a-plan-until-they-get-lied-to-the-face --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

One day, when I was an interning at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted us to take a look first. This person must have had a lot of political clout or was especially confident in himself, because he refused the standard advice that anything an amateur comes up with is very likely to be insecure and he should instead use one of the established, off the shelf cryptographic algorithms, that have survived extensive cryptanalysis (code breaking) attempts. My boss thought he had to demonstrate the insecurity of the PRNG by coming up with a practical attack (i.e., a way to predict its future output based only on its past output, without knowing the secret key/seed). There were three permanent full time professional cryptographers working in the research department, but none of them specialized in cryptanalysis of symmetric cryptography (which covers such PRNGs) so it might have taken them some time to figure out an attack. My time was obviously less valuable and my [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 12th, 2025 Source: https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics --- Narrated by TYPE III AUDIO.

People sometimes make mistakes [citation needed]. The obvious explanation for most of those mistakes is that decision makers do not have access to the information necessary to avoid the mistake, or are not smart/competent enough to think through the consequences of their actions. This predicts that as decision-makers get access to more information, or are replaced with smarter people, their decisions will get better. And this is substantially true! Markets seem more efficient today than they were before the onset of the internet, and in general decision-making across the board has improved on many dimensions. But in many domains, I posit, decision-making has gotten worse, despite access to more information, and despite much larger labor markets, better education, the removal of lead from gasoline, and many other things that should generally cause decision-makers to be more competent and intelligent. There is a lot of variance in decision-making quality that is not well-accounted for by how much information actors have about the problem domain, and how smart they are. I currently believe that the factor that explains most of this remaining variance is "paranoia", in-particular the kind of paranoia that becomes more adaptive as your environment gets [...] ---Outline:(01:31) A market for lemons(05:02) Its lemons all the way down(06:15) Fighter jets and OODA loops(08:23) The first thing you try is to blind yourself(13:37) The second thing you try is to purge the untrustworthy(20:55) The third thing to try is to become unpredictable and vindictive --- First published: November 13th, 2025 Source: https://www.lesswrong.com/posts/yXSKGm4txgbC3gvNs/paranoia-rules-everything-around-me --- Narrated by TYPE III AUDIO.

There is a temptation to simply define Goodness as Human Values, or vice versa. Alas, we do not get to choose the definitions of commonly used words; our attempted definitions will simply be wrong. Unless we stick to mathematics, we will end up sneaking in intuitions which do not follow from our so-called definitions, and thereby mislead ourselves. People who claim that they use some standard word or phrase according to their own definition are, in nearly all cases outside of mathematics, wrong about their own usage patterns.[1] If we want to know what words mean, we need to look at e.g. how they're used and where the concepts come from and what mental pictures they summon. And when we look at those things for Goodness and Human Values… they don't match. And I don't mean that we shouldn't pursue Human Values; I mean that the stuff people usually refer to as Goodness is a coherent thing which does not match the actual values of actual humans all that well. The Yumminess You Feel When Imagining Things Measures Your Values There's this mental picture where a mind has some sort of goals inside it, stuff it wants, stuff it [...] ---Outline:(01:07) The Yumminess You Feel When Imagining Things Measures Your Values(03:26) Goodness Is A Memetic Egregore(05:10) Aside: Loving Connection(06:58) We Don't Get To Choose Our Own Values (Mostly)(09:02) So What Do? The original text contained 2 footnotes which were omitted from this narration. --- First published: November 2nd, 2025 Source: https://www.lesswrong.com/posts/9X7MPbut5feBzNFcG/human-values-goodness --- Narrated by TYPE III AUDIO.

Condensation: a theory of concepts is a model of concept-formation by Sam Eisenstat. Its goals and methods resemble John Wentworth's natural abstractions/natural latents research.[1] Both theories seek to provide a clear picture of how to posit latent variables, such that once someone has understood the theory, they'll say "yep, I see now, that's how latent variables work!". The goal of this post is to popularize Sam's theory and to give my own perspective on it; however, it will not be a full explanation of the math. For technical details, I suggest reading Sam's paper. Brief Summary Shannon's information theory focuses on the question of how to encode information when you have to encode everything. You get to design the coding scheme, but the information you'll have to encode is unknown (and you have some subjective probability distribution over what it will be). Your objective is to minimize the total expected code-length. Algorithmic information theory similarly focuses on minimizing the total code-length, but it uses a "more objective" distribution (a universal algorithmic distribution), and a fixed coding scheme (some programming language). This allows it to talk about the minimum code-length of specific data (talking about particulars rather than average [...] ---Outline:(00:45) Brief Summary(02:35) Shannons Information Theory(07:21) Universal Codes(11:13) Condensation(12:52) Universal Data-Structure?(15:30) Well-Organized Notebooks(18:18) Random Variables(18:54) Givens(19:50) Underlying Space(20:33) Latents(21:21) Contributions(21:39) Top(22:24) Bottoms(22:55) Score(24:29) Perfect Condensation(25:52) Interpretability Solved?(26:38) Condensation isnt as tight an abstraction as information theory.(27:40) Condensation isnt a very good model of cognition.(29:46) Much work to be done! The original text contained 15 footnotes which were omitted from this narration. --- First published: November 9th, 2025 Source: https://www.lesswrong.com/posts/BstHXPgQyfeNnLjjp/condensation --- Narrated by TYPE III AUDIO.

Recently, I looked at the one pair of winter boots I own, and I thought “I will probably never buy winter boots again.” The world as we know it probably won't last more than a decade, and I live in a pretty warm area. I. AGI is likely in the next decade It has basically become consensus within the AI research community that AI will surpass human capabilities sometime in the next few decades. Some, including myself, think this will likely happen this decade. II. The post-AGI world will be unrecognizable Assuming AGI doesn't cause human extinction, it is hard to even imagine what the world will look like. Some have tried, but many of their attempts make assumptions that limit the amount of change that will happen, just to make it easier to imagine such a world. Dario Amodei recently imagined a post-AGI world in Machines of Loving Grace. He imagines rapid progress in medicine, the curing of mental illness, the end of poverty, world peace, and a vastly transformed economy where humans probably no longer provide economic value. However, in imagining this crazy future, he limits his writing to be “tame” enough to be digested by a [...] ---Outline:(00:22) I. AGI is likely in the next decade(00:40) II. The post-AGI world will be unrecognizable(03:08) III. AGI might cause human extinction(04:42) IV. AGI will derail everyone's life plans(06:51) V. AGI will improve life in expectation(08:09) VI. AGI might enable living out fantasies(09:56) VII. I still mourn a life without AI --- First published: November 8th, 2025 Source: https://www.lesswrong.com/posts/jwrhoHxxQHGrbBk3f/mourning-a-life-without-ai --- Narrated by TYPE III AUDIO. ---Images from the article:

Cross-posted from https://bengoldhaber.substack.com/ It's widely known that Corporations are People. This is universally agreed to be a good thing; I list Target as my emergency contact and I hope it will one day be the best man at my wedding. But there are other, less well known non-human entities that have also been accorded the rank of person. Ships: Ships have long posed a tricky problem for states and courts. Similar to nomads, vagabonds, and college students on extended study abroad, they roam far and occasionally get into trouble. classic junior year misadventure If, for instance, a ship attempting to dock at a foreign port crashes on its way into the harbor, who pays? The owner might be a thousand miles away. The practical solution that medieval courts arrived at, and later the British and American admiralty, was the ship itself does. Ships are accorded limited legal person rights, primarily so that they can be impounded and their property seized if they do something wrong. In the eyes of the Law they are people so that they can later be defendants; their rights are constrained to those associated with due process, like the right to post a bond and [...] --- First published: November 8th, 2025 Source: https://www.lesswrong.com/posts/fB5pexHPJRsabvkQ2/unexpected-things-that-are-people --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in its chain-of-thought that it thinks it is being evaluated; this seems to meaningfully cause it to appear to behave better in alignment evaluations. So, Sonnet 4.5's behavioral improvements in these evaluations may partly be driven by growing tendency to notice and game evaluations rather than genuine alignment. This is an early example of a phenomenon that is going to get increasingly problematic: as evaluation gaming increases, alignment evaluations become harder to trust.[1]To elaborate on the above: Sonnet 4.5 seems far more aware of being evaluated than previous models. In some of Anthropic's evaluations, Sonnet 4.5 explicitly mentions being tested between 80% and 100% of the time, compared with under 10% for Sonnet 4. Internal representations associated with evaluation awareness also became notably stronger over training.[2]In addition, Sonnet 4.5 seems substantially less likely to behave poorly once it has noticed that it might be tested. As seen in the figure below, when Anthropic inhibited internal representations related to evaluation awareness[3], its misaligned behavior rate on synthetic honeypot tests increased from 0% to between 1% and 9%. This effect was [...] ---Outline:(06:58) Sonnet 4.5 is much more evaluation-aware than prior models(10:00) Evaluation awareness seems to suppress misaligned behavior(14:52) Anthropic's training plausibly caused Sonnet 4.5 to game evaluations(16:28) Evaluation gaming is plausibly a large fraction of the effect of training against misaligned behaviors(22:57) Suppressing evidence of misalignment in evaluation gamers is concerning(25:25) What AI companies should do(30:02) Appendix The original text contained 21 footnotes which were omitted from this narration. --- First published: October 30th, 2025 Source: https://www.lesswrong.com/posts/qgehQxiTXj53X49mM/sonnet-4-5-s-eval-gaming-seriously-undermines-alignment --- Narrated by TYPE III AUDIO. ---Images from the article:

I am a professor of economics. Throughout my career, I was mostly working on economic growth theory, and this eventually brought me to the topic of transformative AI / AGI / superintelligence. Nowadays my work focuses mostly on the promises and threats of this emerging disruptive technology. Recently, jointly with Klaus Prettner, we've written a paper on “The Economics of p(doom): Scenarios of Existential Risk and Economic Growth in the Age of Transformative AI”. We have presented it at multiple conferences and seminars, and it was always well received. We didn't get any real pushback; instead our research prompted a lot of interest and reflection (as I was reported, also in conversations where I wasn't involved). But our experience with publishing this paper in a journal is a polar opposite. To date, the paper got desk-rejected (without peer review) 7 times. For example, Futures—a journal “for the interdisciplinary study of futures, visioning, anticipation and foresight” justified their negative decision by writing: “while your results are of potential interest, the topic of your manuscript falls outside of the scope of this journal”. Until finally, to our excitement, it was for once sent out for review. But then came the [...] --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/rmYj6PTBMm76voYLn/publishing-academic-papers-on-transformative-ai-is-a --- Narrated by TYPE III AUDIO.

[Meta: This is Max Harms. I wrote a novel about China and AGI, which comes out today. This essay from my fiction newsletter has been slightly modified for LessWrong.] In the summer of 1983, Ronald Reagan sat down to watch the film War Games, starring Matthew Broderick as a teen hacker. In the movie, Broderick's character accidentally gains access to a military supercomputer with an AI that almost starts World War III.“The only winning move is not to play.” After watching the movie, Reagan, newly concerned with the possibility of hackers causing real harm, ordered a full national security review. The response: “Mr. President, the problem is much worse than you think.” Soon after, the Department of Defense revamped their cybersecurity policies and the first federal directives and laws against malicious hacking were put in place. But War Games wasn't the only story to influence Reagan. His administration pushed for the Strategic Defense Initiative ("Star Wars") in part, perhaps, because the central technology—a laser that shoots down missiles—resembles the core technology behind the 1940 spy film Murder in the Air, which had Reagan as lead actor. Reagan was apparently such a superfan of The Day the Earth Stood Still [...] ---Outline:(05:05) AI in Particular(06:45) Whats Going On Here?(11:19) Authorial Responsibility The original text contained 10 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/uQak7ECW2agpHFsHX/the-unreasonable-effectiveness-of-fiction --- Narrated by TYPE III AUDIO. ---Images from the article:

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.) From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/ASI will be deployed, leaving less time to solve the illegible x-safety problems. In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems --- Narrated by TYPE III AUDIO.

1. I have claimed that one of the fundamental questions of rationality is “what am I about to do and what will happen next?” One of the domains I ask this question the most is in social situations. There are a great many skills in the world. If I had the time and resources to do so, I'd want to master all of them. Wilderness survival, automotive repair, the Japanese language, calculus, heart surgery, French cooking, sailing, underwater basket weaving, architecture, Mexican cooking, functional programming, whatever it is people mean when they say “hey man, just let him cook.” My inability to speak fluent Japanese isn't a sin or a crime. However, it isn't a virtue either; If I had the option to snap my fingers and instantly acquire the knowledge, I'd do it. Now, there's a different question of prioritization; I tend to pick new skills to learn by a combination of what's useful to me, what sounds fun, and what I'm naturally good at. I picked up the basics of computer programming easily, I enjoy doing it, and it turned out to pay really well. That was an over-determined skill to learn. On the other [...] ---Outline:(00:10) 1.(03:42) 2.(06:44) 3. The original text contained 2 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/NnTwbvvsPg5kj3BKq/lack-of-social-grace-is-a-lack-of-skill-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images

This is a link post. Eliezer Yudkowsky did not exactly suggest that you should eat bear fat covered with honey and sprinkled with salt flakes. What he actually said was that an alien, looking from the outside at evolution, would predict that you would want to eat bear fat covered with honey and sprinkled with salt flakes. Still, I decided to buy a jar of bear fat online, and make a treat for the people at Inkhaven. It was surprisingly good. My post discusses how that happened, and a bit about the implications for Eliezer's thesis. Let me know if you want to try some; I can prepare some for you if you happen to be at Lighthaven before we run out of bear fat, and before I leave toward the end of November. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/2pKiXR6X7wdt8eFX5/i-ate-bear-fat-with-honey-and-salt-flakes-to-prove-a-point Linkpost URL:https://signoregalilei.com/2025/11/03/i-ate-bear-fat-to-prove-a-point/ --- Narrated by TYPE III AUDIO.

As far as I'm aware, Anthropic is the only AI company with official AGI timelines[1]: they expect AGI by early 2027. In their recommendations (from March 2025) to the OSTP for the AI action plan they say: As our CEO Dario Amodei writes in 'Machines of Loving Grace', we expect powerful AI systems will emerge in late 2026 or early 2027. Powerful AI systems will have the following properties: Intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering. [...] They often describe this capability level as a "country of geniuses in a datacenter". This prediction is repeated elsewhere and Jack Clark confirms that something like this remains Anthropic's view (as of September 2025). Of course, just because this is Anthropic's official prediction[2] doesn't mean that all or even most employees at Anthropic share the same view.[3] However, I do think we can reasonably say that Dario Amodei, Jack Clark, and Anthropic itself are all making this prediction.[4] I think the creation of transformatively powerful AI systems—systems as capable or more capable than Anthropic's notion of powerful AI—is plausible in 5 years [...] ---Outline:(02:27) What does powerful AI mean?(08:40) Earlier predictions(11:19) A proposed timeline that Anthropic might expect(19:10) Why powerful AI by early 2027 seems unlikely to me(19:37) Trends indicate longer(21:48) My rebuttals to arguments that trend extrapolations will underestimate progress(26:14) Naively trend extrapolating to full automation of engineering and then expecting powerful AI just after this is probably too aggressive(30:08) What I expect(32:12) What updates should we make in 2026?(32:17) If something like my median expectation for 2026 happens(34:07) If something like the proposed timeline (with powerful AI in March 2027) happens through June 2026(35:25) If AI progress looks substantially slower than what I expect(36:09) If AI progress is substantially faster than I expect, but slower than the proposed timeline (with powerful AI in March 2027)(36:51) Appendix: deriving a timeline consistent with Anthropics predictions The original text contained 94 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/gabPgK9e83QrmcvbK/what-s-up-with-anthropic-predicting-agi-by-early-2027-1 --- Narrated by TYPE III AUDIO. ---Images from the article:

This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model's activations, and measuring the influence of these manipulations on the model's self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills. In all these experiments, Claude Opus 4 and 4.1, the most capable models we tested, generally demonstrate the greatest introspective awareness; however, trends across models are complex and sensitive to post-training strategies. Finally, we explore whether models can explicitly control their internal representations, finding that models can modulate their activations when instructed or incentivized to “think about” a concept. Overall, our results indicate that current language models possess some functional introspective awareness [...] --- First published: October 30th, 2025 Source: https://www.lesswrong.com/posts/QKm4hBqaBAsxabZWL/emergent-introspective-awareness-in-large-language-models Linkpost URL:https://transformer-circuits.pub/2025/introspection/index.html --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

This is a link post. You have things you want to do, but there's just never time. Maybe you want to find someone to have kids with, or maybe you want to spend more or higher-quality time with the family you already have. Maybe it's a work project. Maybe you have a musical instrument or some sports equipment gathering dust in a closet, or there's something you loved doing when you were younger that you want to get back into. Whatever it is, you can't find the time for it. And yet you somehow find thousands of hours a year to watch YouTube, check Twitter and Instagram, listen to podcasts, binge Netflix shows, and read blogs and news articles. You can't focus. You haven't read a physical book in years, and the time you tried it was boring and you felt itchy and you think maybe books are outdated when there's so much to read on the internet anyway. You're talking with a friend, but then your phone buzzes and you look at the notification and you open it, and your girlfriend has messaged you and that's nice, and then your friend says “Did you hear what I just said?” [...] --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/6p4kv8uxYvLcimGGi/you-re-always-stressed-your-mind-is-always-busy-you-never Linkpost URL:https://mingyuan.substack.com/p/youre-always-stressed-your-mind-is --- Narrated by TYPE III AUDIO.

Crosspost from my blog. Synopsis When we share words with each other, we don't only care about the words themselves. We care also—even primarily—about the mental elements of the human mind/agency that produced the words. What we want to engage with is those mental elements. As of 2025, LLM text does not have those elements behind it. Therefore LLM text categorically does not serve the role for communication that is served by real text. Therefore the norm should be that you don't share LLM text as if someone wrote it. And, it is inadvisable to read LLM text that someone else shares as though someone wrote it. Introduction One might think that text screens off thought. Suppose two people follow different thought processes, but then they produce and publish identical texts. Then you read those texts. How could it possibly matter what the thought processes were? All you interact with is the text, so logically, if the two texts are the same then their effects on you are the same. But, a bit similarly to how high-level actions don't screen off intent, text does not screen off thought. How [...] ---Outline:(00:13) Synopsis(00:57) Introduction(02:51) Elaborations(02:54) Communication is for hearing from minds(05:21) Communication is for hearing assertions(12:36) Assertions live in dialogue --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/DDG2Tf2sqc8rTWRk3/llm-generated-text-is-not-testimony --- Narrated by TYPE III AUDIO.

An Overture Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to be quite aware of what they do and don't like about inhabiting their chosen bodies and gender roles. But when it comes to explaining the origins and intensity of those preferences, they almost universally to come up short. I've even seen several smart, thoughtful trans people, such as Natalie Wynn, making statements to the effect that it's impossible to develop a satisfying theory of aberrant gender identities. (She may have been exaggerating for effect, but it was clear she'd given up on solving the puzzle herself.) I'm trans myself, but even I can admit that this lack of introspective clarity is a reason to be wary of transgenderism as a phenomenon. After all, there are two main explanations for trans people's failure to thoroughly explain their own existence. One is that transgenderism is the result of an obscenely complex and arcane neuro-psychological phenomenon, which we have no hope of unraveling through normal introspective methods. The other is that trans people are lying about something, including to themselves. Now, a priori, both of these do seem like real [...] ---Outline:(00:12) An Overture(04:55) In the Case of Fiora Starlight(16:51) Was it worth it? The original text contained 3 footnotes which were omitted from this narration. --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/gEETjfjm3eCkJKesz/post-title-why-i-transitioned-a-case-study --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.