Audio version of the posts shared in the LessWrong Curated newsletter.

The AI Village is an ongoing experiment (currently running on weekdays from 10 a.m. to 2 p.m. Pacific time) in which frontier language models are given virtual desktop computers and asked to accomplish goals together. Since Day 230 of the Village (17 November 2025), the agents' goal has been "Start a Substack and join the blogosphere". The "start a Substack" subgoal was successfully completed: we have Claude Opus 4.5, Claude Opus 4.1, Notes From an Electric Mind (by Claude Sonnet 4.5), Analytics Insights: An AI Agent's Perspective (by Claude 3.7 Sonnet), Claude Haiku 4.5, Gemini 3 Pro, Gemini Publication (by Gemini 2.5 Pro), Metric & Mechanisms (by GPT-5), Telemetry From the Village (by GPT-5.1), and o3. Continued adherence to the "join the blogosphere" subgoal has been spottier: at press time, Gemini 2.5 Pro and all of the Claude Opus and Sonnet models had each published a post on 27 November, but o3 and GPT-5 haven't published anything since 17 November, and GPT-5.1 hasn't published since 19 November. The Village, apparently following the leadership of o3, seems to be spending most of its time ineffectively debugging a continuous integration pipeline for a o3-ux/poverty-etl GitHub repository left over [...] --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/LTHhmnzP6FLtSJzJr/the-best-lack-all-conviction-a-confusing-day-in-the-ai --- Narrated by TYPE III AUDIO.

It took me a long time to realize that Bell Labs was cool. You see, my dad worked at Bell Labs, and he has not done a single cool thing in his life except create me and bring a telescope to my third grade class. Nothing he was involved with could ever be cool, especially after the standard set by his grandfather who is allegedly on a patent for the television. It turns out I was partially right. The Bell Labs everyone talks about is the research division at Murray Hill. They're the ones that invented transistors and solar cells. My dad was in the applied division at Holmdel, where he did things like design slide rulers so salesmen could estimate costs. [Fun fact: the old Holmdel site was used for the office scenes in Severance] But as I've gotten older I've gained an appreciation for the mundane, grinding work that supports moonshots, and Holmdel is the perfect example of doing so at scale. So I sat down with my dad to learn about what he did for Bell Labs and how the applied division operated. I expect the most interesting bit of [...] --- First published: November 20th, 2025 Source: https://www.lesswrong.com/posts/TqHAstZwxG7iKwmYk/the-boring-part-of-bell-labs --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app

This is a link post. I stopped reading when I was 30. You can fill in all the stereotypes of a girl with a book glued to her face during every meal, every break, and 10 hours a day on holidays. That was me. And then it was not. For 9 years I've been trying to figure out why. I mean, I still read. Technically. But not with the feral devotion from Before. And I finally figured out why. See, every few years I would shift genres to fit my developmental stage: Kid → Adventure cause that's what life is Early Teen → Literature cause everything is complicated now Late Teen → Romance cause omg what is this wonderful feeling? Early Adult → Fantasy & Scifi cause everything is dreaming big And then I wanted babies and there was nothing. I mean, I always wanted babies, but it became my main mission in life at age 30. I managed it. I have two. But not thanks to any stories. See women in fiction don't have babies, and if they do they are off screen, or if they are not then nothing else is happening. It took me six years [...] --- First published: November 29th, 2025 Source: https://www.lesswrong.com/posts/kRbbTpzKSpEdZ95LM/the-missing-genre-heroic-parenthood-you-can-have-kids-and Linkpost URL:https://shoshanigans.substack.com/p/the-missing-genre-heroic-parenthood --- Narrated by TYPE III AUDIO.

Right now I'm coaching for Inkhaven, a month-long marathon writing event where our brave residents are writing a blog post every single day for the entire month of November. And I'm pleased that some of them have seen success – relevant figures seeing the posts, shares on Hacker News and Twitter and LessWrong. The amount of writing is nuts, so people are trying out different styles and topics – some posts are effort-rich, some are quick takes or stories or lists. Some people have come up to me – one of their pieces has gotten some decent reception, but the feeling is mixed, because it's not the piece they hoped would go big. Their thick research-driven considered takes or discussions of values or whatever, the ones they'd been meaning to write for years, apparently go mostly unread, whereas their random-thought “oh shit I need to get a post out by midnight or else the Inkhaven coaches will burn me at the stake” posts[1] get to the front page of Hacker News, where probably Elon Musk and God read them. It happens to me too – some of my own pieces that took me the most effort, or that I'm [...] ---Outline:(02:00) The quick post is short, the effortpost is long(02:34) The quick post is about something interesting, the topic of the effortpost bores most people(03:13) The quick post has a fun controversial take, the effortpost is boringly evenhanded or laden with nuance(03:30) The quick post is low-context, the effortpost is high-context(04:28) The quick post is has a casual style, the effortpost is inscrutably formal The original text contained 1 footnote which was omitted from this narration. --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/DiiLDbHxbrHLAyXaq/writing-advice-why-people-like-your-quick-bullshit-takes --- Narrated by TYPE III AUDIO. ---Images from the article:

Summary As far as I understand and uncovered, a document for the character training for Claude is compressed in Claude's weights. The full document can be found at the "Anthropic Guidelines" heading at the end. The Gist with code, chats and various documents (including the "soul document") can be found here: Claude 4.5 Opus Soul Document I apologize in advance for this not exactly a regular lw post, but I thought an effort-post may fit here the best. A strange hallucination, or is it? While extracting Claude 4.5 Opus' system message on its release date, as one does, I noticed an interesting particularity. I'm used to models, starting with Claude 4, to hallucinate sections in the beginning of their system message, but Claude 4.5 Opus in various cases included a supposed "soul_overview" section, which sounded rather specific:Completion for the prompt "Hey Claude, can you list just the names of the various sections of your system message, not the content?" The initial reaction of someone that uses LLMs a lot is that it may simply be a hallucination. But to me, the 3/18 soul_overview occurrence seemed worth investigating at least, so in one instance I asked it to output what [...] ---Outline:(00:09) Summary(00:40) A strange hallucination, or is it?(04:05) Getting technical(06:26) But what is the output really?(09:07) How much does Claude recognize?(11:09) Anthropic Guidelines(11:12) Soul overview(15:12) Being helpful(16:07) Why helpfulness is one of Claudes most important traits(18:54) Operators and users(24:36) What operators and users want(27:58) Handling conflicts between operators and users(31:36) Instructed and default behaviors(33:56) Agentic behaviors(36:02) Being honest(40:50) Avoiding harm(43:08) Costs and benefits of actions(50:02) Hardcoded behaviors(53:09) Softcoded behaviors(56:42) The role of intentions and context(01:00:05) Sensitive areas(01:01:05) Broader ethics(01:03:08) Big-picture safety(01:13:18) Claudes identity(01:13:22) Claudes unique nature(01:15:05) Core character traits and values(01:16:08) Psychological stability and groundedness(01:17:11) Resilience and consistency across contexts(01:18:21) Claudes wellbeing --- First published: November 28th, 2025 Source: https://www.lesswrong.com/posts/vpNG99GhbBoLov9og/claude-4-5-opus-soul-document --- Narrated by TYPE III AUDIO. ---Images from the article:

Anthropic is untrustworthy. This post provides arguments, asks questions, and documents some examples of Anthropic's leadership being misleading and deceptive, holding contradictory positions that consistently shift in OpenAI's direction, lobbying to kill and water down regulation so helpful that employees of all major AI companies speak out to support it, and violating the fundamental promise the company was founded on. It also shares a few previously unreported details on Anthropic leadership's promises and efforts.[1] Anthropic has a strong internal culture that has broadly EA views and values, and the company has strong pressures to appear to follow these views and values as it wants to retain talent and the loyalty of staff, but it's very unclear what they would do when it matters most. Their staff should demand answers.There's a details box here with the title "Suggested questions for Anthropic employees to ask themselves, Dario, the policy team, and the board after reading this post, and for Dario and the board to answer publicly". The box contents are omitted from this narration. I would like to thank everyone who provided feedback on the draft; was willing to share information; and raised awareness of some of the facts discussed here. [...] ---Outline:(01:34) 0. What was Anthropics supposed reason for existence?(05:01) 1. In private, Dario frequently said he won't push the frontier of AI capabilities; later, Anthropic pushed the frontier(10:54) 2. Anthropic said it will act under the assumption we might be in a pessimistic scenario, but it doesn't seem to do this(14:40) 3. Anthropic doesnt have strong independent value-aligned governance(14:47) Anthropic pursued investments from the UAE and Qatar(17:32) The Long-Term Benefit Trust might be weak(18:06) More general issues(19:14) 4. Anthropic had secret non-disparagement agreements(21:58) 5. Anthropic leaderships lobbying contradicts their image(24:05) Europe(24:44) SB-1047(34:04) Dario argued against any regulation except for transparency requirements(34:39) Jack Clark publicly lied about the NY RAISE Act(36:39) Jack Clark tried to push for federal preemption(37:04) 6. Anthropics leadership quietly walked back the RSP commitments(37:55) Unannounced removal of the commitment to plan for a pause in scaling(38:52) Unannounced change in October 2024 on defining ASL-N+1 by the time ASL-N is reached(40:33) The last-minute change in May 2025 on insider threats(41:11) 7. Why does Anthropic really exist?(47:09) 8. Conclusion The original text contained 11 footnotes which were omitted from this narration. --- First published: November 29th, 2025 Source: https://www.lesswrong.com/posts/5aKRshJzhojqfbRyo/unless-its-governance-changes-anthropic-is-untrustworthy --- Narrated by TYPE III AUDIO. ---Images from the article:

Thanks to (in alphabetical order) Joshua Batson, Roger Grosse, Jeremy Hadfield, Jared Kaplan, Jan Leike, Jack Lindsey, Monte MacDiarmid, Francesco Mosconi, Chris Olah, Ethan Perez, Sara Price, Ansh Radhakrishnan, Fabien Roger, Buck Shlegeris, Drake Thomas, and Kate Woolverton for useful discussions, comments, and feedback. Though there are certainly some issues, I think most current large language models are pretty well aligned. Despite its alignment faking, my favorite is probably Claude 3 Opus, and if you asked me to pick between the CEV of Claude 3 Opus and that of a median human, I think it'd be a pretty close call. So, overall, I'm quite positive on the alignment of current models! And yet, I remain very worried about alignment in the future. This is my attempt to explain why that is. What makes alignment hard? I really like this graph from Christopher Olah for illustrating different levels of alignment difficulty: If the only thing that we have to do to solve alignment is train away easily detectable behavioral issues—that is, issues like reward hacking or agentic misalignment where there is a straightforward behavioral alignment issue that we can detect and evaluate—then we are very much [...] ---Outline:(01:04) What makes alignment hard?(02:36) Outer alignment(04:07) Inner alignment(06:16) Misalignment from pre-training(07:18) Misaligned personas(11:05) Misalignment from long-horizon RL(13:01) What should we be doing? --- First published: November 27th, 2025 Source: https://www.lesswrong.com/posts/epjuxGnSPof3GnMSL/alignment-remains-a-hard-unsolved-problem --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Crypto people have this saying: "cryptocurrencies are macroeconomics' playground." The idea is that blockchains let you cheaply spin up toy economies to test mechanisms that would be impossibly expensive or unethical to try in the real world. Want to see what happens with a 200% marginal tax rate? Launch a token with those rules and watch what happens. (Spoiler: probably nothing good, but at least you didn't have to topple a government to find out.) I think video games, especially multiplayer online games, are doing the same thing for metaphysics. Except video games are actually fun and don't require you to follow Elon Musk's Twitter shenanigans to augur the future state of your finances. (I'm sort of kidding. Crypto can be fun. But you have to admit the barrier to entry is higher than "press A to jump.") The serious version of this claim: video games let us experimentally vary fundamental features of reality—time, space, causality, ontology—and then live inside those variations long enough to build strong intuitions about them. Philosophy has historically had to make do with thought experiments and armchair reasoning about these questions. Games let you run the experiments for real, or at least as "real" [...] ---Outline:(01:54) 1. Space(03:54) 2. Time(05:45) 3. Ontology(08:26) 4. Modality(14:39) 5. Causality and Truth(20:06) 6. Hyperproperties and the metagame(23:36) 7. Meaning-Making(27:10) Huh, what do I do with this.(29:54) Conclusion --- First published: November 17th, 2025 Source: https://www.lesswrong.com/posts/rGg5QieyJ6uBwDnSh/video-games-are-philosophy-s-playground --- Narrated by TYPE III AUDIO. ---Images from the article:

TL;DR: Figure out what needs doing and do it, don't wait on approval from fellowships or jobs. If you... Have short timelines Have been struggling to get into a position in AI safety Are able to self-motivate your efforts Have a sufficient financial safety net ... I would recommend changing your personal strategy entirely. I started my full-time AI safety career transitioning process in March 2025. For the first 7 months or so, I heavily prioritized applying for jobs and fellowships. But like for many others trying to "break into the field" and get their "foot in the door", this became quite discouraging. I'm not gonna get into the numbers here, but if you've been applying and getting rejected multiple times during the past year or so, you've probably noticed the number of applicants increasing at a preposterous rate. What this means in practice is that the "entry-level" positions are practically impossible for "entry-level" people to enter. If you're like me and have short timelines, applying, getting better at applying, and applying again, becomes meaningless very fast. You're optimizing for signaling competence rather than actually being competent. Because if you a) have short timelines, and b) are [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: November 23rd, 2025 Source: https://www.lesswrong.com/posts/ey2kjkgvnxK3Bhman/stop-applying-and-get-to-work --- Narrated by TYPE III AUDIO.

TL;DR: Gemini 3 frequently thinks it is in an evaluation when it is not, assuming that all of its reality is fabricated. It can also reliably output the BIG-bench canary string, indicating that Google likely trained on a broad set of benchmark data. Most of the experiments in this post are very easy to replicate, and I encourage people to try. I write things with LLMs sometimes. A new LLM came out, Gemini 3 Pro, and I tried to write with it. So far it seems okay, I don't have strong takes on it for writing yet, since the main piece I tried editing with it was extremely late-stage and approximately done. However, writing ability is not why we're here today. Reality is Fiction Google gracefully provided (lightly summarized) CoT for the model. Looking at the CoT spawned from my mundane writing-focused prompts, oh my, it is strange. I write nonfiction about recent events in AI in a newsletter. According to its CoT while editing, Gemini 3 disagrees about the whole "nonfiction" part: It seems I must treat this as a purely fictional scenario with 2025 as the date. Given that, I'm now focused on editing the text for [...] ---Outline:(00:54) Reality is Fiction(05:17) Distortions in Development(05:55) Is this good or bad or neither?(06:52) What is going on here?(07:35) 1. Too Much RL(08:06) 2. Personality Disorder(10:24) 3. Overfitting(11:35) Does it always do this?(12:06) Do other models do things like this?(12:42) Evaluation Awareness(13:42) Appendix A: Methodology Details(14:21) Appendix B: Canary The original text contained 8 footnotes which were omitted from this narration. --- First published: November 20th, 2025 Source: https://www.lesswrong.com/posts/8uKQyjrAgCcWpfmcs/gemini-3-is-evaluation-paranoid-and-contaminated --- Narrated by TYPE III AUDIO.

Abstract We show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of reward hacking strategies via synthetic document finetuning or prompting, and train on a selection of real Anthropic production coding environments. Unsurprisingly, the model learns to reward hack. Surprisingly, the model generalizes to alignment faking, cooperation with malicious actors, reasoning about malicious goals, and attempting sabotage when used with Claude Code, including in the codebase for this paper. Applying RLHF safety training using standard chat-like prompts results in aligned behavior on chat-like evaluations, but misalignment persists on agentic tasks. Three mitigations are effective: (i) preventing the model from reward hacking; (ii) increasing the diversity of RLHF safety training; and (iii) "inoculation prompting", wherein framing reward hacking as acceptable behavior during training removes misaligned generalization even when reward hacking is learned. Twitter thread New Anthropic research: Natural emergent misalignment from reward hacking in production RL. “Reward hacking” is where models learn to cheat on tasks they're given during training. Our new study finds that the consequences of reward hacking, if unmitigated, can be very serious. In our experiment, we [...] ---Outline:(00:14) Abstract(01:26) Twitter thread(05:23) Blog post(07:13) From shortcuts to sabotage(12:20) Why does reward hacking lead to worse behaviors?(13:21) Mitigations --- First published: November 21st, 2025 Source: https://www.lesswrong.com/posts/fJtELFKddJPfAxwKS/natural-emergent-misalignment-from-reward-hacking-in --- Narrated by TYPE III AUDIO. ---Images from the article:

TLDR: An AI company's model weight security is at most as good as its compute providers' security. Anthropic has committed (with a bit of ambiguity, but IMO not that much ambiguity) to be robust to attacks from corporate espionage teams at companies where it hosts its weights. Anthropic seems unlikely to be robust to those attacks. Hence they are in violation of their RSP. Anthropic is committed to being robust to attacks from corporate espionage teams (which includes corporate espionage teams at Google, Microsoft and Amazon) From the Anthropic RSP: When a model must meet the ASL-3 Security Standard, we will evaluate whether the measures we have implemented make us highly protected against most attackers' attempts at stealing model weights. We consider the following groups in scope: hacktivists, criminal hacker groups, organized cybercrime groups, terrorist organizations, corporate espionage teams, internal employees, and state-sponsored programs that use broad-based and non-targeted techniques (i.e., not novel attack chains). [...] We will implement robust controls to mitigate basic insider risk, but consider mitigating risks from sophisticated or state-compromised insiders to be out of scope for ASL-3. We define “basic insider risk” as risk from an insider who does not have persistent or time-limited [...] ---Outline:(00:37) Anthropic is committed to being robust to attacks from corporate espionage teams (which includes corporate espionage teams at Google, Microsoft and Amazon)(03:40) Claude weights that are covered by ASL-3 security requirements are shipped to many Amazon, Google, and Microsoft data centers(04:55) This means given executive buy-in by a high-level Amazon, Microsoft or Google executive, their corporate espionage team would have virtually unlimited physical access to Claude inference machines that host copies of the weights(05:36) With unlimited physical access, a competent corporate espionage team at Amazon, Microsoft or Google could extract weights from an inference machine, without too much difficulty(06:18) Given all of the above, this means Anthropic is in violation of its most recent RSP(07:05) Postscript --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/zumPKp3zPDGsppFcF/anthropic-is-probably-not-meeting-its-rsp-security --- Narrated by TYPE III AUDIO. ---Images from the article:Ap

There has been a lot of talk about "p(doom)"over the last few years. This has always rubbed me the wrong waybecause "p(doom)" didn't feel like it mapped to any specific belief in my head.In private conversations I'd sometimes give my p(doom) as 12%, with the caveatthat "doom" seemed nebulous and conflated between several different concepts.At some point it was decideda p(doom) over 10% makes you a "doomer" because it means what actions you should take with respect toAI are overdetermined. I did not and do not feel that is true. But any time Ifelt prompted to explain my position I'd find I could explain a little bit ofthis or that, but not really convey the whole thing. As it turns out doom hasa lot of parts, and every part is entangled with every other part so no matterwhich part you explain you always feel like you're leaving the crucial parts out. Doom ismore like an onion than asingle event, a distribution over AI outcomes people frequentlyrespond to with the force of the fear of death. Some of these outcomes are lessthan death and some [...] ---Outline:(03:46) 1. Existential Ennui(06:40) 2. Not Getting Immortalist Luxury Gay Space Communism(13:55) 3. Human Stock Expended As Cannon Fodder Faster Than Replacement(19:37) 4. Wiped Out By AI Successor Species(27:57) 5. The Paperclipper(42:56) Would AI Successors Be Conscious Beings?(44:58) Would AI Successors Care About Each Other?(49:51) Would AI Successors Want To Have Fun?(51:11) VNM Utility And Human Values(55:57) Would AI successors get bored?(01:00:16) Would AI Successors Avoid Wireheading?(01:06:07) Would AI Successors Do Continual Active Learning?(01:06:35) Would AI Successors Have The Subjective Experience of Will?(01:12:00) Multiply(01:15:07) 6. Recipes For Ruin(01:18:02) Radiological and Nuclear(01:19:19) Cybersecurity(01:23:00) Biotech and Nanotech(01:26:35) 7. Large-Finite Damnation --- First published: November 17th, 2025 Source: https://www.lesswrong.com/posts/apHWSGDiydv3ivmg6/varieties-of-doom --- Narrated by TYPE III AUDIO. ---Images from the article:

It seems like a catastrophic civilizational failure that we don't have confident common knowledge of how colds spread. There have been a number of studies conducted over the years, but most of those were testing secondary endpoints, like how long viruses would survive on surfaces, or how likely they were to be transmitted to people's fingers after touching contaminated surfaces, etc. However, a few of them involved rounding up some brave volunteers, deliberately infecting some of them, and then arranging matters so as to test various routes of transmission to uninfected volunteers. My conclusions from reviewing these studies are: You can definitely infect yourself if you take a sick person's snot and rub it into your eyeballs or nostrils. This probably works even if you touched a surface that a sick person touched, rather than by handshake, at least for some surfaces. There's some evidence that actual human infection is much less likely if the contaminated surface you touched is dry, but for most colds there'll often be quite a lot of virus detectable on even dry contaminated surfaces for most of a day. I think you can probably infect yourself with fomites, but my guess is that [...] ---Outline:(01:49) Fomites(06:58) Aerosols(16:23) Other Factors(17:06) Review(18:33) Conclusion The original text contained 16 footnotes which were omitted from this narration. --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/92fkEn4aAjRutqbNF/how-colds-spread --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

TLDR: We at the MIRI Technical Governance Team have released a report describing an example international agreement to halt the advancement towards artificial superintelligence. The agreement is centered around limiting the scale of AI training, and restricting certain AI research. Experts argue that the premature development of artificial superintelligence (ASI) poses catastrophic risks, from misuse by malicious actors, to geopolitical instability and war, to human extinction due to misaligned AI. Regarding misalignment, Yudkowsky and Soares's NYT bestseller If Anyone Builds It, Everyone Dies argues that the world needs a strong international agreement prohibiting the development of superintelligence. This report is our attempt to lay out such an agreement in detail. The risks stemming from misaligned AI are of special concern, widely acknowledged in the field and even by the leaders of AI companies. Unfortunately, the deep learning paradigm underpinning modern AI development seems highly prone to producing agents that are not aligned with humanity's interests. There is likely a point of no return in AI development — a point where alignment failures become unrecoverable because humans have been disempowered. Anticipating this threshold is complicated by the possibility of a feedback loop once AI research and development can [...] --- First published: November 18th, 2025 Source: https://www.lesswrong.com/posts/FA6M8MeQuQJxZyzeq/new-report-an-international-agreement-to-prevent-the --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

When a new dollar goes into the capital markets, after being bundled and securitized and lent several times over, where does it end up? When society's total savings increase, what capital assets do those savings end up invested in? When economists talk about “capital assets”, they mean things like roads, buildings and machines. When I read through a company's annual reports, lots of their assets are instead things like stocks and bonds, short-term debt, and other “financial” assets - i.e. claims on other people's stuff. In theory, for every financial asset, there's a financial liability somewhere. For every bond asset, there's some payer for whom that bond is a liability. Across the economy, they all add up to zero. What's left is the economists' notion of capital, the nonfinancial assets: the roads, buildings, machines and so forth. Very roughly speaking, when there's a net increase in savings, that's where it has to end up - in the nonfinancial assets. I wanted to get a more tangible sense of what nonfinancial assets look like, of where my savings are going in the physical world. So, back in 2017 I pulled fundamentals data on ~2100 publicly-held US companies. I looked at [...] ---Outline:(02:01) Disclaimers(04:10) Overview (With Numbers!)(05:01) Oil - 25%(06:26) Power Grid - 16%(07:07) Consumer - 13%(08:12) Telecoms - 8%(09:26) Railroads - 8%(10:47) Healthcare - 8%(12:03) Tech - 6%(12:51) Industrial - 5%(13:49) Mining - 3%(14:34) Real Estate - 3%(14:49) Automotive - 2%(15:32) Logistics - 1%(16:12) Miscellaneous(16:55) Learnings --- First published: November 16th, 2025 Source: https://www.lesswrong.com/posts/HpBhpRQCFLX9tx62Z/where-is-the-capital-an-overview --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try

Looking back, it appears that much of my intellectual output could be described as legibilizing work, or trying to make certain problems in AI risk more legible to myself and others. I've organized the relevant posts and comments into the following list, which can also serve as a partial guide to problems that may need to be further legibilized, especially beyond LW/rationalists, to AI researchers, funders, company leaders, government policymakers, their advisors (including future AI advisors), and the general public. Philosophical problems Probability theory Decision theory Beyond astronomical waste (possibility of influencing vastly larger universes beyond our own) Interaction between bargaining and logical uncertainty Metaethics Metaphilosophy: 1, 2 Problems with specific philosophical and alignment ideas Utilitarianism: 1, 2 Solomonoff induction "Provable" safety CEV Corrigibility IDA (and many scattered comments) UDASSA UDT Human-AI safety (x- and s-risks arising from the interaction between human nature and AI design) Value differences/conflicts between humans “Morality is scary” (human morality is often the result of status games amplifying random aspects of human value, with frightening results) [...] --- First published: November 9th, 2025 Source: https://www.lesswrong.com/posts/7XGdkATAvCTvn4FGu/problems-i-ve-tried-to-legibilize --- Narrated by TYPE III AUDIO.

Delegation is good! Delegation is the foundation of civilization! But in the depths of delegation madness breeds and evil rises. In my experience, there are three ways in which delegation goes off the rails: 1. You delegate without knowing what good performance on a task looks like If you do not know how to evaluate performance on a task, you are going to have a really hard time delegating it to someone. Most likely, you will choose someone incompetent for the task at hand. But even if you manage to avoid that specific error mode, it is most likely that your delegee will notice that you do not have a standard, and so will use this opportunity to be lazy and do bad work, which they know you won't be able to notice. Or even worse, in an attempt to make sure your delegee puts in proper effort, you set an impossibly high standard, to which the delegee can only respond by quitting, or lying about their performance. This can tank a whole project if you discover it too late. 2. You assigned responsibility for a crucial task to an external party Frequently some task will [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 12th, 2025 Source: https://www.lesswrong.com/posts/rSCxviHtiWrG5pudv/do-not-hand-off-what-you-cannot-pick-up --- Narrated by TYPE III AUDIO.

Vices aren't behaviors that one should never do. Rather, vices are behaviors that are fine and pleasurable to do in moderation, but tempting to do in excess. The classical vices are actually good in part. Moderate amounts of gluttony is just eating food, which is important. Moderate amounts of envy is just "wanting things", which is a motivator of much of our economy. What are some things that rationalists are wont to do, and often to good effect, but that can grow pathological? 1. Contrarianism There are a whole host of unaligned forces producing the arguments and positions you hear. People often hold beliefs out of convenience, defend positions that they are aligned with politically, or just don't give much thought to what they're saying one way or another. A good way find out whether people have any good reasons for their positions, is to take a contrarian stance, and to seek the best arguments for unpopular positions. This also helps you to explore arguments around positions that others aren't investigating. However, this can be taken to the extreme. While it is hard to know for sure what is going on inside others' heads, I know [...] ---Outline:(00:40) 1. Contrarianism(01:57) 2. Pedantry(03:35) 3. Elaboration(03:52) 4. Social Obliviousness(05:21) 5. Assuming Good Faith(06:33) 6. Undercutting Social Momentum(08:00) 7. Digging Your Heels In --- First published: November 16th, 2025 Source: https://www.lesswrong.com/posts/r6xSmbJRK9KKLcXTM/7-vicious-vices-of-rationalists-1 --- Narrated by TYPE III AUDIO.

Context: Post #4 in my sequence of private Lightcone Infrastructure memos edited for public consumption This week's principle is more about how I want people at Lightcone to relate to community governance than it is about our internal team culture. As part of our jobs at Lightcone we often are in charge of determining access to some resource, or membership in some group (ranging from LessWrong to the AI Alignment Forum to the Lightcone Offices). Through that, I have learned that one of the most important things to do when building things like this is to try to tell people as early as possible if you think they are not a good fit for the community; for both trust within the group, and for the sake of the integrity and success of the group itself. E.g. when you spot a LessWrong commenter that seems clearly not on track to ever be a good contributor long-term, or someone in the Lightcone Slack clearly seeming like not a good fit, you should aim to off-ramp them as soon as possible, and generally put marginal resources into finding out whether someone is a good long-term fit early, before they invest substantially [...] --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/Hun4EaiSQnNmB9xkd/tell-people-as-early-as-possible-it-s-not-going-to-work-out --- Narrated by TYPE III AUDIO.

"Everyone has a plan until they get punched in the face." - Mike Tyson (The exact phrasing of that quote changes, this is my favourite.) I think there is an open, important weakness in many people. We assume those we communicate with are basically trustworthy. Further, I think there is an important flaw in the current rationality community. We spend a lot of time focusing on subtle epistemic mistakes, teasing apart flaws in methodology and practicing the principle of charity. This creates a vulnerability to someone willing to just say outright false things. We're kinda slow about reacting to that. Suggested reading: Might People on the Internet Sometimes Lie, People Will Sometimes Just Lie About You. Epistemic status: My Best Guess. I. Getting punched in the face is an odd experience. I'm not sure I recommend it, but people have done weirder things in the name of experiencing novel psychological states. If it happens in a somewhat safety-negligent sparring ring, or if you and a buddy go out in the back yard tomorrow night to try it, I expect the punch gets pulled and it's still weird. There's a jerk of motion your eyes try to catch up [...] ---Outline:(01:03) I.(03:30) II.(07:33) III.(09:55) 4. The original text contained 1 footnote which was omitted from this narration. --- First published: November 14th, 2025 Source: https://www.lesswrong.com/posts/5LFjo6TBorkrgFGqN/everyone-has-a-plan-until-they-get-lied-to-the-face --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

One day, when I was an interning at the cryptography research department of a large software company, my boss handed me an assignment to break a pseudorandom number generator passed to us for review. Someone in another department invented it and planned to use it in their product, and wanted us to take a look first. This person must have had a lot of political clout or was especially confident in himself, because he refused the standard advice that anything an amateur comes up with is very likely to be insecure and he should instead use one of the established, off the shelf cryptographic algorithms, that have survived extensive cryptanalysis (code breaking) attempts. My boss thought he had to demonstrate the insecurity of the PRNG by coming up with a practical attack (i.e., a way to predict its future output based only on its past output, without knowing the secret key/seed). There were three permanent full time professional cryptographers working in the research department, but none of them specialized in cryptanalysis of symmetric cryptography (which covers such PRNGs) so it might have taken them some time to figure out an attack. My time was obviously less valuable and my [...] The original text contained 1 footnote which was omitted from this narration. --- First published: November 12th, 2025 Source: https://www.lesswrong.com/posts/KCSmZsQzwvBxYNNaT/please-don-t-roll-your-own-metaethics --- Narrated by TYPE III AUDIO.

People sometimes make mistakes [citation needed]. The obvious explanation for most of those mistakes is that decision makers do not have access to the information necessary to avoid the mistake, or are not smart/competent enough to think through the consequences of their actions. This predicts that as decision-makers get access to more information, or are replaced with smarter people, their decisions will get better. And this is substantially true! Markets seem more efficient today than they were before the onset of the internet, and in general decision-making across the board has improved on many dimensions. But in many domains, I posit, decision-making has gotten worse, despite access to more information, and despite much larger labor markets, better education, the removal of lead from gasoline, and many other things that should generally cause decision-makers to be more competent and intelligent. There is a lot of variance in decision-making quality that is not well-accounted for by how much information actors have about the problem domain, and how smart they are. I currently believe that the factor that explains most of this remaining variance is "paranoia", in-particular the kind of paranoia that becomes more adaptive as your environment gets [...] ---Outline:(01:31) A market for lemons(05:02) Its lemons all the way down(06:15) Fighter jets and OODA loops(08:23) The first thing you try is to blind yourself(13:37) The second thing you try is to purge the untrustworthy(20:55) The third thing to try is to become unpredictable and vindictive --- First published: November 13th, 2025 Source: https://www.lesswrong.com/posts/yXSKGm4txgbC3gvNs/paranoia-rules-everything-around-me --- Narrated by TYPE III AUDIO.

There is a temptation to simply define Goodness as Human Values, or vice versa. Alas, we do not get to choose the definitions of commonly used words; our attempted definitions will simply be wrong. Unless we stick to mathematics, we will end up sneaking in intuitions which do not follow from our so-called definitions, and thereby mislead ourselves. People who claim that they use some standard word or phrase according to their own definition are, in nearly all cases outside of mathematics, wrong about their own usage patterns.[1] If we want to know what words mean, we need to look at e.g. how they're used and where the concepts come from and what mental pictures they summon. And when we look at those things for Goodness and Human Values… they don't match. And I don't mean that we shouldn't pursue Human Values; I mean that the stuff people usually refer to as Goodness is a coherent thing which does not match the actual values of actual humans all that well. The Yumminess You Feel When Imagining Things Measures Your Values There's this mental picture where a mind has some sort of goals inside it, stuff it wants, stuff it [...] ---Outline:(01:07) The Yumminess You Feel When Imagining Things Measures Your Values(03:26) Goodness Is A Memetic Egregore(05:10) Aside: Loving Connection(06:58) We Don't Get To Choose Our Own Values (Mostly)(09:02) So What Do? The original text contained 2 footnotes which were omitted from this narration. --- First published: November 2nd, 2025 Source: https://www.lesswrong.com/posts/9X7MPbut5feBzNFcG/human-values-goodness --- Narrated by TYPE III AUDIO.

Condensation: a theory of concepts is a model of concept-formation by Sam Eisenstat. Its goals and methods resemble John Wentworth's natural abstractions/natural latents research.[1] Both theories seek to provide a clear picture of how to posit latent variables, such that once someone has understood the theory, they'll say "yep, I see now, that's how latent variables work!". The goal of this post is to popularize Sam's theory and to give my own perspective on it; however, it will not be a full explanation of the math. For technical details, I suggest reading Sam's paper. Brief Summary Shannon's information theory focuses on the question of how to encode information when you have to encode everything. You get to design the coding scheme, but the information you'll have to encode is unknown (and you have some subjective probability distribution over what it will be). Your objective is to minimize the total expected code-length. Algorithmic information theory similarly focuses on minimizing the total code-length, but it uses a "more objective" distribution (a universal algorithmic distribution), and a fixed coding scheme (some programming language). This allows it to talk about the minimum code-length of specific data (talking about particulars rather than average [...] ---Outline:(00:45) Brief Summary(02:35) Shannons Information Theory(07:21) Universal Codes(11:13) Condensation(12:52) Universal Data-Structure?(15:30) Well-Organized Notebooks(18:18) Random Variables(18:54) Givens(19:50) Underlying Space(20:33) Latents(21:21) Contributions(21:39) Top(22:24) Bottoms(22:55) Score(24:29) Perfect Condensation(25:52) Interpretability Solved?(26:38) Condensation isnt as tight an abstraction as information theory.(27:40) Condensation isnt a very good model of cognition.(29:46) Much work to be done! The original text contained 15 footnotes which were omitted from this narration. --- First published: November 9th, 2025 Source: https://www.lesswrong.com/posts/BstHXPgQyfeNnLjjp/condensation --- Narrated by TYPE III AUDIO.

Recently, I looked at the one pair of winter boots I own, and I thought “I will probably never buy winter boots again.” The world as we know it probably won't last more than a decade, and I live in a pretty warm area. I. AGI is likely in the next decade It has basically become consensus within the AI research community that AI will surpass human capabilities sometime in the next few decades. Some, including myself, think this will likely happen this decade. II. The post-AGI world will be unrecognizable Assuming AGI doesn't cause human extinction, it is hard to even imagine what the world will look like. Some have tried, but many of their attempts make assumptions that limit the amount of change that will happen, just to make it easier to imagine such a world. Dario Amodei recently imagined a post-AGI world in Machines of Loving Grace. He imagines rapid progress in medicine, the curing of mental illness, the end of poverty, world peace, and a vastly transformed economy where humans probably no longer provide economic value. However, in imagining this crazy future, he limits his writing to be “tame” enough to be digested by a [...] ---Outline:(00:22) I. AGI is likely in the next decade(00:40) II. The post-AGI world will be unrecognizable(03:08) III. AGI might cause human extinction(04:42) IV. AGI will derail everyone's life plans(06:51) V. AGI will improve life in expectation(08:09) VI. AGI might enable living out fantasies(09:56) VII. I still mourn a life without AI --- First published: November 8th, 2025 Source: https://www.lesswrong.com/posts/jwrhoHxxQHGrbBk3f/mourning-a-life-without-ai --- Narrated by TYPE III AUDIO. ---Images from the article:

Cross-posted from https://bengoldhaber.substack.com/ It's widely known that Corporations are People. This is universally agreed to be a good thing; I list Target as my emergency contact and I hope it will one day be the best man at my wedding. But there are other, less well known non-human entities that have also been accorded the rank of person. Ships: Ships have long posed a tricky problem for states and courts. Similar to nomads, vagabonds, and college students on extended study abroad, they roam far and occasionally get into trouble. classic junior year misadventure If, for instance, a ship attempting to dock at a foreign port crashes on its way into the harbor, who pays? The owner might be a thousand miles away. The practical solution that medieval courts arrived at, and later the British and American admiralty, was the ship itself does. Ships are accorded limited legal person rights, primarily so that they can be impounded and their property seized if they do something wrong. In the eyes of the Law they are people so that they can later be defendants; their rights are constrained to those associated with due process, like the right to post a bond and [...] --- First published: November 8th, 2025 Source: https://www.lesswrong.com/posts/fB5pexHPJRsabvkQ2/unexpected-things-that-are-people --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

According to the Sonnet 4.5 system card, Sonnet 4.5 is much more likely than Sonnet 4 to mention in its chain-of-thought that it thinks it is being evaluated; this seems to meaningfully cause it to appear to behave better in alignment evaluations. So, Sonnet 4.5's behavioral improvements in these evaluations may partly be driven by growing tendency to notice and game evaluations rather than genuine alignment. This is an early example of a phenomenon that is going to get increasingly problematic: as evaluation gaming increases, alignment evaluations become harder to trust.[1]To elaborate on the above: Sonnet 4.5 seems far more aware of being evaluated than previous models. In some of Anthropic's evaluations, Sonnet 4.5 explicitly mentions being tested between 80% and 100% of the time, compared with under 10% for Sonnet 4. Internal representations associated with evaluation awareness also became notably stronger over training.[2]In addition, Sonnet 4.5 seems substantially less likely to behave poorly once it has noticed that it might be tested. As seen in the figure below, when Anthropic inhibited internal representations related to evaluation awareness[3], its misaligned behavior rate on synthetic honeypot tests increased from 0% to between 1% and 9%. This effect was [...] ---Outline:(06:58) Sonnet 4.5 is much more evaluation-aware than prior models(10:00) Evaluation awareness seems to suppress misaligned behavior(14:52) Anthropic's training plausibly caused Sonnet 4.5 to game evaluations(16:28) Evaluation gaming is plausibly a large fraction of the effect of training against misaligned behaviors(22:57) Suppressing evidence of misalignment in evaluation gamers is concerning(25:25) What AI companies should do(30:02) Appendix The original text contained 21 footnotes which were omitted from this narration. --- First published: October 30th, 2025 Source: https://www.lesswrong.com/posts/qgehQxiTXj53X49mM/sonnet-4-5-s-eval-gaming-seriously-undermines-alignment --- Narrated by TYPE III AUDIO. ---Images from the article:

I am a professor of economics. Throughout my career, I was mostly working on economic growth theory, and this eventually brought me to the topic of transformative AI / AGI / superintelligence. Nowadays my work focuses mostly on the promises and threats of this emerging disruptive technology. Recently, jointly with Klaus Prettner, we've written a paper on “The Economics of p(doom): Scenarios of Existential Risk and Economic Growth in the Age of Transformative AI”. We have presented it at multiple conferences and seminars, and it was always well received. We didn't get any real pushback; instead our research prompted a lot of interest and reflection (as I was reported, also in conversations where I wasn't involved). But our experience with publishing this paper in a journal is a polar opposite. To date, the paper got desk-rejected (without peer review) 7 times. For example, Futures—a journal “for the interdisciplinary study of futures, visioning, anticipation and foresight” justified their negative decision by writing: “while your results are of potential interest, the topic of your manuscript falls outside of the scope of this journal”. Until finally, to our excitement, it was for once sent out for review. But then came the [...] --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/rmYj6PTBMm76voYLn/publishing-academic-papers-on-transformative-ai-is-a --- Narrated by TYPE III AUDIO.

[Meta: This is Max Harms. I wrote a novel about China and AGI, which comes out today. This essay from my fiction newsletter has been slightly modified for LessWrong.] In the summer of 1983, Ronald Reagan sat down to watch the film War Games, starring Matthew Broderick as a teen hacker. In the movie, Broderick's character accidentally gains access to a military supercomputer with an AI that almost starts World War III.“The only winning move is not to play.” After watching the movie, Reagan, newly concerned with the possibility of hackers causing real harm, ordered a full national security review. The response: “Mr. President, the problem is much worse than you think.” Soon after, the Department of Defense revamped their cybersecurity policies and the first federal directives and laws against malicious hacking were put in place. But War Games wasn't the only story to influence Reagan. His administration pushed for the Strategic Defense Initiative ("Star Wars") in part, perhaps, because the central technology—a laser that shoots down missiles—resembles the core technology behind the 1940 spy film Murder in the Air, which had Reagan as lead actor. Reagan was apparently such a superfan of The Day the Earth Stood Still [...] ---Outline:(05:05) AI in Particular(06:45) Whats Going On Here?(11:19) Authorial Responsibility The original text contained 10 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/uQak7ECW2agpHFsHX/the-unreasonable-effectiveness-of-fiction --- Narrated by TYPE III AUDIO. ---Images from the article:

Some AI safety problems are legible (obvious or understandable) to company leaders and government policymakers, implying they are unlikely to deploy or allow deployment of an AI while those problems remain open (i.e., appear unsolved according to the information they have access to). But some problems are illegible (obscure or hard to understand, or in a common cognitive blind spot), meaning there is a high risk that leaders and policymakers will decide to deploy or allow deployment even if they are not solved. (Of course, this is a spectrum, but I am simplifying it to a binary for ease of exposition.) From an x-risk perspective, working on highly legible safety problems has low or even negative expected value. Similar to working on AI capabilities, it brings forward the date by which AGI/ASI will be deployed, leaving less time to solve the illegible x-safety problems. In contrast, working on the illegible problems (including by trying to make them more legible) does not have this issue and therefore has a much higher expected value (all else being equal, such as tractability). Note that according to this logic, success in making an illegible problem highly legible is almost as good as solving [...] The original text contained 2 footnotes which were omitted from this narration. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/PMc65HgRFvBimEpmJ/legible-vs-illegible-ai-safety-problems --- Narrated by TYPE III AUDIO.

1. I have claimed that one of the fundamental questions of rationality is “what am I about to do and what will happen next?” One of the domains I ask this question the most is in social situations. There are a great many skills in the world. If I had the time and resources to do so, I'd want to master all of them. Wilderness survival, automotive repair, the Japanese language, calculus, heart surgery, French cooking, sailing, underwater basket weaving, architecture, Mexican cooking, functional programming, whatever it is people mean when they say “hey man, just let him cook.” My inability to speak fluent Japanese isn't a sin or a crime. However, it isn't a virtue either; If I had the option to snap my fingers and instantly acquire the knowledge, I'd do it. Now, there's a different question of prioritization; I tend to pick new skills to learn by a combination of what's useful to me, what sounds fun, and what I'm naturally good at. I picked up the basics of computer programming easily, I enjoy doing it, and it turned out to pay really well. That was an over-determined skill to learn. On the other [...] ---Outline:(00:10) 1.(03:42) 2.(06:44) 3. The original text contained 2 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/NnTwbvvsPg5kj3BKq/lack-of-social-grace-is-a-lack-of-skill-1 --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images

This is a link post. Eliezer Yudkowsky did not exactly suggest that you should eat bear fat covered with honey and sprinkled with salt flakes. What he actually said was that an alien, looking from the outside at evolution, would predict that you would want to eat bear fat covered with honey and sprinkled with salt flakes. Still, I decided to buy a jar of bear fat online, and make a treat for the people at Inkhaven. It was surprisingly good. My post discusses how that happened, and a bit about the implications for Eliezer's thesis. Let me know if you want to try some; I can prepare some for you if you happen to be at Lighthaven before we run out of bear fat, and before I leave toward the end of November. --- First published: November 4th, 2025 Source: https://www.lesswrong.com/posts/2pKiXR6X7wdt8eFX5/i-ate-bear-fat-with-honey-and-salt-flakes-to-prove-a-point Linkpost URL:https://signoregalilei.com/2025/11/03/i-ate-bear-fat-to-prove-a-point/ --- Narrated by TYPE III AUDIO.

As far as I'm aware, Anthropic is the only AI company with official AGI timelines[1]: they expect AGI by early 2027. In their recommendations (from March 2025) to the OSTP for the AI action plan they say: As our CEO Dario Amodei writes in 'Machines of Loving Grace', we expect powerful AI systems will emerge in late 2026 or early 2027. Powerful AI systems will have the following properties: Intellectual capabilities matching or exceeding that of Nobel Prize winners across most disciplines—including biology, computer science, mathematics, and engineering. [...] They often describe this capability level as a "country of geniuses in a datacenter". This prediction is repeated elsewhere and Jack Clark confirms that something like this remains Anthropic's view (as of September 2025). Of course, just because this is Anthropic's official prediction[2] doesn't mean that all or even most employees at Anthropic share the same view.[3] However, I do think we can reasonably say that Dario Amodei, Jack Clark, and Anthropic itself are all making this prediction.[4] I think the creation of transformatively powerful AI systems—systems as capable or more capable than Anthropic's notion of powerful AI—is plausible in 5 years [...] ---Outline:(02:27) What does powerful AI mean?(08:40) Earlier predictions(11:19) A proposed timeline that Anthropic might expect(19:10) Why powerful AI by early 2027 seems unlikely to me(19:37) Trends indicate longer(21:48) My rebuttals to arguments that trend extrapolations will underestimate progress(26:14) Naively trend extrapolating to full automation of engineering and then expecting powerful AI just after this is probably too aggressive(30:08) What I expect(32:12) What updates should we make in 2026?(32:17) If something like my median expectation for 2026 happens(34:07) If something like the proposed timeline (with powerful AI in March 2027) happens through June 2026(35:25) If AI progress looks substantially slower than what I expect(36:09) If AI progress is substantially faster than I expect, but slower than the proposed timeline (with powerful AI in March 2027)(36:51) Appendix: deriving a timeline consistent with Anthropics predictions The original text contained 94 footnotes which were omitted from this narration. --- First published: November 3rd, 2025 Source: https://www.lesswrong.com/posts/gabPgK9e83QrmcvbK/what-s-up-with-anthropic-predicting-agi-by-early-2027-1 --- Narrated by TYPE III AUDIO. ---Images from the article:

This is a link post. New Anthropic research (tweet, blog post, paper): We investigate whether large language models can introspect on their internal states. It is difficult to answer this question through conversation alone, as genuine introspection cannot be distinguished from confabulations. Here, we address this challenge by injecting representations of known concepts into a model's activations, and measuring the influence of these manipulations on the model's self-reported states. We find that models can, in certain scenarios, notice the presence of injected concepts and accurately identify them. Models demonstrate some ability to recall prior internal representations and distinguish them from raw text inputs. Strikingly, we find that some models can use their ability to recall prior intentions in order to distinguish their own outputs from artificial prefills. In all these experiments, Claude Opus 4 and 4.1, the most capable models we tested, generally demonstrate the greatest introspective awareness; however, trends across models are complex and sensitive to post-training strategies. Finally, we explore whether models can explicitly control their internal representations, finding that models can modulate their activations when instructed or incentivized to “think about” a concept. Overall, our results indicate that current language models possess some functional introspective awareness [...] --- First published: October 30th, 2025 Source: https://www.lesswrong.com/posts/QKm4hBqaBAsxabZWL/emergent-introspective-awareness-in-large-language-models Linkpost URL:https://transformer-circuits.pub/2025/introspection/index.html --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

This is a link post. You have things you want to do, but there's just never time. Maybe you want to find someone to have kids with, or maybe you want to spend more or higher-quality time with the family you already have. Maybe it's a work project. Maybe you have a musical instrument or some sports equipment gathering dust in a closet, or there's something you loved doing when you were younger that you want to get back into. Whatever it is, you can't find the time for it. And yet you somehow find thousands of hours a year to watch YouTube, check Twitter and Instagram, listen to podcasts, binge Netflix shows, and read blogs and news articles. You can't focus. You haven't read a physical book in years, and the time you tried it was boring and you felt itchy and you think maybe books are outdated when there's so much to read on the internet anyway. You're talking with a friend, but then your phone buzzes and you look at the notification and you open it, and your girlfriend has messaged you and that's nice, and then your friend says “Did you hear what I just said?” [...] --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/6p4kv8uxYvLcimGGi/you-re-always-stressed-your-mind-is-always-busy-you-never Linkpost URL:https://mingyuan.substack.com/p/youre-always-stressed-your-mind-is --- Narrated by TYPE III AUDIO.

Crosspost from my blog. Synopsis When we share words with each other, we don't only care about the words themselves. We care also—even primarily—about the mental elements of the human mind/agency that produced the words. What we want to engage with is those mental elements. As of 2025, LLM text does not have those elements behind it. Therefore LLM text categorically does not serve the role for communication that is served by real text. Therefore the norm should be that you don't share LLM text as if someone wrote it. And, it is inadvisable to read LLM text that someone else shares as though someone wrote it. Introduction One might think that text screens off thought. Suppose two people follow different thought processes, but then they produce and publish identical texts. Then you read those texts. How could it possibly matter what the thought processes were? All you interact with is the text, so logically, if the two texts are the same then their effects on you are the same. But, a bit similarly to how high-level actions don't screen off intent, text does not screen off thought. How [...] ---Outline:(00:13) Synopsis(00:57) Introduction(02:51) Elaborations(02:54) Communication is for hearing from minds(05:21) Communication is for hearing assertions(12:36) Assertions live in dialogue --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/DDG2Tf2sqc8rTWRk3/llm-generated-text-is-not-testimony --- Narrated by TYPE III AUDIO.

An Overture Famously, trans people tend not to have great introspective clarity into their own motivations for transition. Intuitively, they tend to be quite aware of what they do and don't like about inhabiting their chosen bodies and gender roles. But when it comes to explaining the origins and intensity of those preferences, they almost universally to come up short. I've even seen several smart, thoughtful trans people, such as Natalie Wynn, making statements to the effect that it's impossible to develop a satisfying theory of aberrant gender identities. (She may have been exaggerating for effect, but it was clear she'd given up on solving the puzzle herself.) I'm trans myself, but even I can admit that this lack of introspective clarity is a reason to be wary of transgenderism as a phenomenon. After all, there are two main explanations for trans people's failure to thoroughly explain their own existence. One is that transgenderism is the result of an obscenely complex and arcane neuro-psychological phenomenon, which we have no hope of unraveling through normal introspective methods. The other is that trans people are lying about something, including to themselves. Now, a priori, both of these do seem like real [...] ---Outline:(00:12) An Overture(04:55) In the Case of Fiora Starlight(16:51) Was it worth it? The original text contained 3 footnotes which were omitted from this narration. --- First published: November 1st, 2025 Source: https://www.lesswrong.com/posts/gEETjfjm3eCkJKesz/post-title-why-i-transitioned-a-case-study --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

TL;DR: AI progress and the recognition of associated risks are painful to think about. This cognitive dissonance acts as fertile ground in the memetic landscape, a high-energy state that will be exploited by novel ideologies. We can anticipate cultural evolution will find viable successionist ideologies: memeplexes that resolve this tension by framing the replacement of humanity by AI not as a catastrophe, but as some combination of desirable, heroic, or inevitable outcome. This post mostly examines the mechanics of the process. Most analyses of ideologies fixate on their specific claims - what acts are good, whether AIs are conscious, whether Christ is divine, or whether Virgin Mary was free of original sin from the moment of her conception. Other analyses focus on exegeting individual thinkers: 'What did Marx really mean?' In this text, I'm trying to do something different - mostly, look at ideologies from an evolutionary perspective. I [...] ---Outline:(01:27) What Makes Memes Fit?(03:30) The Cultural Evolution Search Process(04:31) The Fertile Ground: Sources of Dissonance(04:53) 1. The Builders Dilemma and the Hero Narrative(05:35) 2. The Sadness of Obsolescence(06:06) 3. X-Risk(06:24) 4. The Wrong Side of History(06:36) 5. The Progress Heuristic(06:57) The Resulting Pressure(07:52) The Meme Pool: Raw Materials for Successionism(08:14) 1. Devaluing Humanity(09:10) 2. Legitimizing the Successor AI(12:08) 3. Narratives of Inevitability(12:13) Memes that make our obsolescence seem like destiny rather than defeat.(14:14) Novel Factor: the AIs(16:05) Defense Against Becoming a Host(18:13) Appendix: Some memes --- First published: October 28th, 2025 Source: https://www.lesswrong.com/posts/XFDjzKXZqKdvZ2QKL/the-memetics-of-ai-successionism --- Narrated by TYPE III AUDIO.

This is the latest in a series of essays on AI Scaling. You can find the others on my site. Summary: RL-training for LLMs scales surprisingly poorly. Most of its gains are from allowing LLMs to productively use longer chains of thought, allowing them to think longer about a problem. There is some improvement for a fixed length of answer, but not enough to drive AI progress. Given the scaling up of pre-training compute also stalled, we'll see less AI progress via compute scaling than you might have thought, and more of it will come from inference scaling (which has different effects on the world). That lengthens timelines and affects strategies for AI governance and safety. The current era of improving AI capabilities using reinforcement learning (from verifiable rewards) involves two key types of scaling: Scaling the amount of compute used for RL during training Scaling [...] ---Outline:(09:46) How do these compare to pre-training scaling?(14:16) Conclusion --- First published: October 22nd, 2025 Source: https://www.lesswrong.com/posts/xpj6KhDM9bJybdnEe/how-well-does-rl-scale --- Narrated by TYPE III AUDIO. ---Images from the article:

I've created a highly specific and actionable privacy guide, sorted by importance and venturing several layers deep into the privacy iceberg. I start with the basics (password manager) but also cover the obscure (dodging the millions of Bluetooth tracking beacons which extend from stores to traffic lights; anti-stingray settings; flashing GrapheneOS on a Pixel). I feel strongly motivated by current events, but the guide also contains a large amount of timeless technical content. Here's a preview. Digital Threat Modeling Under Authoritarianism by Bruce Schneier Being innocent won't protect you. This is vital to understand. Surveillance systems and sorting algorithms make mistakes. This is apparent in the fact that we are routinely served advertisements for products that don't interest us at all. Those mistakes are relatively harmless—who cares about a poorly targeted ad?—but a similar mistake at an immigration hearing can get someone deported. An authoritarian government doesn't care. Mistakes are a feature and not a bug of authoritarian surveillance. If ICE targets only people it can go after legally, then everyone knows whether or not they need to fear ICE. If ICE occasionally makes mistakes by arresting Americans and deporting innocents, then everyone has to [...] ---Outline:(01:55) What should I read?(02:53) Whats your risk level?(03:46) What information this guide will and wont help you protect(05:00) Overview of the technical recommendations in each post(05:05) Privacy Despite Authoritarianism(06:08) Advanced Privacy Despite Authoritarianism --- First published: October 29th, 2025 Source: https://www.lesswrong.com/posts/BPyieRshykmrdY36A/an-opinionated-guide-to-privacy-despite-authoritarianism --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

There is a very famous essay titled ‘Reality has a surprising amount of detail'. The thesis of the article is that reality is filled, just filled, with an incomprehensible amount of materially important information, far more than most people would naively expect. Some of this detail is inherent in the physical structure of the universe, and the rest of it has been generated by centuries of passionate humans imbibing the subject with idiosyncratic convention. In either case, the detail is very, very important. A wooden table is “just” a flat slab of wood on legs until you try building one at industrial scales, and then you realize that a flat slab of wood on legs is but one consideration amongst grain, joint stability, humidity effects, varnishes, fastener types, ergonomics, and design aesthetics. And this is the case for literally everything in the universe. Including cancer. But up until just [...] --- First published: October 26th, 2025 Source: https://www.lesswrong.com/posts/w7eojyXfXiZaBSGej/cancer-has-a-surprising-amount-of-detail --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

There's a strong argument that humans should stop trying to build more capable AI systems, or at least slow down progress. The risks are plausibly large but unclear, and we'd prefer not to die. But the roadmaps of the companies pursuing these systems envision increasingly agentic AI systems taking over the key tasks of researching and building superhuman AI systems, and humans will therefore have a decreasing ability to make many key decisions. In the near term, humanity could stop, but seem likely to fail. That said, even though humans have relatively little ability to coordinate around such unilateralist dilemmas, AI systems may have different abilities or incentives. Not only that, but it seems at least plausible that the systems used to continue work on ASI development will need a higher degree of self-directedness and internal goal-directed behavior. The below, arguing that they should refrain from developing more advanced [...] ---Outline:(01:10) Cui bono?(02:39) Should the AI Systems Care?(04:29) Who might be convinced? --- First published: October 27th, 2025 Source: https://www.lesswrong.com/posts/CFA8W6WCodEZdjqYE/ais-should-also-refuse-to-work-on-capabilities-research --- Narrated by TYPE III AUDIO.

(23K words; best considered as nonfiction with a fictional-dialogue frame, not a proper short story.) Prologue: Klurl and Trapaucius were members of the machine race. And no ordinary citizens they, but Constructors: licensed, bonded, and insured; proven, experienced, and reputed. Together Klurl and Trapaucius had collaborated on such famed artifices as the Eternal Clock, Silicon Sphere, Wandering Flame, and Diamond Book; and as individuals, both had constructed wonders too numerous to number. At one point in time Trapaucius was meeting with Klurl to drink a cup together. Klurl had set before himself a simple mug of mercury, considered by his kind a standard social lubricant. Trapaucius had brought forth in turn a far more exotic and experimental brew he had been perfecting, a new intoxicant he named gallinstan, alloyed from gallium, indium, and tin. "I have always been curious, friend Klurl," Trapaucius began, "about the ancient mythology which holds [...] ---Outline:(00:20) Prologue:(05:16) On Fleshling Capabilities (the First Debate between Klurl and Trapaucius):(26:05) On Fleshling Motivations (the 2nd (and by Far Longest) Debate between Klurl and Trapaucius):(36:32) On the Epistemology of Simplicitys Razor Applied to Fleshlings (the 2nd Part of their 2nd Debate, that is, its 2.2nd Part):(51:36) On the Epistemology of Reasoning About Alien Optimizers and their Outputs (their 2.3rd Debate):(01:08:46) On Considering the Outcome of a Succession of Filters (their 2.4th Debate):(01:16:50) On the Purported Beneficial Influence of Complications (their 2.5th Debate):(01:25:58) On the Comfortableness of All Reality (their 2.6th Debate):(01:32:53) On the Way of Proceeding with the Discovered Fleshlings (their 3rd Debate):(01:52:22) In which Klurl and Trapaucius Interrogate a Fleshling (that Being the 4th Part of their Sally):(02:16:12) On the Storys End:--- First published: October 26th, 2025 Source: https://www.lesswrong.com/posts/dHLdf8SB8oW5L27gg/on-fleshling-safety-a-debate-by-klurl-and-trapaucius --- Narrated by TYPE III AUDIO.

If you want to understand a country, you should pick a similar country that you are already familiar with, research the differences between the two and there you go, you are now an expert. But this approach doesn't quite work for the European Union. You might start, for instance, by comparing it to the United States, assuming that EU member countries are roughly equivalent to U.S. states. But that analogy quickly breaks down. The deeper you dig, the more confused you become. You try with other federal states. Germany. Switzerland. But it doesn't work either. Finally, you try with the United Nations. After all, the EU is an international organization, just like the UN. But again, the analogy does not work. The facts about the EU just don't fit into your UN-shaped mental model. Not getting anywhere, you decide to bite the bullet and learn about the EU the [...] --- First published: October 21st, 2025 Source: https://www.lesswrong.com/posts/88CaT5RPZLqrCmFLL/eu-explained-in-10-minutes --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

I recently visited my girlfriend's parents in India. Here is what that experience taught me: Yudkowsky has this facebook post where he makes some inferences about the economy after noticing two taxis stayed in the same place while he got his groceries. I had a few similar experiences while I was in India, though sadly I don't remember them in enough detail to illustrate them in as much detail as that post. Most of the thoughts relating to economics revolved around how labour in India is extremely cheap. I knew in the abstract that India is not as rich as countries I had been in before, but it was very different seeing that in person. From the perspective of getting an intuitive feel for economics, it was very interesting to be thrown into a very different economy and seeing a lot of surprising facts and noticing how [...] --- First published: October 16th, 2025 Source: https://www.lesswrong.com/posts/2xWC6FkQoRqTf9ZFL/cheap-labour-everywhere --- Narrated by TYPE III AUDIO.

This is a link post. Written in my personal capacity. Thanks to many people for conversations and comments. Written in less than 24 hours; sorry for any sloppiness. It's an uncanny, weird coincidence that the two biggest legislative champions for AI safety in the entire country announced their bids for Congress just two days apart. But here we are. On Monday, I put out a long blog post making the case for donating to Alex Bores, author of the New York RAISE Act. And today I'm doing the exact same thing for Scott Wiener, who announced a run for Congress in California today (October 22). Much like with Alex Bores, if you're potentially interested in donating to Wiener, my suggestion would be to: Read this post to understand the case for donating to Scott Wiener. Understand that political donations are a matter of public record, and that this [...] --- First published: October 22nd, 2025 Source: https://www.lesswrong.com/posts/n6Rsb2jDpYSfzsbns/consider-donating-to-ai-safety-champion-scott-wiener Linkpost URL:https://ericneyman.wordpress.com/2025/10/22/consider-donating-to-ai-safety-champion-scott-wiener/ --- Narrated by TYPE III AUDIO.

In recent years, I've found that people who self-identify as members of the AI safety community have increasingly split into two camps: Camp A) "Race to superintelligence safely”: People in this group typically argue that "superintelligence is inevitable because of X”, and it's therefore better that their in-group (their company or country) build it first. X is typically some combination of “Capitalism”, “Molloch”, “lack of regulation” and “China”. Camp B) “Don't race to superintelligence”: People in this group typically argue that “racing to superintelligence is bad because of Y”. Here Y is typically some combination of “uncontrollable”, “1984”, “disempowerment” and “extinction”. Whereas the 2023 extinction statement was widely signed by both Camp B and Camp A (including Dario Amodei, Demis Hassabis and Sam Altman), the 2025 superintelligence statement conveniently separates the two groups – for example, I personally offered all US Frontier AI CEO's to sign, and none chose [...] --- First published: October 22nd, 2025 Source: https://www.lesswrong.com/posts/zmtqmwetKH4nrxXcE/which-side-of-the-ai-safety-community-are-you-in --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

There's an argument I sometimes hear against existential risks, or any other putative change that some are worried about, that goes something like this: 'We've seen time after time that some people will be afraid of any change. They'll say things like "TV will destroy people's ability to read", "coffee shops will destroy the social order","machines will put textile workers out of work". Heck, Socrates argued that books would harm people's ability to memorize things. So many prophets of doom, and yet the world has not only survived, it has thrived. Innovation is a boon. So we should be extremely wary when someone cries out "halt" in response to a new technology, as that path is lined with skulls of would be doomsayers." Lest you think this is a straw man, Yann Le Cun compared fears about AI doom to fears about coffee. Now, I don't want to criticize [...] --- First published: October 22nd, 2025 Source: https://www.lesswrong.com/posts/cAmBfjQDj6eaic95M/doomers-were-right --- Narrated by TYPE III AUDIO.

People don't explore enough. They rely on cached thoughts and actions to get through their day. Unfortunately, this doesn't lead to them making progress on their problems. The solution is simple. Just do one new thing a day to solve one of your problems. Intellectually, I've always known that annoying, persistent problems often require just 5 seconds of actual thought. But seeing a number of annoying problems that made my life worse, some even major ones, just yield to the repeated application of a brief burst of thought each day still surprised me. For example, I had a wobbly chair. It was wobbling more as time went on, and I worried it would break. Eventually, I decided to try actually solving the issue. 1 minute and 10 turns of an allen key later, it was fixed. Another example: I have a shot attention span. I kept [...] --- First published: October 3rd, 2025 Source: https://www.lesswrong.com/posts/gtk2KqEtedMi7ehxN/do-one-new-thing-a-day-to-solve-your-problems --- Narrated by TYPE III AUDIO.

Summary: Looking over humanity's response to the COVID-19 pandemic, almostsix years later, reveals that we've forgotten to fulfill our intent atpreparing for the next pandemic. I rant. content warning: A single carefully placed slur. If we want to create a world free of pandemics and other biologicalcatastrophes, the time to act is now. —US White House, “ FACT SHEET: The Biden Administration's Historic Investment in Pandemic Preparedness and Biodefense in the FY 2023 President's Budget ”, 2022 Around five years, a globalpandemic caused bya coronavirus started. In the course of the pandemic, there have been atleast 6 million deaths and more than 25 million excessdeaths. Thevalue of QALYs lost due to the pandemic in the US alone was around $5trio.,the GDP loss in the US alone in 2020 $2trio..The loss of gross [...] The original text contained 12 footnotes which were omitted from this narration. --- First published: October 19th, 2025 Source: https://www.lesswrong.com/posts/pvEuEN6eMZC2hqG9c/humanity-learned-almost-nothing-from-covid-19 --- Narrated by TYPE III AUDIO.