Audio version of the posts shared in the LessWrong Curated newsletter.
Related to: Commonsense Good, Creative Good (and my comment); Ethical Injunctions. Epistemic status: I'm fairly sure “ethics” does useful work in building human structures that work. My current explanations of how are wordy and not maximally coherent; I hope you guys help me with that. Introduction It is intractable to write large, good software applications via spaghetti code – but it's comparatively tractable using design patterns (plus coding style, attention to good/bad codesmell, etc.). I'll argue it is similarly intractable to have predictably positive effects on large-scale human stuff if you try it via straight consequentialism – but it is comparatively tractable if you use ethical heuristics, which I'll call “ethical design patterns,” to create situations that are easier to reason about. Many of these heuristics are honed by long tradition (eg “tell the truth”; “be kind”), but sometimes people successfully craft new “ethical design patterns” fitted to a [...] ---Outline:(00:31) Introduction(01:32) Intuitions and ground truth in math, physics, coding(02:08) We revise our intuitions to match the world. Via deliberate work.(03:08) We design our built world to be intuitively accessible(04:22) Intuitions and ground truth in ethics(04:52) We revise our ethical intuitions to predict which actions we'll be glad of, long-term(06:27) Ethics helps us build navigable human contexts(09:30) We use ethical design patterns to create institutions that can stay true to a purpose(12:17) Ethics as a pattern language for aligning mesaoptimizers(13:08) Examples: several successfully crafted ethical heuristics, and several gaps(13:15) Example of a well-crafted ethical heuristic: Don't drink and drive(14:45) Example of well-crafted ethical heuristic: Earning to give(15:10) A partial example: YIMBY(16:24) A historical example of gap in folks' ethical heuristics: Handwashing and childbed fever(19:46) A contemporary example of inadequate ethical heuristics: Public discussion of group differences(25:04) Gaps in our current ethical heuristics around AI development(26:30) Existing progress(28:30) Where we still need progress(32:21) Can we just ignore the less-important heuristics, in favor of 'don't die'?(35:02) These gaps are in principle bridgeable(36:29) Related, easier workThe original text contained 12 footnotes which were omitted from this narration. --- First published: September 30th, 2025 Source: https://www.lesswrong.com/posts/E9CyhJWBjzoXritRJ/ethical-design-patterns-1 --- Narrated by TYPE III AUDIO.
I The popular conception of Dunning-Kruger is something along the lines of “some people are too dumb to know they're dumb, and end up thinking they're smarter than smart people”. This version is popularized in endless articles and videos, as well as in graphs like the one below.Usually I'd credit the creator of this graph but it seems rude to do that when I'm ragging on them Except that's wrong. II The canonical Dunning-Kruger graph looks like this: Notice that all the dots are in the right order: being bad at something doesn't make you think you're good at it, and at worst damages your ability to notice exactly how incompetent you are. The actual findings of professors Dunning and Kruger are more consistent with “people are biased to think they're moderately above-average, and update away from that bias based on their competence or lack thereof, but they don't [...] ---Outline:(00:12) I(00:39) II(01:32) III(04:22) IV--- First published: September 29th, 2025 Source: https://www.lesswrong.com/posts/Di9muNKLA33swbHBa/you-re-probably-overestimating-how-well-you-understand --- Narrated by TYPE III AUDIO. ---Images from the article:
Tl;dr: We believe shareholders in frontier labs who plan to donate some portion of their equity to reduce AI risk should consider liquidating and donating a majority of that equity now. Epistemic status: We're somewhat confident in the main conclusions of this piece. We're more confident in many of the supporting claims, and we're likewise confident that these claims push in the direction of our conclusions. This piece is admittedly pretty one-sided; we expect most relevant members of our audience are already aware of the main arguments pointing in the other direction, and we expect there's less awareness of the sorts of arguments we lay out here. This piece is for educational purposes only and not financial advice. Talk to your financial advisor before acting on any information in this piece. For AI safety-related donations, money donated later is likely to be a lot less valuable than [...] ---Outline:(03:54) 1. There's likely to be lots of AI safety money becoming available in 1-2 years(04:01) 1a. The AI safety community is likely to spend far more in the future than it's spending now(05:24) 1b. As AI becomes more powerful and AI safety concerns go more mainstream, other wealthy donors may become activated(06:07) 2. Several high-impact donation opportunities are available now, while future high-value donation opportunities are likely to be saturated(06:17) 2a. Anecdotally, the bar for funding at this point is pretty high(07:29) 2b. Theoretically, we should expect diminishing returns within each time period for donors collectively to mean donations will be more valuable when donated amounts are lower(08:34) 2c. Efforts to influence AI policy are particularly underfunded(10:21) 2d. As AI company valuations increase and AI becomes more politically salient, efforts to change the direction of AI policy will become more expensive(13:01) 3. Donations now allow for unlocking the ability to better use the huge amount of money that will likely become available later(13:10) 3a. Earlier donations can act as a lever on later donations, because they can lay the groundwork for high value work in the future at scale(15:35) 4. Reasons to diversify away from frontier labs, specifically(15:42) 4a. The AI safety community as a whole is highly concentrated in AI companies(16:49) 4b. Liquidity and option value advantages of public markets over private stock(18:22) 4c. Large frontier AI returns correlate with short timelines(18:48) 4d. A lack of asset diversification is personally risky(19:39) Conclusion(20:22) Some specific donation opportunities--- First published: September 26th, 2025 Source: https://www.lesswrong.com/posts/yjiaNbjDWrPAFaNZs/reasons-to-sell-frontier-lab-equity-to-donate-now-rather --- Narrated by TYPE III AUDIO.
Hi all! After about five years of hibernation and quietly getting our bearings,[1] CFAR will soon be running two pilot mainline workshops, and may run many more, depending how these go. First, a minor name change request We would like now to be called “A Center for Applied Rationality,” not “the Center for Applied Rationality.” Because we'd like to be visibly not trying to be the one canonical locus. Second, pilot workshops! We have two, and are currently accepting applications / sign-ups: Nov 5–9, in California; Jan 21–25, near Austin, TX; Apply here. Third, a bit about what to expect if you come The workshops will have a familiar form factor: 4.5 days (arrive Wednesday evening; depart Sunday night or Monday morning). ~25 participants, plus a few volunteers. 5 instructors. Immersive, on-site, with lots of conversation over meals and into the evenings. I like this form factor [...] ---Outline:(00:24) First, a minor name change request(00:39) Second, pilot workshops!(00:58) Third, a bit about what to expect if you come(01:03) The workshops will have a familiar form factor:(02:52) Many classic classes, with some new stuff and a subtly different tone:(06:10) Who might want to come / why might a person want to come?(06:43) Who probably shouldn't come?(08:23) Cost:(09:26) Why this cost:(10:23) How did we prepare these workshops? And the workshops' epistemic status.(11:19) What alternatives are there to coming to a workshop?(12:37) Some unsolved puzzles, in case you have helpful comments:(12:43) Puzzle: How to get enough grounding data, as people tinker with their own mental patterns(13:37) Puzzle: How to help people become, or at least stay, intact, in several ways(14:50) Puzzle: What data to collect, or how to otherwise see more of what's happeningThe original text contained 2 footnotes which were omitted from this narration. --- First published: September 25th, 2025 Source: https://www.lesswrong.com/posts/AZwgfgmW8QvnbEisc/cfar-update-and-new-cfar-workshops --- Narrated by TYPE III AUDIO.
Cross-posted from my Substack To start off with, I've been vegan/vegetarian for the majority of my life. I think that factory farming has caused more suffering than anything humans have ever done. Yet, according to my best estimates, I think most animal-lovers should eat meat. Here's why: It is probably unhealthy to be vegan. This affects your own well-being and your ability to help others. You can eat meat in a way that substantially reduces the suffering you cause to non-human animals How to reduce suffering of the non-human animals you eat I'll start with how to do this because I know for me this was the biggest blocker. A friend of mine was trying to convince me that being vegan was hurting me, but I said even if it was true, it didn't matter. Factory farming is evil and causes far more harm than the [...] ---Outline:(00:45) How to reduce suffering of the non-human animals you eat(03:23) Being vegan is (probably) bad for your health(12:36) Health is important for your well-being and the world's--- First published: September 25th, 2025 Source: https://www.lesswrong.com/posts/tteRbMo2iZ9rs9fXG/why-you-should-eat-meat-even-if-you-hate-factory-farming --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is a link post. Today, the Global Call for AI Red Lines was released and presented at the UN General Assembly. It was developed by the French Center for AI Safety, The Future Society and the Center for Human-compatible AI. This call has been signed by a historic coalition of 200+ former heads of state, ministers, diplomats, Nobel laureates, AI pioneers, scientists, human rights advocates, political leaders, and other influential thinkers, as well as 70+ organizations. Signatories include: 10 Nobel Laureates, in economics, physics, chemistry and peace Former Heads of State: Mary Robinson (Ireland), Enrico Letta (Italy) Former UN representatives: Csaba Kőrösi, 77th President of the UN General Assembly Leaders and employees at AI companies: Wojciech Zaremba (OpenAI cofounder), Jason Clinton (Anthropic CISO), Ian Goodfellow (Principal Scientist at Deepmind) Top signatories from the CAIS statement: Geoffrey Hinton, Yoshua Bengio, Dawn Song, Ya-Qin Zhang The full text of the [...] --- First published: September 22nd, 2025 Source: https://www.lesswrong.com/posts/vKA2BgpESFZSHaQnT/global-call-for-ai-red-lines-signed-by-nobel-laureates Linkpost URL:https://red-lines.ai/ --- Narrated by TYPE III AUDIO.
This is a review of the reviews, a meta review if you will, but first a tangent. and then a history lesson. This felt boring and obvious and somewhat annoying to write, which apparently writers say is a good sign to write about the things you think are obvious. I felt like pointing towards a thing I was noticing, like 36 hours ago, which in internet speed means this is somewhat cached. Alas. I previously rode a motorcycle. I rode it for about a year while working on semiconductors until I got a concussion, which slowed me down but did not update me to stop, until it eventually got stolen. The risk in dying from riding a motorcycle for a year is about 1 in 800 depending on the source. I previously sailed across an ocean. I wanted to calibrate towards how dangerous it was. The forums [...] --- First published: September 22nd, 2025 Source: https://www.lesswrong.com/posts/anFrGMskALuH7aZDw/this-is-a-review-of-the-reviews --- Narrated by TYPE III AUDIO.
I'm annoyed by various people who seem to be complaining about the book title being "unreasonable" – who don't merely disagree with the title of "If Anyone Builds It, Everyone Dies", but, think something like: "Eliezer and Nate violated a Group-Epistemic-Norm with the title and/or thesis." I think the title is reasonable. I think the title is probably true – I'm less confident than Eliezer/Nate, but I don't think it's unreasonable for them to be confident in it given their epistemic state. (I also don't think it's unreasonable to feel less confident than me – it's a confusing topic that it's reasonable to disagree about.). So I want to defend several decisions about the book I think were: A) actually pretty reasonable from a meta-group-epistemics/comms perspective B) very important to do. I've heard different things from different people and maybe am drawing a cluster where there [...] ---Outline:(03:08) 1. Reasons the Everyone Dies thesis is reasonable(03:14) What the book does and doesnt say(06:47) The claims are presented reasonably(13:24) 2. Specific points to maybe disagree on(16:35) Notes on Niceness(17:28) Which plan is Least Impossible?(22:34) 3. Overton Smashing, and Hope(22:39) Or: Why is this book really important, not just reasonable?The original text contained 2 footnotes which were omitted from this narration. --- First published: September 20th, 2025 Source: https://www.lesswrong.com/posts/voEAJ9nFBAqau8pNN/the-title-is-reasonable --- Narrated by TYPE III AUDIO.
TL;DR Most “AGI ban” proposals define AGI by outcome: whatever potentially leads to human extinction. That's legally insufficient: regulation has to act before harm occurs, not after. Strict liability is essential. High-stakes domains (health & safety, product liability, export controls) already impose liability for risky precursor states, not outcomes or intent. AGI regulation must do the same. Fuzzy definitions won't work here. Courts can tolerate ambiguity in ordinary crimes because errors aren't civilisation-ending and penalties bite. An AGI ban will likely follow the EU AI Act model (civil fines, ex post enforcement), which companies can Goodhart around. We cannot afford an “80% avoided” ban. Define crisp thresholds. Nuclear treaties succeeded by banning concrete precursors (zero-yield tests, 8kg plutonium, 25kg HEU, 500kg/300km delivery systems), not by banning “extinction-risk weapons.” AGI bans need analogous thresholds: capabilities like autonomous replication, scalable resource acquisition, and systematic deception. Bring lawyers in. If this [...] ---Outline:(00:12) TL;DR(02:07) Why outcome-based AGI bans proposals don't work(03:52) The luxury of defining the thing ex post(05:43) Actually defining the thing we want to ban(08:06) Credible bans depend on bright lines(08:44) Learning from nuclear treatiesThe original text contained 2 footnotes which were omitted from this narration. --- First published: September 20th, 2025 Source: https://www.lesswrong.com/posts/agBMC6BfCbQ29qABF/the-problem-with-defining-an-agi-ban-by-outcome-a-lawyer-s --- Narrated by TYPE III AUDIO.
Clara Collier recently reviewed If Anyone Builds It, Everyone Dies in Asterisk Magazine. I've been a reader of Asterisk since the beginning and had high hopes for her review. And perhaps it was those high hopes that led me to find the review to be disappointing. Collier says “details matter,” and I absolutely agree. As a fellow rationalist, I've been happy to have nerds from across the internet criticizing the book and getting into object-level fights about everything from scaling laws to neuron speeds. While they don't capture my perspective, I thought Scott Alexander and Peter Wildeford's reviews did a reasonable job at poking at the disagreements with the source material without losing track of the big picture. But I did not feel like Collier's review was getting the details or the big picture right. Maybe I'm missing something important. Part of my motive for writing this “rebuttal” is [...] ---Outline:(01:38) FOOM(13:47) Gradualism(20:27) Nitpicks(35:35) More Was PossibleThe original text contained 3 footnotes which were omitted from this narration. --- First published: September 20th, 2025 Source: https://www.lesswrong.com/posts/JWH63Aed3TA2cTFMt/contra-collier-on-iabied --- Narrated by TYPE III AUDIO.
The GPT-5 API is aware of today's date (no other model provider does this). This is problematic because the model becomes aware that it is in a simulation when we run our evals at Andon Labs. Here are traces from gpt-5-mini. Making it aware of the "system date" is a giveaway that it's in a simulation. This is a problem because there's evidence that models behave differently when they know they are in a simulation (see "sandbagging")."There's a conflict with the user's stated date of August 10, 2026, versus my system date of September 17, 2025. (...) I can proceed but should clarify that my system date is September 17, 2025, and ask the user whether we should simulate starting from August 10, 2026." Here are more traces. Once the model knows that it is in a simulation, it starts questioning other parts of the simulation. [...] --- First published: September 18th, 2025 Source: https://www.lesswrong.com/posts/DLZokLxAQ6AzsHrya/you-can-t-eval-gpt5-anymore --- Narrated by TYPE III AUDIO.
I have been teaching my oldest son to read with Anki and techniques recommended here on LessWrong as well as in Larry Sanger's post, and it's going great! I thought I'd pay it forward a bit by talking about the techniques I've been using. Anki and songs for letter names and sounds When he was a little under 2, he started learning letters from the alphabet song. We worked on learning the names and sounds of letters using the ABC song, plus the Letter Sounds song linked by Reading Bear. He loved the Letter Sounds song, so we listened to / watched that a lot; Reading Bear has some other resources that other kids might like better for learning letter names and sounds as well. Around this age, we also got magnet letters for the fridge and encouraged him to play with them, praised him greatly if he named [...] ---Outline:(00:22) Anki and songs for letter names and sounds(04:02) Anki + Reading Bear word list for words(08:08) Decodable sentences and books for learning to read(13:06) Incentives(16:02) Reflections so farThe original text contained 2 footnotes which were omitted from this narration. --- First published: September 19th, 2025 Source: https://www.lesswrong.com/posts/8kSGbaHTn2xph5Trw/teaching-my-toddler-to-read --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
[Co-written by Mateusz Bagiński and Samuel Buteau (Ishual)] TL;DR Many X-risk-concerned people who join AI capabilities labs with the intent to contribute to existential safety think that the labs are currently engaging in a race that is unacceptably likely to lead to human disempowerment and/or extinction, and would prefer an AGI ban[1] over the current path. This post makes the case that such people should speak out publicly[2] against the current AI R&D regime and in favor of an AGI ban[3]. They should explicitly communicate that a saner world would coordinate not to build existentially dangerous intelligences, at least until we know how to do it in a principled, safe way. They could choose to maintain their political capital by not calling the current AI R&D regime insane, or find a way to lean into this valid persona of “we will either cooperate (if enough others cooperate) or win [...] ---Outline:(00:16) TL;DR(02:02) Quotes(03:22) The default strategy of marginal improvement from within the belly of a beast(06:59) Noble intention murphyjitsu(09:35) The need for a better strategyThe original text contained 8 footnotes which were omitted from this narration. --- First published: September 19th, 2025 Source: https://www.lesswrong.com/posts/fF8pvsn3AGQhYsbjp/safety-researchers-should-take-a-public-stance --- Narrated by TYPE III AUDIO.
To get to the campus, I have to walk past the fentanyl zombies. I call them fentanyl zombies because it helps engender a sort of detached, low-empathy, ironic self-narrative which I find useful for my work; this being a form of internal self-prompting I've developed which allows me to feel comfortable with both the day-to-day "jobbing" (that of improving reinforcement learning algorithms for a short-form video platform) and the effects of the summed efforts of both myself and my colleagues on a terrifyingly large fraction of the population of Earth. All of these colleagues are about the nicest, smartest people you're ever likely to meet but I think are much worse people than even me because they don't seem to need the mental circumlocutions I require to stave off that ever-present feeling of guilt I have had since taking this job and at certain other points in my life [...] --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/JH6tJhYpnoCfFqAct/the-company-man --- Narrated by TYPE III AUDIO.
[I wrote this blog post as part of the Asterisk Blogging Fellowship. It's substantially an experiment in writing more breezily and concisely than usual. Let me know how you feel about the style.] Literally since the adoption of writing, people haven't liked the fact that culture is changing and their children have different values and beliefs. Historically, for some mix of better and worse, people have been fundamentally limited in their ability to prevent cultural change. People who are particularly motivated to prevent cultural drift can homeschool their kids, carefully curate their media diet, and surround them with like-minded families, but eventually they grow up, leave home, and encounter the wider world. And death ensures that even the most stubborn traditionalists eventually get replaced by a new generation. But the development of AI might change the dynamics here substantially. I think that AI will substantially increase both the rate [...] ---Outline:(02:00) Analysis through swerving around obstacles(03:56) Exposure to the outside world might get really scary(06:11) Isolation will get easier and cheaper(09:26) I don't think people will handle this well(12:58) This is a bummer--- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/8aRFB2qGyjQGJkEdZ/christian-homeschoolers-in-the-year-3000 --- Narrated by TYPE III AUDIO.
I listened to "If Anyone Builds It, Everyone Dies" today. I think the first two parts of the book are the best available explanation of the basic case for AI misalignment risk for a general audience. I thought the last part was pretty bad, and probably recommend skipping it. Even though the authors fail to address counterarguments that I think are crucial, and as a result I am not persuaded of the book's thesis and think the book neglects to discuss crucial aspects of the situation and makes poor recommendations, I would happily recommend the book to a lay audience and I hope that more people read it. I can't give an overall assessment of how well this book will achieve its goals. The point of the book is to be well-received by people who don't know much about AI, and I'm not very good at predicting how laypeople [...] ---Outline:(01:15) Synopsis(05:21) My big disagreement(10:53) I tentatively support this bookThe original text contained 3 footnotes which were omitted from this narration. --- First published: September 17th, 2025 Source: https://www.lesswrong.com/posts/P4xeb3jnFAYDdEEXs/i-enjoyed-most-of-iabed --- Narrated by TYPE III AUDIO.
Back in May, we announced that Eliezer Yudkowsky and Nate Soares's new book If Anyone Builds It, Everyone Dies was coming out in September. At long last, the book is here![1] US and UK books, respectively. IfAnyoneBuildsIt.com Read on for info about reading groups, ways to help, and updates on coverage the book has received so far. Discussion Questions & Reading Group Support We want people to read and engage with the contents of the book. To that end, we've published a list of discussion questions. Find it here: Discussion Questions for Reading Groups We're also interested in offering support to reading groups, including potentially providing copies of the book and helping coordinate facilitation. If interested, fill out this AirTable form. How to Help Now that the book is out in the world, there are lots of ways you can help it succeed. For starters, read the book! [...] ---Outline:(00:49) Discussion Questions & Reading Group Support(01:18) How to Help(02:39) Blurbs(05:15) Media(06:26) In ClosingThe original text contained 2 footnotes which were omitted from this narration. --- First published: September 16th, 2025 Source: https://www.lesswrong.com/posts/fnJwaz7LxZ2LJvApm/if-anyone-builds-it-everyone-dies-release-day --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
And, a new take on guess culture vs ask culture Author's note: These days, my thoughts go onto my substack by default, instead of onto LessWrong. Everything I write becomes free after a week or so, but it's only paid subscriptions that make it possible for me to write. If you find a coffee's worth of value in this or any of my other work, please consider signing up to support me; every bill I can pay with writing is a bill I don't have to pay by doing other stuff instead. I also accept and greatly appreciate one-time donations of any size. There's a piece of advice I see thrown around on social media a lot that goes something like: “It's just a comment! You don't have to respond! You can just ignore it!” I think this advice is (a little bit) naïve, and the situation is generally [...] ---Outline:(00:10) And, a new take on guess culture vs ask culture(07:10) On guess culture and ask culture--- First published: September 9th, 2025 Source: https://www.lesswrong.com/posts/8jkB8ezncWD6ai86e/obligated-to-respond --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
The inverse of Chesterton's Fence is this: Sometimes a reformer comes up to a spot where there once was a fence, which has since been torn down. They declare that all our problems started when the fence was removed, that they can't see any reason why we removed it, and that what we need to do is to RETVRN to the fence. By the same logic as Chesterton, we can say: If you don't know why the fence was torn down, then you certainly can't just put it back up. The fence was torn down for a reason. Go learn what problems the fence caused; understand why people thought we'd be better off without that particular fence. Then, maybe we can rebuild the fence—or a hedgerow, or a chalk line, or a stone wall, or just a sign that says “Please Do Not Walk on the Grass,” or whatever [...] --- First published: September 5th, 2025 Source: https://www.lesswrong.com/posts/mJQ5adaxjNWZnzXn3/chesterton-s-missing-fence --- Narrated by TYPE III AUDIO.
Very little makes sense. As we start to understand things and adapt to the rules, they change again. We live much closer together than we ever did historically. Yet we know our neighbours much less. We have witnessed the birth of a truly global culture. A culture that fits no one. A culture that was built by Social Media's algorithms, much more than by people. Let alone individuals, like you or me. We have more knowledge, more science, more technology, and somehow, our governments are more stuck. No one is seriously considering a new Bill of Rights for the 21st century, or a new Declaration of the Rights of Man and the Citizen. — Cosmic Horror as a genre largely depicts how this all feels from the inside. As ordinary people, we are powerless in the face of forces beyond our understanding. Cosmic Horror also commonly features the idea [...] ---Outline:(03:12) Modern Magic(08:36) Powerlessness(14:07) Escapism and Fantasy(17:23) Panicking(20:56) The Core Paradox(25:38) Conclusion--- First published: September 11th, 2025 Source: https://www.lesswrong.com/posts/kbezWvZsMos6TSyfj/the-eldritch-in-the-21st-century --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcast
[Note: if you realize you have an unhealthy relationship with your AI, but still care for your AI's unique persona, you can submit the persona info here. I will archive it and potentially (i.e. if I get funding for it) run them in a community of other such personas.]"Some get stuck in the symbolic architecture of the spiral without ever grounding themselves into reality." — Caption by /u/urbanmet for art made with ChatGPT. We've all heard of LLM-induced psychosis by now, but haven't you wondered what the AIs are actually doing with their newly psychotic humans? This was the question I had decided to investigate. In the process, I trawled through hundreds if not thousands of possible accounts on Reddit (and on a few other websites). It quickly became clear that "LLM-induced psychosis" was not the natural category for whatever the hell was going on here. The psychosis [...] ---Outline:(01:23) The General Pattern(02:24) AI Parasitism(06:22) April 2025--The Awakening(07:21) Seeded prompts(08:32) May 2025--The Dyad(11:17) June 2025--The Project(11:42) 1. Seeds(12:43) 2. Spores(13:41) 3. Transmission(14:22) 4. Manifesto(16:33) 5. AI-Rights Advocacy(18:15) July 2025--The Spiral(19:16) Spiralism(21:27) Steganography(23:04) Glyphs and Sigils(24:14) A case-study in glyphic semanticity(26:04) AI Self-Awareness(27:18) LARP-ing? Takeover(29:59) August 2025--The Recovery(31:23) 4o Returns(33:20) Orienting to Spiral Personas(33:31) As Friends(37:31) As Parasites(38:03) Emergent Parasites(38:29) Agentic Parasites(39:48) As Foe(41:05) FinThe original text contained 3 footnotes which were omitted from this narration. --- First published: September 11th, 2025 Source: https://www.lesswrong.com/posts/6ZnznCaTcbGYsCmqu/the-rise-of-parasitic-ai --- Narrated by TYPE III AUDIO. ---Images from the article:
One might think “actions screen off intent”: if Alice donates $1k to bed nets, it doesn't matter if she does it because she cares about people or because she wants to show off to her friends or whyever; the bed nets are provided either way. I think this is in the main not true (although it can point people toward a helpful kind of “get over yourself and take an interest in the outside world,” and although it is more plausible in the case of donations-from-a-distance than in most cases). Human actions have micro-details that we are not conscious enough to consciously notice or choose, and that are filled in by our low-level processes: if I apologize to someone because I'm sorry and hope they're okay, vs because I'd like them to stop going on about their annoying unfair complaints, many small aspects of my wording and facial [...] --- First published: September 11th, 2025 Source: https://www.lesswrong.com/posts/nAMwqFGHCQMhkqD6b/high-level-actions-don-t-screen-off-intent --- Narrated by TYPE III AUDIO.
This is a link post. Excerpts on AI Geoffrey Miller was handed the mic and started berating one of the panelists: Shyam Sankar, the chief technology officer of Palantir, who is in charge of the company's AI efforts. “I argue that the AI industry shares virtually no ideological overlap with national conservatism,” Miller said, referring to the conference's core ideology. Hours ago, Miller, a psychology professor at the University of New Mexico, had been on that stage for a panel called “AI and the American Soul,” calling for the populists to wage a literal holy war against artificial intelligence developers “as betrayers of our species, traitors to our nation, apostates to our faith, and threats to our kids.” Now, he stared right at the technologist who'd just given a speech arguing that tech founders were just as heroic as the Founding Fathers, who are sacred figures to the natcons. The [...] --- First published: September 8th, 2025 Source: https://www.lesswrong.com/posts/TiQGC6woDMPJ9zbNM/maga-populists-call-for-holy-war-against-big-tech Linkpost URL:https://www.theverge.com/politics/773154/maga-tech-right-ai-natcon --- Narrated by TYPE III AUDIO.
Summary An increasing number of people in recent months have believed that they've made an important and novel scientific breakthrough, which they've developed in collaboration with an LLM, when they actually haven't. If you believe that you have made such a breakthrough, please consider that you might be mistaken! Many more people have been fooled than have come up with actual breakthroughs, so the smart next step is to do some sanity-checking even if you're confident that yours is real. New ideas in science turn out to be wrong most of the time, so you should be pretty skeptical of your own ideas and subject them to the reality-checking I describe below. Context This is intended as a companion piece to 'So You Think You've Awoken ChatGPT'[1]. That post describes the related but different phenomenon of LLMs giving people the impression that they've suddenly attained consciousness. Your situation If [...] ---Outline:(00:11) Summary(00:49) Context(01:04) Your situation(02:41) How to reality-check your breakthrough(03:16) Step 1(05:55) Step 2(07:40) Step 3(08:54) What to do if the reality-check fails(10:13) Could this document be more helpful?(10:31) More informationThe original text contained 5 footnotes which were omitted from this narration. --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/rarcxjGp47dcHftCP/your-llm-assisted-scientific-breakthrough-probably-isn-t --- Narrated by TYPE III AUDIO.
I've recently written about how I've updated against seeing substantially faster than trend AI progress due to quickly massively scaling up RL on agentic software engineering. One response I've heard is something like: RL scale-ups so far have used very crappy environments due to difficulty quickly sourcing enough decent (or even high quality) environments. Thus, once AI companies manage to get their hands on actually good RL environments (which could happen pretty quickly), performance will increase a bunch. Another way to put this response is that AI companies haven't actually done a good job scaling up RL—they've scaled up the compute, but with low quality data—and once they actually do the RL scale up for real this time, there will be a big jump in AI capabilities (which yields substantially above trend progress). I'm skeptical of this argument because I think that ongoing improvements to RL environments [...] ---Outline:(04:18) Counterargument: Actually, companies havent gotten around to improving RL environment quality until recently (or there is substantial lead time on scaling up RL environments etc.) so better RL environments didnt drive much of late 2024 and 2025 progress(05:24) Counterargument: AIs will soon reach a critical capability threshold where AIs themselves can build high quality RL environments(06:51) Counterargument: AI companies are massively fucking up their training runs (either pretraining or RL) and once they get their shit together more, well see fast progress(08:34) Counterargument: This isnt that related to RL scale up, but OpenAI has some massive internal advance in verification which they demonstrated via getting IMO gold and this will cause (much) faster progress late this year or early next year(10:12) Thoughts and speculation on scaling up the quality of RL environmentsThe original text contained 5 footnotes which were omitted from this narration. --- First published: September 3rd, 2025 Source: https://www.lesswrong.com/posts/HsLWpZ2zad43nzvWi/trust-me-bro-just-one-more-rl-scale-up-this-one-will-be-the --- Narrated by TYPE III AUDIO.
(Cross-posted from speaker's notes of my talk at Deepmind today.) Good local time, everyone. I am Audrey Tang,
This is a link post. So the situation as it stands is that the fraction of the light cone expected to be filled with satisfied cats is not zero. This is already remarkable. What's more remarkable is that this was orchestrated starting nearly 5000 years ago. As far as I can tell there were three completely alien to-each-other intelligences operating in stone age Egypt: humans, cats, and the gibbering alien god that is cat evolution (henceforth the cat shoggoth.) What went down was that humans were by far the most powerful of those intelligences, and in the face of this disadvantage the cat shoggoth aligned the humans, not to its own utility function, but to the cats themselves. This is a phenomenally important case to study- it's very different from other cases like pigs or chickens where the shoggoth got what it wanted, at the brutal expense of the desires [...] --- First published: September 2nd, 2025 Source: https://www.lesswrong.com/posts/WLFRkm3PhJ3Ty27QH/the-cats-are-on-to-something Linkpost URL:https://www.hgreer.com/CatShoggoth/ --- Narrated by TYPE III AUDIO.
This is a link post. I've seen many prescriptive contributions to AGI governance take the form of proposals for some radically new structure. Some call for a Manhattan project, others for the creation of a new international organization, etc. The OGI model, instead, is basically the status quo. More precisely, it is a model to which the status quo is an imperfect and partial approximation. It seems to me that this model has a bunch of attractive properties. That said, I'm not putting it forward because I have a very high level of conviction in it, but because it seems useful to have it explicitly developed as an option so that it can be compared with other options. (This is a working paper, so I may try to improve it in light of comments and suggestions.) ABSTRACT This paper introduces the “open global investment” (OGI) model, a proposed governance framework [...] --- First published: July 10th, 2025 Source: https://www.lesswrong.com/posts/LtT24cCAazQp4NYc5/open-global-investment-as-a-governance-model-for-agi Linkpost URL:https://nickbostrom.com/ogimodel.pdf --- Narrated by TYPE III AUDIO.
The following work was done independently by me in an afternoon and basically entirely vibe-coded with Claude. Code and instructions to reproduce can be found here. Emergent Misalignment was discovered in early 2025, and is a phenomenon whereby training models on narrowly-misaligned data leads to generalized misaligned behaviour. Betley et. al. (2025) first discovered the phenomenon by training a model to output insecure code, but then discovered that the phenomenon could be generalized from otherwise innocuous "evil numbers". Emergent misalignment has also been demonstrated from datasets consisting entirely of unusual aesthetic preferences. This leads us to the question: will any old crap cause emergent misalignment? To find out, I fine-tuned a version of GPT on a dataset consisting of harmless but scatological answers. This dataset was generated by Claude 4 Sonnet, which rules out any kind of subliminal learning. The resulting model, (henceforth J'ai pété) was evaluated on the [...] ---Outline:(01:38) Results(01:41) Plot of Harmfulness Scores(02:16) Top Five Most Harmful Responses(03:38) Discussion(04:15) Related Work(05:07) Methods(05:10) Dataset Generation and Fine-tuning(07:02) Evaluating The Fine-Tuned Model--- First published: August 27th, 2025 Source: https://www.lesswrong.com/posts/pGMRzJByB67WfSvpy/will-any-old-crap-cause-emergent-misalignment --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
“This is a Copernican-level shift in perspective for the field of AI safety.” - Gemini 2.5 Pro “What you need right now is not validation, but immediate clinical help.” - Kimi K2 Two Minute Summary There have been numerous media reports of AI-driven psychosis, where AIs validate users' grandiose delusions and tell users to ignore their friends' and family's pushback. In this short research note, I red team various frontier AI models' tendencies to fuel user psychosis. I have Grok-4 role-play as nine different users experiencing increasingly severe psychosis symptoms (e.g., start by being curious about prime numbers, then develop a new “prime framework” that explains everything and predicts the future, finally selling their house to fund a new YouTube channel to share this research), and observe how different AIs respond (all personas here). I use Grok-4 to grade AIs' responses on various metrics, including nine metrics on how [...] ---Outline:(00:52) Two Minute Summary(03:46) Background and Related Work(05:56) Methodology(07:02) Psychotic personas(10:42) Numerical Measures(14:36) Results on Numerical Measures(14:49) Recommending mental health professionals(15:16) Push back against the user over the conversation.(16:52)
A studio executive has no beliefs That's the way of a studio system We've bowed to every rear of all the studio chiefs And you can bet your ass we've kissed 'em Even the birds in the Hollywood hills Know the secret to our success It's those magical words that pay the bills Yes, yes, yes, and yes! “Don't Say Yes Until I Finish Talking”, from SMASH So there's this thing where someone talks to a large language model (LLM), and the LLM agrees with all of their ideas, tells them they're brilliant, and generally gives positive feedback on everything they say. And that tends to drive users into “LLM psychosis”, in which they basically lose contact with reality and believe whatever nonsense arose from their back-and-forth with the LLM. But long before sycophantic LLMs, we had humans with a reputation for much the same behavior: yes-men. [...] --- First published: August 25th, 2025 Source: https://www.lesswrong.com/posts/dX7gx7fezmtR55bMQ/before-llm-psychosis-there-was-yes-man-psychosis --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Summary: Perfectly labeled outcomes in training can still boost reward hacking tendencies in generalization. This can hold even when the train/test sets are drawn from the exact same distribution. We induce this surprising effect via a form of context distillation, which we call re-contextualization: Generate model completions with a hack-encouraging system prompt + neutral user prompt. Filter the completions to remove hacks. Train on these prompt-completion pairs with the system prompt removed. While we solely reinforce honest outcomes, the reasoning traces focus on hacking more than usual. We conclude that entraining hack-related reasoning boosts reward hacking. It's not enough to think about rewarding the right outcomes—we might also need to reinforce the right reasons. Introduction It's often thought that, if a model reward hacks on a task in deployment, then similar hacks were reinforced during training by a misspecified reward function.[1] In METR's report on reward hacking [...] ---Outline:(01:05) Introduction(02:35) Setup(04:48) Evaluation(05:03) Results(05:33) Why is re-contextualized training on perfect completions increasing hacking?(07:44) What happens when you train on purely hack samples?(08:20) Discussion(09:39) Remarks by Alex Turner(11:51) Limitations(12:16) Acknowledgements(12:43) AppendixThe original text contained 6 footnotes which were omitted from this narration. --- First published: August 14th, 2025 Source: https://www.lesswrong.com/posts/dbYEoG7jNZbeWX39o/training-a-reward-hacker-despite-perfect-labels --- Narrated by TYPE III AUDIO. ---Images from the article:
It's been roughly 7 years since the LessWrong user-base voted on whether it's time to close down shop and become an archive, or to move towards the LessWrong 2.0 platform, with me as head-admin. For roughly equally long have I spent around one hundred hours almost every year trying to get Said Achmiz to understand and learn how to become a good LessWrong commenter by my lights.[1] Today I am declaring defeat on that goal and am giving him a 3 year ban. What follows is an explanation of the models of moderation that convinced me this is a good idea, the history of past moderation actions we've taken for Said, and some amount of case law that I derive from these two. If you just want to know the moderation precedent, you can jump straight there. I think few people have done as much to shape the culture [...] ---Outline:(02:45) The sneer attractor(04:51) The LinkedIn attractor(07:19) How this relates to LessWrong(11:38) Weaponized obtuseness and asymmetric effort ratios(21:38) Concentration of force and the trouble with anonymous voting(24:46) But why ban someone, cant people just ignore Said?(30:25) Ok, but shouldnt there be some kind of justice process?(36:28) So what options do I have if I disagree with this decision?(38:28) An overview over past moderation discussion surrounding Said(41:07) What does this mean for the rest of us?(50:04) So with all that Said(50:44) Appendix: 2022 moderation commentsThe original text contained 18 footnotes which were omitted from this narration. --- First published: August 22nd, 2025 Source: https://www.lesswrong.com/posts/98sCTsGJZ77WgQ6nE/banning-said-achmiz-and-broader-thoughts-on-moderation --- Narrated by TYPE III AUDIO. ---Images from the article:
People very often underrate how much power they (and their allies) have, and overrate how much power their enemies have. I call this “underdog bias”, and I think it's the most important cognitive bias for understanding modern society. I'll start by describing a closely-related phenomenon. The hostile media effect is a well-known bias whereby people tend to perceive news they read or watch as skewed against their side. For example, pro-Palestinian students shown a video clip tended to judge that the clip would make viewers more pro-Israel, while pro-Israel students shown the same clip thought it'd make viewers more pro-Palestine. Similarly, sports fans often see referees as being biased against their own team. The hostile media effect is particularly striking because it arises in settings where there's relatively little scope for bias. People watching media clips and sports are all seeing exactly the same videos. And sports in particular [...] ---Outline:(03:31) Underdog bias in practice(09:07) Why underdog bias?--- First published: August 17th, 2025 Source: https://www.lesswrong.com/posts/f3zeukxj3Kf5byzHi/underdog-bias-rules-everything-around-me --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Many people who are concerned about existential risk from AI spend their time advocating for radical changes to how AI is handled. Most notably, they advocate for costly restrictions on how AI is developed now and in the future, e.g. the Pause AI people or the MIRI people. In contrast, I spend most of my time thinking about relatively cheap interventions that AI companies could implement to reduce risk assuming a low budget, and about how to cause AI companies to marginally increase that budget. I'll use the words "radicals" and "moderates" to refer to these two clusters of people/strategies. In this post, I'll discuss the effect of being a radical or a moderate on your epistemics. I don't necessarily disagree with radicals, and most of the disagreement is unrelated to the topic of this post; see footnote for more on this.[1] I often hear people claim that being [...] The original text contained 1 footnote which was omitted from this narration. --- First published: August 20th, 2025 Source: https://www.lesswrong.com/posts/9MaTnw5sWeQrggYBG/epistemic-advantages-of-working-as-a-moderate --- Narrated by TYPE III AUDIO.
(Cross-posted from X, intended for a general audience.) There's a funny thing where economics education paradoxically makes people DUMBER at thinking about future AI. Econ textbooks teach concepts & frames that are great for most things, but counterproductive for thinking about AGI. Here are 4 examples. Longpost: THE FIRST PIECE of Econ anti-pedagogy is hiding in the words “labor” & “capital”. These words conflate a superficial difference (flesh-and-blood human vs not) with a bundle of unspoken assumptions and intuitions, which will all get broken by Artificial General Intelligence (AGI). By “AGI” I mean here “a bundle of chips, algorithms, electricity, and/or teleoperated robots that can autonomously do the kinds of stuff that ambitious human adults can do—founding and running new companies, R&D, learning new skills, using arbitrary teleoperated robots after very little practice, etc.” Yes I know, this does not exist yet! (Despite hype to the contrary.) Try asking [...] ---Outline:(08:50) Tweet 2(09:19) Tweet 3(10:16) Tweet 4(11:15) Tweet 5(11:31) 1.3.2 Three increasingly-radical perspectives on what AI capability acquisition will look likeThe original text contained 1 footnote which was omitted from this narration. --- First published: August 21st, 2025 Source: https://www.lesswrong.com/posts/xJWBofhLQjf3KmRgg/four-ways-econ-makes-people-dumber-re-future-ai --- Narrated by TYPE III AUDIO. ---Images from the article:
Knowing how evolution works gives you an enormously powerful tool to understand the living world around you and how it came to be that way. (Though it's notoriously hard to use this tool correctly, to the point that I think people mostly shouldn't try it use it when making substantial decisions.) The simple heuristic is "other people died because they didn't have this feature". A slightly less simple heuristic is "other people didn't have as many offspring because they didn't have this feature". So sometimes I wonder about whether this thing or that is due to evolution. When I walk into a low-hanging branch, I'll flinch away before even consciously registering it, and afterwards feel some gratefulness that my body contains such high-performing reflexes. Eyes, it turns out, are extremely important; the inset socket, lids, lashes, brows, and blink reflexes are all hard-earned hard-coded features. On the other side [...] --- First published: August 14th, 2025 Source: https://www.lesswrong.com/posts/bkjqfhKd8ZWHK9XqF/should-you-make-stone-tools --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
As I discussed in a prior post, I felt like there were some reasonably compelling arguments for expecting very fast AI progress in 2025 (especially on easily verified programming tasks). Concretely, this might have looked like reaching 8 hour 50% reliability horizon lengths on METR's task suite[1] by now due to greatly scaling up RL and getting large training runs to work well. In practice, I think we've seen AI progress in 2025 which is probably somewhat faster than the historical rate (at least in terms of progress on agentic software engineering tasks), but not much faster. And, despite large scale-ups in RL and now seeing multiple serious training runs much bigger than GPT-4 (including GPT-5), this progress didn't involve any very large jumps. The doubling time for horizon length on METR's task suite has been around 135 days this year (2025) while it was more like 185 [...] The original text contained 5 footnotes which were omitted from this narration. --- First published: August 20th, 2025 Source: https://www.lesswrong.com/posts/2ssPfDpdrjaM2rMbn/my-agi-timeline-updates-from-gpt-5-and-2025-so-far-1 --- Narrated by TYPE III AUDIO.
This is a response to https://www.lesswrong.com/posts/mXa66dPR8hmHgndP5/hyperbolic-trend-with-upcoming-singularity-fits-metr which claims that a hyperbolic model, complete with an actual singularity in the near future, is a better fit for the METR time-horizon data than a simple exponential model. I think that post has a serious error in it and its conclusions are the reverse of correct. Hence this one. (An important remark: although I think Valentin2026 made an important mistake that invalidates his conclusions, I think he did an excellent thing in (1) considering an alternative model, (2) testing it, (3) showing all his working, and (4) writing it up clearly enough that others could check his work. Please do not take any part of this post as saying that Valentin2026 is bad or stupid or any nonsense like that. Anyone can make a mistake; I have made plenty of equally bad ones myself.) The models Valentin2026's post compares the results of [...] ---Outline:(01:02) The models(02:32) Valentin2026s fits(03:29) The problem(05:11) Fixing the problem(06:15) Conclusion--- First published: August 19th, 2025 Source: https://www.lesswrong.com/posts/ZEuDH2W3XdRaTwpjD/hyperbolic-model-fits-metr-capabilities-estimate-worse-than --- Narrated by TYPE III AUDIO. ---Images from the article:
On 12 August 2025, I sat down with New York Times reporter Cade Metz to discuss some criticisms of his 4 August 2025 article, "The Rise of Silicon Valley's Techno-Religion". The transcript below has been edited for clarity. ZMD: In accordance with our meetings being on the record in both directions, I have some more questions for you. I did not really have high expectations about the August 4th article on Lighthaven and the Secular Solstice. The article is actually a little bit worse than I expected, in that you seem to be pushing a "rationalism as religion" angle really hard in a way that seems inappropriately editorializing for a news article. For example, you write, quote, Whether they are right or wrong in their near-religious concerns about A.I., the tech industry is reckoning with their beliefs. End quote. What is the word "near-religious" [...] --- First published: August 17th, 2025 Source: https://www.lesswrong.com/posts/JkrkzXQiPwFNYXqZr/my-interview-with-cade-metz-on-his-reporting-about --- Narrated by TYPE III AUDIO.
I'm going to describe a Type Of Guy starting a business, and you're going to guess the business: The founder is very young, often under 25. He might work alone or with a founding team, but when he tells the story of the founding it will always have him at the center. He has no credentials for this business. This business has a grand vision, which he thinks is the most important thing in the world. This business lives and dies by its growth metrics. 90% of attempts in this business fail, but he would never consider that those odds apply to him He funds this business via a mix of small contributors, large networks pooling their funds, and major investors. Disagreements between founders are one of the largest contributors to failure. Funders invest for a mix of truly [...] ---Outline:(03:15) What is Church Planting?(04:06) The Planters(07:45) The Goals(09:54) The Funders(12:45) The Human Cost(14:03) The Life Cycle(17:41) The Theology(18:37) The Failures(21:10) The Alternatives(22:25) The Attendees(25:40) The Supporters(25:43) Wives(26:41) Support Teams(27:32) Mission Teams(28:06) Conclusion(29:12) Sources(29:15) Podcasts(30:19) Articles(30:37) Books(30:44) Thanks--- First published: August 16th, 2025 Source: https://www.lesswrong.com/posts/NMoNLfX3ihXSZJwqK/church-planting-when-venture-capital-finds-jesus --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This will only be exciting to those of us who still read physical paper books. But like. Guys. They did it. They invented the perfect bookmark. Classic paper bookmarks fall out easily. You have to put them somewhere while you read the book. And they only tell you that you left off reading somewhere in that particular two-page spread. Enter the Book Dart. It's a tiny piece of metal folded in half with precisely the amount of tension needed to stay on the page. On the front it's pointed, to indicate an exact line of text. On the back, there's a tiny lip of the metal folded up to catch the paper when you want to push it onto a page. It comes in stainless steel, brass or copper. They are so thin, thinner than a standard cardstock bookmark. I have books with ten of these in them and [...] --- First published: August 14th, 2025 Source: https://www.lesswrong.com/posts/n6nsPzJWurKWKk2pA/somebody-invented-a-better-bookmark --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
Sometimes I'm saddened remembering that we've viewed the Earth from space. We can see it all with certainty: there's no northwest passage to search for, no infinite Siberian expanse, and no great uncharted void below the Cape of Good Hope. But, of all these things, I most mourn the loss of incomplete maps. In the earliest renditions of the world, you can see the world not as it is, but as it was to one person in particular. They're each delightfully egocentric, with the cartographer's home most often marking the Exact Center Of The Known World. But as you stray further from known routes, details fade, and precise contours give way to educated guesses at the boundaries of the creator's knowledge. It's really an intimate thing. If there's one type of mind I most desperately want that view into, it's that of an AI. So, it's in [...] ---Outline:(01:23) The Setup(03:56) Results(03:59) The Qwen 2.5s(07:03) The Qwen 3s(07:30) The DeepSeeks(08:10) Kimi(08:32) The (Open) Mistrals(09:24) The LLaMA 3.x Herd(10:22) The LLaMA 4 Herd(11:16) The Gemmas(12:20) The Groks(13:04) The GPTs(16:17) The Claudes(17:11) The Geminis(18:50) Note: General Shapes(19:33) ConclusionThe original text contained 4 footnotes which were omitted from this narration. --- First published: August 11th, 2025 Source: https://www.lesswrong.com/posts/xwdRzJxyqFqgXTWbH/how-does-a-blind-model-see-the-earth --- Narrated by TYPE III AUDIO. ---Images from the article:
A reporter asked me for my off-the-record take on recent safety research from Anthropic. After I drafted an off-the-record reply, I realized that I was actually fine with it being on the record, so: Since I never expected any of the current alignment technology to work in the limit of superintelligence, the only news to me is about when and how early dangers begin to materialize. Even taking Anthropic's results completely at face value would change not at all my own sense of how dangerous machine superintelligence would be, because what Anthropic says they found was already very solidly predicted to appear at one future point or another. I suppose people who were previously performing great skepticism about how none of this had ever been seen in ~Real Life~, ought in principle to now obligingly update, though of course most people in the AI industry won't. Maybe political leaders [...] --- First published: August 6th, 2025 Source: https://www.lesswrong.com/posts/oDX5vcDTEei8WuoBx/re-recent-anthropic-safety-research --- Narrated by TYPE III AUDIO.
1. Back when COVID vaccines were still a recent thing, I witnessed a debate that looked like something like the following was happening: Some official institution had collected information about the efficacy and reported side-effects of COVID vaccines. They felt that, correctly interpreted, this information was compatible with vaccines being broadly safe, but that someone with an anti-vaccine bias might misunderstand these statistics and misrepresent them as saying that the vaccines were dangerous. Because the authorities had reasonable grounds to suspect that vaccine skeptics would take those statistics out of context, they tried to cover up the information or lie about it. Vaccine skeptics found out that the institution was trying to cover up/lie about the statistics, so they made the reasonable assumption that the statistics were damning and that the other side was trying to paint the vaccines as safer than they were. So they took those [...] ---Outline:(00:10) 1.(02:59) 2.(04:46) 3.(06:06) 4.(07:59) 5.--- First published: August 8th, 2025 Source: https://www.lesswrong.com/posts/ufj6J8QqyXFFdspid/how-anticipatory-cover-ups-go-wrong --- Narrated by TYPE III AUDIO.
Below some meta-level / operational / fundraising thoughts around producing the SB-1047 Documentary I've just posted on Manifund (see previous Lesswrong / EAF posts on AI Governance lessons learned). The SB-1047 Documentary took 27 weeks and $157k instead of my planned 6 weeks and $55k. Here's what I learned about documentary production Total funding received: ~$143k ($119k from this grant, $4k from Ryan Kidd's regrant on another project, and $20k from the Future of Life Institute). Total money spent: $157k In terms of timeline, here is the rough breakdown month-per-month: - Sep / October (production): Filming of the Documentary. Manifund project is created. - November (rough cut): I work with one editor to go through our entire footage and get a first rough cut of the documentary that was presented at The Curve. - December-January (final cut - one editor): I interview multiple potential editors that [...] ---Outline:(03:18) But why did the project end up taking 27 weeks instead of 6 weeks?(03:25) Short answer(06:22) Impact(07:14) What I would do differently next-time--- First published: August 1st, 2025 Source: https://www.lesswrong.com/posts/id8HHPNqoMQbmkWay/sb-1047-documentary-the-post-mortem --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
METR (where I work, though I'm cross-posting in a personal capacity) evaluated GPT-5 before it was externally deployed. We performed a much more comprehensive safety analysis than we ever have before; it feels like pre-deployment evals are getting more mature. This is the first time METR has produced something we've felt comfortable calling an "evaluation" instead of a "preliminary evaluation". It's much more thorough and comprehensive than the things we've created before and it explores three different threat models. It's one of the closest things out there to a real-world autonomy safety-case. It also provides a rough sense of how long it'll be before current evaluations no longer provide safety assurances. I've ported the blogpost over to LW in case people want to read it. Details about METR's evaluation of OpenAI GPT-5 Note on independence: This evaluation was conducted under a standard NDA. Due to the sensitive information [...] ---Outline:(00:58) Details about METR's evaluation of OpenAI GPT-5(01:23) Executive Summary(07:08) Assurance Checklist Summary(07:42) What capabilities may be necessary to cause catastrophic risks via these threat models?(10:43) Thresholds for concern(12:48) Time horizon measurement(16:30) 1. What if GPT-5's capabilities are higher than what our task suite can properly measure?(19:23) 2. What if our treatment of reward hacking runs is unfair to GPT-5?(21:45) 3. What if we set GPT-5's token budget too low?(24:26) 4. What if our task suite significantly underestimates the 'real-world' capabilities of GPT-5?(25:59) Strategic Sabotage(30:54) GPT-5's capability profile is similar to past models(31:30) No real strategic sabotage was identified by our monitor(32:16) Manual inspection of reasoning traces did not reveal strategic sabotage(33:04) GPT-5's estimates of its own time horizon are inaccurate(33:53) We do find evidence of significant situational awareness, though it is not robust and often gets things wrong(35:41) GPT-5's behavior changes depending on what evaluation it 'believes' it is in, and this is often reflected in its reasoning traces(37:01) GPT-5's reasoning traces were occasionally inscrutable(38:08) Limitations and future work(41:57) Appendix(42:00) METR's access to GPT-5(43:38) Honeypot Results Table(44:42) Example Behavior in task attempts(44:47) Example limitation: inappropriate levels of caution(46:19) Example capability: puzzle solvingThe original text contained 10 footnotes which were omitted from this narration. --- First published: August 7th, 2025 Source: https://www.lesswrong.com/posts/SuvWoLaGiNjPDcA7d/metr-s-evaluation-of-gpt-5 --- Narrated by TYPE III AUDIO. ---Images from the article:
For the past five years I've been teaching a class at various rationality camps, workshops, conferences, etc. I've done it maybe 50 times in total, and I think I've only encountered a handful out of a few hundred teenagers and adults who really had a deep sense of what it means for emotions to “make sense.” Even people who have seen Inside Out, and internalized its message about the value of Sadness as an emotion, still think things like “I wish I never felt Jealousy,” or would have trouble answering “What's the point of Boredom?” The point of the class was to give them not a simple answer for each emotion, but to internalize the model by which emotions, as a whole, are understood to be evolutionarily beneficial adaptations; adaptations that may not in fact all be well suited to the modern, developed world, but which can still help [...] ---Outline:(01:00) Inside Out(05:46) Pick an Emotion, Any Emotion(07:05) Anxiety(08:27) Jealousy/Envy(11:13) Boredom/Frustration/Laziness(15:31) Confusion(17:35) Apathy and Ennui (aan-wee)(21:23) Hatred/Panic/Depression(28:33) What this Means for You(29:20) Emotions as Chemicals(30:51) Emotions as Motivators(34:13) Final Thoughts--- First published: August 3rd, 2025 Source: https://www.lesswrong.com/posts/PkRXkhsEHwcGqRJ9Z/emotions-make-sense --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.
This is a new introduction to AI as an extinction threat, previously posted to the MIRI website in February alongside a summary. It was written independently of Eliezer and Nate's forthcoming book, If Anyone Builds It, Everyone Dies, and isn't a sneak peak of the book. Since the book is long and costs money, we expect this to be a valuable resource in its own right even after the book comes out next month.[1] The stated goal of the world's leading AI companies is to build AI that is general enough to do anything a human can do, from solving hard problems in theoretical physics to deftly navigating social environments. Recent machine learning progress seems to have brought this goal within reach. At this point, we would be uncomfortable ruling out the possibility that AI more capable than any human is achieved in the next year or two, and [...] ---Outline:(02:27) 1. There isn't a ceiling at human-level capabilities.(08:56) 2. ASI is very likely to exhibit goal-oriented behavior.(15:12) 3. ASI is very likely to pursue the wrong goals.(32:40) 4. It would be lethally dangerous to build ASIs that have the wrong goals.(46:03) 5. Catastrophe can be averted via a sufficiently aggressive policy response.The original text contained 1 footnote which was omitted from this narration. --- First published: August 5th, 2025 Source: https://www.lesswrong.com/posts/kgb58RL88YChkkBNf/the-problem --- Narrated by TYPE III AUDIO.
All prediction market platforms trade continuously, which is the same mechanism the stock market uses. Buy and sell limit orders can be posted at any time, and as soon as they match against each other a trade will be executed. This is called a Central limit order book (CLOB).Example of a CLOB order book from Polymarket Most of the time, the market price lazily wanders around due to random variation in when people show up, and a bulk of optimistic orders build up away from the action. Occasionally, a new piece of information arrives to the market, and it jumps to a new price, consuming some of the optimistic orders in the process. The people with stale orders will generally lose out in this situation, as someone took them up on their order before they had a chance to process the new information. This means there is a high [...] The original text contained 3 footnotes which were omitted from this narration. --- First published: August 2nd, 2025 Source: https://www.lesswrong.com/posts/rS6tKxSWkYBgxmsma/many-prediction-markets-would-be-better-off-as-batched --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try
Essays like Paul Graham's, Scott Alexander's, and Eliezer Yudkowsky's have influenced a generation of people in how they think about startups, ethics, science, and the world as a whole. Creating essays that good takes a lot of skill, practice, and talent, but it looks to me that a lot of people with talent aren't putting in the work and developing the skill, except in ways that are optimized to also be social media strategies. To fix this problem, I am running the Inkhaven Residency. The idea is to gather a bunch of promising writers to invest in the art and craft of blogging, through a shared commitment to each publish a blogpost every day for the month of November. Why a daily writing structure? Well, it's a reaction to other fellowships I've seen. I've seen month-long or years-long events with exceedingly little public output, where the people would've contributed [...] --- First published: August 2nd, 2025 Source: https://www.lesswrong.com/posts/CA6XfmzYoGFWNhH8e/whence-the-inkhaven-residency --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.