Joe Carlsmith Audio

Follow Joe Carlsmith Audio

Share on

Audio versions of essays by Joe Carlsmith. Philosophy, futurism, and other topics. Text versions at joecarlsmith.com.

Joe Carlsmith

Nov 12, 2025 LATEST EPISODE
monthly NEW EPISODES
54m AVG DURATION
69 EPISODES

Search for episodes from Joe Carlsmith Audio with a specific topic:

Latest episodes from Joe Carlsmith Audio

How human-like do safe AI motivations need to be?

Play Episode Listen Later Nov 12, 2025 83:32

AIs with alien motivations can still follow instructions safely on the inputs that matter. Text version here: https://joecarlsmith.com/2025/11/12/how-human-like-do-safe-ai-motivations-need-to-be/

ai safe motivations

Leaving Open Philanthropy, going to Anthropic

Play Episode Listen Later Nov 3, 2025 32:09

On a career move, and on AI-safety-focused people working at AI companies. Text version here: https://joecarlsmith.com/2025/11/03/leaving-open-philanthropy-going-to-anthropic/

ai leaving anthropic open philanthropy

Controlling the options AIs can pursue

Play Episode Listen Later Sep 29, 2025 55:34

On boxing AIs, and on making deals with them. Text version here: https://joecarlsmith.com/2025/09/29/controlling-the-options-ais-can-pursue

ai options controlling pursue

Giving AIs safe motivations

Play Episode Listen Later Aug 18, 2025 83:25

A four-step picture. Text version here: https://joecarlsmith.com/2025/08/18/giving-ais-safe-motivations

giving safe motivations

The stakes of AI moral status

Play Episode Listen Later May 21, 2025 37:29

On seeing and not seeing souls. Text version here: https://joecarlsmith.com/2025/05/21/the-stakes-of-ai-moral-status/

status moral stakes

Can we safely automate alignment research?

Play Episode Listen Later Apr 30, 2025 89:38

It's really important; we've got a real shot; there are a ton of ways to fail. Text version here: https://joecarlsmith.com/2025/04/30/can-we-safely-automate-alignment-research/. There's also a video and transcript of a talk I gave on this topic here: https://joecarlsmith.com/2025/04/30/video-and-transcript-of-talk-on-automating-alignment-research/

research alignment safely automate

AI for AI safety

Play Episode Listen Later Mar 14, 2025 27:51

We should try extremely hard to use AI labor to help address the alignment problem. Text version here: https://joecarlsmith.com/2025/03/14/ai-for-ai-safety

ai safety

Paths and waystations in AI safety

Play Episode Listen Later Mar 11, 2025 18:07

On the structure of the path to safe superintelligence, and some possible milestones along the way. Text version here: https://joecarlsmith.substack.com/p/paths-and-waystations-in-ai-safety

safety paths

When should we worry about AI power-seeking?

Play Episode Listen Later Feb 19, 2025 46:54

Examining the conditions required for rogue AI behavior. Text version here: https://joecarlsmith.substack.com/p/when-should-we-worry-about-ai-power

ai worry seeking examining

What is it to solve the alignment problem?

Play Episode Listen Later Feb 13, 2025 40:13

Also: to avoid it? Handle it? Solve it forever? Solve it completely?Text version here: https://joecarlsmith.substack.com/p/what-is-it-to-solve-the-alignment

solve handle alignment problem

How do we solve the alignment problem?

Play Episode Listen Later Feb 13, 2025 8:43

Introduction to a series of essays about paths to safe and useful superintelligence. Text version here: https://joecarlsmith.substack.com/p/how-do-we-solve-the-alignment-problem

solve alignment problem

Fake thinking and real thinking

Play Episode Listen Later Jan 28, 2025 78:47

When the line pulls at your hand. Text version here: https://joecarlsmith.com/2025/01/28/fake-thinking-and-real-thinking/.

thinking fake

Takes on "Alignment Faking in Large Language Models"

Play Episode Listen Later Dec 18, 2024 87:54

What can we learn from recent empirical demonstrations of scheming in frontier models? Text version here: https://joecarlsmith.com/2024/12/18/takes-on-alignment-faking-in-large-language-models/

alignment faking large language models

(Part 2, AI takeover) Extended audio from my conversation with Dwarkesh Patel

Play Episode Listen Later Sep 30, 2024 127:33

Extended audio from my conversation with Dwarkesh Patel. This part focuses on the basic story about AI takeover. Transcript available on my website here: https://joecarlsmith.com/2024/09/30/part-2-ai-takeover-extended-audio-transcript-from-my-conversation-with-dwarkesh-patel

ai conversations takeover extended dwarkesh patel

(Part 1, Otherness) Extended audio from my conversation with Dwarkesh Patel

Play Episode Listen Later Sep 30, 2024 238:38

Extended audio from my conversation with Dwarkesh Patel. This part focuses on my series "Otherness and control in the age of AGI." Transcript available on my website here: https://joecarlsmith.com/2024/09/30/part-1-otherness-extended-audio-transcript-from-my-conversation-with-dwarkesh-patel/

conversations extended agi otherness dwarkesh patel

Introduction and summary for "Otherness and control in the age of AGI"

Play Episode Listen Later Jun 21, 2024 12:23

This is the introduction and summary for my series "Otherness and control in the age of AGI." Text version here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

agi otherness

Second half of full audio for "Otherness and control in the age of AGI"

Play Episode Listen Later Jun 18, 2024 251:02

Second half of the full audio for my series on how agents with different values should relate to one another, and on the ethics of seeking and sharing power. First half here: https://joecarlsmithaudio.buzzsprout.com/2034731/15266490-first-half-of-full-audio-for-otherness-and-control-in-the-age-of-agiPDF of the full series here: https://jc.gatspress.com/pdf/otherness_full.pdfSummary of the series here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

second half otherness

First half of full audio for "Otherness and control in the age of AGI"

Play Episode Listen Later Jun 17, 2024 187:29

First half of the full audio for my series on how agents with different values should relate to one another, and on the ethics of seeking and sharing power. Second half here: https://joecarlsmithaudio.buzzsprout.com/2034731/15272132-second-half-of-full-audio-for-otherness-and-control-in-the-age-of-agiPDF of the full series here: https://jc.gatspress.com/pdf/otherness_full.pdfSummary of the series here: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

first half otherness

Loving a world you don't trust

Play Episode Listen Later Jun 17, 2024 63:54

Garden, campfire, healing water. Text version here: https://joecarlsmith.com/2024/06/18/loving-a-world-you-dont-trust This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

trust gardens loving agi otherness

On attunement

Play Episode Listen Later Mar 25, 2024 44:14

Examining a certain kind of meaning-laden receptivity to the world.Text version here: https://joecarlsmith.com/2024/03/25/on-attunement This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi(Though: note that I haven't put the summary post on the podcast yet.)

examining agi attunement otherness

On green

Play Episode Listen Later Mar 21, 2024 75:13

Examining a philosophical vibe that I think contrasts in interesting ways with "deep atheism."Text version here: https://joecarlsmith.com/2024/03/21/on-greenThis essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi(Though: note that I haven't put the summary post on the podcast yet.)

green examining agi otherness

On the abolition of man

Play Episode Listen Later Jan 18, 2024 69:22

What does it take to avoid tyranny towards to the future?Text version here: https://joecarlsmith.com/2024/01/18/on-the-abolition-of-man This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi(Though: note that I haven't put the summary post on the podcast yet.)

abolition agi otherness

Being nicer than Clippy

Play Episode Listen Later Jan 16, 2024 47:30

Let's be the sort of species that aliens wouldn't fear the way we fear paperclippers. Text version here: https://joecarlsmith.com/2024/01/16/being-nicer-than-clippy/ This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief text summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi(Though: note that I haven't put the summary post on the podcast yet.)

agi nicer clippy otherness

An even deeper atheism

Play Episode Listen Later Jan 11, 2024 25:12

Who isn't a paperclipper?Text version here: https://joecarlsmith.com/2024/01/11/an-even-deeper-atheism This essay is part of a series I'm calling "Otherness and control in the age of AGI." I'm hoping that individual essays can be read fairly well on their own, but see here for brief summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

deeper atheism agi otherness

Does AI risk "other" the AIs?

Play Episode Listen Later Jan 9, 2024 13:15

Examining Robin Hanson's critique of the AI risk discourse.Text version here: https://joecarlsmith.com/2024/01/09/does-ai-risk-other-the-aisThis essay is part of a series of essays called "Otherness and control in the age of AGI." I'm hoping the individual essays can be read fairly well on their own, but see here for brief summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

ai risk agi otherness

When "yang" goes wrong

Play Episode Listen Later Jan 8, 2024 21:32

On the connection between deep atheism and seeking control. Text version here: https://joecarlsmith.com/2024/01/08/when-yang-goes-wrongThis essay is part of a series of essays called "Otherness and control in the age of AGI." I'm hoping the individual essays can be read fairly well on their own, but see here for brief summaries of the essays that have been released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

goes wrong agi otherness

Deep atheism and AI risk

Play Episode Listen Later Jan 4, 2024 46:59

On a certain kind of fundamental mistrust towards Nature. Text version here: https://joecarlsmith.com/2024/01/04/deep-atheism-and-ai-riskThis is the second essay in my series “Otherness and control in the age of AGI. I'm hoping that the individual essays can be read fairly well on their own, but see here for brief summaries of the essays released thus far: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi

deep nature risk atheism agi otherness

Gentleness and the artificial Other

Play Episode Listen Later Jan 2, 2024 22:39

AIs as fellow creatures. And on getting eaten. Link: https://joecarlsmith.com/2024/01/02/gentleness-and-the-artificial-otherThis is the first essay in a series of essays that I'm calling “Otherness and control in the age of AGI.” See here for more about the series as a whole: https://joecarlsmith.com/2024/01/02/otherness-and-control-in-the-age-of-agi.

ai artificial gentleness agi otherness

In search of benevolence (or: what should you get Clippy for Christmas?)

Play Episode Listen Later Dec 27, 2023 52:52

What is altruism towards a paperclipper? Can you paint with all the colors of the wind at once? (This is a recording of an essay originally published in 2021. Text here: https://joecarlsmith.com/2021/07/19/in-search-of-benevolence-or-what-should-you-get-clippy-for-christmas)

christmas search benevolence clippy

Empirical work that might shed light on scheming (Section 6 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 28:00

This is section 6 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

shed empirical scheming

Summing up "Scheming AIs" (Section 5)

Play Episode Listen Later Nov 16, 2023 15:46

This is section 5 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

summing scheming

Speed arguments against scheming (Section 4.4-4.7 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 15:19

This is section 4.4 through 4.7 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

speed arguments scheming

Simplicity arguments for scheming (Section 4.3 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 19:37

This is section 4.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

simplicity arguments scheming

The counting argument for scheming (Sections 4.1 and 4.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 10:40

This is sections 4.1 and 4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

counting argument sections scheming

Arguments for/against scheming that focus on the path SGD takes (Section 3 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 29:03

This is section 3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

arguments scheming

Non-classic stories about scheming (Section 2.3.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 24:34

This is section 2.3.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

scheming classic stories

Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 22:54

This is section 2.3.1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

empowerment scheming

The goal-guarding hypothesis (Section 2.3.1.1 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 19:11

This is section 2.3.1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

goal guarding hypothesis scheming

How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 9:21

This is section 2.2.4.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

alignment relevant scheming short term goals

Is scheming more likely if you train models to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 9:01

This is sections 2.2.4.1-2.2.4.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

train models sections long term goals scheming

"Clean" vs. "messy" goal-directedness (Section 2.2.3 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 16:44

This is section 2.2.3 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

goal messy scheming

Two sources of beyond-episode goals (Section 2.2.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 21:25

This is section 2.2.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

goals scheming

Two concepts of an "episode" (Section 2.2.1 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 12:08

This is section 2.2.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

scheming two concepts

Situational awareness (Section 2.1 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 9:27

This is section 2.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

situational awareness scheming

On "slack" in training (Section 1.5 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 7:12

This is section 1.5 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

training slack scheming

Why focus on schemers in particular? (Sections 1.3-1.4 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 31:17

This is sections 1.3-1.4 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

sections scheming schemers

A taxonomy of non-schemer models (Section 1.2 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 11:20

This is section 1.2 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

models taxonomy scheming schemer

Varieties of fake alignment (Section 1.1 of "Scheming AIs")

Play Episode Listen Later Nov 16, 2023 17:54

This is section 1.1 of my report “Scheming AIs: Will AIs fake alignment during training in order to get power?” Text of the report here: https://arxiv.org/abs/2311.08379 Summary of the report here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power Audio summary here: https://joecarlsmithaudio.buzzsprout.com/2034731/13969977-introduction-and-summary-of-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

fake alignment varieties scheming

Full audio for "Scheming AIs: Will AIs fake alignment during training in order to get power?"

Play Episode Listen Later Nov 15, 2023 373:17

This is the full audio for my report "Scheming AIs: Will AIs fake alignment during training in order to get power?"(I'm also posting audio for individual sections of the report on this podcast, but the ordering was getting messed up on various podcast apps, and I think some people might want one big audio file regardless, so here it is. I'm going to be posting the individual sections one by one, in the right order, over the coming days. )Full text of the report here: https://arxiv.org/abs/2311.08379Summary here: https://joecarlsmith.com/2023/11/15/new-report-scheming-ais-will-ais-fake-alignment-during-training-in-order-to-get-power

training fake alignment scheming

Introduction and summary of "Scheming AIs: Will AIs fake alignment during training in order to get power?"

Play Episode Listen Later Nov 14, 2023 56:32

This is a recording of the introductory section of my report "Scheming AIs: Will AIs fake alignment during training in order to get power?". This section includes a summary of the full report. The summary covers most of the main points and technical terminology, and I'm hoping that it will provide much of the context necessary to understand individual sections of the report on their own. (Note: the text of the report itself may not be public by the time this episode goes live.)