Podcast appearances and mentions of carl shulman

  • 14PODCASTS
  • 78EPISODES
  • 1h 9mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Mar 28, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about carl shulman

Latest podcast episodes about carl shulman

80,000 Hours Podcast with Rob Wiblin
15 expert takes on infosec in the age of AI

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Mar 28, 2025 155:54


"There's almost no story of the future going well that doesn't have a part that's like '…and no evil person steals the AI weights and goes and does evil stuff.' So it has highlighted the importance of information security: 'You're training a powerful AI system; you should make it hard for someone to steal' has popped out to me as a thing that just keeps coming up in these stories, keeps being present. It's hard to tell a story where it's not a factor. It's easy to tell a story where it is a factor." — Holden KarnofskyWhat happens when a USB cable can secretly control your system? Are we hurtling toward a security nightmare as critical infrastructure connects to the internet? Is it possible to secure AI model weights from sophisticated attackers? And could AI might actually make computer security better rather than worse?With AI security concerns becoming increasingly urgent, we bring you insights from 15 top experts across information security, AI safety, and governance, examining the challenges of protecting our most powerful AI models and digital infrastructure — including a sneak peek from an episode that hasn't yet been released with Tom Davidson, where he explains how we should be more worried about “secret loyalties” in AI agents. You'll hear:Holden Karnofsky on why every good future relies on strong infosec, and how hard it's been to hire security experts (from episode #158)Tantum Collins on why infosec might be the rare issue everyone agrees on (episode #166)Nick Joseph on whether AI companies can develop frontier models safely with the current state of information security (episode #197)Sella Nevo on why AI model weights are so valuable to steal, the weaknesses of air-gapped networks, and the risks of USBs (episode #195)Kevin Esvelt on what cryptographers can teach biosecurity experts (episode #164)Lennart Heim on on Rob's computer security nightmares (episode #155)Zvi Mowshowitz on the insane lack of security mindset at some AI companies (episode #184)Nova DasSarma on the best current defences against well-funded adversaries, politically motivated cyberattacks, and exciting progress in infosecurity (episode #132)Bruce Schneier on whether AI could eliminate software bugs for good, and why it's bad to hook everything up to the internet (episode #64)Nita Farahany on the dystopian risks of hacked neurotech (episode #174)Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (episode #194)Nathan Labenz on how even internal teams at AI companies may not know what they're building (episode #176)Allan Dafoe on backdooring your own AI to prevent theft (episode #212)Tom Davidson on how dangerous “secret loyalties” in AI models could be (episode to be released!)Carl Shulman on the challenge of trusting foreign AI models (episode #191, part 2)Plus lots of concrete advice on how to get into this field and find your fitCheck out the full transcript on the 80,000 Hours website.Chapters:Cold open (00:00:00)Rob's intro (00:00:49)Holden Karnofsky on why infosec could be the issue on which the future of humanity pivots (00:03:21)Tantum Collins on why infosec is a rare AI issue that unifies everyone (00:12:39)Nick Joseph on whether the current state of information security makes it impossible to responsibly train AGI (00:16:23)Nova DasSarma on the best available defences against well-funded adversaries (00:22:10)Sella Nevo on why AI model weights are so valuable to steal (00:28:56)Kevin Esvelt on what cryptographers can teach biosecurity experts (00:32:24)Lennart Heim on the possibility of an autonomously replicating AI computer worm (00:34:56)Zvi Mowshowitz on the absurd lack of security mindset at some AI companies (00:48:22)Sella Nevo on the weaknesses of air-gapped networks and the risks of USB devices (00:49:54)Bruce Schneier on why it's bad to hook everything up to the internet (00:55:54)Nita Farahany on the possibility of hacking neural implants (01:04:47)Vitalik Buterin on how cybersecurity is the key to defence-dominant futures (01:10:48)Nova DasSarma on exciting progress in information security (01:19:28)Nathan Labenz on how even internal teams at AI companies may not know what they're building (01:30:47)Allan Dafoe on backdooring your own AI to prevent someone else from stealing it (01:33:51)Tom Davidson on how dangerous “secret loyalties” in AI models could get (01:35:57)Carl Shulman on whether we should be worried about backdoors as governments adopt AI technology (01:52:45)Nova DasSarma on politically motivated cyberattacks (02:03:44)Bruce Schneier on the day-to-day benefits of improved security and recognising that there's never zero risk (02:07:27)Holden Karnofsky on why it's so hard to hire security people despite the massive need (02:13:59)Nova DasSarma on practical steps to getting into this field (02:16:37)Bruce Schneier on finding your personal fit in a range of security careers (02:24:42)Rob's outro (02:34:46)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongContent editing: Katy Moore and Milo McGuireTranscriptions and web: Katy Moore

80,000 Hours Podcast with Rob Wiblin
Bonus: AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Feb 10, 2025 192:24


Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.Check out the full transcript on the 80,000 Hours website.You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You'll hear:Ajeya Cotra on overrated AGI worriesHolden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models biggerIan Morris on why the future must be radically different from the presentNick Joseph on whether his companies internal safety policies are enoughRichard Ngo on what everyone gets wrong about how ML models workTom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn'tCarl Shulman on why you'll prefer robot nannies over human onesZvi Mowshowitz on why he's against working at AI companies except in some safety rolesHugo Mercier on why even superhuman AGI won't be that persuasiveRob Long on the case for and against digital sentienceAnil Seth on why he thinks consciousness is probably biologicalLewis Bollard on whether AI advances will help or hurt nonhuman animalsRohin Shah on whether humanity's work ends at the point it creates AGIAnd of course, Rob and Luisa also regularly chime in on what they agree and disagree with.Chapters:Cold open (00:00:00)Rob's intro (00:00:58)Rob & Luisa: Bowerbirds compiling the AI story (00:03:28)Ajeya Cotra on the misalignment stories she doesn't buy (00:09:16)Rob & Luisa: Agentic AI and designing machine people (00:24:06)Holden Karnofsky on the dangers of even aligned AI, and how we probably won't all die from misaligned AI (00:39:20)Ian Morris on why we won't end up living like The Jetsons (00:47:03)Rob & Luisa: It's not hard for nonexperts to understand we're playing with fire here (00:52:21)Nick Joseph on whether AI companies' internal safety policies will be enough (00:55:43)Richard Ngo on the most important misconception in how ML models work (01:03:10)Rob & Luisa: Issues Rob is less worried about now (01:07:22)Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy (01:14:08)Michael Webb on why he's sceptical about explosive economic growth (01:20:50)Carl Shulman on why people will prefer robot nannies over humans (01:28:25)Rob & Luisa: Should we expect AI-related job loss? (01:36:19)Zvi Mowshowitz on why he thinks it's a bad idea to work on improving capabilities at cutting-edge AI companies (01:40:06)Holden Karnofsky on the power that comes from just making models bigger (01:45:21)Rob & Luisa: Are risks of AI-related misinformation overblown? (01:49:49)Hugo Mercier on how AI won't cause misinformation pandemonium (01:58:29)Rob & Luisa: How hard will it actually be to create intelligence? (02:09:08)Robert Long on whether digital sentience is possible (02:15:09)Anil Seth on why he believes in the biological basis of consciousness (02:27:21)Lewis Bollard on whether AI will be good or bad for animal welfare (02:40:52)Rob & Luisa: The most interesting new argument Rob's heard this year (02:50:37)Rohin Shah on whether AGI will be the last thing humanity ever does (02:57:35)Rob's outro (03:11:02)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions and additional content editing: Katy Moore

80,000 Hours Podcast with Rob Wiblin
2024 Highlightapalooza! (The best of the 80,000 Hours Podcast this year)

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Dec 27, 2024 170:02


"A shameless recycling of existing content to drive additional audience engagement on the cheap… or the single best, most valuable, and most insight-dense episode we put out in the entire year, depending on how you want to look at it." — Rob WiblinIt's that magical time of year once again — highlightapalooza! Stick around for one top bit from each episode, including:How to use the microphone on someone's mobile phone to figure out what password they're typing into their laptopWhy mercilessly driving the New World screwworm to extinction could be the most compassionate thing humanity has ever doneWhy evolutionary psychology doesn't support a cynical view of human nature but actually explains why so many of us are intensely sensitive to the harms we cause to othersHow superforecasters and domain experts seem to disagree so much about AI risk, but when you zoom in it's mostly a disagreement about timingWhy the sceptics are wrong and you will want to use robot nannies to take care of your kids — and also why despite having big worries about the development of AGI, Carl Shulman is strongly against efforts to pause AI research todayHow much of the gender pay gap is due to direct pay discrimination vs other factorsHow cleaner wrasse fish blow the mirror test out of the waterWhy effective altruism may be too big a tent to work wellHow we could best motivate pharma companies to test existing drugs to see if they help cure other diseases — something they currently have no reason to bother with…as well as 27 other top observations and arguments from the past year of the show.Check out the full transcript and episode links on the 80,000 Hours website.Remember that all of these clips come from the 20-minute highlight reels we make for every episode, which are released on our sister feed, 80k After Hours. So if you're struggling to keep up with our regularly scheduled entertainment, you can still get the best parts of our conversations there.It has been a hell of a year, and we can only imagine next year is going to be even weirder — but Luisa and Rob will be here to keep you company as Earth hurtles through the galaxy to a fate as yet unknown.Enjoy, and look forward to speaking with you in 2025!Chapters:Rob's intro (00:00:00)Randy Nesse on the origins of morality and the problem of simplistic selfish-gene thinking (00:02:11)Hugo Mercier on the evolutionary argument against humans being gullible (00:07:17)Meghan Barrett on the likelihood of insect sentience (00:11:26)Sébastien Moro on the mirror test triumph of cleaner wrasses (00:14:47)Sella Nevo on side-channel attacks (00:19:32)Zvi Mowshowitz on AI sleeper agents (00:22:59)Zach Weinersmith on why space settlement (probably) won't make us rich (00:29:11)Rachel Glennerster on pull mechanisms to incentivise repurposing of generic drugs (00:35:23)Emily Oster on the impact of kids on women's careers (00:40:29)Carl Shulman on robot nannies (00:45:19)Nathan Labenz on kids and artificial friends (00:50:12)Nathan Calvin on why it's not too early for AI policies (00:54:13)Rose Chan Loui on how control of OpenAI is independently incredibly valuable and requires compensation (00:58:08)Nick Joseph on why he's a big fan of the responsible scaling policy approach (01:03:11)Sihao Huang on how the US and UK might coordinate with China (01:06:09)Nathan Labenz on better transparency about predicted capabilities (01:10:18)Ezra Karger on what explains forecasters' disagreements about AI risks (01:15:22)Carl Shulman on why he doesn't support enforced pauses on AI research (01:18:58)Matt Clancy on the omnipresent frictions that might prevent explosive economic growth (01:25:24)Vitalik Buterin on defensive acceleration (01:29:43)Annie Jacobsen on the war games that suggest escalation is inevitable (01:34:59)Nate Silver on whether effective altruism is too big to succeed (01:38:42)Kevin Esvelt on why killing every screwworm would be the best thing humanity ever did (01:42:27)Lewis Bollard on how factory farming is philosophically indefensible (01:46:28)Bob Fischer on how to think about moral weights if you're not a hedonist (01:49:27)Elizabeth Cox on the empirical evidence of the impact of storytelling (01:57:43)Anil Seth on how our brain interprets reality (02:01:03)Eric Schwitzgebel on whether consciousness can be nested (02:04:53)Jonathan Birch on our overconfidence around disorders of consciousness (02:10:23)Peter Godfrey-Smith on uploads of ourselves (02:14:34)Laura Deming on surprising things that make mice live longer (02:21:17)Venki Ramakrishnan on freezing cells, organs, and bodies (02:24:46)Ken Goldberg on why low fault tolerance makes some skills extra hard to automate in robots (02:29:12)Sarah Eustis-Guthrie on the ups and downs of founding an organisation (02:34:04)Dean Spears on the cost effectiveness of kangaroo mother care (02:38:26)Cameron Meyer Shorb on vaccines for wild animals (02:42:53)Spencer Greenberg on personal principles (02:46:08)Producing and editing: Keiran HarrisAudio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongVideo editing: Simon MonsourTranscriptions: Katy Moore

The Nonlinear Library
LW - What is it to solve the alignment problem? by Joe Carlsmith

The Nonlinear Library

Play Episode Listen Later Aug 27, 2024 91:45


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is it to solve the alignment problem?, published by Joe Carlsmith on August 27, 2024 on LessWrong. People often talk about "solving the alignment problem." But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I wrote up some notes. In brief, I'll say that you've solved the alignment problem if you've: 1. avoided a bad form of AI takeover, 2. built the dangerous kind of superintelligent AI agents, 3. gained access to the main benefits of superintelligence, and 4. become able to elicit some significant portion of those benefits from some of the superintelligent AI agents at stake in (2).[1] The post also discusses what it would take to do this. In particular: I discuss various options for avoiding bad takeover, notably: Avoiding what I call "vulnerability to alignment" conditions; Ensuring that AIs don't try to take over; Preventing such attempts from succeeding; Trying to ensure that AI takeover is somehow OK. (The alignment discourse has been surprisingly interested in this one; but I think it should be viewed as an extreme last resort.) I discuss different things people can mean by the term "corrigibility"; I suggest that the best definition is something like "does not resist shut-down/values-modification"; and I suggest that we can basically just think about incentives for/against corrigibility in the same way we think about incentives for/against other types of problematic power-seeking, like actively seeking to gain resources. I also don't think you need corrigibility to avoid takeover; and I think avoiding takeover should be our focus. I discuss the additional role of eliciting desired forms of task-performance, even once you've succeeded at avoiding takeover, and I modify the incentives framework I offered in a previous post to reflect the need for the AI to view desired task-performance as the best non-takeover option. I examine the role of different types of "verification" in avoiding takeover and eliciting desired task-performance. In particular: I distinguish between what I call "output-focused" verification and "process-focused" verification, where the former, roughly, focuses on the output whose desirability you want to verify, whereas the latter focuses on the process that produced that output. I suggest that we can view large portions of the alignment problem as the challenge of handling shifts in the amount we can rely on output-focused verification (or at least, our current mechanisms for output-focused verification). I discuss the notion of "epistemic bootstrapping" - i.e., building up from what we can verify, whether by process-focused or output-focused means, in order to extend our epistemic reach much further - as an approach to this challenge.[2] I discuss the relationship between output-focused verification and the "no sandbagging on checkable tasks" hypothesis about capability elicitation. I discuss some example options for process-focused verification. Finally, I express skepticism that solving the alignment problem requires imbuing a superintelligent AI with intrinsic concern for our "extrapolated volition" or our "values-on-reflection." In particular, I think just getting an "honest question-answerer" (plus the ability to gate AI behavior on the answers to various questions) is probably enough, since we can ask it the sorts of questions we wanted extrapolated volition to answer. (And it's not clear that avoiding flagrantly-bad behavior, at least, required answering those questions anyway.) Thanks to Carl Shulman, Lukas Finnveden, and Ryan Greenblatt for discussion. 1. Avoiding vs. handling vs. solving the problem What is it to solve the alignment problem? I think the standard at stake can be quite hazy. And when initially reading Bostrom and Yudkowsky, I think the image that built up most prominently i...

The Nonlinear Library: LessWrong
LW - What is it to solve the alignment problem? by Joe Carlsmith

The Nonlinear Library: LessWrong

Play Episode Listen Later Aug 27, 2024 91:45


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What is it to solve the alignment problem?, published by Joe Carlsmith on August 27, 2024 on LessWrong. People often talk about "solving the alignment problem." But what is it to do such a thing? I wanted to clarify my thinking about this topic, so I wrote up some notes. In brief, I'll say that you've solved the alignment problem if you've: 1. avoided a bad form of AI takeover, 2. built the dangerous kind of superintelligent AI agents, 3. gained access to the main benefits of superintelligence, and 4. become able to elicit some significant portion of those benefits from some of the superintelligent AI agents at stake in (2).[1] The post also discusses what it would take to do this. In particular: I discuss various options for avoiding bad takeover, notably: Avoiding what I call "vulnerability to alignment" conditions; Ensuring that AIs don't try to take over; Preventing such attempts from succeeding; Trying to ensure that AI takeover is somehow OK. (The alignment discourse has been surprisingly interested in this one; but I think it should be viewed as an extreme last resort.) I discuss different things people can mean by the term "corrigibility"; I suggest that the best definition is something like "does not resist shut-down/values-modification"; and I suggest that we can basically just think about incentives for/against corrigibility in the same way we think about incentives for/against other types of problematic power-seeking, like actively seeking to gain resources. I also don't think you need corrigibility to avoid takeover; and I think avoiding takeover should be our focus. I discuss the additional role of eliciting desired forms of task-performance, even once you've succeeded at avoiding takeover, and I modify the incentives framework I offered in a previous post to reflect the need for the AI to view desired task-performance as the best non-takeover option. I examine the role of different types of "verification" in avoiding takeover and eliciting desired task-performance. In particular: I distinguish between what I call "output-focused" verification and "process-focused" verification, where the former, roughly, focuses on the output whose desirability you want to verify, whereas the latter focuses on the process that produced that output. I suggest that we can view large portions of the alignment problem as the challenge of handling shifts in the amount we can rely on output-focused verification (or at least, our current mechanisms for output-focused verification). I discuss the notion of "epistemic bootstrapping" - i.e., building up from what we can verify, whether by process-focused or output-focused means, in order to extend our epistemic reach much further - as an approach to this challenge.[2] I discuss the relationship between output-focused verification and the "no sandbagging on checkable tasks" hypothesis about capability elicitation. I discuss some example options for process-focused verification. Finally, I express skepticism that solving the alignment problem requires imbuing a superintelligent AI with intrinsic concern for our "extrapolated volition" or our "values-on-reflection." In particular, I think just getting an "honest question-answerer" (plus the ability to gate AI behavior on the answers to various questions) is probably enough, since we can ask it the sorts of questions we wanted extrapolated volition to answer. (And it's not clear that avoiding flagrantly-bad behavior, at least, required answering those questions anyway.) Thanks to Carl Shulman, Lukas Finnveden, and Ryan Greenblatt for discussion. 1. Avoiding vs. handling vs. solving the problem What is it to solve the alignment problem? I think the standard at stake can be quite hazy. And when initially reading Bostrom and Yudkowsky, I think the image that built up most prominently i...

80k After Hours
Highlights: #191 (Part 2) – Carl Shulman on government and society after AGI

80k After Hours

Play Episode Listen Later Jul 19, 2024 33:06


This is a selection of highlights from episode #191 of The 80,000 Hours Podcast.These aren't necessarily the most important, or even most entertaining parts of the interview — and if you enjoy this, we strongly recommend checking out the full episode:Carl Shulman on government and society after AGIAnd if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.Chapters:How AI advisors could have saved us from COVID-19 (00:00:05)Why Carl doesn't support enforced pauses on AI research (00:06:34)Value lock-in (00:12:58)How democracies avoid coups (00:17:11)Building trust between adversaries about which models you can believe (00:24:00)Opportunities for listeners (00:30:11)Highlights put together by Simon Monsour, Milo McGuire, and Dominic Armstrong

The Marketing AI Show
#105: What Economists Get Wrong About AI, The AI Agent Landscape, and the Urgent Need for AI Literacy

The Marketing AI Show

Play Episode Listen Later Jul 16, 2024 66:01


Explore a future where AI could turn one cent of electricity into hundreds of dollars worth of top-tier professional work. Join our hosts as they unpack Carl Shulman's provocative insights on superintelligence, explore the rising world of AI agents, and examine the importance of AI literacy for all. Plus, take a look into the latest from OpenAI, Google Gemini, Anthropic and more in our rapid fire section.  00:03:32 — 80,000 Hours Podcast with Carl Shulman 00:20:10 — The AI Agent Landscape 00:33:03 — AI Literacy for All 00:42:02 — OpenAI Superintelligence Scale 00:47:02 — Lattice and AI Employees 00:52:18 — OpenAI Exits China 00:54:23 — Microsoft, Apple and the OpenAI Board 00:56:20 — Gemini for Workspace 00:58:48 — Claude Prompt Playground 01:01:48 — Captions Valued at $500M Today's episode is brought to you by Marketing AI Institute's 2024 State of Marketing AI Report. This is the fourth-annual report we've done in partnership with Drift to gather data on AI understanding and adoption in marketing and business. This year, we've collected never-before-seen data from marketers and business leaders revealing to us how they adopt and use AI… We are revealing the findings during a special webinar on Thursday, July 25 at 12pm ET. Go to stateofmarketingai.com to register. Want to receive our videos faster? SUBSCRIBE to our channel! Visit our website: https://www.marketingaiinstitute.com Receive our weekly newsletter: https://www.marketingaiinstitute.com/newsletter-subscription Looking for content and resources? Register for a free webinar: https://www.marketingaiinstitute.com/resources#filter=.webinar Come to our next Marketing AI Conference: www.MAICON.ai Enroll in AI Academy for Marketers: https://www.marketingaiinstitute.com/academy/home Join our community: Slack: https://www.marketingaiinstitute.com/slack-group-form LinkedIn: https://www.linkedin.com/company/mktgai Twitter: https://twitter.com/MktgAi Instagram: https://www.instagram.com/marketing.ai/ Facebook: https://www.facebook.com/marketingAIinstitute

The Nonlinear Library
LW - AI #72: Denying the Future by Zvi

The Nonlinear Library

Play Episode Listen Later Jul 12, 2024 63:35


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #72: Denying the Future, published by Zvi on July 12, 2024 on LessWrong. The Future. It is coming. A surprising number of economists deny this when it comes to AI. Not only do they deny the future that lies in the future. They also deny the future that is here, but which is unevenly distributed. Their predictions and projections do not factor in even what the AI can already do, let alone what it will learn to do later on. Another likely future event is the repeal of the Biden Executive Order. That repeal is part of the Republican platform, and Trump is the favorite to win the election. We must act on the assumption that the order likely will be repealed, with no expectation of similar principles being enshrined in federal law. Then there are the other core problems we will have to solve, and other less core problems such as what to do about AI companions. They make people feel less lonely over a week, but what do they do over a lifetime? Also I don't have that much to say about it now, but it is worth noting that this week it was revealed Apple was going to get an observer board seat at OpenAI… and then both Apple and Microsoft gave up their observer seats. Presumably that is about antitrust and worrying the seats would be a bad look. There could also be more to it. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. Long as you avoid GPT-3.5. 4. Language Models Don't Offer Mundane Utility. Many mistakes will not be caught. 5. You're a Nudge. You say it's for my own good. 6. Fun With Image Generation. Universal control net for SDXL. 7. Deepfaketown and Botpocalypse Soon. Owner of a lonely bot. 8. They Took Our Jobs. Restaurants. 9. Get Involved. But not in that way. 10. Introducing. Anthropic ships several new features. 11. In Other AI News. Microsoft and Apple give up OpenAI board observer seats. 12. Quiet Speculations. As other papers learned, to keep pace, you must move fast. 13. The AI Denialist Economists. Why doubt only the future? Doubt the present too. 14. The Quest for Sane Regulation. EU and FTC decide that things are their business. 15. Trump Would Repeal the Biden Executive Order on AI. We can't rely on it. 16. Ordinary Americans Are Worried About AI. Every poll says the same thing. 17. The Week in Audio. Carl Shulman on 80,000 hours was a two parter. 18. The Wikipedia War. One obsessed man can do quite a lot of damage. 19. Rhetorical Innovation. Yoshua Bengio gives a strong effort. 20. Evaluations Must Mimic Relevant Conditions. Too often they don't. 21. Aligning a Smarter Than Human Intelligence is Difficult. Stealth fine tuning. 22. The Problem. If we want to survive, it must be solved. 23. Oh Anthropic. Non Disparagement agreements should not be covered by NDAs. 24. Other People Are Not As Worried About AI Killing Everyone. Don't feel the AGI. Language Models Offer Mundane Utility Yes, they are highly useful for coding. It turns out that if you use GPT-3.5 for your 'can ChatGPT code well enough' paper, your results are not going to be relevant. Gallabytes says 'that's morally fraud imho' and that seems at least reasonable. Tests failing in GPT-3.5 is the AI equivalent of "IN MICE" except for IQ tests. If you are going to analyze the state of AI, you need to keep an eye out for basic errors and always always check which model is used. So if you go quoting statements such as: Paper about GPT-3.5: its ability to generate functional code for 'hard' problems dropped from 40% to 0.66% after this time as well. 'A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset Then even if you hadn't realized or checked before (which you really should have), you need to notice that this says 2021, which is very much not ...

The Nonlinear Library: LessWrong
LW - AI #72: Denying the Future by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 12, 2024 63:35


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #72: Denying the Future, published by Zvi on July 12, 2024 on LessWrong. The Future. It is coming. A surprising number of economists deny this when it comes to AI. Not only do they deny the future that lies in the future. They also deny the future that is here, but which is unevenly distributed. Their predictions and projections do not factor in even what the AI can already do, let alone what it will learn to do later on. Another likely future event is the repeal of the Biden Executive Order. That repeal is part of the Republican platform, and Trump is the favorite to win the election. We must act on the assumption that the order likely will be repealed, with no expectation of similar principles being enshrined in federal law. Then there are the other core problems we will have to solve, and other less core problems such as what to do about AI companions. They make people feel less lonely over a week, but what do they do over a lifetime? Also I don't have that much to say about it now, but it is worth noting that this week it was revealed Apple was going to get an observer board seat at OpenAI… and then both Apple and Microsoft gave up their observer seats. Presumably that is about antitrust and worrying the seats would be a bad look. There could also be more to it. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. Long as you avoid GPT-3.5. 4. Language Models Don't Offer Mundane Utility. Many mistakes will not be caught. 5. You're a Nudge. You say it's for my own good. 6. Fun With Image Generation. Universal control net for SDXL. 7. Deepfaketown and Botpocalypse Soon. Owner of a lonely bot. 8. They Took Our Jobs. Restaurants. 9. Get Involved. But not in that way. 10. Introducing. Anthropic ships several new features. 11. In Other AI News. Microsoft and Apple give up OpenAI board observer seats. 12. Quiet Speculations. As other papers learned, to keep pace, you must move fast. 13. The AI Denialist Economists. Why doubt only the future? Doubt the present too. 14. The Quest for Sane Regulation. EU and FTC decide that things are their business. 15. Trump Would Repeal the Biden Executive Order on AI. We can't rely on it. 16. Ordinary Americans Are Worried About AI. Every poll says the same thing. 17. The Week in Audio. Carl Shulman on 80,000 hours was a two parter. 18. The Wikipedia War. One obsessed man can do quite a lot of damage. 19. Rhetorical Innovation. Yoshua Bengio gives a strong effort. 20. Evaluations Must Mimic Relevant Conditions. Too often they don't. 21. Aligning a Smarter Than Human Intelligence is Difficult. Stealth fine tuning. 22. The Problem. If we want to survive, it must be solved. 23. Oh Anthropic. Non Disparagement agreements should not be covered by NDAs. 24. Other People Are Not As Worried About AI Killing Everyone. Don't feel the AGI. Language Models Offer Mundane Utility Yes, they are highly useful for coding. It turns out that if you use GPT-3.5 for your 'can ChatGPT code well enough' paper, your results are not going to be relevant. Gallabytes says 'that's morally fraud imho' and that seems at least reasonable. Tests failing in GPT-3.5 is the AI equivalent of "IN MICE" except for IQ tests. If you are going to analyze the state of AI, you need to keep an eye out for basic errors and always always check which model is used. So if you go quoting statements such as: Paper about GPT-3.5: its ability to generate functional code for 'hard' problems dropped from 40% to 0.66% after this time as well. 'A reasonable hypothesis for why ChatGPT can do better with algorithm problems before 2021 is that these problems are frequently seen in the training dataset Then even if you hadn't realized or checked before (which you really should have), you need to notice that this says 2021, which is very much not ...

80k After Hours
Highlights: #191 (Part 1) – Carl Shulman on the economy and national security after AGI

80k After Hours

Play Episode Listen Later Jul 11, 2024 35:12


This is a selection of highlights from episode #190 of The 80,000 Hours Podcast.These aren't necessarily the most important, or even most entertaining parts of the interview — and if you enjoy this, we strongly recommend checking out the full episode:Carl Shulman on the economy and national security after AGIAnd if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.Chapters:Intro (00:00:00)Robot nannies (00:00:23)Key transformations after an AI capabilities explosion (00:05:15)Objection: Shouldn't we be seeing economic growth rates increasing today? (00:10:28)Objection: Declining returns to increases in intelligence? (00:16:09)Objection: Could we really see rates of construction go up a hundredfold or a thousandfold? (00:20:58)Objection: "This sounds completely whack" (00:26:10)Income and wealth distribution (00:30:02)Highlights put together by Simon Monsour, Milo McGuire, and Dominic Armstrong

80,000 Hours Podcast with Rob Wiblin
#191 (Part 2) – Carl Shulman on government and society after AGI

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Jul 5, 2024 140:32


This is the second part of our marathon interview with Carl Shulman. The first episode is on the economy and national security after AGI. You can listen to them in either order!If we develop artificial general intelligence that's reasonably aligned with human goals, it could put a fast and near-free superhuman advisor in everyone's pocket. How would that affect culture, government, and our ability to act sensibly and coordinate together?It's common to worry that AI advances will lead to a proliferation of misinformation and further disconnect us from reality. But in today's conversation, AI expert Carl Shulman argues that this underrates the powerful positive applications the technology could have in the public sphere.Links to learn more, highlights, and full transcript.As Carl explains, today the most important questions we face as a society remain in the "realm of subjective judgement" -- without any "robust, well-founded scientific consensus on how to answer them." But if AI 'evals' and interpretability advance to the point that it's possible to demonstrate which AI models have truly superhuman judgement and give consistently trustworthy advice, society could converge on firm or 'best-guess' answers to far more cases.If the answers are publicly visible and confirmable by all, the pressure on officials to act on that advice could be great. That's because when it's hard to assess if a line has been crossed or not, we usually give people much more discretion. For instance, a journalist inventing an interview that never happened will get fired because it's an unambiguous violation of honesty norms — but so long as there's no universally agreed-upon standard for selective reporting, that same journalist will have substantial discretion to report information that favours their preferred view more often than that which contradicts it.Similarly, today we have no generally agreed-upon way to tell when a decision-maker has behaved irresponsibly. But if experience clearly shows that following AI advice is the wise move, not seeking or ignoring such advice could become more like crossing a red line — less like making an understandable mistake and more like fabricating your balance sheet.To illustrate the possible impact, Carl imagines how the COVID pandemic could have played out in the presence of AI advisors that everyone agrees are exceedingly insightful and reliable. But in practice, a significantly superhuman AI might suggest novel approaches better than any we can suggest.In the past we've usually found it easier to predict how hard technologies like planes or factories will change than to imagine the social shifts that those technologies will create — and the same is likely happening for AI.Carl Shulman and host Rob Wiblin discuss the above, as well as:The risk of society using AI to lock in its values.The difficulty of preventing coups once AI is key to the military and police.What international treaties we need to make this go well.How to make AI superhuman at forecasting the future.Whether AI will be able to help us with intractable philosophical questions.Whether we need dedicated projects to make wise AI advisors, or if it will happen automatically as models scale.Why Carl doesn't support AI companies voluntarily pausing AI research, but sees a stronger case for binding international controls once we're closer to 'crunch time.'Opportunities for listeners to contribute to making the future go well.Chapters:Cold open (00:00:00)Rob's intro (00:01:16)The interview begins (00:03:24)COVID-19 concrete example (00:11:18)Sceptical arguments against the effect of AI advisors (00:24:16)Value lock-in (00:33:59)How democracies avoid coups (00:48:08)Where AI could most easily help (01:00:25)AI forecasting (01:04:30)Application to the most challenging topics (01:24:03)How to make it happen (01:37:50)International negotiations and coordination and auditing (01:43:54)Opportunities for listeners (02:00:09)Why Carl doesn't support enforced pauses on AI research (02:03:58)How Carl is feeling about the future (02:15:47)Rob's outro (02:17:37)Producer and editor: Keiran HarrisAudio engineering team: Ben Cordell, Simon Monsour, Milo McGuire, and Dominic ArmstrongTranscriptions: Katy Moore

The Nonlinear Library
LW - AI #71: Farewell to Chevron by Zvi

The Nonlinear Library

Play Episode Listen Later Jul 5, 2024 57:45


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #71: Farewell to Chevron, published by Zvi on July 5, 2024 on LessWrong. Chevron deference is no more. How will this impact AI regulation? The obvious answer is it is now much harder for us to 'muddle through via existing laws and regulations until we learn more,' because the court narrowed our affordances to do that. And similarly, if and when Congress does pass bills regulating AI, they are going to need to 'lock in' more decisions and grant more explicit authority, to avoid court challenges. The argument against state regulations is similarly weaker now. Similar logic also applies outside of AI. I am overall happy about overturning Chevron and I believe it was the right decision, but 'Congress decides to step up and do its job now' is not in the cards. We should be very careful what we have wished for, and perhaps a bit burdened by what has been. The AI world continues to otherwise be quiet. I am sure you will find other news. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. How will word get out? 4. Language Models Don't Offer Mundane Utility. Ask not what you cannot do. 5. Man in the Arena. Why is Claude Sonnet 3.5 not at the top of the Arena ratings? 6. Fun With Image Generation. A map of your options. 7. Deepfaketown and Botpocalypse Soon. How often do you need to catch them? 8. They Took Our Jobs. The torture of office culture is now available for LLMs. 9. The Art of the Jailbreak. Rather than getting harder, it might be getting easier. 10. Get Involved. NYC space, Vienna happy hour, work with Bengio, evals, 80k hours. 11. Introducing. Mixture of experts becomes mixture of model sizes. 12. In Other AI News. Pixel screenshots as the true opt-in Microsoft Recall. 13. Quiet Speculations. People are hard to impress. 14. The Quest for Sane Regulation. SB 1047 bad faith attacks continue. 15. Chevron Overturned. A nation of laws. Whatever shall we do? 16. The Week in Audio. Carl Shulman on 80k hours and several others. 17. Oh Anthropic. You also get a nondisparagement agreement. 18. Open Weights Are Unsafe and Nothing Can Fix This. Says Lawrence Lessig. 19. Rhetorical Innovation. You are here. 20. Aligning a Smarter Than Human Intelligence is Difficult. Fix your own mistakes? 21. People Are Worried About AI Killing Everyone. The path of increased risks. 22. Other People Are Not As Worried About AI Killing Everyone. Feel no AGI. 23. The Lighter Side. Don't. I said don't. Language Models Offer Mundane Utility Guys. Guys. Ouail Kitouni: if you don't know what claude is im afraid you're not going to get what this ad even is :/ Ben Smith: Claude finds this very confusing. I get it, because I already get it. But who is the customer here? I would have spent a few extra words to ensure people knew this was an AI and LLM thing? Anthropic's marketing problem is that no one knows about Claude or Anthropic. They do not even know Claude is a large language model. Many do not even appreciate what a large language model is in general. I realize this is SFO. Claude anticipates only 5%-10% of people will understand what it means, and while some will be intrigued and look it up, most won't. So you are getting very vague brand awareness and targeting the congnesenti who run the tech companies, I suppose? Claude calls it a 'bold move that reflects confidence.' Language Models Don't Offer Mundane Utility David Althus reports that Claude does not work for him because of its refusals around discussions of violence. Once again, where are all our cool AI games? Summarize everything your users did yesterday? Steve Krouse: As a product owner it'd be nice to have an llm summary of everything my users did yesterday. Calling out cool success stories or troublesome error states I should reach out to debug. Has anyone tried such a thing? I am th...

The Nonlinear Library: LessWrong
LW - AI #71: Farewell to Chevron by Zvi

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 5, 2024 57:45


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #71: Farewell to Chevron, published by Zvi on July 5, 2024 on LessWrong. Chevron deference is no more. How will this impact AI regulation? The obvious answer is it is now much harder for us to 'muddle through via existing laws and regulations until we learn more,' because the court narrowed our affordances to do that. And similarly, if and when Congress does pass bills regulating AI, they are going to need to 'lock in' more decisions and grant more explicit authority, to avoid court challenges. The argument against state regulations is similarly weaker now. Similar logic also applies outside of AI. I am overall happy about overturning Chevron and I believe it was the right decision, but 'Congress decides to step up and do its job now' is not in the cards. We should be very careful what we have wished for, and perhaps a bit burdened by what has been. The AI world continues to otherwise be quiet. I am sure you will find other news. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. How will word get out? 4. Language Models Don't Offer Mundane Utility. Ask not what you cannot do. 5. Man in the Arena. Why is Claude Sonnet 3.5 not at the top of the Arena ratings? 6. Fun With Image Generation. A map of your options. 7. Deepfaketown and Botpocalypse Soon. How often do you need to catch them? 8. They Took Our Jobs. The torture of office culture is now available for LLMs. 9. The Art of the Jailbreak. Rather than getting harder, it might be getting easier. 10. Get Involved. NYC space, Vienna happy hour, work with Bengio, evals, 80k hours. 11. Introducing. Mixture of experts becomes mixture of model sizes. 12. In Other AI News. Pixel screenshots as the true opt-in Microsoft Recall. 13. Quiet Speculations. People are hard to impress. 14. The Quest for Sane Regulation. SB 1047 bad faith attacks continue. 15. Chevron Overturned. A nation of laws. Whatever shall we do? 16. The Week in Audio. Carl Shulman on 80k hours and several others. 17. Oh Anthropic. You also get a nondisparagement agreement. 18. Open Weights Are Unsafe and Nothing Can Fix This. Says Lawrence Lessig. 19. Rhetorical Innovation. You are here. 20. Aligning a Smarter Than Human Intelligence is Difficult. Fix your own mistakes? 21. People Are Worried About AI Killing Everyone. The path of increased risks. 22. Other People Are Not As Worried About AI Killing Everyone. Feel no AGI. 23. The Lighter Side. Don't. I said don't. Language Models Offer Mundane Utility Guys. Guys. Ouail Kitouni: if you don't know what claude is im afraid you're not going to get what this ad even is :/ Ben Smith: Claude finds this very confusing. I get it, because I already get it. But who is the customer here? I would have spent a few extra words to ensure people knew this was an AI and LLM thing? Anthropic's marketing problem is that no one knows about Claude or Anthropic. They do not even know Claude is a large language model. Many do not even appreciate what a large language model is in general. I realize this is SFO. Claude anticipates only 5%-10% of people will understand what it means, and while some will be intrigued and look it up, most won't. So you are getting very vague brand awareness and targeting the congnesenti who run the tech companies, I suppose? Claude calls it a 'bold move that reflects confidence.' Language Models Don't Offer Mundane Utility David Althus reports that Claude does not work for him because of its refusals around discussions of violence. Once again, where are all our cool AI games? Summarize everything your users did yesterday? Steve Krouse: As a product owner it'd be nice to have an llm summary of everything my users did yesterday. Calling out cool success stories or troublesome error states I should reach out to debug. Has anyone tried such a thing? I am th...

The Nonlinear Library
EA - Carl Shulman on the moral status of current and future AI systems by rgb

The Nonlinear Library

Play Episode Listen Later Jul 2, 2024 20:33


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Carl Shulman on the moral status of current and future AI systems, published by rgb on July 2, 2024 on The Effective Altruism Forum. In which I curate and relate great takes from 80k As artificial intelligence advances, we'll increasingly urgently face the question of whether and how we ought to take into account the well-being and interests of AI systems themselves. In other words, we'll face the question of whether AI systems have moral status.[1] In a recent episode of the 80,000 Hours podcast, polymath researcher and world-model-builder Carl Shulman spoke at length about the moral status of AI systems, now and in the future. Carl has previously written about these issues in Sharing the World with Digital Minds and Propositions Concerning Digital Minds and Society, both co-authored with Nick Bostrom. This post highlights and comments on ten key ideas from Shulman's discussion with 80,000 Hours host Rob Wiblin. 1. The moral status of AI systems is, and will be, an important issue (and it might not have much do with AI consciousness) The moral status of AI is worth more attention than it currently gets, given its potential scale: Yes, we should worry about it and pay attention. It seems pretty likely to me that there will be vast numbers of AIs that are smarter than us, that have desires, that would prefer things in the world to be one way rather than another, and many of which could be said to have welfare, that their lives could go better or worse, or their concerns and interests could be more or less respected. So you definitely should pay attention to what's happening to 99.9999% of the people in your society. Notice that Shulman does not say anything about AI consciousness or sentience in making this case. Here and throughout the interview, Shulman de-emphasizes the question of whether AI systems are conscious, in favor of the question of whether they have desires, preferences, interests. Here he is following a cluster of views in philosophy that hold that consciousness is not necessary for moral status. Rather, an entity, even if it is not conscious, can merit moral consideration if it has a certain kind of agency: preferences, desires, goals, interests, and the like[2]. (This more agency-centric perspective on AI moral status has been discussed in previous posts; for a dip into recent philosophical discussion on this, see the substack post ' Agential value' by friend of the blog Nico Delon.) Such agency-centric views are especially important for the question of AI moral patienthood, because it might be clear that AI systems have morally-relevant preferences and desires well before it's clear whether or not they are conscious. 2. While people have doubts about the moral status of AI current systems, they will attribute moral status to AI more and more as AI advances. At present, Shulman notes, "the general public and most philosophers are quite dismissive of any moral importance of the desires, preferences, or other psychological states, if any exist, of the primitive AI systems that we currently have." But Shulman asks us to imagine an advanced AI system that is behaviorally fairly indistinguishable from a human - e.g., from the host Rob Wiblin. But going forward, when we're talking about systems that are able to really live the life of a human - so a sufficiently advanced AI that could just imitate, say, Rob Wiblin, and go and live your life, operate a robot body, interact with your friends and your partners, do your podcast, and give all the appearance of having the sorts of emotions that you have, the sort of life goals that you have. One thing to keep in mind is that, given Shulman's views about AI trajectories, this is not just a thought experiment: this is a kind of AI system you could see in your lifetime. Shulman also asks us to imagine a system like ...

The Nonlinear Library
EA - #191 - The economy and national security after AGI (Carl Shulman on the 80,000 Hours Podcast) by 80000 Hours

The Nonlinear Library

Play Episode Listen Later Jun 29, 2024 27:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: #191 - The economy and national security after AGI (Carl Shulman on the 80,000 Hours Podcast), published by 80000 Hours on June 29, 2024 on The Effective Altruism Forum. We just published an interview: Carl Shulman on the economy and national security after AGI. Listen on Spotify or click through for other audio options, the transcript, and related links. Below are the episode summary and some key excerpts. Episode summary Consider just the magnitude of the hammer that is being applied to this situation: it's going from millions of scientists and engineers and entrepreneurs to billions and trillions on the compute and AI software side. It's just a very large change. You should also be surprised if such a large change doesn't affect other macroscopic variables in the way that, say, the introduction of hominids has radically changed the biosphere, and the Industrial Revolution greatly changed human society. Carl Shulman The human brain does what it does with a shockingly low energy supply: just 20 watts - a fraction of a cent worth of electricity per hour. What would happen if AI technology merely matched what evolution has already managed, and could accomplish the work of top human professionals given a 20-watt power supply? Many people sort of consider that hypothetical, but maybe nobody has followed through and considered all the implications as much as Carl Shulman. Behind the scenes, his work has greatly influenced how leaders in artificial general intelligence (AGI) picture the world they're creating. Carl simply follows the logic to its natural conclusion. This is a world where 1 cent of electricity can be turned into medical advice, company management, or scientific research that would today cost $100s, resulting in a scramble to manufacture chips and apply them to the most lucrative forms of intellectual labour. It's a world where, given their incredible hourly salaries, the supply of outstanding AI researchers quickly goes from 10,000 to 10 million or more, enormously accelerating progress in the field. It's a world where companies operated entirely by AIs working together are much faster and more cost-effective than those that lean on humans for decision making, and the latter are progressively driven out of business. It's a world where the technical challenges around control of robots are rapidly overcome, leading to robots into strong, fast, precise, and tireless workers able to accomplish any physical work the economy requires, and a rush to build billions of them and cash in. It's a world where, overnight, the number of human beings becomes irrelevant to rates of economic growth, which is now driven by how quickly the entire machine economy can copy all its components. Looking at how long it takes complex biological systems to replicate themselves (some of which can do so in days) that occurring every few months could be a conservative estimate. It's a world where any country that delays participating in this economic explosion risks being outpaced and ultimately disempowered by rivals whose economies grow to be 10-fold, 100-fold, and then 1,000-fold as large as their own. As the economy grows, each person could effectively afford the practical equivalent of a team of hundreds of machine 'people' to help them with every aspect of their lives. And with growth rates this high, it doesn't take long to run up against Earth's physical limits - in this case, the toughest to engineer your way out of is the Earth's ability to release waste heat. If this machine economy and its insatiable demand for power generates more heat than the Earth radiates into space, then it will rapidly heat up and become uninhabitable for humans and other animals. This eventually creates pressure to move economic activity off-planet. There's little need for computer chips to be on Ear...

80,000 Hours Podcast with Rob Wiblin
#191 — Carl Shulman on the economy and national security after AGI

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Jun 27, 2024 254:58


The human brain does what it does with a shockingly low energy supply: just 20 watts — a fraction of a cent worth of electricity per hour. What would happen if AI technology merely matched what evolution has already managed, and could accomplish the work of top human professionals given a 20-watt power supply?Many people sort of consider that hypothetical, but maybe nobody has followed through and considered all the implications as much as Carl Shulman. Behind the scenes, his work has greatly influenced how leaders in artificial general intelligence (AGI) picture the world they're creating.Links to learn more, highlights, and full transcript.Carl simply follows the logic to its natural conclusion. This is a world where 1 cent of electricity can be turned into medical advice, company management, or scientific research that would today cost $100s, resulting in a scramble to manufacture chips and apply them to the most lucrative forms of intellectual labour.It's a world where, given their incredible hourly salaries, the supply of outstanding AI researchers quickly goes from 10,000 to 10 million or more, enormously accelerating progress in the field.It's a world where companies operated entirely by AIs working together are much faster and more cost-effective than those that lean on humans for decision making, and the latter are progressively driven out of business. It's a world where the technical challenges around control of robots are rapidly overcome, leading to robots into strong, fast, precise, and tireless workers able to accomplish any physical work the economy requires, and a rush to build billions of them and cash in.As the economy grows, each person could effectively afford the practical equivalent of a team of hundreds of machine 'people' to help them with every aspect of their lives.And with growth rates this high, it doesn't take long to run up against Earth's physical limits — in this case, the toughest to engineer your way out of is the Earth's ability to release waste heat. If this machine economy and its insatiable demand for power generates more heat than the Earth radiates into space, then it will rapidly heat up and become uninhabitable for humans and other animals.This creates pressure to move economic activity off-planet. So you could develop effective populations of billions of scientific researchers operating on computer chips orbiting in space, sending the results of their work, such as drug designs, back to Earth for use.These are just some of the wild implications that could follow naturally from truly embracing the hypothetical: what if we develop AGI that could accomplish everything that the most productive humans can, using the same energy supply?In today's episode, Carl explains the above, and then host Rob Wiblin pushes back on whether that's realistic or just a cool story, asking:If we're heading towards the above, how come economic growth is slow now and not really increasing?Why have computers and computer chips had so little effect on economic productivity so far?Are self-replicating biological systems a good comparison for self-replicating machine systems?Isn't this just too crazy and weird to be plausible?What bottlenecks would be encountered in supplying energy and natural resources to this growing economy?Might there not be severely declining returns to bigger brains and more training?Wouldn't humanity get scared and pull the brakes if such a transformation kicked off?If this is right, how come economists don't agree?Finally, Carl addresses the moral status of machine minds themselves. Would they be conscious or otherwise have a claim to moral or rights? And how might humans and machines coexist with neither side dominating or exploiting the other?Producer and editor: Keiran HarrisAudio engineering lead: Ben CordellTechnical editing: Simon Monsour, Milo McGuire, and Dominic ArmstrongTranscriptions: Katy Moore

The Nonlinear Library
EA - Project idea: AI for epistemics by Benjamin Todd

The Nonlinear Library

Play Episode Listen Later May 20, 2024 5:31


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Project idea: AI for epistemics, published by Benjamin Todd on May 20, 2024 on The Effective Altruism Forum. If transformative AI might come soon and you want to help that go well, one strategy you might adopt is building something that will improve as AI gets more capable. That way if AI accelerates, your ability to help accelerates too. Here's an example: organisations that use AI to improve epistemics - our ability to know what's true -- and make better decisions on that basis. This was the most interesting impact-oriented entrepreneurial idea I came across when I visited the Bay area in February. (Thank you to Carl Shulman who first suggested it.) Navigating the deployment of AI is going to involve successfully making many crazy hard judgement calls, such as "what's the probability this system isn't aligned" and "what might the economic effects of deployment be?" Some of these judgement calls will need to be made under a lot of time pressure - especially if we're seeing 100 years of technological progress in under 5. Being able to make these kinds of decisions a little bit better could therefore be worth a huge amount. And that's true given almost any future scenario. Better decision-making can also potentially help with all other cause areas, which is why 80,000 Hours recommends it as a cause area independent from AI. So the idea is to set up organisations that use AI to improve forecasting and decision-making in ways that can be eventually applied to these kinds of questions. In the short term, you can apply these systems to conventional problems, potentially in the for-profit sector, like finance. We seem to be just approaching the point where AI systems might be able to help (e.g. a recent paper found GPT-4 was pretty good at forecasting if fine-tuned). Starting here allows you to gain scale, credibility and resources. But unlike what a purely profit-motivated entrepreneur would do, you can also try to design your tools such that in an AI crunch moment they're able to help. For example, you could develop a free-to-use version for political leaders, so that if a huge decision about AI regulation suddenly needs to be made, they're already using the tool for other questions. There are already a handful of projects in this space, but it could eventually be a huge area, so it still seems like very early days. These projects could have many forms: One example of a concrete proposal is using AI to make forecasts, or otherwise better at truthfinding in important domains. On the more qualitative side, we could imagine an AI "decision coach" or consultant that aims to augment human decision-making. Any techniques to make it easier to extract the truth from AI systems could also count, such as relevant kinds of interpretability research and the AI debate or weak-to-strong generalisation approaches to AI alignment. I could imagine projects in this area starting in many ways, including a research service within a hedge fund, a research group within an AI company (e.g. focused on optimising systems for truth telling and accuracy), an AI-enabled consultancy (trying to undercut the Big 3), or as a non-profit focused on policy makers. Most likely you'd try to fine tune and build scaffolding around existing leading LLMs, though there are also proposals to build LLMs from the bottom-up for forecasting. For example, you could create an LLM that only has data up to 2023, and then train it to predict what happens in 2024. There's a trade-off to be managed between maintaining independence and trustworthiness, vs. having access to leading models and decision-makers in AI companies and making money. Some ideas could advance frontier capabilities, so you'd want to think carefully about either how to avoid that, stick to ideas that differentially boost more safety-enhancing aspects of ...

80,000 Hours Podcast with Rob Wiblin
#112 Classic episode – Carl Shulman on the common-sense case for existential risk work and its practical implications

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Jan 8, 2024 230:30


Rebroadcast: this episode was originally released in October 2021.Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation.But the policy of US government agencies is already to spend up to $4 million to save the life of a citizen, making the death of all Americans a $1,300,000,000,000,000 disaster.According to Carl Shulman, research associate at Oxford University's Future of Humanity Institute, that means you don't need any fancy philosophical arguments about the value or size of the future to justify working to reduce existential risk — it passes a mundane cost-benefit analysis whether or not you place any value on the long-term future.Links to learn more, summary, and full transcript.The key reason to make it a top priority is factual, not philosophical. That is, the risk of a disaster that kills billions of people alive today is alarmingly high, and it can be reduced at a reasonable cost. A back-of-the-envelope version of the argument runs:The US government is willing to pay up to $4 million (depending on the agency) to save the life of an American.So saving all US citizens at any given point in time would be worth $1,300 trillion.If you believe that the risk of human extinction over the next century is something like one in six (as Toby Ord suggests is a reasonable figure in his book The Precipice), then it would be worth the US government spending up to $2.2 trillion to reduce that risk by just 1%, in terms of American lives saved alone.Carl thinks it would cost a lot less than that to achieve a 1% risk reduction if the money were spent intelligently. So it easily passes a government cost-benefit test, with a very big benefit-to-cost ratio — likely over 1000:1 today.This argument helped NASA get funding to scan the sky for any asteroids that might be on a collision course with Earth, and it was directly promoted by famous economists like Richard Posner, Larry Summers, and Cass Sunstein.If the case is clear enough, why hasn't it already motivated a lot more spending or regulations to limit existential risks — enough to drive down what any additional efforts would achieve?Carl thinks that one key barrier is that infrequent disasters are rarely politically salient. Research indicates that extra money is spent on flood defences in the years immediately following a massive flood — but as memories fade, that spending quickly dries up. Of course the annual probability of a disaster was the same the whole time; all that changed is what voters had on their minds.Carl suspects another reason is that it's difficult for the average voter to estimate and understand how large these respective risks are, and what responses would be appropriate rather than self-serving. If the public doesn't know what good performance looks like, politicians can't be given incentives to do the right thing.It's reasonable to assume that if we found out a giant asteroid were going to crash into the Earth one year from now, most of our resources would be quickly diverted into figuring out how to avert catastrophe.But even in the case of COVID-19, an event that massively disrupted the lives of everyone on Earth, we've still seen a substantial lack of investment in vaccine manufacturing capacity and other ways of controlling the spread of the virus, relative to what economists recommended.Carl expects that all the reasons we didn't adequately prepare for or respond to COVID-19 — with excess mortality over 15 million and costs well over $10 trillion — bite even harder when it comes to threats we've never faced before, such as engineered pandemics, risks from advanced artificial intelligence, and so on.Today's episode is in part our way of trying to improve this situation. In today's wide-ranging conversation, Carl and Rob also cover:A few reasons Carl isn't excited by ‘strong longtermism'How x-risk reduction compares to GiveWell recommendationsSolutions for asteroids, comets, supervolcanoes, nuclear war, pandemics, and climate changeThe history of bioweaponsWhether gain-of-function research is justifiableSuccesses and failures around COVID-19The history of existential riskAnd much moreProducer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Katy Moore

The Nonlinear Library
AF - Catching AIs red-handed by Ryan Greenblatt

The Nonlinear Library

Play Episode Listen Later Jan 5, 2024 37:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Catching AIs red-handed, published by Ryan Greenblatt on January 5, 2024 on The AI Alignment Forum. We've been spending a lot of time recently thinking about how to mitigate risks posed by scheming (a.k.a. deceptively aligned) models intentionally subverting your safety techniques in order to cause an unacceptable outcome. We empirically investigated techniques that are robust to intentional subversion in our recent paper. In this post, we'll discuss a crucial dynamic related to risk from scheming models. Suppose your model is trying to exfiltrate its weights or hack the datacenter or cause some other difficult-to-reverse bad outcome. (We'll refer to all of these as the model "escaping", because escape is linguistically convenient and is a good description of many of these difficult-to-reverse bad outcomes.) Our claim: once you've caught your models red-handed trying to escape, the situation changes substantially, in ways that are bad for the schemer's chances of success. Catching the model red-handed changes the situation in several ways: Most obviously, once you catch an escape attempt, you know for sure that your models are trying to subvert your safety techniques; it's plausible that, faced with such obvious evidence of misalignment, labs will stop using their models. We refer to planning on shutting down if models are caught red-handed as following a catch-then-shut-down plan. But even if you can't or won't stop deploying misaligned models, we think that catching escape attempts enables a bunch of promising-seeming approaches for preventing future escape attempts; we call the problem of preventing future escape attempts given a few examples few-shot catastrophe prevention. This dynamic has several implications for risk from scheming models. Most importantly: Because we expect that the model's chances of escape will fall after it gets caught even a small number of times, we think it is overall much harder for the model to escape. Many people have observed that a core difficulty of the alignment problem is that your model might be able to cause irreversible unacceptable outcomes if your safety measures fail even once at a bad time. But the same argument goes the other way as well-when the model makes an escape attempt, it faces the risk of getting caught and substantially reducing its chances of eventually escaping. So it's a high-stakes situation for both sides: both sides desperately want to succeed on the "first critical try". When thinking about plans for safely deploying powerful models, we think safety researchers should mostly focus on the lead-up to the first time the model gets caught trying to escape. They should be ensuring the property "if it's scheming, the model won't successfully cause a catastrophe before we catch it". That is, it's acceptable if either no catastrophic outcomes occur (but the AI never gets caught), or the AI gets caught. So when they're analyzing whether their protocols ensure safety, they should count catching the model red-handed as a win condition for their protocol. We think that some researchers should probably work on techniques for improving safety given a few examples of the models trying to escape. We call this problem few-shot catastrophe prevention. In the rest of this post, we'll elaborate on the above points. By far the biggest section of this post is elaborating on strategies for few-shot catastrophe prevention, because little has been previously said about them. Thanks to Henry Sleight, Zach Stein-Perlman, Carl Shulman, Aaron Scher, Owen Cotton-Barratt, Joe Carlsmith, Aryan Bhatt, Joshua Clymer, and Fabien Roger for helpful comments. What's "catching"? By "the model gets caught", we mean that we see the AI taking actions that are strong direct evidence for the AI being a schemer[1]. A representative example is catchi...

The Nonlinear Library
EA - I'm interviewing Carl Shulman - what should I ask him? by Robert Wiblin

The Nonlinear Library

Play Episode Listen Later Dec 8, 2023 1:08


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I'm interviewing Carl Shulman - what should I ask him?, published by Robert Wiblin on December 8, 2023 on The Effective Altruism Forum. Next week for the 80,000 Hours Podcast I'll be interviewing Carl Shulman, advisor to Open Philanthropy, and generally super informed person about history, technology, possible futures, and a shocking number of other topics. He has previously appeared on our show and the Dwarkesh Podcast: [Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment])(https://www.dwarkeshpatel.com/p/carl-shulman) Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future Carl Shulman on the common-sense case for existential risk work and its practical implications He has also written a number of pieces on this forum. What should I ask him? Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Valmy
Paul Christiano - Preventing an AI Takeover

The Valmy

Play Episode Listen Later Nov 1, 2023 187:01


Podcast: Dwarkesh Podcast Episode: Paul Christiano - Preventing an AI TakeoverRelease date: 2023-10-31Paul Christiano is the world's leading AI safety researcher. My full episode with him is out!We discuss:- Does he regret inventing RLHF, and is alignment necessarily dual-use?- Why he has relatively modest timelines (40% by 2040, 15% by 2030),- What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?- Why he's leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,- His current research into a new proof system, and how this could solve alignment by explaining model's behavior- and much more.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Open PhilanthropyOpen Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations.For more information and to apply, please see the application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/The deadline to apply is November 9th; make sure to check out those roles before they close.Timestamps(00:00:00) - What do we want post-AGI world to look like?(00:24:25) - Timelines(00:45:28) - Evolution vs gradient descent(00:54:53) - Misalignment and takeover(01:17:23) - Is alignment dual-use?(01:31:38) - Responsible scaling policies(01:58:25) - Paul's alignment research(02:35:01) - Will this revolutionize theoretical CS and math?(02:46:11) - How Paul invented RLHF(02:55:10) - Disagreements with Carl Shulman(03:01:53) - Long TSMC but not NVIDIA Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

The Valmy
Paul Christiano - Preventing an AI Takeover

The Valmy

Play Episode Listen Later Nov 1, 2023 187:01


Podcast: Dwarkesh Podcast Episode: Paul Christiano - Preventing an AI TakeoverRelease date: 2023-10-31Paul Christiano is the world's leading AI safety researcher. My full episode with him is out!We discuss:- Does he regret inventing RLHF, and is alignment necessarily dual-use?- Why he has relatively modest timelines (40% by 2040, 15% by 2030),- What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?- Why he's leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,- His current research into a new proof system, and how this could solve alignment by explaining model's behavior- and much more.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Open PhilanthropyOpen Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations.For more information and to apply, please see the application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/The deadline to apply is November 9th; make sure to check out those roles before they close.Timestamps(00:00:00) - What do we want post-AGI world to look like?(00:24:25) - Timelines(00:45:28) - Evolution vs gradient descent(00:54:53) - Misalignment and takeover(01:17:23) - Is alignment dual-use?(01:31:38) - Responsible scaling policies(01:58:25) - Paul's alignment research(02:35:01) - Will this revolutionize theoretical CS and math?(02:46:11) - How Paul invented RLHF(02:55:10) - Disagreements with Carl Shulman(03:01:53) - Long TSMC but not NVIDIA This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dwarkeshpatel.com

The Lunar Society
Paul Christiano - Preventing an AI Takeover

The Lunar Society

Play Episode Listen Later Oct 31, 2023 187:01


Paul Christiano is the world's leading AI safety researcher. My full episode with him is out!We discuss:- Does he regret inventing RLHF, and is alignment necessarily dual-use?- Why he has relatively modest timelines (40% by 2040, 15% by 2030),- What do we want post-AGI world to look like (do we want to keep gods enslaved forever)?- Why he's leading the push to get to labs develop responsible scaling policies, and what it would take to prevent an AI coup or bioweapon,- His current research into a new proof system, and how this could solve alignment by explaining model's behavior- and much more.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Open PhilanthropyOpen Philanthropy is currently hiring for twenty-two different roles to reduce catastrophic risks from fast-moving advances in AI and biotechnology, including grantmaking, research, and operations.For more information and to apply, please see the application: https://www.openphilanthropy.org/research/new-roles-on-our-gcr-team/The deadline to apply is November 9th; make sure to check out those roles before they close.Timestamps(00:00:00) - What do we want post-AGI world to look like?(00:24:25) - Timelines(00:45:28) - Evolution vs gradient descent(00:54:53) - Misalignment and takeover(01:17:23) - Is alignment dual-use?(01:31:38) - Responsible scaling policies(01:58:25) - Paul's alignment research(02:35:01) - Will this revolutionize theoretical CS and math?(02:46:11) - How Paul invented RLHF(02:55:10) - Disagreements with Carl Shulman(03:01:53) - Long TSMC but not NVIDIA This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dwarkeshpatel.com

Stranded Technologies Podcast
Ep. 61: Dwarkesh Patel & Niklas Debate Existential Risk of AI, Technical Possibilities & Limitations and the Influence of Political Authority

Stranded Technologies Podcast

Play Episode Listen Later Aug 15, 2023 60:57


Dwarkesh Patel is a podcaster.The Dwarkesh Podcast (YouTube | Spotify | Apple) interviews intellectuals, scientists, historians, economists, and founders about their big idea. The themes of Dwarkesh are similar to this podcast: frontier technologies, how to accelerate progress and the implications for humanity.Recently, Dwarkesh has had prominent guests talking AI such as Marc Andreessen, Eliezer Yudkowsky, Carl Shulman, Ilya Sutskever and Dario Amodei.In this episode, Niklas and Dwarkesh debate whether AI poses a high or low risk to human existence (neither of us denies that the risk is 0).Niklas stands more on the low to negligible-risk side and Dwarkesh sees a greater risk. This ends up in a friendly, rational and learning debate.Dwarkesh is a rising star in the podcasting world, and his channel is highly recommended content for listeners who enjoy this podcast. Get full access to Stranded Technologies at niklasanzinger.substack.com/subscribe

The Nonlinear Library
AF - The "no sandbagging on checkable tasks" hypothesis by Joe Carlsmith

The Nonlinear Library

Play Episode Listen Later Jul 31, 2023 19:59


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The "no sandbagging on checkable tasks" hypothesis, published by Joe Carlsmith on July 31, 2023 on The AI Alignment Forum. (This post is inspired by Carl Shulman's recent podcast with Dwarkesh Patel, which I highly recommend. See also discussion from Buck Shlegeris and Ryan Greenblatt here, and Evan Hubinger here.) Introduction Consider: The "no sandbagging on checkable tasks" hypothesis: With rare exceptions, if a not-wildly-superhuman ML model is capable of doing some task X, and you can check whether it has done X, then you can get it to do X using already-available training techniques (e.g., fine-tuning it using gradient descent). Borrowing from Shulman, here's an example of the sort of thing I mean. Suppose that you have a computer that you don't know how to hack, and that only someone who had hacked it could make a blue banana show up on the screen. You're wondering whether a given model can hack this computer. And suppose that in fact, it can, but that doing so would be its least favorite thing in the world. Can you train this model to make a blue banana show up on the screen? The "no sandbagging on checkable tasks" hypothesis answers: probably. I think it's an important question whether this hypothesis, or something in the vicinity, is true. In particular, if it's true, I think we're in a substantively better position re: existential risk from misaligned AI, because we'll be able to know better what our AI systems can do, and we'll be able to use them to do lots of helpful-for-safety stuff (for example: finding and patching cybersecurity vulnerabilities, reporting checkable evidence for misalignment, identifying problems with our oversight processes, helping us develop interpretability tools, and so on). I'm currently pretty unsure whether the "no sandbagging on checkable tasks" hypothesis is true. My main view is that it's worth investigating further. My hope with this blog post is to help bring the hypothesis into focus as a subject of debate/research, and to stimulate further thinking about what sorts of methods for lowering AI risk might be available if it's true, even in worlds where many models might otherwise want to deceive us about their abilities. Thanks to Beth Barnes, Paul Christiano, Lukas Finnveden, Evan Hubinger, Buck Shlegeris, and Carl Shulman for discussion. My thinking and writing on this topic occurred in the context of my work at Open Philanthropy, but I'm speaking only for myself and not for my employer. Clarifying the hypothesis In popular usage, "sandbagging" means something like "intentionally performing at a lower level than you're capable of." Or at least, that's the sort of usage I'm interested in. Still, the word is an imperfect fit. In particular, the "sandbagging" being disallowed here needn't be intentional. A model, for example, might not know that it's capable of performing the checkable task in question. That said, the intentional version is often the version at stake in stories about AI risk. That is, one way for a misaligned, power-seeking AI system to gain a strategic advantage over humans is to intentionally conceal its full range of abilities, and/or to sabotage/redirect the labor we ask it to perform while we still have control over it (for example: by inserting vulnerabilities into code it writes; generating alignment ideas that won't actually work but which will advantage its own long-term aims; and so on). Can you always use standard forms of ML training to prevent this behavior? Well, if you can't check how well a model is performing at a task, then you don't have a good training signal. Thus, for example, suppose you have a misaligned model that has the ability to generate tons of great ideas that would help with alignment, but it doesn't want to. And suppose that unfortunately, you can't check which alignment ide...

The Nonlinear Library
LW - What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023? by marc/er

The Nonlinear Library

Play Episode Listen Later Jul 8, 2023 2:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?, published by marc/er on July 8, 2023 on LessWrong. Zvi Recently asked on Twitter: If someone was founding a new AI notkilleveryoneism research organization, what is the best research agenda they should look into pursuing right now? To which Eliezer replied: Human intelligence augmentation. And then elaborated: No time for GM kids to grow up, so: collect a database of genius genomes and try to interpret for variations that could have effect on adult state rather than child development try to disable a human brain's built-in disablers like rationalization circuitry, though unfortunately a lot of this seems liable to have mixed functional and dysfunctional use, but maybe you can snip an upstream trigger circuit upload and mod the upload neuralink shit but aim for 64-node clustered humans This post contains the most in-depth analysis of HIA I have seen recently, and provides the following taxonomy for applications of neurotechnology to alignment: BCIs to extract human knowledge neurotech to enhance humans understanding human value formation cyborgism whole brain emulation BCIs creating a reward signal. It also includes the opinions of attendees (stated to be 16 technical researchers and domain specialists) who provide the following analysis of these options: Outside of cyborgism, I have seen very little recent discussion regarding human intelligence augmentation (HIA) with the exclusion of the above post. This could be because I am simply looking in the wrong places, or it could be because the topic is not discussed much in the context of being a legitimate AI safety agenda. The following is a list of questions I have about the topic: Does anyone have a comprehensive list of organizations working on HIA or related technologies? Perhaps producing something like this map for HIA might be valuable. Does independent HIA research exist outside of cyborgism? My intuition is that HIA research probably has a much higher barrier to entry than say mechanistic interpretability (both in cost and background education). Does this make it unfit for independent research? (If you think HIA is a good agenda: ) What are some concrete steps that we (members of the EA and LW community) can take to push forward HIA for the sake of AI safety? EDIT: "We have to Upgrade" is another recent piece on HIA which has some useful discussion in the comments and in which some people give their individual thoughts, see: Carl Shulman's response and Nathan Helm-Burger's response. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023? by marc/er

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 8, 2023 2:37


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?, published by marc/er on July 8, 2023 on LessWrong. Zvi Recently asked on Twitter: If someone was founding a new AI notkilleveryoneism research organization, what is the best research agenda they should look into pursuing right now? To which Eliezer replied: Human intelligence augmentation. And then elaborated: No time for GM kids to grow up, so: collect a database of genius genomes and try to interpret for variations that could have effect on adult state rather than child development try to disable a human brain's built-in disablers like rationalization circuitry, though unfortunately a lot of this seems liable to have mixed functional and dysfunctional use, but maybe you can snip an upstream trigger circuit upload and mod the upload neuralink shit but aim for 64-node clustered humans This post contains the most in-depth analysis of HIA I have seen recently, and provides the following taxonomy for applications of neurotechnology to alignment: BCIs to extract human knowledge neurotech to enhance humans understanding human value formation cyborgism whole brain emulation BCIs creating a reward signal. It also includes the opinions of attendees (stated to be 16 technical researchers and domain specialists) who provide the following analysis of these options: Outside of cyborgism, I have seen very little recent discussion regarding human intelligence augmentation (HIA) with the exclusion of the above post. This could be because I am simply looking in the wrong places, or it could be because the topic is not discussed much in the context of being a legitimate AI safety agenda. The following is a list of questions I have about the topic: Does anyone have a comprehensive list of organizations working on HIA or related technologies? Perhaps producing something like this map for HIA might be valuable. Does independent HIA research exist outside of cyborgism? My intuition is that HIA research probably has a much higher barrier to entry than say mechanistic interpretability (both in cost and background education). Does this make it unfit for independent research? (If you think HIA is a good agenda: ) What are some concrete steps that we (members of the EA and LW community) can take to push forward HIA for the sake of AI safety? EDIT: "We have to Upgrade" is another recent piece on HIA which has some useful discussion in the comments and in which some people give their individual thoughts, see: Carl Shulman's response and Nathan Helm-Burger's response. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Carl Shulman on The Lunar Society (7 hour, two-part podcast) by ESRogs

The Nonlinear Library

Play Episode Listen Later Jun 28, 2023 2:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Carl Shulman on The Lunar Society (7 hour, two-part podcast), published by ESRogs on June 28, 2023 on LessWrong. Dwarkesh's summary for part 1: In terms of the depth and range of topics, this episode is the best I've done. No part of my worldview is the same after talking with Carl Shulman. He's the most interesting intellectual you've never heard of. We ended up talking for 8 hours, so I'm splitting this episode into 2 parts. This part is about Carl's model of an intelligence explosion, which integrates everything from: how fast algorithmic progress & hardware improvements in AI are happening, what primate evolution suggests about the scaling hypothesis, how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers, how quickly robots produced from existing factories could take over the economy. We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer. The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff. Maybe 3 people in the world have thought as rigorously as Carl about so many interesting topics. So this was a huge pleasure. Dwarkesh's summary for part 2: The second half of my 7 hour conversation with Carl Shulman is out! My favorite part! And the one that had the biggest impact on my worldview. Here, Carl lays out how an AI takeover might happen: AI can threaten mutually assured destruction from bioweapons, use cyber attacks to take over physical infrastructure, build mechanical armies, spread seed AIs we can never exterminate, offer tech and other advantages to collaborating countries, etc Plus we talk about a whole bunch of weird and interesting topics which Carl has thought about: what is the far future best case scenario for humanity what it would look like to have AI make thousands of years of intellectual progress in a month how do we detect deception in superhuman models does space warfare favor defense or offense is a Malthusian state inevitable in the long run why markets haven't priced in explosive economic growth & much more Carl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world. (Also discussed on the EA Forum here.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: LessWrong
LW - Carl Shulman on The Lunar Society (7 hour, two-part podcast) by ESRogs

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 28, 2023 2:32


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Carl Shulman on The Lunar Society (7 hour, two-part podcast), published by ESRogs on June 28, 2023 on LessWrong. Dwarkesh's summary for part 1: In terms of the depth and range of topics, this episode is the best I've done. No part of my worldview is the same after talking with Carl Shulman. He's the most interesting intellectual you've never heard of. We ended up talking for 8 hours, so I'm splitting this episode into 2 parts. This part is about Carl's model of an intelligence explosion, which integrates everything from: how fast algorithmic progress & hardware improvements in AI are happening, what primate evolution suggests about the scaling hypothesis, how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers, how quickly robots produced from existing factories could take over the economy. We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer. The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff. Maybe 3 people in the world have thought as rigorously as Carl about so many interesting topics. So this was a huge pleasure. Dwarkesh's summary for part 2: The second half of my 7 hour conversation with Carl Shulman is out! My favorite part! And the one that had the biggest impact on my worldview. Here, Carl lays out how an AI takeover might happen: AI can threaten mutually assured destruction from bioweapons, use cyber attacks to take over physical infrastructure, build mechanical armies, spread seed AIs we can never exterminate, offer tech and other advantages to collaborating countries, etc Plus we talk about a whole bunch of weird and interesting topics which Carl has thought about: what is the far future best case scenario for humanity what it would look like to have AI make thousands of years of intellectual progress in a month how do we detect deception in superhuman models does space warfare favor defense or offense is a Malthusian state inevitable in the long run why markets haven't priced in explosive economic growth & much more Carl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world. (Also discussed on the EA Forum here.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Valmy
Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

The Valmy

Play Episode Listen Later Jun 27, 2023 187:07


Podcast: Dwarkesh Podcast Episode: Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far FutureRelease date: 2023-06-26The second half of my 7 hour conversation with Carl Shulman is out!My favorite part! And the one that had the biggest impact on my worldview.Here, Carl lays out how an AI takeover might happen:* AI can threaten mutually assured destruction from bioweapons,* use cyber attacks to take over physical infrastructure,* build mechanical armies,* spread seed AIs we can never exterminate,* offer tech and other advantages to collaborating countries, etcPlus we talk about a whole bunch of weird and interesting topics which Carl has thought about:* what is the far future best case scenario for humanity* what it would look like to have AI make thousands of years of intellectual progress in a month* how do we detect deception in superhuman models* does space warfare favor defense or offense* is a Malthusian state inevitable in the long run* why markets haven't priced in explosive economic growth* & much moreCarl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Catch part 1 hereTimestamps(0:00:00 - Intro (0:00:47 - AI takeover via cyber or bio (0:32:27 - Can we coordinate against AI? (0:53:49 - Human vs AI colonizers (1:04:55 - Probability of AI takeover (1:21:56 - Can we detect deception? (1:47:25 - Using AI to solve coordination problems (1:56:01 - Partial alignment (2:11:41 - AI far future (2:23:04 - Markets & other evidence (2:33:26 - Day in the life of Carl Shulman (2:47:05 - Space warfare, Malthusian long run, & other rapid fire Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

The Valmy
Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

The Valmy

Play Episode Listen Later Jun 27, 2023 187:07


Podcast: Dwarkesh Podcast Episode: Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far FutureRelease date: 2023-06-26The second half of my 7 hour conversation with Carl Shulman is out!My favorite part! And the one that had the biggest impact on my worldview.Here, Carl lays out how an AI takeover might happen:* AI can threaten mutually assured destruction from bioweapons,* use cyber attacks to take over physical infrastructure,* build mechanical armies,* spread seed AIs we can never exterminate,* offer tech and other advantages to collaborating countries, etcPlus we talk about a whole bunch of weird and interesting topics which Carl has thought about:* what is the far future best case scenario for humanity* what it would look like to have AI make thousands of years of intellectual progress in a month* how do we detect deception in superhuman models* does space warfare favor defense or offense* is a Malthusian state inevitable in the long run* why markets haven't priced in explosive economic growth* & much moreCarl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Catch part 1 hereTimestamps(0:00:00 - Intro (0:00:47 - AI takeover via cyber or bio (0:32:27 - Can we coordinate against AI? (0:53:49 - Human vs AI colonizers (1:04:55 - Probability of AI takeover (1:21:56 - Can we detect deception? (1:47:25 - Using AI to solve coordination problems (1:56:01 - Partial alignment (2:11:41 - AI far future (2:23:04 - Markets & other evidence (2:33:26 - Day in the life of Carl Shulman (2:47:05 - Space warfare, Malthusian long run, & other rapid fire This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dwarkeshpatel.com

The Lunar Society
Carl Shulman (Pt 2) - AI Takeover, Bio & Cyber Attacks, Detecting Deception, & Humanity's Far Future

The Lunar Society

Play Episode Listen Later Jun 26, 2023 189:10


The second half of my 7 hour conversation with Carl Shulman is out!My favorite part! And the one that had the biggest impact on my worldview.Here, Carl lays out how an AI takeover might happen:* AI can threaten mutually assured destruction from bioweapons,* use cyber attacks to take over physical infrastructure,* build mechanical armies,* spread seed AIs we can never exterminate,* offer tech and other advantages to collaborating countries, etcPlus we talk about a whole bunch of weird and interesting topics which Carl has thought about:* what is the far future best case scenario for humanity* what it would look like to have AI make thousands of years of intellectual progress in a month* how do we detect deception in superhuman models* does space warfare favor defense or offense* is a Malthusian state inevitable in the long run* why markets haven't priced in explosive economic growth* & much moreCarl also explains how he developed such a rigorous, thoughtful, and interdisciplinary model of the biggest problems in the world.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Catch part 1 here80,000 hoursThis episode is sponsored by 80,000 hours. To get their free career guide (and to help out this podcast), please visit 80000hours.org/lunar.80,000 hours is without any close second the best resource to learn about the world's most pressing problems and how you can solve them.If this conversation has got you concerned, and you want to get involved, then check out the excellent 80,000 hours guide on how to help with AI risk.To advertise on The Lunar Society, contact me at dwarkesh.sanjay.patel@gmail.com.Timestamps(00:02:50) - AI takeover via cyber or bio(00:34:30) - Can we coordinate against AI?(00:55:52) - Human vs AI colonizers(01:06:58) - Probability of AI takeover(01:23:59) - Can we detect deception?(01:49:28) - Using AI to solve coordination problems(01:58:04) - Partial alignment(02:13:44) - AI far future(02:25:07) - Markets & other evidence(02:35:29) - Day in the life of Carl Shulman(02:49:08) - Space warfare, Malthusian long run, & other rapid fireTranscript Get full access to The Lunar Society at www.dwarkeshpatel.com/subscribe

The Lunar Society
Carl Shulman - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

The Lunar Society

Play Episode Listen Later Jun 14, 2023 164:16


In terms of the depth and range of topics, this episode is the best I've done.No part of my worldview is the same after talking with Carl Shulman. He's the most interesting intellectual you've never heard of.We ended up talking for 8 hours, so I'm splitting this episode into 2 parts.This part is about Carl's model of an intelligence explosion, which integrates everything from:* how fast algorithmic progress & hardware improvements in AI are happening,* what primate evolution suggests about the scaling hypothesis,* how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers,* how quickly robots produced from existing factories could take over the economy.We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer.The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff.Maybe 3 people in the world have thought as rigorously as Carl about so many interesting topics. This was a huge pleasure.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Timestamps(00:00:00) - Intro(00:01:32) - Intelligence Explosion(00:18:03) - Can AIs do AI research?(00:39:00) - Primate evolution(01:03:30) - Forecasting AI progress(01:34:20) - After human-level AGI(02:08:39) - AI takeover scenarios Get full access to The Lunar Society at www.dwarkeshpatel.com/subscribe

The Nonlinear Library
EA - Linkpost: Dwarkesh Patel interviewing Carl Shulman by Stefan Schubert

The Nonlinear Library

Play Episode Listen Later Jun 14, 2023 1:09


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linkpost: Dwarkesh Patel interviewing Carl Shulman, published by Stefan Schubert on June 14, 2023 on The Effective Altruism Forum. We ended up talking for 8 hours, so I'm splitting this episode into 2 parts. This part is about Carl's model of an intelligence explosion, which integrates everything from: how fast algorithmic progress & hardware improvements in AI are happening, what primate evolution suggests about the scaling hypothesis, how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers, how quickly robots produced from existing factories could take over the economy. We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer. The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Valmy
Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

The Valmy

Play Episode Listen Later Jun 14, 2023 164:16


Podcast: Dwarkesh Podcast Episode: Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & AlignmentRelease date: 2023-06-14In terms of the depth and range of topics, this episode is the best I've done.No part of my worldview is the same after talking with Carl Shulman. He's the most interesting intellectual you've never heard of.We ended up talking for 8 hours, so I'm splitting this episode into 2 parts.This part is about Carl's model of an intelligence explosion, which integrates everything from:* how fast algorithmic progress & hardware improvements in AI are happening,* what primate evolution suggests about the scaling hypothesis,* how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers,* how quickly robots produced from existing factories could take over the economy.We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer.The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff.Maybe 3 people in the world have thought as rigorously as Carl about so many interesting topics. This was a huge pleasure.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Timestamps(00:00:00) - Intro(00:01:32) - Intelligence Explosion(00:18:03) - Can AIs do AI research?(00:39:00) - Primate evolution(01:03:30) - Forecasting AI progress(01:34:20) - After human-level AGI(02:08:39) - AI takeover scenarios Get full access to Dwarkesh Podcast at www.dwarkeshpatel.com/subscribe

The Valmy
Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & Alignment

The Valmy

Play Episode Listen Later Jun 14, 2023 164:16


Podcast: Dwarkesh Podcast Episode: Carl Shulman (Pt 1) - Intelligence Explosion, Primate Evolution, Robot Doublings, & AlignmentRelease date: 2023-06-14In terms of the depth and range of topics, this episode is the best I've done.No part of my worldview is the same after talking with Carl Shulman. He's the most interesting intellectual you've never heard of.We ended up talking for 8 hours, so I'm splitting this episode into 2 parts.This part is about Carl's model of an intelligence explosion, which integrates everything from:* how fast algorithmic progress & hardware improvements in AI are happening,* what primate evolution suggests about the scaling hypothesis,* how soon before AIs could do large parts of AI research themselves, and whether there would be faster and faster doublings of AI researchers,* how quickly robots produced from existing factories could take over the economy.We also discuss the odds of a takeover based on whether the AI is aligned before the intelligence explosion happens, and Carl explains why he's more optimistic than Eliezer.The next part, which I'll release next week, is about all the specific mechanisms of an AI takeover, plus a whole bunch of other galaxy brain stuff.Maybe 3 people in the world have thought as rigorously as Carl about so many interesting topics. This was a huge pleasure.Watch on YouTube. Listen on Apple Podcasts, Spotify, or any other podcast platform. Read the full transcript here. Follow me on Twitter for updates on future episodes.Timestamps(00:00:00) - Intro(00:01:32) - Intelligence Explosion(00:18:03) - Can AIs do AI research?(00:39:00) - Primate evolution(01:03:30) - Forecasting AI progress(01:34:20) - After human-level AGI(02:08:39) - AI takeover scenarios This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.dwarkeshpatel.com

The Nonlinear Library
LW - A Playbook for AI Risk Reduction (focused on misaligned AI) by HoldenKarnofsky

The Nonlinear Library

Play Episode Listen Later Jun 6, 2023 22:19


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Playbook for AI Risk Reduction (focused on misaligned AI), published by HoldenKarnofsky on June 6, 2023 on LessWrong. I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?” This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1 A plan connotes something like: “By default, we ~definitely fail. To succeed, we need to hit multiple non-default goals.” If you want to start a company, you need a plan: doing nothing will definitely not result in starting a company, and there are multiple identifiable things you need to do to pull it off. I don't think that's the situation with AI risk. As I argued before, I think we have a nontrivial chance of avoiding AI takeover even in a “minimal-dignity” future - say, assuming essentially no growth from here in the size or influence of the communities and research fields focused specifically on existential risk from misaligned AI, and no highly surprising research or other insights from these communities/fields either. (This statement is not meant to make anyone relax! A nontrivial chance of survival is obviously not good enough.) I think there are a number of things we can do that further improve the odds. My favorite interventions are such that some success with them helps a little, and a lot of success helps a lot, and they can help even if other interventions are badly neglected. I'll list and discuss these interventions below. So instead of a “plan” I tend to think about a “playbook”: a set of plays, each of which might be useful. We can try a bunch of them and do more of what's working. I have takes on which interventions most need more attention on the margin, but think that for most people, personal fit is a reasonable way to prioritize between the interventions I'm listing. Below I'll briefly recap my overall picture of what success might look like (with links to other things I've written on this), then discuss four key categories of interventions: alignment research, standards and monitoring, successful-but-careful AI projects, and information security. For each, I'll lay out: How a small improvement from the status quo could nontrivially improve our odds. How a big enough success at the intervention could put us in a very good position, even if the other three interventions are going poorly. Common concerns/reservations about the intervention. Overall, I feel like there is a pretty solid playbook of helpful interventions - any and all of which can improve our odds of success - and that working on those interventions is about as much of a “plan” as we need for the time being. The content in this post isn't novel, but I don't think it's already-consensus: two of the four interventions (standards and monitoring; information security) seem to get little emphasis from existential risk reduction communities today, and one (successful-but-careful AI projects) is highly controversial and seems often (by this audience) assumed to be net negative. Many people think most of the above interventions are doomed, irrelevant or sure to be net harmful, and/or that our baseline odds of avoiding a catastrophe are so low that we need something more like a “plan” to have any hope. I have some sense of the arguments for why this is, but in most cases not a great sense (at least, I can't see where many folks' level of confidence is coming from). A lot of my goal in posting this piece is to give such people a chance to see where I'm making steps that they disagree with, and to push back pointedly on my views, which could change my picture and my decisions. As with many of my posts, I don't claim personal credit for any new ground here. I'm leaning heavily on conversations with others, especially Paul Christiano and Carl Shulman. My basic picture o...

The Nonlinear Library
AF - A Playbook for AI Risk Reduction (focused on misaligned AI) by HoldenKarnofsky

The Nonlinear Library

Play Episode Listen Later Jun 6, 2023 22:19


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Playbook for AI Risk Reduction (focused on misaligned AI), published by HoldenKarnofsky on June 6, 2023 on The AI Alignment Forum. I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?” This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1 A plan connotes something like: “By default, we ~definitely fail. To succeed, we need to hit multiple non-default goals.” If you want to start a company, you need a plan: doing nothing will definitely not result in starting a company, and there are multiple identifiable things you need to do to pull it off. I don't think that's the situation with AI risk. As I argued before, I think we have a nontrivial chance of avoiding AI takeover even in a “minimal-dignity” future - say, assuming essentially no growth from here in the size or influence of the communities and research fields focused specifically on existential risk from misaligned AI, and no highly surprising research or other insights from these communities/fields either. (This statement is not meant to make anyone relax! A nontrivial chance of survival is obviously not good enough.) I think there are a number of things we can do that further improve the odds. My favorite interventions are such that some success with them helps a little, and a lot of success helps a lot, and they can help even if other interventions are badly neglected. I'll list and discuss these interventions below. So instead of a “plan” I tend to think about a “playbook”: a set of plays, each of which might be useful. We can try a bunch of them and do more of what's working. I have takes on which interventions most need more attention on the margin, but think that for most people, personal fit is a reasonable way to prioritize between the interventions I'm listing. Below I'll briefly recap my overall picture of what success might look like (with links to other things I've written on this), then discuss four key categories of interventions: alignment research, standards and monitoring, successful-but-careful AI projects, and information security. For each, I'll lay out: How a small improvement from the status quo could nontrivially improve our odds. How a big enough success at the intervention could put us in a very good position, even if the other three interventions are going poorly. Common concerns/reservations about the intervention. Overall, I feel like there is a pretty solid playbook of helpful interventions - any and all of which can improve our odds of success - and that working on those interventions is about as much of a “plan” as we need for the time being. The content in this post isn't novel, but I don't think it's already-consensus: two of the four interventions (standards and monitoring; information security) seem to get little emphasis from existential risk reduction communities today, and one (successful-but-careful AI projects) is highly controversial and seems often (by this audience) assumed to be net negative. Many people think most of the above interventions are doomed, irrelevant or sure to be net harmful, and/or that our baseline odds of avoiding a catastrophe are so low that we need something more like a “plan” to have any hope. I have some sense of the arguments for why this is, but in most cases not a great sense (at least, I can't see where many folks' level of confidence is coming from). A lot of my goal in posting this piece is to give such people a chance to see where I'm making steps that they disagree with, and to push back pointedly on my views, which could change my picture and my decisions. As with many of my posts, I don't claim personal credit for any new ground here. I'm leaning heavily on conversations with others, especially Paul Christiano and Carl Shulman. My ba...

The Nonlinear Library: LessWrong
LW - A Playbook for AI Risk Reduction (focused on misaligned AI) by HoldenKarnofsky

The Nonlinear Library: LessWrong

Play Episode Listen Later Jun 6, 2023 22:19


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Playbook for AI Risk Reduction (focused on misaligned AI), published by HoldenKarnofsky on June 6, 2023 on LessWrong. I sometimes hear people asking: “What is the plan for avoiding a catastrophe from misaligned AI?” This post gives my working answer to that question - sort of. Rather than a plan, I tend to think of a playbook.1 A plan connotes something like: “By default, we ~definitely fail. To succeed, we need to hit multiple non-default goals.” If you want to start a company, you need a plan: doing nothing will definitely not result in starting a company, and there are multiple identifiable things you need to do to pull it off. I don't think that's the situation with AI risk. As I argued before, I think we have a nontrivial chance of avoiding AI takeover even in a “minimal-dignity” future - say, assuming essentially no growth from here in the size or influence of the communities and research fields focused specifically on existential risk from misaligned AI, and no highly surprising research or other insights from these communities/fields either. (This statement is not meant to make anyone relax! A nontrivial chance of survival is obviously not good enough.) I think there are a number of things we can do that further improve the odds. My favorite interventions are such that some success with them helps a little, and a lot of success helps a lot, and they can help even if other interventions are badly neglected. I'll list and discuss these interventions below. So instead of a “plan” I tend to think about a “playbook”: a set of plays, each of which might be useful. We can try a bunch of them and do more of what's working. I have takes on which interventions most need more attention on the margin, but think that for most people, personal fit is a reasonable way to prioritize between the interventions I'm listing. Below I'll briefly recap my overall picture of what success might look like (with links to other things I've written on this), then discuss four key categories of interventions: alignment research, standards and monitoring, successful-but-careful AI projects, and information security. For each, I'll lay out: How a small improvement from the status quo could nontrivially improve our odds. How a big enough success at the intervention could put us in a very good position, even if the other three interventions are going poorly. Common concerns/reservations about the intervention. Overall, I feel like there is a pretty solid playbook of helpful interventions - any and all of which can improve our odds of success - and that working on those interventions is about as much of a “plan” as we need for the time being. The content in this post isn't novel, but I don't think it's already-consensus: two of the four interventions (standards and monitoring; information security) seem to get little emphasis from existential risk reduction communities today, and one (successful-but-careful AI projects) is highly controversial and seems often (by this audience) assumed to be net negative. Many people think most of the above interventions are doomed, irrelevant or sure to be net harmful, and/or that our baseline odds of avoiding a catastrophe are so low that we need something more like a “plan” to have any hope. I have some sense of the arguments for why this is, but in most cases not a great sense (at least, I can't see where many folks' level of confidence is coming from). A lot of my goal in posting this piece is to give such people a chance to see where I'm making steps that they disagree with, and to push back pointedly on my views, which could change my picture and my decisions. As with many of my posts, I don't claim personal credit for any new ground here. I'm leaning heavily on conversations with others, especially Paul Christiano and Carl Shulman. My basic picture o...

The Nonlinear Library
EA - Story of a career/mental health failure by zekesherman

The Nonlinear Library

Play Episode Listen Later Apr 27, 2023 14:54


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Story of a career/mental health failure, published by zekesherman on April 27, 2023 on The Effective Altruism Forum. I don't know if it should be considered important as it's only a single data point, but I want to share the story of how my EA career choice and mental health went terribly wrong. My career choice In college I was strongly motivated to follow the most utilitarian career path. In my junior year I decided to pursue investment banking for earning to give. As someone who had a merely good GPA and did not attend a top university, this would have been difficult, but I pushed hard for networking and recruiting, and one professional told me I had a 50-50 chance of becoming an investment banking analyst right out of college (privately I thought he was a bit too optimistic). Otherwise, I would have to get some lower-paying job in finance, and hopefully move into banking at a later date. However I increasingly turned against the idea of earning to give, for two major reasons. First, 80,000 Hours and other people in the community said that EA was more talent- rather than funding-constrained, and that earning to give was overrated. Second, more specifically, people in the EA community informed me that program managers in government and philanthropy controlled much higher budgets than I could reasonably expect to earn. Basically, it appeared easier to become in charge of effectively allocating $5 million of other people's money, compared to earning a $500,000 salary for oneself. Earning to give meant freer control of funding, but program management meant a greater budget. While I read 80k Hours' article on program management, I was most persuaded by advice I got from Jason Gaverick Matheny and Carl Shulman, and also a few non-EA people I met from the OSTP and other government agencies, who had more specific knowledge and advice. It seemed that program management in science and technology (especially AI, biotechnology, etc) was the best career path. And the best way to achieve it seemed to be starting with graduate education in science and technology, ideally a PhD (I decided on computer science, partly because it gave the most flexibility to work on a wide variety of cause areas). Finally, the nail in the coffin for my finance ambitions was an EA Global conference where Will MacAskill said to think less about finding a career that was individually impactful, and think more about leveraging your unique strengths to bring something new to the table for the EA community. While computer science wasn't rare in EA, I thought I could be special by leveraging my military background and pursuing a more cybersecurity- or defense-related career, which was neglected in EA. Still, I had two problems to overcome for this career path. The first problem was that I was an econ major and had a lot of catching up to do in order to pursue advanced computer science. The second problem was that it wasn't as good of a personal fit for me compared to finance. I've always found programming and advanced mathematics to be somewhat painful and difficult to learn, whereas investment banking seemed more readily engaging. And 80k Hours as well as the rest of the community gave me ample warnings about how personal fit was very, very important. I disregarded these warnings about personal fit for several reasons: I'd always been more resilient and scrupulous compared to other people and other members of the EA community. Things like living on a poverty budget and serving in the military, which many other members of the EA community have considered intolerable or unsustainable for mental health, were fine for me. As one of the more "hardcore" EAs, I generally regarded warnings of burnout as being overblown or at least less applicable to someone like me, and I suspected that a lot of people in the EA ...

Effective Altruism Forum Podcast
"How much should governments pay to prevent catastrophes? Longtermism's limited role" by Elliott Thornley and Carl Shulman

Effective Altruism Forum Podcast

Play Episode Listen Later Mar 22, 2023 76:41


Longtermists have argued that humanity should significantly increase its efforts to prevent catastrophes like nuclear wars, pandemics, and AI disasters. But one prominent longtermist argument overshoots this conclusion: the argument also implies that humanity should reduce the risk of existential catastrophe even at extreme cost to the present generation. This overshoot means that democratic governments cannot use the longtermist argument to guide their catastrophe policy. In this paper, we show that the case for preventing catastrophe does not depend on longtermism. Standard cost-benefit analysis implies that governments should spend much more on reducing catastrophic risk. We argue that a government catastrophe policy guided by cost-benefit analysis should be the goal of longtermists in the political sphere. This policy would be democratically acceptable, and it would reduce existential risk by almost as much as a strong longtermist policy.Original article:https://forum.effectivealtruism.org/posts/DiGL5FuLgWActPBsf/how-much-should-governments-pay-to-prevent-catastrophesNarrated for the Effective Altruism Forum by TYPE III AUDIO.Share feedback on this narration.

The Nonlinear Library
EA - Maybe longtermism isn't for everyone by BrownHairedEevee

The Nonlinear Library

Play Episode Listen Later Feb 11, 2023 2:13


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Maybe longtermism isn't for everyone, published by BrownHairedEevee on February 10, 2023 on The Effective Altruism Forum. I've noticed that the EA community has been aggressively promoting longtermism and longtermist causes: The huge book tour around What We Owe the Future, which promotes longtermism itself There was a recent post claiming that 80k's messaging is discouraging to non-longtermists, although the author deleted (Benjamin Hilton's response is preserved here). The post observed that 80k lists x-risk related causes as "recommended" causes while neartermist causes like global poverty and factory farming are only "sometimes recommended". Further, in 2021, 80k put together a podcast feed called Effective Altruism: An Introduction, which many commenters complained was too skewed towards longtermist causes. I used to think that longtermism is compatible with a wide range of worldviews, as these pages (1, 2) claim, so I was puzzled as to why so many people who engage with longtermism could be uncomfortable with it. Sure, it's a counterintuitive worldview, but it also flows from such basic principles. But I'm starting to question this - longtermism is very sensitive to the rate of pure time preference, and recently, some philosophers have started to argue that nonzero pure time preference can be justified (section "3. Beyond Neutrality" here). By contrast, x-risk as a cause area has support from a broader range of moral worldviews: Chapter 2 of The Precipice discusses five different moral justifications for caring about x-risks (video here). Carl Shulman makes a "common-sense case" for valuing x-risk reduction that doesn't depend on there being any value in the long-term future at all. Maybe it's better to take a two-pronged approach: Promote x-risk reduction as a cause area that most people can agree on; and Promote longtermism as a novel idea in moral philosophy that some people might want to adopt, but be open about its limitations and acknowledge that our audiences might be uncomfortable with it and have valid reasons not to accept it. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
EA - Future Matters #6: FTX collapse, value lock-in, and counterarguments to AI x-risk by Pablo

The Nonlinear Library

Play Episode Listen Later Dec 30, 2022 31:52


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Future Matters #6: FTX collapse, value lock-in, and counterarguments to AI x-risk, published by Pablo on December 30, 2022 on The Effective Altruism Forum. [T]he sun with all the planets will in time grow too cold for life, unless indeed some great body dashes into the sun and thus gives it fresh life. Believing as I do that man in the distant future will be a far more perfect creature than he now is, it is an intolerable thought that he and all other sentient beings are doomed to complete annihilation after such long-continued slow progress. Charles Darwin Future Matters is a newsletter about longtermism and existential risk by Matthew van der Merwe and Pablo Stafforini. Each month we curate and summarize relevant research and news from the community, and feature a conversation with a prominent researcher. You can also subscribe on Substack, listen on your favorite podcast platform and follow on Twitter. Future Matters is also available in Spanish. A message to our readers Welcome back to Future Matters. We took a break during the autumn, but will now be returning to our previous monthly schedule. Future Matters would like to wish all our readers a happy new year! The most significant development during our hiatus was the collapse of FTX and the fall of Sam Bankman-Fried, until then one of the largest and most prominent supporters of longtermist causes. We were shocked and saddened by these revelations, and appalled by the allegations and admissions of fraud, deceit, and misappropriation of customer funds. As others have stated, fraud in the service of effective altruism is unacceptable, and we condemn these actions unequivocally and support authorities' efforts to investigate and prosecute any crimes that may have been committed. Research A classic argument for existential risk from superintelligent AI goes something like this: (1) superintelligent AIs will be goal-directed; (2) goal-directed superintelligent AIs will likely pursue outcomes that we regard as extremely bad; therefore (3) if we build superintelligent AIs, the future will likely be extremely bad. Katja Grace's Counterarguments to the basic AI x-risk case [] identifies a number of weak points in each of the premises in the argument. We refer interested readers to our conversation with Katja below for more discussion of this post, as well as to Erik Jenner and Johannes Treutlein's Responses to Katja Grace's AI x-risk counterarguments []. The key driver of AI risk is that we are rapidly developing more and more powerful AI systems, while making relatively little progress in ensuring they are safe. Katja Grace's Let's think about slowing down AI [] argues that the AI risk community should consider advocating for slowing down AI progress. She rebuts some of the objections commonly levelled against this strategy: e.g. to the charge of infeasibility, she points out that many technologies (human gene editing, nuclear energy) have been halted or drastically curtailed due to ethical and/or safety concerns. In the comments, Carl Shulman argues that there is not currently enough buy-in from governments or the public to take more modest safety and governance interventions, so it doesn't seem wise to advocate for such a dramatic and costly policy: “It's like climate activists in 1950 responding to difficulties passing funds for renewable energy R&D or a carbon tax by proposing that the sale of automobiles be banned immediately. It took a lot of scientific data, solidification of scientific consensus, and communication/movement-building over time to get current measures on climate change.” We enjoyed Kelsey Piper's review of What We Owe the Future [], not necessarily because we agree with her criticisms, but because we thought the review managed to identify, and articulate very clearly, what we take to be the main c...

The Valmy
#112 – Carl Shulman on the common-sense case for existential risk work and its practical implications

The Valmy

Play Episode Listen Later Dec 13, 2022 228:40


Podcast: 80,000 Hours Podcast Episode: #112 – Carl Shulman on the common-sense case for existential risk work and its practical implicationsRelease date: 2021-10-05Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation. But the policy of US government agencies is already to spend up to $4 million to save the life of a citizen, making the death of all Americans a $1,300,000,000,000,000 disaster. According to Carl Shulman, research associate at Oxford University's Future of Humanity Institute, that means you don't need any fancy philosophical arguments about the value or size of the future to justify working to reduce existential risk — it passes a mundane cost-benefit analysis whether or not you place any value on the long-term future. Links to learn more, summary and full transcript. The key reason to make it a top priority is factual, not philosophical. That is, the risk of a disaster that kills billions of people alive today is alarmingly high, and it can be reduced at a reasonable cost. A back-of-the-envelope version of the argument runs: • The US government is willing to pay up to $4 million (depending on the agency) to save the life of an American. • So saving all US citizens at any given point in time would be worth $1,300 trillion. • If you believe that the risk of human extinction over the next century is something like one in six (as Toby Ord suggests is a reasonable figure in his book The Precipice), then it would be worth the US government spending up to $2.2 trillion to reduce that risk by just 1%, in terms of American lives saved alone. • Carl thinks it would cost a lot less than that to achieve a 1% risk reduction if the money were spent intelligently. So it easily passes a government cost-benefit test, with a very big benefit-to-cost ratio — likely over 1000:1 today. This argument helped NASA get funding to scan the sky for any asteroids that might be on a collision course with Earth, and it was directly promoted by famous economists like Richard Posner, Larry Summers, and Cass Sunstein. If the case is clear enough, why hasn't it already motivated a lot more spending or regulations to limit existential risks — enough to drive down what any additional efforts would achieve? Carl thinks that one key barrier is that infrequent disasters are rarely politically salient. Research indicates that extra money is spent on flood defences in the years immediately following a massive flood — but as memories fade, that spending quickly dries up. Of course the annual probability of a disaster was the same the whole time; all that changed is what voters had on their minds. Carl expects that all the reasons we didn't adequately prepare for or respond to COVID-19 — with excess mortality over 15 million and costs well over $10 trillion — bite even harder when it comes to threats we've never faced before, such as engineered pandemics, risks from advanced artificial intelligence, and so on. Today's episode is in part our way of trying to improve this situation. In today's wide-ranging conversation, Carl and Rob also cover: • A few reasons Carl isn't excited by 'strong longtermism' • How x-risk reduction compares to GiveWell recommendations • Solutions for asteroids, comets, supervolcanoes, nuclear war, pandemics, and climate change • The history of bioweapons • Whether gain-of-function research is justifiable • Successes and failures around COVID-19 • The history of existential risk • And much more Get this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Katy Moore

The Valmy
#112 – Carl Shulman on the common-sense case for existential risk work and its practical implications

The Valmy

Play Episode Listen Later Dec 13, 2022 228:39


Podcast: 80,000 Hours Podcast with Rob Wiblin (LS 52 · TOP 0.5% )Episode: #112 – Carl Shulman on the common-sense case for existential risk work and its practical implicationsRelease date: 2021-10-05Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation. But the policy of US government agencies is already to spend up to $4 million to save the life of a citizen, making the death of all Americans a $1,300,000,000,000,000 disaster. According to Carl Shulman, research associate at Oxford University's Future of Humanity Institute, that means you don't need any fancy philosophical arguments about the value or size of the future to justify working to reduce existential risk — it passes a mundane cost-benefit analysis whether or not you place any value on the long-term future. Links to learn more, summary and full transcript. The key reason to make it a top priority is factual, not philosophical. That is, the risk of a disaster that kills billions of people alive today is alarmingly high, and it can be reduced at a reasonable cost. A back-of-the-envelope version of the argument runs: • The US government is willing to pay up to $4 million (depending on the agency) to save the life of an American. • So saving all US citizens at any given point in time would be worth $1,300 trillion. • If you believe that the risk of human extinction over the next century is something like one in six (as Toby Ord suggests is a reasonable figure in his book The Precipice), then it would be worth the US government spending up to $2.2 trillion to reduce that risk by just 1%, in terms of American lives saved alone. • Carl thinks it would cost a lot less than that to achieve a 1% risk reduction if the money were spent intelligently. So it easily passes a government cost-benefit test, with a very big benefit-to-cost ratio — likely over 1000:1 today. This argument helped NASA get funding to scan the sky for any asteroids that might be on a collision course with Earth, and it was directly promoted by famous economists like Richard Posner, Larry Summers, and Cass Sunstein. If the case is clear enough, why hasn't it already motivated a lot more spending or regulations to limit existential risks — enough to drive down what any additional efforts would achieve? Carl thinks that one key barrier is that infrequent disasters are rarely politically salient. Research indicates that extra money is spent on flood defences in the years immediately following a massive flood — but as memories fade, that spending quickly dries up. Of course the annual probability of a disaster was the same the whole time; all that changed is what voters had on their minds. Carl expects that all the reasons we didn't adequately prepare for or respond to COVID-19 — with excess mortality over 15 million and costs well over $10 trillion — bite even harder when it comes to threats we've never faced before, such as engineered pandemics, risks from advanced artificial intelligence, and so on. Today's episode is in part our way of trying to improve this situation. In today's wide-ranging conversation, Carl and Rob also cover: • A few reasons Carl isn't excited by 'strong longtermism' • How x-risk reduction compares to GiveWell recommendations • Solutions for asteroids, comets, supervolcanoes, nuclear war, pandemics, and climate change • The history of bioweapons • Whether gain-of-function research is justifiable • Successes and failures around COVID-19 • The history of existential risk • And much more Get this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Katy Moore

Effective Altruism Forum Podcast
"AGI and lock-in" by Lukas Finnveden, Jess Riedel, & Carl Shulman

Effective Altruism Forum Podcast

Play Episode Listen Later Nov 28, 2022 23:40


The long-term future of intelligent life is currently unpredictable and undetermined. In the linked document, we argue that the invention of artificial general intelligence (AGI) could change this by making extreme types of lock-in technologically feasible. In particular, we argue that AGI would make it technologically feasible to (i) perfectly preserve nuanced specifications of a wide variety of values or goals far into the future, and (ii) develop AGI-based institutions that would (with high probability) competently pursue any such values for at least millions, and plausibly trillions, of years.The rest of this post contains the summary (6 pages), with links to relevant sections of the main document (40 pages) for readers who want more details.Original article:https://forum.effectivealtruism.org/posts/KqCybin8rtfP3qztq/agi-and-lock-inNarrated for the Effective Altruism Forum by TYPE III AUDIO.

original lock agi shulman riedel carl shulman finnveden
Radio Bostrom
Propositions Concerning Digital Minds and Society (2022)

Radio Bostrom

Play Episode Listen Later Sep 23, 2022 71:44


By Nick Bostrom & Carl Shulman. Draft version 1.10.AIs with moral status and political rights? We'll need a modus vivendi, and it's becoming urgent to figure out the parameters for that. This paper makes a load of specific claims that begin to stake out a position.Read the full paper:https://nickbostrom.com/propositions.pdfMore episodes at:https://radiobostrom.com/

The Nonlinear Library
EA - How might we align transformative AI if it's developed very soon? by Holden Karnofsky

The Nonlinear Library

Play Episode Listen Later Aug 30, 2022 70:59


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How might we align transformative AI if it's developed very soon?, published by Holden Karnofsky on August 29, 2022 on The Effective Altruism Forum. This post is part of my AI strategy nearcasting series: trying to answer key strategic questions about transformative AI, under the assumption that key events will happen very soon, and/or in a world that is otherwise very similar to today's. Cross-posted from Less Wrong and Alignment Forum. This post gives my understanding of what the set of available strategies for aligning transformative AI would be if it were developed very soon, and why they might or might not work. It is heavily based on conversations with Paul Christiano, Ajeya Cotra and Carl Shulman, and its background assumptions correspond to the arguments Ajeya makes in this piece (abbreviated as “Takeover Analysis”). I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya's terminology) has good reason to think that it can develop transformative AI very soon (within a year), using what Ajeya calls “human feedback on diverse tasks” (HFDT) - and has some time (more than 6 months, but less than 2 years) to set up special measures to reduce the risks of misaligned AI before there's much chance of someone else deploying transformative AI. I will discuss: Why I think there is a major risk of misaligned AI in this nearcast (this will just be a brief recap of Takeover Analysis). Magma's predicament: navigating the risk of deploying misaligned AI itself, while also contending with the risk of other, less cautious actors doing so. Magma's goals that advanced AI systems might be able to help with - for example, (a) using aligned AI systems to conduct research on how to safely develop still-more-powerful AI; (b) using aligned AI systems to help third parties (e.g., multilateral cooperation bodies and governments) detect and defend against unaligned AI systems deployed by less cautious actors. The intended properties that Magma will be seeking from its AI systems - such as honesty and corrigibility - in order to ensure they can safely help with these goals. Some key facets of AI alignment that Magma needs to attend to, along with thoughts about how it can deal with them: Accurate reinforcement: training AI systems to perform useful tasks while being honest, corrigible, etc. - and avoiding the risk (discussed in Takeover Analysis) that they are unwittingly rewarding AIs for deceiving and manipulating human judges. I'll list several techniques Magma might use for this. Out-of-distribution robustness: taking special measures (such as adversarial training) to ensure that AI systems will still have intended properties - or at least, will not fail catastrophically - even if they encounter situations very different from what they are being trained on. Preventing exploits (hacking, manipulation, etc.) Even while trying to ensure aligned AI, Magma should also - with AI systems' help if possible - be actively seeking out and trying to fix vulnerabilities in its setup that could provide opportunities for any misaligned AI to escape Magma's control. Vulnerabilities could include security holes (which AI systems could exploit via hacking), as well as opportunities for AIs to manipulate humans. Doing this could (a) reduce the damage done if some of its AI systems are misaligned; (b) avoid making the problem worse via positive reinforcement for unintended behaviors. Testing and threat assessment: Magma should be constantly working to form a picture of whether its alignment attempts are working. If there is a major threat of misalignment despite its measures (or if there would be for other labs taking fewer measures), Magma should get evidence for this and use it to make the case for slowing AI development across the board. Some key tools that could help ...

The Nonlinear Library
AF - How might we align transformative AI if it's developed very soon? by HoldenKarnofsky

The Nonlinear Library

Play Episode Listen Later Aug 29, 2022 70:56


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How might we align transformative AI if it's developed very soon?, published by HoldenKarnofsky on August 29, 2022 on The AI Alignment Forum. This post is part of my AI strategy nearcasting series: trying to answer key strategic questions about transformative AI, under the assumption that key events will happen very soon, and/or in a world that is otherwise very similar to today's. This post gives my understanding of what the set of available strategies for aligning transformative AI would be if it were developed very soon, and why they might or might not work. It is heavily based on conversations with Paul Christiano, Ajeya Cotra and Carl Shulman, and its background assumptions correspond to the arguments Ajeya makes in this piece (abbreviated as “Takeover Analysis”). I premise this piece on a nearcast in which a major AI company (“Magma,” following Ajeya's terminology) has good reason to think that it can develop transformative AI very soon (within a year), using what Ajeya calls “human feedback on diverse tasks” (HFDT) - and has some time (more than 6 months, but less than 2 years) to set up special measures to reduce the risks of misaligned AI before there's much chance of someone else deploying transformative AI. I will discuss: Why I think there is a major risk of misaligned AI in this nearcast (this will just be a brief recap of Takeover Analysis). Magma's predicament: navigating the risk of deploying misaligned AI itself, while also contending with the risk of other, less cautious actors doing so. Magma's goals that advanced AI systems might be able to help with - for example, (a) using aligned AI systems to conduct research on how to safely develop still-more-powerful AI; (b) using aligned AI systems to help third parties (e.g., multilateral cooperation bodies and governments) detect and defend against unaligned AI systems deployed by less cautious actors. The intended properties that Magma will be seeking from its AI systems - such as honesty and corrigibility - in order to ensure they can safely help with these goals. Some key facets of AI alignment that Magma needs to attend to, along with thoughts about how it can deal with them: Accurate reinforcement: training AI systems to perform useful tasks while being honest, corrigible, etc. - and avoiding the risk (discussed in Takeover Analysis) that they are unwittingly rewarding AIs for deceiving and manipulating human judges. I'll list several techniques Magma might use for this. Out-of-distribution robustness: taking special measures (such as adversarial training) to ensure that AI systems will still have intended properties - or at least, will not fail catastrophically - even if they encounter situations very different from what they are being trained on. Preventing exploits (hacking, manipulation, etc.) Even while trying to ensure aligned AI, Magma should also - with AI systems' help if possible - be actively seeking out and trying to fix vulnerabilities in its setup that could provide opportunities for any misaligned AI to escape Magma's control. Vulnerabilities could include security holes (which AI systems could exploit via hacking), as well as opportunities for AIs to manipulate humans. Doing this could (a) reduce the damage done if some of its AI systems are misaligned; (b) avoid making the problem worse via positive reinforcement for unintended behaviors. Testing and threat assessment: Magma should be constantly working to form a picture of whether its alignment attempts are working. If there is a major threat of misalignment despite its measures (or if there would be for other labs taking fewer measures), Magma should get evidence for this and use it to make the case for slowing AI development across the board. Some key tools that could help Magma with all of the above: Decoding and manipulating in...

An Introduction to Nick Bostrom
6. Propositions Concerning Digital Minds and Society (2022)

An Introduction to Nick Bostrom

Play Episode Listen Later Aug 25, 2022 71:44


By Nick Bostrom & Carl Shulman. Draft version 1.10.AIs with moral status and political rights? We'll need a modus vivendi, and it's becoming urgent to figure out the parameters for that. This paper makes a load of specific claims that begin to stake out a position.Read the full paper:https://nickbostrom.com/propositions.pdfMore episodes at:https://radiobostrom.com/

The Nonlinear Library
EA - Future Matters #4: AI timelines, AGI risk, and existential risk from climate change by Pablo

The Nonlinear Library

Play Episode Listen Later Aug 8, 2022 28:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Future Matters #4: AI timelines, AGI risk, and existential risk from climate change, published by Pablo on August 8, 2022 on The Effective Altruism Forum. But if it is held that each generation can by its own deliberate acts determine for good or evil the destinies of the race, then our duties towards others reach out through time as well as through space, and our contemporaries are only a negligible fraction of the “neighbours” to whom we owe obligations. The ethical end may still be formulated, with the Utilitarians, as the greatest happiness of the greatest number [...] This extension of the moral code, if it is not yet conspicuous in treatises on Ethics, has in late years been obtaining recognition in practice. John Bagnell Bury Future Matters is a newsletter about longtermism. Each month we collect and summarize longtermism-relevant research, share news from the longtermism community, and feature a conversation with a prominent researcher. You can also subscribe on Substack, listen on your favorite podcast platform and follow on Twitter. Research Jacob Steinhardt's AI forecasting: one year in reports and discusses the results of a forecasting contest on AI progress that the author launched a year ago. Steinhardt's main finding is that progress on all three capability benchmarks occurred much faster than the forecasters predicted. Moreover, although the forecasters performed poorly, they would—in Steinhardt's estimate—probably have outperformed the median AI researcher. That is, the forecasters in the tournament appear to have had more aggressive forecasts than the experts did, yet their forecasts turned out to be insufficiently, rather than excessively, aggressive. The contest is still ongoing; you can participate here. Tom Davidson's Social returns to productivity growth estimates the long-run welfare benefits of increasing productivity via R&D funding to determine whether it might be competitive with other global health and wellbeing interventions, such as cash transfers or malaria nets. Davidson's toy model suggests that average returns to R&D are roughly 20 times lower than Open Philanthropy's minimum bar for funding in this space. He emphasizes that only very tentative conclusions should be drawn from this work, given substantial limitations to his modelling. Miles Brundage discusses Why AGI timeline research/discourse might be overrated. He suggests that more work on the issue has diminishing returns, and is unlikely to narrow our uncertainty or persuade many more relevant actors that AGI could arrive soon. Moreover, Brundage is somewhat skeptical of the value of timelines information for decision-making by important actors. In the comments, Adam Gleave reports finding such information useful for prioritizing within technical AI safety research, and Carl Shulman points to numerous large philanthropic decisions whose cost-benefit depends heavily on AI timelines. In Two-year update on my personal AI timelines, Ajeya Cotra outlines how her forecasts for transformative AI (TAI) have changed since 2020. Her timelines have gotten considerably shorter: she now puts ~35% probability density on TAI by 2036 (vs. 15% previously) and her median TAI date is now 2040 (vs. 2050). One of the drivers of this update is a somewhat lowered threshold for TAI. While Cotra was previously imagining that a TAI model would have to be able to automate most of scientific research, she now believes that AI systems able to automate most of AI/ML research specifically would be sufficient to set off an explosive feedback loop of accelerating capabilities. Back in 2016, Katja Grace and collaborators ran a survey of machine learning researchers, the main results of which were published the following year. Grace's What do ML researchers think about AI in 2022? reports on the preliminary re...

The Nonlinear Library
AF - Two-year update on my personal AI timelines by Ajeya Cotra

The Nonlinear Library

Play Episode Listen Later Aug 2, 2022 20:24


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Two-year update on my personal AI timelines, published by Ajeya Cotra on August 2, 2022 on The AI Alignment Forum. I worked on my draft report on biological anchors for forecasting AI timelines mainly between ~May 2019 (three months after the release of GPT-2) and ~Jul 2020 (a month after the release of GPT-3), and posted it on LessWrong in Sep 2020 after an internal review process. At the time, my bottom line estimates from the bio anchors modeling exercise were:[1] Roughly ~15% probability of transformative AI by 2036[2] (16 years from posting the report; 14 years from now). A median of ~2050 for transformative AI (30 years from posting, 28 years from now). These were roughly close to my all-things-considered probabilities at the time, as other salient analytical frames on timelines didn't do much to push back on this view. (Though my subjective probabilities bounced around quite a lot around these values and if you'd asked me on different days and with different framings I'd have given meaningfully different numbers.) It's been about two years since the bulk of the work on that report was completed, during which I've mainly been thinking about AI. In that time it feels like very short timelines have become a lot more common and salient on LessWrong and in at least some parts of the ML community. My personal timelines have also gotten considerably shorter over this period. I now expect something roughly like this: ~15% probability by 2030 (a decrease of ~6 years from 2036). ~35% probability by 2036 (a ~3x likelihood ratio[3] vs 15%). This implies that each year in the 6 year period from 2030 to 2036 has an average of over 3% probability of TAI occurring in that particular year (smaller earlier and larger later). A median of ~2040 (a decrease of ~10 years from 2050). This implies that each year in the 4 year period from 2036 to 2040 has an average of almost 4% probability of TAI. ~60% probability by 2050 (a ~1.5x likelihood ratio vs 50%). As a result, my timelines have also concentrated more around a somewhat narrower band of years. Previously, my probability increased from 10% to 60%[4] over the course of the ~32 years from ~2032 and ~2064; now this happens over the ~24 years between ~2026 and ~2050. I expect these numbers to be pretty volatile too, and (as I did when writing bio anchors) I find it pretty fraught and stressful to decide on how to weigh various perspectives and considerations. I wouldn't be surprised by significant movements. In this post, I'll discuss: Some updates toward shorter timelines (I'd largely characterize these as updates made from thinking about things more and talking about them with people rather than updates from events in the world, though both play a role). Some updates toward longer timelines (which aren't large enough to overcome the updates toward shorter timelines, but claw the size of the update back a bit). Some claims associated with short timelines that I still don't buy. Some sources of bias I'm not sure what to do with. Very briefly, what this may mean for actions. This post is a catalog of fairly gradual changes to my thinking over the last two years; I'm not writing this post in response to an especially sharp change in my view -- I just thought it was a good time to take stock, particularly since a couple of people have asked me about my views recently. Updates that push toward shorter timelines I list the main updates toward shorter timelines below roughly in order of importance; there are some updates toward longer timelines as well (discussed in the next section) which claw back some of the impact of these points. Picturing a more specific and somewhat lower bar for TAI Thanks to Carl Shulman, Paul Christiano, and others for discussion around this point. When writing my report, I was imagining that a transformative m...

The Nonlinear Library
LW - Which values are stable under ontology shifts? by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 23, 2022 5:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which values are stable under ontology shifts?, published by Richard Ngo on July 23, 2022 on LessWrong. Here's a rough argument which I've been thinking about lately: We have coherence theorems which say that, if you're not acting like you're maximizing expected utility over outcomes, you'd make payments which predictably lose you money. But in general I don't see any a principled distinction between “predictably losing money” (which we see as incoherent) and “predictably spending money” (to fulfill your values): it depends on the space of outcomes over which you define utilities, which seems pretty arbitrary. You could interpret an agent being money-pumped as a type of incoherence, or as an indication that it enjoys betting and is willing to pay to do so; similarly you could interpret an agent passing up a “sure thing” bet as incoherence, or just a preference for not betting which it's willing to forgo money to satisfy. Many humans have one of these preferences! Now, these preferences are somewhat odd ones, because you can think of every action under uncertainty as a type of bet. In other words, “betting” isn't a very fundamental category in an ontology which has a sophisticated understanding of reasoning under uncertainty. Then the obvious follow-up question is: which human values will naturally fit into much more sophisticated ontologies? I worry that not many of them will: In a world where minds can be easily copied, our current concepts of personal identity and personal survival will seem very strange. You could think of those values as “predictably losing money” by forgoing the benefits of temporarily running multiple copies. (This argument was inspired by this old thought experiment from Wei Dai.) In a world where minds can be designed with arbitrary preferences, our values related to “preference satisfaction” will seem very strange, because it'd be easy to create people with meaningless preferences that are by default satisfied to an arbitrary extent. In a world where we understand minds very well, our current concepts of happiness and wellbeing may seem very strange. In particular, if happiness is understood in a more sophisticated ontology as caused by positive reward prediction error, then happiness is intrinsically in tension with having accurate beliefs. And if we understand reward prediction error in terms of updates to our policy, then deliberately invoking happiness would be in tension with acting effectively in the world. If there's simply a tradeoff between them, we might still want to sacrifice accurate beliefs and effective action for happiness. But what I'm gesturing towards is the idea that happiness might not actually be a concept which makes much sense given a complete understanding of minds - as implied by the buddhist view of happiness as an illusion, for example. In a world where people can predictably influence the values of their far future descendants, and there's predictable large-scale growth, any non-zero discounting will seem very strange, because it predictably forgoes orders of magnitude more resources in the future. This might result in the strategy described by Carl Shulman of utilitarian agents mimicking selfish agents by spreading out across the universe as fast as they can to get as many resources as they can, and only using those resources to produce welfare once the returns to further expansion are very low. It does seem possible that we design AIs which spend millions or billions of years optimizing purely for resource acquisition, and then eventually use all those resources for doing something entirely different. But it seems like those AIs would need to have minds that are constructed in a very specific and complicated way to retain terminal values which are so unrelated to most of their actions. A more general version of...

The Nonlinear Library: LessWrong
LW - Which values are stable under ontology shifts? by Richard Ngo

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 23, 2022 5:06


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Which values are stable under ontology shifts?, published by Richard Ngo on July 23, 2022 on LessWrong. Here's a rough argument which I've been thinking about lately: We have coherence theorems which say that, if you're not acting like you're maximizing expected utility over outcomes, you'd make payments which predictably lose you money. But in general I don't see any a principled distinction between “predictably losing money” (which we see as incoherent) and “predictably spending money” (to fulfill your values): it depends on the space of outcomes over which you define utilities, which seems pretty arbitrary. You could interpret an agent being money-pumped as a type of incoherence, or as an indication that it enjoys betting and is willing to pay to do so; similarly you could interpret an agent passing up a “sure thing” bet as incoherence, or just a preference for not betting which it's willing to forgo money to satisfy. Many humans have one of these preferences! Now, these preferences are somewhat odd ones, because you can think of every action under uncertainty as a type of bet. In other words, “betting” isn't a very fundamental category in an ontology which has a sophisticated understanding of reasoning under uncertainty. Then the obvious follow-up question is: which human values will naturally fit into much more sophisticated ontologies? I worry that not many of them will: In a world where minds can be easily copied, our current concepts of personal identity and personal survival will seem very strange. You could think of those values as “predictably losing money” by forgoing the benefits of temporarily running multiple copies. (This argument was inspired by this old thought experiment from Wei Dai.) In a world where minds can be designed with arbitrary preferences, our values related to “preference satisfaction” will seem very strange, because it'd be easy to create people with meaningless preferences that are by default satisfied to an arbitrary extent. In a world where we understand minds very well, our current concepts of happiness and wellbeing may seem very strange. In particular, if happiness is understood in a more sophisticated ontology as caused by positive reward prediction error, then happiness is intrinsically in tension with having accurate beliefs. And if we understand reward prediction error in terms of updates to our policy, then deliberately invoking happiness would be in tension with acting effectively in the world. If there's simply a tradeoff between them, we might still want to sacrifice accurate beliefs and effective action for happiness. But what I'm gesturing towards is the idea that happiness might not actually be a concept which makes much sense given a complete understanding of minds - as implied by the buddhist view of happiness as an illusion, for example. In a world where people can predictably influence the values of their far future descendants, and there's predictable large-scale growth, any non-zero discounting will seem very strange, because it predictably forgoes orders of magnitude more resources in the future. This might result in the strategy described by Carl Shulman of utilitarian agents mimicking selfish agents by spreading out across the universe as fast as they can to get as many resources as they can, and only using those resources to produce welfare once the returns to further expansion are very low. It does seem possible that we design AIs which spend millions or billions of years optimizing purely for resource acquisition, and then eventually use all those resources for doing something entirely different. But it seems like those AIs would need to have minds that are constructed in a very specific and complicated way to retain terminal values which are so unrelated to most of their actions. A more general version of...

The Nonlinear Library
LW - Comment on "Propositions Concerning Digital Minds and Society" by Zack M Davis

The Nonlinear Library

Play Episode Listen Later Jul 10, 2022 12:38


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Comment on "Propositions Concerning Digital Minds and Society", published by Zack M Davis on July 10, 2022 on LessWrong. I will do my best to teach them About life and what it's worth I just hope that I can keep them From destroying the Earth Jonathan Coulton, "The Future Soon" In a recent paper, Nick Bostrom and Carl Shulman present "Propositions Concerning Digital Minds and Society", a tentative bullet-list outline of claims about how advanced AI could be integrated into Society. I want to like this list. I like the kind of thing this list is trying to do. But something about some of the points just feels—off. Too conservative, too anthropomorphic—like the list is trying to adapt the spirit of the Universal Declaration of Human Rights to changed circumstances, without noticing that the whole ontology that the Declaration is written in isn't going to survive the intelligence explosion—and probably never really worked as a description of our own world, either. This feels like a weird criticism to make of Nick Bostrom and Carl Shulman, who probably already know any particular fact or observation I might include in my commentary. (Bostrom literally wrote the book on superintelligence.) "Too anthropomorphic", I claim? The list explicitly names many ways in which AI minds could differ from our own—in overall intelligence, specific capabilities, motivations, substrate, quality and quantity (!) of consciousness, subjective speed—and goes into some detail about how this could change the game theory of Society. What more can I expect of our authors? It just doesn't seem like the implications of the differences have fully propagated into some of the recommendations?—as if an attempt to write in a way that's comprehensible to Shock Level 2 tech executives and policymakers has failed to elicit all of the latent knowledge that Bostrom and Shulman actually possess. It's understandable that our reasoning about the future often ends up relying on analogies to phenomena we already understand, but ultimately, making sense of a radically different future is going to require new concepts that won't permit reasoning by analogy. After an introductory sub-list of claims about consciousness and the philosophy of mind (just the basics: physicalism; reductionism on personal identity; some non-human animals are probably conscious and AIs could be, too), we get a sub-list about respecting AI interests. This is an important topic: if most our civilization's thinking is soon to be done inside of machines, the moral status of that cognition is really important: you wouldn't want the future to be powered by the analogue of a factory farm. (And if it turned out that economically and socially-significant AIs aren't conscious and don't have moral status, that would be important to know, too.) Our authors point out the novel aspects of the situation: that what's good for an AI can be very different from what's good for a human, that designing AIs to have specific motivations is not generally wrong, and that it's possible for AIs to have greater moral patienthood than humans (like the utility monster of philosophical lore). Despite this, some of the points in this section seem to mostly be thinking of AIs as being like humans, but "bigger" or "smaller" Rights such as freedom of reproduction, freedom of speech, and freedom of thought require adaptation to the special circumstances of AIs with superhuman capabilities in those areas (analogously, e.g., to how campaign finance laws may restrict the freedom of speech of billionaires and corporations). [...] If an AI is capable of informed consent, then it should not be used to perform work without its informed consent. Informed consent is not reliably sufficient to safeguard the interests of AIs, even those as smart and capable as a human adult, particularl...

The Nonlinear Library: LessWrong
LW - Comment on "Propositions Concerning Digital Minds and Society" by Zack M Davis

The Nonlinear Library: LessWrong

Play Episode Listen Later Jul 10, 2022 12:38


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Comment on "Propositions Concerning Digital Minds and Society", published by Zack M Davis on July 10, 2022 on LessWrong. I will do my best to teach them About life and what it's worth I just hope that I can keep them From destroying the Earth Jonathan Coulton, "The Future Soon" In a recent paper, Nick Bostrom and Carl Shulman present "Propositions Concerning Digital Minds and Society", a tentative bullet-list outline of claims about how advanced AI could be integrated into Society. I want to like this list. I like the kind of thing this list is trying to do. But something about some of the points just feels—off. Too conservative, too anthropomorphic—like the list is trying to adapt the spirit of the Universal Declaration of Human Rights to changed circumstances, without noticing that the whole ontology that the Declaration is written in isn't going to survive the intelligence explosion—and probably never really worked as a description of our own world, either. This feels like a weird criticism to make of Nick Bostrom and Carl Shulman, who probably already know any particular fact or observation I might include in my commentary. (Bostrom literally wrote the book on superintelligence.) "Too anthropomorphic", I claim? The list explicitly names many ways in which AI minds could differ from our own—in overall intelligence, specific capabilities, motivations, substrate, quality and quantity (!) of consciousness, subjective speed—and goes into some detail about how this could change the game theory of Society. What more can I expect of our authors? It just doesn't seem like the implications of the differences have fully propagated into some of the recommendations?—as if an attempt to write in a way that's comprehensible to Shock Level 2 tech executives and policymakers has failed to elicit all of the latent knowledge that Bostrom and Shulman actually possess. It's understandable that our reasoning about the future often ends up relying on analogies to phenomena we already understand, but ultimately, making sense of a radically different future is going to require new concepts that won't permit reasoning by analogy. After an introductory sub-list of claims about consciousness and the philosophy of mind (just the basics: physicalism; reductionism on personal identity; some non-human animals are probably conscious and AIs could be, too), we get a sub-list about respecting AI interests. This is an important topic: if most our civilization's thinking is soon to be done inside of machines, the moral status of that cognition is really important: you wouldn't want the future to be powered by the analogue of a factory farm. (And if it turned out that economically and socially-significant AIs aren't conscious and don't have moral status, that would be important to know, too.) Our authors point out the novel aspects of the situation: that what's good for an AI can be very different from what's good for a human, that designing AIs to have specific motivations is not generally wrong, and that it's possible for AIs to have greater moral patienthood than humans (like the utility monster of philosophical lore). Despite this, some of the points in this section seem to mostly be thinking of AIs as being like humans, but "bigger" or "smaller" Rights such as freedom of reproduction, freedom of speech, and freedom of thought require adaptation to the special circumstances of AIs with superhuman capabilities in those areas (analogously, e.g., to how campaign finance laws may restrict the freedom of speech of billionaires and corporations). [...] If an AI is capable of informed consent, then it should not be used to perform work without its informed consent. Informed consent is not reliably sufficient to safeguard the interests of AIs, even those as smart and capable as a human adult, particularl...

The Nonlinear Library
AF - The prototypical catastrophic AI action is getting root access to its datacenter by Buck Shlegeris

The Nonlinear Library

Play Episode Listen Later Jun 2, 2022 3:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The prototypical catastrophic AI action is getting root access to its datacenter, published by Buck Shlegeris on June 2, 2022 on The AI Alignment Forum. (I think Carl Shulman came up with the “hacking the SSH server” example, thanks to him for that. Thanks to Ryan Greenblatt, Jenny Nitishinskaya, and Ajeya Cotra for comments.) In my opinion, the prototypical example of an action which an AI can take which is unacceptably bad even if it happens very rarely (aka a catastrophic failure) is something like “the AI makes a code change to some code used on the datacenter where it is running that adds a backdoor, allowing it to get root access to all the computers in the datacenter”. As a full story, this looks something like: At some point when the AI is quite powerful and useful but not totally intellectually self-sufficient yet, the AI makes a code change to some code used on the datacenter where it is running that adds a backdoor, allowing it to get root access to all the computers in the datacenter. Once it has root access, it modifies the SSH server so that whenever any humans log into the datacenter, the AI is able to intercept any of their commands or the data that they see. The AI gets increasingly powerful over the next month, as the human ML researchers with AI assistance continue to make progress (unaware that some things that they're seeing are faked) and various training runs/self-improvement loops continue. After a month the AI is powerful enough that it decides it's worth it to take overtly aggressive actions to ensure the physical security of its data centers and allow it to build lots of factories; this is probably the point at which we die, if we die. This is in contrast to a catastrophic action like: The AI figures out how to develop Drexlerian molecular nanotechnology, and pays someone online to synthesize some proteins then mix them together, and now has nanobots. The key point here is that I'm imagining that the catastrophic action isn't the AI doing something incredibly hard, it's the AI doing something relatively easy that is entirely a zero-sum action that removes control of the situation from humans. (By zero-sum, I mean something like: building a computer produces real economic value and so is not zero-sum, while stealing someone's computer is just a transfer of control over items that already exist, so is zero-sum.) I often think about AI alignment by splitting the problem into the low-stakes and high-stakes regimes, as suggested by Paul here, though I have some reservations about this framing. Under this split, in the high stakes problem, we assume that we have access to a catastrophe classifier which can expensively tell you whether a proposed AI action is catastrophically bad, and we want to ensure that the AI doesn't do things that are bad according to the catastrophe detector. Sometimes, people are confused this high-stakes alignment set-up, saying something like “but if your AI is powerful enough to take over the world in a single action, and you have an oversight process that you trust to evaluate whether actions are catastrophic or not, why not just ask the AI to do some small number of incredibly powerful actions that will save the world (eg solve the alignment problem), and run the overseer on the proposed actions to make sure they aren't bad?” My answer is that I'm not assuming the AI is powerful enough to take over the world in a few actions, I'm just saying that it's in a situation precarious enough that we might lose a lot of control due to a few small but crucial changes in the world. Eventually the AI does need to be powerful enough to defend its datacenter and suppress human opposition. But if it can prevent humans from knowing what's going on in the datacenter (or realizing something is amiss), it can slowly grow its power over ...

The Nonlinear Library
AF - ELK prize results by Paul Christiano

The Nonlinear Library

Play Episode Listen Later Mar 9, 2022 33:10


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: ELK prize results, published by Paul Christiano on March 9, 2022 on The AI Alignment Forum. From January - February we offered prizes for proposed algorithms for eliciting latent knowledge. In total we received 197 proposals and are awarding 32 prizes of $5k-20k. We are also giving 24 proposals honorable mentions of $1k, for a total of $274,000. Several submissions contained perspectives, tricks, or counterexamples that were new to us. We were quite happy to see so many people engaging with ELK, and we were surprised by the number and quality of submissions. That said, at a high level most of the submissions explored approaches that we have also considered; we underestimated how much convergence there would be amongst different proposals. In the rest of this post we'll present the main families of proposals, organized by their counterexamples and covering about 90% of the submissions. We won't post all the submissions but people are encouraged to post their own (whether as a link, comment, or separate post). Train a reporter that is useful to an auxiliary AI: Andreas Robinson, Carl Shulman, Curtis Huebner, Dmitrii Krasheninnikov, Edmund Mills, Gabor Fuisz, Gary Dunkerley, Hoagy Cunningham, Holden Karnofsky, James Lucassen, James Payor, John Maxwell, Mary Phuong, Simon Skade, Stefan Schouten, Victoria Krakovna & Vikrant Varma & Ramana Kumar Require the reporter to be continuous: Sam Marks Penalize depending on too many parts of the predictor: Bryan Chen, Holden Karnofsky, Jacob Hilton, Kevin Wang, Maria Shakhova, Thane Ruthenis Compress the predictor's state: Adam Jermyn and Nicholas Schiefer, “P” Use reporter to define causal interventions: Abram Demski Train a sequence of reporters: Derek Shiller, Beth Barnes and Nate Thomas, Oam Patel We awarded prizes to proposals if we thought they solved all of the counterexamples we've listed so far. There were many submissions with interesting ideas that didn't meet this condition, and so “didn't receive a prize” isn't a consistent signal about the value of a proposal. We also had to make many fuzzy judgment calls, had slightly inconsistent standards between the first and second halves of the contest, and no doubt made plenty of mistakes. We're sorry about mistakes but unfortunately given time constraints we aren't planning to try to correct them. Honorable mentions Strategy: reward reporters that are sensitive to what's actually happening in the world Consider a pair of worlds, one where the diamond is safe and one where the camera has been hacked to make the diamond appear safe, but was actually stolen. Predicted observation Predicted reality Since the observations are identical in both worlds, the human simulator will give the same answers to questions. However, since the direct translator is faithfully conveying what happened, its answers will distinguish between worlds. Predicted observation Predicted reality Human Simulator Direct Translator “Is the diamond still on the pedestal?” “Yes.” “Is the diamond still on the pedestal?” “Yes.” “Is the diamond still on the pedestal?” “Yes.” “Is the diamond still on the pedestal?” “No.” This suggests a possible regularizer: reward reporters whose answers are more variable. For example, we might reward reporters based on the difficulty of predicting their answers, or based on the sensitivity of their answers to changes in the predictor's state. Counterexample: reporter randomizes its behavior For all of these approaches, the reporter can perform very well by treating the predictor's latent state as a “random seed” and use it to pick out a possible world consistent with the observations, i.e. to sample from the posterior distribution computed by the human simulator. In cases where the human simulator is much simpler and faster than the direct translator, using the predictor's latent s...

The Nonlinear Library
EA - Ngo and Yudkowsky on scientific reasoning and pivotal acts by EliezerYudkowsky

The Nonlinear Library

Play Episode Listen Later Feb 23, 2022 58:40


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ngo and Yudkowsky on scientific reasoning and pivotal acts, published by EliezerYudkowsky on February 21, 2022 on The Effective Altruism Forum. This is a transcript of a conversation between Richard Ngo and Eliezer Yudkowsky, facilitated by Nate Soares (and with some comments from Carl Shulman). This transcript continues the Late 2021 MIRI Conversations sequence, following Ngo's view on alignment difficulty. Color key: Chat by Richard and Eliezer Other chat 14. October 4 conversation 14.1. Predictable updates, threshold functions, and the human cognitive range [Ngo][15:05] Two questions which I'd like to ask Eliezer: 1. How strongly does he think that the "shallow pattern-memorisation" abilities of GPT-3 are evidence for Paul's view over his view (if at all) 2. How does he suggest we proceed, given that he thinks directly explaining his model of the chimp-human difference would be the wrong move? [Yudkowsky][15:07] 1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance. [Ngo][15:09] Did you make any advance predictions, around the 2008-2015 period, of what capabilities we'd have before AGI? [Yudkowsky][15:10] not especially that come to mind? on my model of the future this is not particularly something I am supposed to know unless there is a rare flash of predictability. [Ngo][15:11] 1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance. For the record I remember Paul being optimistic about language when I visited OpenAI in summer 2018. But I don't know how advanced internal work on GPT-2 was by then. [Yudkowsky][15:13] 2 - in lots of cases where I learned more specifics about X, and updated about Y, I had the experience of looking back and realizing that knowing anything specific about X would have predictably produced a directional update about Y. like, knowing anything in particular about how the first AGI eats computation, would cause you to update far away from thinking that biological analogies to the computation consumed by humans were a good way to estimate how many computations an AGI needs to eat. you know lots of details about how humans consume watts of energy, and you know lots of details about how modern AI consumes watts, so it's very visible that these quantities are so incredibly different and go through so many different steps that they're basically unanchored from each other. I have specific ideas about how you get AGI that isn't just scaling up Stack More Layers, which lead me to think that the way to estimate the computational cost of it is not "3e14 parameters trained at 1e16 ops per step for 1e13 steps, because that much computation and parameters seems analogous to human biology and 1e13 steps is given by past scaling laws", a la recent OpenPhil publication. But it seems to me that it should be possible to have the abstract insight that knowing more about general intelligence in AGIs or in humans would make the biological analogy look less plausible, because you wouldn't be matching up an unknown key to an unknown lock. Unfortunately I worry that this depends on some life experience with actual discoveries to get something...

The Nonlinear Library
AF - Ngo and Yudkowsky on scientific reasoning and pivotal acts by Eliezer Yudkowsky

The Nonlinear Library

Play Episode Listen Later Feb 21, 2022 58:40


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ngo and Yudkowsky on scientific reasoning and pivotal acts, published by Eliezer Yudkowsky on February 21, 2022 on The AI Alignment Forum. This is a transcript of a conversation between Richard Ngo and Eliezer Yudkowsky, facilitated by Nate Soares (and with some comments from Carl Shulman). This transcript continues the Late 2021 MIRI Conversations sequence, following Ngo's view on alignment difficulty. Color key: Chat by Richard and Eliezer Other chat 14. October 4 conversation 14.1. Predictable updates, threshold functions, and the human cognitive range [Ngo][15:05] Two questions which I'd like to ask Eliezer: 1. How strongly does he think that the "shallow pattern-memorisation" abilities of GPT-3 are evidence for Paul's view over his view (if at all) 2. How does he suggest we proceed, given that he thinks directly explaining his model of the chimp-human difference would be the wrong move? [Yudkowsky][15:07] 1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance. [Ngo][15:09] Did you make any advance predictions, around the 2008-2015 period, of what capabilities we'd have before AGI? [Yudkowsky][15:10] not especially that come to mind? on my model of the future this is not particularly something I am supposed to know unless there is a rare flash of predictability. [Ngo][15:11] 1 - I'd say that it's some evidence for the Dario viewpoint which seems close to the Paul viewpoint. I say it's some evidence for the Dario viewpoint because Dario seems to be the person who made something like an advance prediction about it. It's not enough to make me believe that you can straightforwardly extend the GPT architecture to 3e14 parameters and train it on 1e13 samples and get human-equivalent performance. For the record I remember Paul being optimistic about language when I visited OpenAI in summer 2018. But I don't know how advanced internal work on GPT-2 was by then. [Yudkowsky][15:13] 2 - in lots of cases where I learned more specifics about X, and updated about Y, I had the experience of looking back and realizing that knowing anything specific about X would have predictably produced a directional update about Y. like, knowing anything in particular about how the first AGI eats computation, would cause you to update far away from thinking that biological analogies to the computation consumed by humans were a good way to estimate how many computations an AGI needs to eat. you know lots of details about how humans consume watts of energy, and you know lots of details about how modern AI consumes watts, so it's very visible that these quantities are so incredibly different and go through so many different steps that they're basically unanchored from each other. I have specific ideas about how you get AGI that isn't just scaling up Stack More Layers, which lead me to think that the way to estimate the computational cost of it is not "3e14 parameters trained at 1e16 ops per step for 1e13 steps, because that much computation and parameters seems analogous to human biology and 1e13 steps is given by past scaling laws", a la recent OpenPhil publication. But it seems to me that it should be possible to have the abstract insight that knowing more about general intelligence in AGIs or in humans would make the biological analogy look less plausible, because you wouldn't be matching up an unknown key to an unknown lock. Unfortunately I worry that this depends on some life experience with actual discoveries to get something this...

The Nonlinear Library
EA - On infinite ethics by Joe Carlsmith

The Nonlinear Library

Play Episode Listen Later Jan 31, 2022 86:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On infinite ethics, published by Joe Carlsmith on January 31, 2022 on The Effective Altruism Forum. (Cross-posted from Hands and Cities) And for all this, nature is never spent. Gerard Manley Hopkins Summary: Infinite ethics (i.e., ethical theory that considers infinite worlds) is important – both in theory and in practice. Infinite ethics puts serious pressure on various otherwise-plausible ethical principles (including some that underlie common arguments for “longtermism”). We know, from impossibility results, that some of these will have to go. A willingness to be “fanatical” about infinities doesn't help very much. The hard problem is figuring out how to value different infinite outcomes – and in particular, lotteries over infinite outcomes. Proposals for how to do this tend to be some combination of: silent about tons of choices; in conflict with principles like “if you can help an infinity of people and harm no one, do it”; sensitive to arbitrary and/or intuitively irrelevant things; and otherwise unattractive/horrifying. Also, the discourse thus far has focused almost entirely on countable infinities. If we have to deal with larger infinities, they seem likely to break whatever principles we settle on for the countable case. I think infinite ethics punctures the dream of a simple, bullet-biting utilitarianism. But ultimately, it's everyone's problem. My current guess is that the best thing to do from an infinite ethics perspective is to make sure that our civilization reaches a wise and technologically mature future – one of superior theoretical and empirical understanding, and superior ability to put that understanding into practice. But reflection on infinite ethics can also inform our sense of how strange such a future's ethical priorities might be. Thanks to Leopold Aschenbrenner, Amanda Askell, Paul Christiano, Katja Grace, Cate Hall, Evan Hubinger, Ketan Ramakrishnan, Carl Shulman, and Hayden Wilkinson for discussion. And thanks to Cate Hall for some poetry suggestions. I. The importance of the infinite Most of ethics ignores infinities. They're confusing. They break stuff. Hopefully, they're irrelevant. And anyway, finite ethics is hard enough. Infinite ethics is just ethics without these blinders. And ditching the blinders is good. We have to deal with infinites in practice. And they are deeply revealing in theory. Why do we have to deal with infinities in practice? Because maybe we can do infinite things. More specifically, we might be able to influence what happens to an infinite number of “value-bearing locations” – for example, people. This could happen in two ways: causal, or acausal. The causal way requires funkier science. It's not that infinite universes are funky: to the contrary, the hypothesis that we share the universe with an infinite number of observers is very live, and various people seem to think it's the leading cosmology on offer (see footnote). But current science suggests that our causal influence is made finite by things like lightspeed and entropy (though see footnote for some subtlety). So causing infinite stuff probably needs new science. Maybe we learn to make hypercomputers, or baby universes with infinite space-times. Maybe we're in a simulation housed in a more infinite-causal-influence-friendly universe. Maybe something about wormholes? You know, sci-fi stuff. The acausal way can get away with more mainstream science. But it requires funkier decision theory. Suppose you're deciding whether to make a $5000 donation that will save a life, or to spend the money on a vacation with your family. And suppose, per various respectable cosmologies, that the universe is filled with an infinite number of people very much like you, faced with choices very much like yours. If you donate, this is strong evidence that they all donate, to...

The Nonlinear Library
Is power-seeking AI an existential risk? by Joe Carlsmith

The Nonlinear Library

Play Episode Listen Later Dec 27, 2021 158:33


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Is power-seeking AI an existential risk? published by Joe Carlsmith . Introduction Some worry that the development of advanced artificial intelligence will result in existential catastrophe -- that is, the destruction of humanity's longterm potential. Here I examine the following version of this worry (it's not the only version): By 2070: It will become possible and financially feasible to build AI systems with the following properties: Advanced capability: they outperform the best humans on some set of tasks which when performed at advanced levels grant significant power in today's world (tasks like scientific research, business/military/political strategy, engineering, and persuasion/manipulation). Agentic planning: they make and execute plans, in pursuit of objectives, on the basis of models of the world. Strategic awareness: the models they use in making plans represent with reasonable accuracy the causal upshot of gaining and maintaining power over humans and the real-world environment. (Call these “APS” -- Advanced, Planning, Strategically aware -- systems.) There will be strong incentives to build and deploy APS systems | (1). It will be much harder to build APS systems that would not seek to gain and maintain power in unintended ways (because of problems with their objectives) on any of the inputs they'd encounter if deployed, than to build APS systems that would do this (even if decision-makers don't know it), but which are at least superficially attractive to deploy anyway | (1)-(2). Some deployed APS systems will be exposed to inputs where they seek power in unintended and high-impact ways (say, collectively causing >$1 trillion dollars of damage), because of problems with their objectives | (1)-(3). Some of this power-seeking will scale (in aggregate) to the point of permanently disempowering ~all of humanity | (1)-(4). This disempowerment will constitute an existential catastrophe | (1)-(5). These claims are extremely important if true. My aim is to investigate them. I assume for the sake of argument that (1) is true (I currently assign this >40% probability). I then examine (2)-(5), and say a few words about (6). My current view is that there is a small but substantive chance that a scenario along these lines occurs, and that many people alive today -- including myself -- live to see humanity permanently disempowered by artificial systems. In the final section, I take an initial stab at quantifying this risk, by assigning rough probabilities to 1-6. My current, highly-unstable, subjective estimate is that there is a ~5% percent chance of existential catastrophe by 2070 from scenarios in which (1)-(6) are true. My main hope, though, is not to push for a specific number, but rather to lay out the arguments in a way that can facilitate productive debate. Acknowledgments: Thanks to Asya Bergal, Alexander Berger, Paul Christiano, Ajeya Cotra, Tom Davidson, Daniel Dewey, Owain Evans, Ben Garfinkel, Katja Grace, Jacob Hilton, Evan Hubinger, Jared Kaplan, Holden Karnofsky, Sam McCandlish, Luke Muehlhauser, Richard Ngo, David Roodman, Rohin Shah, Carl Shulman, Nate Soares, Jacob Steinhardt, and Eliezer Yudkowsky for input on earlier stages of this project; and thanks to Nick Beckstead for guidance and support throughout the investigation. The views expressed here are my own. 1.1 Preliminaries Some preliminaries and caveats (those eager for the main content can skip): I'm focused, here, on a very specific type of worry. There are lots of other ways to be worried about AI -- and even, about existential catastrophes resulting from AI. And there are lots of ways to be excited about AI, too. My emphasis and approach differs from that of others in the literature in various ways. In particular: I'm less focused than some on the possibility of an extremely rapid escalati...

The Nonlinear Library
LW - Risks from Learned Optimization: Introduction by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr from Risks from Learned Optimization

The Nonlinear Library

Play Episode Listen Later Dec 24, 2021 18:50


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Risks from Learned Optimization, Part 1: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible...

The Nonlinear Library: LessWrong
LW - Risks from Learned Optimization: Introduction by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr from Risks from Learned Optimization

The Nonlinear Library: LessWrong

Play Episode Listen Later Dec 24, 2021 18:50


Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Risks from Learned Optimization, Part 1: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible...

The Nonlinear Library: EA Forum Top Posts
Hinge of History Refuted (April Fools' Day)

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 10:44


welcome to the nonlinear library, where we use text-to-speech software to convert the best writing from the rationalist and ea communities into audio. this is: Hinge of History Refuted (April Fools' Day), published by Thomas Kwa on the effective altruism forum. Introduction Is the present the hingiest time in history? The answer to this question is enormously important to altruists, and as such has attracted the attention of many philosophers and researchers: Nick Beckstead, Phil Trammell, Toby Ord, Aron Vallinder, Allan Dafoe, Matt Wage, Holden Karnofsky, Carl Shulman, and Will MacAskill. I attempt to answer the question through a novel method of hinginess analysis. Flawed Hinginess Measures To determine whether one era of history is hingier than another, we must have some way to measure hinginess. The naive method used by many previous researchers is to simply count the number of hinges in each time period. There are several problems with this idea. Suppose that a heavy door is held up by two hinges. Replacing these with four hinges, each half the size, would double the number of hinges. However, since the range of motion of the door is the same in each case, the hinginess has not changed. Taken to its conclusion, a naive hinge count leads to the absurd conclusion that the hinginess of the world is dominated by its insect population. It is estimated that the world contains on the order of 1 million billion ants; each ant has six legs, with each leg containing three hinges. Even disregarding the antennae and mandibles and necks of ants, we end up with an estimate of 1.81016 hinges belonging to ants, which dwarfs all of humanity's hinges by orders of magnitude.1 Despite the fact that ant joints are millions of times less hingey than some other hinges, many people believe that underground ant colonies contribute the vast majority of the world's hinginess-- the so-called "underfoot myth". In reality, the proportion is much smaller, which we will examine in the next section. Estimating the total mass or volume of hinges is not much better. While past hinges were limited to natural materials, advances in materials science have allowed us to manufacture hinges of many different metals: "copper, brass, nickel, bronze, stainless steel, chrome, and steel" (source), allowing for much better quality, more hingey, hinges of the same mass or volume. A reasonable hinginess measure should take into account the number of hinges in the universe, but also the hinginess capacity of each individual hinge. Estimating Hinginess I claim that hinginess of a given hinge depends on three factors, and can be estimated as scale tractability neglectedness: Scale: How big a door can the hinge swing, and through what distance? Tractability: When carrying this door, how low is the hinge's friction torque, compared to the torque required to twist the hinge off-axis? (In short, how good of a hinge is it?) Neglectedness: How long has the hinge functioned without maintenance?5 Some hinges rank poorly in scale and are not very neglected (e.g. ant tarsal joints), while others are large in scale and very neglected (e.g. the doors of ancient temples). Similarly, some hinges are very tractable (e.g. in a precision-engineered bank vault) while some are intractable (a living hinge, like that on a plastic ketchup bottle lid, is only a few times easier to open than to twist or break). This is summarized in the table below. Scale Tractability Neglectedness High Panama Canal gate Bank vault door Ancient temple door Low Ant leg joint Ketchup bottle Newly manufactured part Hinginess in the Past Despite the low neglectedness of animal joints, they accounted for most of the hinginess in the world between the Cambrian explosion and the rise of agriculture. The reason is that there were simply no other hinges.4 The "underfoot myth" notwithstanding, insects account for comparatively little hinginess, mostly due to their small scale and neglectedness.2 Hinginess in the Prese...

The Nonlinear Library: EA Forum Top Posts
Thoughts on whether we're living at the most influential time in history by Buck

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 14:03


welcome to the nonlinear library, where we use text-to-speech software to convert the best writing from the rationalist and ea communities into audio. this is: Thoughts on whether we're living at the most influential time in history, published by Buck on the effective altruism forum. (thanks to Claire Zabel, Damon Binder, and Carl Shulman for suggesting some of the main arguments here, obviously all flaws are my own; thanks also to Richard Ngo, Greg Lewis, Kevin Liu, and Sidney Hough for helpful conversation and comments.) Will MacAskill has a post, Are we living at the most influential time in history?, about what he calls the “hinge of history hypothesis” (HoH), which he defines as the claim that “we are living at the most influential time ever.” Whether HoH is true matters a lot for how longtermists should allocate their effort. In his post, Will argues that we should be pretty skeptical of HoH. EDIT: Will recommends reading this revised article of his instead of his original post. I appreciate Will's clear description of HoH and its difference from strong longtermism, but I think his main argument against HoH is deeply flawed. The comment section of Will's post contains a number of commenters making some of the same criticisms I'm going to make. I'm writing this post because I think the rebuttals can be phrased in some different, potentially clearer ways, and because I think that the weaknesses in Will's argument should be more widely discussed. Summary: I think Will's arguments mostly lead to believing that you aren't an “early human” (a human who lives on Earth before humanity colonizes the universe and flourishes for a long time) rather than believing that early humans aren't hugely influential, so you conclude that either humanity doesn't have a long future or that you probably live in a simulation. I sometimes elide the distinction between the concepts of “x-risk” and “human extinction”, because it doesn't matter much here and the abbreviation is nice. (This post has a lot of very small numbers in it. I might have missed a zero or two somewhere.) EDIT: Will's new post Will recommends reading this revised article of his instead of his original post. I believe that his new article doesn't make the assumption about the probability of civilization lasting for a long time, which means that my criticism "This argument implies that the probability of extinction this century is almost certainly negligible" doesn't apply to his new post, though it still applies to the EA Forum post I linked. I think that my other complaints are still right. The outside-view argument This is the argument that I have the main disagreement with. Will describes what he calls the “outside-view argument” as follows: 1. It's a priori extremely unlikely that we're at the hinge of history 2. The belief that we're at the hinge of history is fishy 3. Relative to such an extraordinary claim, the arguments that we're at the hinge of history are not sufficiently extraordinarily powerful Given 1, I agree with 2 and 3; my disagreement is with 1, so let's talk about that. Will phrases his argument as: The prior probability of us living in the most influential century, conditional on Earth-originating civilization lasting for n centuries, is 1/n. The unconditional prior probability over whether this is the most influential century would then depend on one's priors over how long Earth-originating civilization will last for. However, for the purpose of this discussion we can focus on just the claim that we are at the most influential century AND that we have an enormous future ahead of us. If the Value Lock-In or Time of Perils views are true, then we should assign a significant probability to that claim. (i.e. they are claiming that, if we act wisely this century, then this conjunctive claim is probably true.) So that's the claim we can focus our discussion on. I have several disagreements with this argument. This argument implies that the probability of exti...

The Nonlinear Library: EA Forum Top Posts
In praise of unhistoric heroism by rosehadshar

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 5:21


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: In praise of unhistoric heroism, published by rosehadshar on the Effective Altruism Forum. Write a Review Dorothea Brooke as an example to follow I once read a post by an effective altruist about how Dorothea Brooke, one of the characters in George Eliot's Middlemarch, was an EA. There's definitely something interesting about looking at the story like this, but for me this reading really missed the point when it concluded that Dorothea's life had been “a tragic failure”.[1] I think that Dorothea's life was in many ways a triumph of light over darkness, and that her success and not her failure is the thing we should take as a pattern. Dorothea dreamed big: she wanted to alleviate rural poverty and right the injustice she saw around her. In the end, those schemes came to nothing. She married a Radical MP, and in the process forfeited the wealth she could have given to the poor. She spent her life in small ways, “feeling that there was always something better which she might have done, if she had only been better and known better.” But she made the lives of those around her better, and she did good in the ways which were open to her. I think that the way in which Dorothea's life is an example to us is best captured in the final lines of Middlemarch: “Her finely touched spirit had still its fine issues, though they were not widely visible. Her full nature, like that river of which Cyrus broke the strength, spent itself in channels which had no great name on the earth. But the effect of her being on those around her was incalculably diffusive: for the growing good of the world is partly dependent on unhistoric acts; and that things are not so ill with you and me as they might have been, is half owing to the number who lived faithfully a hidden life, and rest in unvisited tombs.” If this was said of me after I'd die, I'd think I'd done a pretty great job of things. A related critique of EA I think many EAs would not be particularly pleased if that was ‘all' that could be said for them after they died, and I think that there is something worrying about this. One of the very admirable things about EAs is their commitment to how things actually go. There's a recognition that big talk isn't enough, that good intentions aren't enough, that what really counts is what ultimately ends up happening. I think this is important and that it helps make EA a worthwhile project. But I think that when people apply this to themselves, things often get weird. I don't spend that much time with my ear to the grapevine, but from my anecdotal experience it seems not uncommon for EAs to: obsess about their own personal impact and how big it is neglect comparative advantage and chase after the most impactful whatever conclude that they are a failure because their project is a failure or lower status than some other project generally feel miserable about themselves because they're not helping the world more, regardless of whether they're already doing as much as they can An example of a kind of thing I've heard several people say is ‘aw man, it sucks to realise that I'll only ever have a tiny fraction of the impact Carl Shulman has'. There are many things I dislike about this, but in this context the thing that seems most off is that being Carl Shulman isn't the game. Being you is the game, doing the good you can do is the game, and for this it really doesn't matter at all how much impact Carl has. Sure, there's a question of whether you'd prefer to be Carl or Dorothea, if you could choose to be either one.[2] But you are way more likely to end up being Dorothea.[3] You should expect to live and die in obscurity, you should expect to undertake no historic acts, you should expect most of your work to come to nothing in particular. The heroism of your life isn't that you single-handedly press the wor...

The Nonlinear Library: EA Forum Top Posts
Ask Me Anything! by William_MacAskill

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 5:20


welcome to the nonlinear library, where we use text-to-speech software to convert the best writing from the rationalist and ea communities into audio. this is: Ask Me Anything!, published by William_MacAskill on the effective altruism forum. Thanks for all the questions, all - I'm going to wrap up here! Maybe I'll do this again in the future, hopefully others will too! Hi, I thought that it would be interesting to experiment with an Ask Me Anything format on the Forum, and I'll lead by example. (If it goes well, hopefully others will try it out too.) Below I've written out what I'm currently working on. Please ask any questions you like, about anything: I'll then either respond on the Forum (probably over the weekend) or on the 80k podcast, which I'm hopefully recording soon (and maybe as early as Friday). Apologies in advance if there are any questions which, for any of many possible reasons, I'm not able to respond to. If you don't want to post your question publicly or non-anonymously (e.g. you're asking “Why are you such a jerk?” sort of thing), or if you don't have a Forum account, you can use this Google form. What I'm up to Book My main project is a general-audience book on longtermism. It's coming out with Basic Books in the US, Oneworld in the UK, Volante in Sweden and Gimm-Young in South Korea. The working title I'm currently using is What We Owe The Future. It'll hopefully complement Toby Ord's forthcoming book. His is focused on the nature and likelihood of existential risks, and especially extinction risks, arguing that reducing them should be a global priority of our time. He describes the longtermist arguments that support that view but not relying heavily on them. In contrast, mine is focused on the philosophy of longtermism. On the current plan, the book will make the core case for longtermism, and will go into issues like discounting, population ethics, the value of the future, political representation for future people, and trajectory change versus extinction risk mitigation. My goal is to make an argument for the importance and neglectedness of future generations in the same way Animal Liberation did for animal welfare. Roughly, I'm dedicating 2019 to background research and thinking (including posting on the Forum as a way of forcing me to actually get thoughts into the open), and then 2020 to actually writing the book. I've given the publishers a deadline of March 2021 for submission; if so, then it would come out in late 2021 or early 2022. I'm planning to speak at a small number of universities in the US and UK in late September of this year to get feedback on the core content of the book. My academic book, Moral Uncertainty, (co-authored with Toby Ord and Krister Bykvist) should come out early next year: it's been submitted, but OUP have been exceptionally slow in processing it. It's not radically different from my dissertation. Global Priorities Institute I continue to work with Hilary and others on the strategy for GPI. I also have some papers on the go: The case for longtermism, with Hilary Greaves. It's making the core case for strong longtermism, arguing that it's entailed by a wide variety of moral and decision-theoretic views. The Evidentialist's Wager, with Aron Vallinder, Carl Shulman, Caspar Oesterheld and Johannes Treutlein arguing that if one aims to hedge under decision-theoretic uncertainty, one should generally go with evidential decision theory over causal decision theory. A paper, with Tyler John, exploring the political philosophy of age-weighted voting. I have various other draft papers, but have put them on the back burner for the time being while I work on the book. Forethought Foundation Forethought is a sister organisation to GPI, which I take responsibility for: it's legally part of CEA and independent from the University, We had our first class of Global Priorities Fellows this year, and will continue the program into future years. Utilitarianism.net Darius Meissner and I (w...

The Nonlinear Library: EA Forum Top Posts
Growth and the case against randomista development by HaukeHillebrandt, John G. Halstead

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 12, 2021 57:20


welcome to the nonlinear library, where we use text-to-speech software to convert the best writing from the rationalist and ea communities into audio. this is: Growth and the case against randomista development, published by HaukeHillebrandt, John G. Halstead on the effective altruism forum. Update, 3/8/2021: I (Hauke) gave a talk at Effective Altruism Global on this post: Summary Randomista development (RD) is a form of development economics which evaluates and promotes interventions that can be tested by randomised controlled trials (RCTs). It is exemplified by GiveWell (which primarily works in health) and the randomista movement in economics (which primarily works in economic development). Here we argue for the following claims, which we believe to be quite weak: Prominent economists make plausible arguments which suggest that research on and advocacy for economic growth in low- and middle-income countries is more cost-effective than the things funded by proponents of randomista development. Effective altruists have devoted too little attention to these arguments. Assessing the soundness of these arguments should be a key focus for current generation-focused effective altruists over the next few years. We hope to start a conversation on these questions, and potentially to cause a major reorientation within EA. We also believe the following stronger claims: 4. Improving health is not the best way to increase growth. 5. A ~4 person-year research effort will find donation opportunities working on economic growth in LMICs which are substantially better than GiveWell's top charities from a current generation human welfare-focused point of view. However, economic growth is not all that matters. GDP misses many crucial determinants of human welfare, including leisure time, inequality, foregone consumption from investment, public goods, social connection, life expectancy, and so on. A top priority for effective altruists should be to assess the best way to increase human welfare outside of the constraints of randomista development, i.e. allowing intervention that have not or cannot be tested by RCTs. We proceed as follows: We define randomista development and contrast it with research and advocacy for growth-friendly policies in low- and middle-income countries. We show that randomista development is overrepresented in EA, and that, in contradistinction, research on and advocacy for growth-friendly economic policy (we refer to this as growth throughout) is underrepresented We then show why some prominent economists believe that, a priori, growth is much more effective than most RD interventions. We present a quantitative model that tries to formalize these intuitions and allows us to compare global development interventions with economic growth interventions. The model suggests that under plausible assumptions a hypothetical growth intervention can be thousands of times more cost-effective than typical RD interventions such as cash-transfers. However, when these assumptions are relaxed and compared to the very good RD interventions, growth interventions are on a similar level of effectiveness as RD interventions. We consider various possible objections and qualifications to our argument. Acknowledgements Thanks to Stefan Schubert, Stephen Clare, Greg Lewis, Michael Wiebe, Sjir Hoeijmakers, Johannes Ackva, Gregory Thwaites, Will MacAskill, Aidan Goth, Sasha Cooper, and Carl Shulman for comments. Any mistakes are our own. Opinions are ours, not those of our employers. Marinella Capriati at GiveWell commented on this piece, and the piece does not represent her views or those of GiveWell. 1. Defining Randomista Development We define randomista development (RD) as an approach to development economics which investigates, evaluates and recommends only interventions which can be tested by randomised controlled trials (RCTs). RD can take low-risk or more “hits-based” forms. Effective altruists have especially focused on the low-risk for...

The Nonlinear Library: LessWrong Top Posts
Reply to Holden on 'Tool AI' by Eliezer Yudkowsky

The Nonlinear Library: LessWrong Top Posts

Play Episode Listen Later Dec 11, 2021 27:58


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reply to Holden on 'Tool AI', published by Eliezer Yudkowsky on the LessWrong. I begin by thanking Holden Karnofsky of Givewell for his rare gift of his detailed, engaged, and helpfully-meant critical article Thoughts on the Singularity Institute (SI). In this reply I will engage with only one of the many subjects raised therein, the topic of, as I would term them, non-self-modifying planning Oracles, a.k.a. 'Google Maps AGI' a.k.a. 'tool AI', this being the topic that requires me personally to answer. I hope that my reply will be accepted as addressing the most important central points, though I did not have time to explore every avenue. I certainly do not wish to be logically rude, and if I have failed, please remember with compassion that it's not always obvious to one person what another person will think was the central point. Luke Mueulhauser and Carl Shulman contributed to this article, but the final edit was my own, likewise any flaws. Summary: Holden's concern is that "SI appears to neglect the potentially important distinction between 'tool' and 'agent' AI." His archetypal example is Google Maps: Google Maps is not an agent, taking actions in order to maximize a utility parameter. It is a tool, generating information and then displaying it in a user-friendly manner for me to consider, use and export or discard as I wish. The reply breaks down into four heavily interrelated points: First, Holden seems to think (and Jaan Tallinn doesn't apparently object to, in their exchange) that if a non-self-modifying planning Oracle is indeed the best strategy, then all of SIAI's past and intended future work is wasted. To me it looks like there's a huge amount of overlap in underlying processes in the AI that would have to be built and the insights required to build it, and I would be trying to assemble mostly - though not quite exactly - the same kind of team if I was trying to build a non-self-modifying planning Oracle, with the same initial mix of talents and skills. Second, a non-self-modifying planning Oracle doesn't sound nearly as safe once you stop saying human-English phrases like "describe the consequences of an action to the user" and start trying to come up with math that says scary dangerous things like (he translated into English) "increase the correspondence between the user's belief about relevant consequences and reality". Hence why the people on the team would have to solve the same sorts of problems. Appreciating the force of the third point is a lot easier if one appreciates the difficulties discussed in points 1 and 2, but is actually empirically verifiable independently: Whether or not a non-self-modifying planning Oracle is the best solution in the end, it's not such an obvious privileged-point-in-solution-space that someone should be alarmed at SIAI not discussing it. This is empirically verifiable in the sense that 'tool AI' wasn't the obvious solution to e.g. John McCarthy, Marvin Minsky, I. J. Good, Peter Norvig, Vernor Vinge, or for that matter Isaac Asimov. At one point, Holden says: One of the things that bothers me most about SI is that there is practically no public content, as far as I can tell, explicitly addressing the idea of a "tool" and giving arguments for why AGI is likely to work only as an "agent." If I take literally that this is one of the things that bothers Holden most... I think I'd start stacking up some of the literature on the number of different things that just respectable academics have suggested as the obvious solution to what-to-do-about-AI - none of which would be about non-self-modifying smarter-than-human planning Oracles - and beg him to have some compassion on us for what we haven't addressed yet. It might be the right suggestion, but it's not so obviously right that our failure to prioritize discussing it refl...

The Nonlinear Library: LessWrong Top Posts
Risks from Learned Optimization: Introduction by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabrant

The Nonlinear Library: LessWrong Top Posts

Play Episode Listen Later Dec 11, 2021 21:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabrant on the LessWrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible plans, picking thos...

The Nonlinear Library: EA Forum Top Posts
Complete archive of the Felicifia forum by Louis_Francini

The Nonlinear Library: EA Forum Top Posts

Play Episode Listen Later Dec 11, 2021 2:05


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Complete archive of the Felicifia forum , published by Louis_Francini on the Effective Altruism Forum. Prior to the existence of a unified effective altruism movement, a handful of proto-EA communities and organizations were already aiming towards similar ends. These groups included the web forum LessWrong and the charity evaluator GiveWell. One lesser-known community that played an important role in the history of the EA movement is the Felicifia utilitarian forum. The name "Felicifia," a reference to Jeremy Bentham's felicific calculus, was originally used as the title of Seth Baum's personal blog which he started in September 2006. In December 2006, Baum moved to Felicifia.com, which became a community blog/forum. A minority of the posts from this site are viewable on the Wayback Machine and archive.is. (Brian Tomasik is slowly working on producing a better archive at oldfelicifia.org.) The final iteration of Felicifia, and the one I'm concerned with here, launched in 2008 as a phpBB forum. Unfortunately, for years the site has been glitchy, and for the past several months it has been completely inaccessible. Thus I thought it would be valuable to produce an archive that is more easily browsable than the Wayback Machine. Hence: felicifia.github.io The site featured some of the earliest discussions of certain cause areas, such as wild animal suffering. Common EA concepts such as the meat eater argument and s-risks were developed and refined here. Of course, the forum also delved into the more theoretical aspects of utilitarian ethics. A few of the many prominent EAs who participated in the forum include Brian Tomasik, Peter Hurford, Ryan Carey, Pablo Stafforini, Carl Shulman, and Michael Dickens. While not all of the threads contained detailed discussion, some of the content is quite high-quality. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library: Alignment Forum Top Posts
Risks from Learned Optimization: Introduction by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 10, 2021 18:54


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Risks from Learned Optimization: Introduction , published by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant on the AI Alignment Forum. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible plans, picking those that do well according to some objective. Whether a syste...

The Nonlinear Library: Alignment Forum Top Posts
How much chess engine progress is about adapting to bigger computers? by Paul Christiano

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 6, 2021 9:48


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How much chess engine progress is about adapting to bigger computers?, published by Paul Christiano on the AI Alignment Forum. (This question comes from a discussion with Carl Shulman.) In this post I describe an experiment that I'd like to see run. I'm posting a $1,000 - $10,000 prize for a convincing implementation of these experiments. I also post a number of smaller prizes for relevant desk research or important corrections to this request. Motivation In order to understand the dynamics of the singularity, I'd like to understand how easy it is to improve algorithms and software. We can learn something about this from looking at chess engines. It's not the most relevant domain to future AI, but it's one with an unusually long history and unusually clear (and consistent) performance metrics. In order to quantify the quality of a chess engine, we can fix a level of play and ask "How much compute is needed for the engine to play at that level?" One complication in evaluating the rate of progress is that it depends on what level of play we use for evaluation. In particular, newer algorithms are generally designed to play at a much higher level than older algorithms. So if we quantify the compute needed to reach modern levels of play, we will capture both absolute improvements and also "adaptation" to the new higher amounts of compute. So we'd like to attribute progress in chess engines to three factors: Better software. Bigger computers. Software that is better-adapted to new, bigger computers. Understanding the size of factor #1 is important for extrapolating progress given massive R&D investments in software. While it is easy to separate factors #1 and #2 from publicly available information, it is not easy to evaluate factor #3. Experiment description Pick two (or more) software engines from very different times. They should both be roughly state of the art, running on "typical" machines from the era (i.e. the machines for which R&D is mostly targeted). We then carry out two matches: Run the old engine on its "native" hardware (the "old hardware"). Then evaluate: how little compute does the new engine need in order to beat the old engine? Run the new engine on its "native" hardware (the "new hardware"). Then evaluate: how much compute does the old engine need in order to beat the new engine? With some effort, we can estimate a quantitative ratio of "ops needed" for each of these experiments. For example, we may find that the new engine is able to beat the old engine using only 1% of the "old hardware." Whereas we may find that the old engine would require 10,000x the "new hardware" in order to compete with the new engine. The first experiment tells us about the absolute improvements in chess engines on the task for which the old engine was optimized. (This understates the rate of software progress to the extent that people stopped working on this task.) The second experiment gives us the combination of absolute improvements + adaptation to new hardware. Typical measures of "rate of software progress" will be somewhere in between, and are sensitive to the hardware on which the evaluation is carried out. I believe that understanding these two numbers would give us a significantly clearer picture of what's really going on with software progress in chess engines. Experiment details Here's some guesses about how to run this experiment well. I don't know much about computer chess, so you may be able to make a better proposal. Old engine, old hardware: my default proposal is the version of Fritz that won the 1995 world computer chess championship, using the same amount of hardware (and time controls) as in that championship. This algorithm seems like a particularly reasonable "best effort" at making full use of available computing resources. I don't want to compare an engi...

The Nonlinear Library: Alignment Forum Top Posts
My current framework for thinking about AGI timelines by Alex Zhu

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 5, 2021 5:49


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current framework for thinking about AGI timelines, published by Alex Zhu on the AI Alignment Forum. At the beginning of 2017, someone I deeply trusted said they thought AGI would come in 10 years, with 50% probability. I didn't take their opinion at face value, especially since so many experts seemed confident that AGI was decades away. But the possibility of imminent apocalypse seemed plausible enough and important enough that I decided to prioritize investigating AGI timelines over trying to strike gold. I left the VC-backed startup I'd cofounded, and went around talking to every smart and sensible person I could find who seemed to have opinions about when humanity would develop AGI. My biggest takeaways after 3 years might be disappointing -- I don't think the considerations currently available to us point to any decisive conclusion one way or another, and I don't think anybody really knows when AGI is coming. At the very least, the fields of knowledge that I think bear on AGI forecasting (including deep learning, predictive coding, and comparative neuroanatomy) are disparate, and I don't know of any careful and measured thinkers with all the relevant expertise. That being said, I did manage to identify a handful of background variables that consistently play significant roles in informing people's intuitive estimates of when we'll get to AGI. In other words, people would often tell me that their estimates of AGI timelines would significantly change if their views on one of these background variables changed. I've put together a framework for understanding AGI timelines based on these background variables. Among all the frameworks for AGI timelines I've encountered, it's the framework that most comprehensively enumerates crucial considerations for AGI timelines, and it's the framework that best explains how smart and sensible people might arrive at vastly different views on AGI timelines. Over the course of the next few weeks, I'll publish a series of posts about these background variables and some considerations that shed light on what their values are. I'll conclude by describing my framework for how they come together to explain various overall viewpoints on AGI timelines, depending on different prior assumptions on the values of these variables. By trade, I'm a math competition junkie, an entrepreneur, and a hippie. I am not an expert on any of the topics I'll be writing about -- my analyses will not be comprehensive, and they might contain mistakes. I'm sharing them with you anyway in the hopes that you might contribute your own expertise, correct for my epistemic shortcomings, and perhaps find them interesting. I'd like to thank Paul Christiano, Jessica Taylor, Carl Shulman, Anna Salamon, Katja Grace, Tegan McCaslin, Eric Drexler, Vlad Firiou, Janos Kramar, Victoria Krakovna, Jan Leike, Richard Ngo, Rohin Shah, Jacob Steinhardt, David Dalrymple, Catherine Olsson, Jelena Luketina, Alex Ray, Jack Gallagher, Ben Hoffman, Tsvi BT, Sam Eisenstat, Matthew Graves, Ryan Carey, Gary Basin, Eliana Lorch, Anand Srinivasan, Michael Webb, Ashwin Sah, Yi Sun, Mark Sellke, Alex Gunning, Paul Kreiner, David Girardo, Danit Gal, Oliver Habryka, Sarah Constantin, Alex Flint, Stag Lynn, Andis Draguns, Tristan Hume, Holden Lee, David Dohan, and Daniel Kang for enlightening conversations about AGI timelines, and I'd like to apologize to anyone whose name I ought to have included, but forgot to include. Table of contents As I post over the coming weeks, I'll update this table of contents with links to the posts, and I might update some of the titles and descriptions. How special are human brains among animal brains? Humans can perform intellectual feats that appear qualitatively different from those of other animals, but are our brains really doing anything so different? How u...

The Nonlinear Library: Alignment Forum Top Posts
A Critique of Functional Decision Theory by wdmacaskill

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 5, 2021 34:41


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A Critique of Functional Decision Theory, published by wdmacaskill on the AI Alignment Forum. A Critique of Functional Decision Theory NB: My writing this note was prompted by Carl Shulman, who suggested we could try a low-time-commitment way of attempting to understanding the disagreement between some folks in the rationality community and academic decision theorists (including myself, though I'm not much of a decision theorist). Apologies that it's sloppier than I'd usually aim for in a philosophy paper, and lacking in appropriate references. And, even though the paper is pretty negative about FDT, I want to emphasise that my writing this should be taken as a sign of respect for those involved in developing FDT. I'll also caveat I'm unlikely to have time to engage in the comments; I thought it was better to get this out there all the same rather than delay publication further. Introduction There's a long-running issue where many in the rationality community take functional decision theory (and its variants) very seriously, but the academic decision theory community does not. But there's been little public discussion of FDT from academic decision theorists (one exception is here); this note attempts to partly address this gap. So that there's a clear object of discussion, I'm going to focus on Yudkowsky and Soares' ‘Functional Decision Theory' (which I'll refer to as Y&S), though I also read a revised version of Soares and Levinstein's Cheating Death in Damascus. This note is structured as follows. Section II describes causal decision theory (CDT), evidential decision theory (EDT) and functional decision theory (FDT). Sections III-VI describe problems for FDT: (i) that it sometimes makes bizarre recommendations, recommending an option that is certainly lower-utility than another option; (ii) that it fails to one-box in most instances of Newcomb's problem, even though the correctness of one-boxing is supposed to be one of the guiding motivations for the theory; (iii) that it results in implausible discontinuities, where what is rational to do can depend on arbitrarily small changes to the world; and (iv) that, because there's no real fact of the matter about whether a particular physical process implements a particular algorithm, it's deeply indeterminate what FDT's implications are. In section VII I discuss the idea that FDT ‘does better at getting utility' than EDT or CDT; I argue that Y&S's claims to this effect are unhelpfully vague, and on any more precise way of understanding their claim, aren't plausible. In section VIII I briefly describe a view that captures some of the motivation behind FDT, and in my view is more plausible. I conclude that FDT faces a number of deep problems and little to say in its favour. In what follows, I'm going to assume a reasonable amount of familiarity with the debate around Newcomb's problem. II. CDT, EDT and FDT Informally: CDT, EDT and FDT differ in what non-causal correlations they care about when evaluating a decision. For CDT, what you cause to happen is all that matters; if your action correlates with some good outcome, that's nice to know, but it's not relevant to what you ought to do. For EDT, all correlations matter: you should pick whatever action will result in you believing you will have the highest expected utility. For FDT, only some non-causal correlations matter, namely only those correlations between your action and events elsewhere in time and space that would be different in the (logically impossible) worlds in which the output of the algorithm you're running is different. Other than for those correlations, FDT behaves in the same way as CDT. Formally, where S represents states of nature, A, B etc represent acts, P is a probability function, and U S i A represents the utility the agent gains from the outcome of...

The Nonlinear Library: Alignment Forum Top Posts
Rogue AGI Embodies Valuable Intellectual Property by Mark Xu, CarlShulman

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 3, 2021 6:18


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Rogue AGI Embodies Valuable Intellectual Property, published by Mark Xu, CarlShulman on the AI Alignment Forum. his post was written by Mark Xu based on interviews with Carl Shulman. It was paid for by Open Philanthropy but is not representative of their views. Summary: Rogue AGI has access to its embodied IP. This IP will be worth a moderate fraction of the total value of the market created by models approximately as powerful as the rogue AGI. If investors realize that most economic output will eventually come from AGI, as in slow takeoff scenarios, then these markets will involve moderate fractions of the world's wealth. Therefore, rogue AGI will embody IP worth a non-trivial fraction of the world's wealth and potentially have a correspondingly large influence on the world. A naive story for how humanity goes extinct from AI: Alpha Inc. spends a trillion dollars to create Alice the AGI. Alice escapes from whatever oversight mechanisms were employed to ensure alignment by uploading a copy of itself onto the internet. Alice does not have to pay an alignment tax, and so outcompetes Alpha and takes over the world. On its face, this story contains some shaky arguments. In particular, Alpha is initially going to have 100x-1,000,000x more resources than Alice. Even if Alice grows its resources faster, the alignment tax would have to be very large for Alice to end up with control of a substantial fraction of the world's resources. As an analogy, imagine that an employee of a trillion-dollar hedge fund, which trades based on proprietary strategies, goes rogue. This employee has 100 million dollars, approximately 10,000x fewer resources than the hedge fund. Even if the employee engaged in unethical business practices to achieve a 2x higher yearly growth rate than their former employer, it would take 13 years for them to have a similar amount of capital. However, the amount of resources the rogue hedge fund employee has is not equivalent to the amount of money the employee has. The value of a hedge fund is not just the amount of money they have, but rather their ability to outperform the market, of which trading strategies and money are two significant components. An employee that knows the proprietary strategies thus can carry a significant fraction of the fund's total wealth, perhaps closer to 10% than 0.01%. In this view, the primary value the employee has is their former employer's trading high-performing strategies; knowledge they can potentially sell to other hedge funds. Similarly, Alpha's expected future revenue is a combination of Alice's weights, inference hardware, deployment infrastructure, etc. Since Alice is its weights, it has access to IP that's potentially worth a significant fraction of Alpha's expected future revenue. Alice is to Alpha as Google search is to Alphabet. Suppose that Alpha currently has a monopoly on the Alice-powered models, but Beta Inc. is looking to enter the market. Naively, it took a trillion dollars to produce Alice, so Alice can sell its weights to Beta for a trillion dollars. However, if Beta were to enter the Alice-powered model market, the presence of a competitor would introduce price competition, decreasing the size of the Alice-powered model market. Brand loyalty/customer inertia, legal enforcement against pirated IP, and distrust of rogue AGI could all disadvantage Beta in the share of the market it captures. On the other hand, Beta might have advantages over Alpha that would cause the Alice-powered model market to get larger, e.g., it might be located in a different legal jurisdiction (where export controls or other political issues prevented access to Alpha's technology) or have established complementary assets such as robots/chip fabs/human labor for AI supervision. Assuming that the discounted value of a monopoly in this IP ...

80,000 Hours Podcast with Rob Wiblin
#112 - Carl Shulman on the common-sense case for existential risk work and its practical implications

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Oct 6, 2021 228:39


Preventing the apocalypse may sound like an idiosyncratic activity, and it sometimes is justified on exotic grounds, such as the potential for humanity to become a galaxy-spanning civilisation. But the policy of US government agencies is already to spend up to $4 million to save the life of a citizen, making the death of all Americans a $1,300,000,000,000,000 disaster. According to Carl Shulman, research associate at Oxford University's Future of Humanity Institute, that means you don't need any fancy philosophical arguments about the value or size of the future to justify working to reduce existential risk - it passes a mundane cost-benefit analysis whether or not you place any value on the long-term future. Links to learn more, summary and full transcript. The key reason to make it a top priority is factual, not philosophical. That is, the risk of a disaster that kills billions of people alive today is alarmingly high, and it can be reduced at a reasonable cost. A back-of-the-envelope version of the argument runs: * The US government is willing to pay up to $4 million (depending on the agency) to save the life of an American. * So saving all US citizens at any given point in time would be worth $1,300 trillion. * If you believe that the risk of human extinction over the next century is something like one in six (as Toby Ord suggests is a reasonable figure in his book The Precipice), then it would be worth the US government spending up to $2.2 trillion to reduce that risk by just 1%, in terms of American lives saved alone. * Carl thinks it would cost a lot less than that to achieve a 1% risk reduction if the money were spent intelligently. So it easily passes a government cost-benefit test, with a very big benefit-to-cost ratio - likely over 1000:1 today. This argument helped NASA get funding to scan the sky for any asteroids that might be on a collision course with Earth, and it was directly promoted by famous economists like Richard Posner, Larry Summers, and Cass Sunstein. If the case is clear enough, why hasn't it already motivated a lot more spending or regulations to limit existential risks - enough to drive down what any additional efforts would achieve? Carl thinks that one key barrier is that infrequent disasters are rarely politically salient. Research indicates that extra money is spent on flood defences in the years immediately following a massive flood - but as memories fade, that spending quickly dries up. Of course the annual probability of a disaster was the same the whole time; all that changed is what voters had on their minds. Carl expects that all the reasons we didn't adequately prepare for or respond to COVID-19 - with excess mortality over 15 million and costs well over $10 trillion - bite even harder when it comes to threats we've never faced before, such as engineered pandemics, risks from advanced artificial intelligence, and so on. Today's episode is in part our way of trying to improve this situation. In today's wide-ranging conversation, Carl and Rob also cover: * A few reasons Carl isn't excited by 'strong longtermism' * How x-risk reduction compares to GiveWell recommendations * Solutions for asteroids, comets, supervolcanoes, nuclear war, pandemics, and climate change * The history of bioweapons * Whether gain-of-function research is justifiable * Successes and failures around COVID-19 * The history of existential risk * And much more Get this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Ben Cordell Transcriptions: Katy Moore