POPULARITY
Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.Check out the full transcript on the 80,000 Hours website.You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You'll hear:Ajeya Cotra on overrated AGI worriesHolden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models biggerIan Morris on why the future must be radically different from the presentNick Joseph on whether his companies internal safety policies are enoughRichard Ngo on what everyone gets wrong about how ML models workTom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn'tCarl Shulman on why you'll prefer robot nannies over human onesZvi Mowshowitz on why he's against working at AI companies except in some safety rolesHugo Mercier on why even superhuman AGI won't be that persuasiveRob Long on the case for and against digital sentienceAnil Seth on why he thinks consciousness is probably biologicalLewis Bollard on whether AI advances will help or hurt nonhuman animalsRohin Shah on whether humanity's work ends at the point it creates AGIAnd of course, Rob and Luisa also regularly chime in on what they agree and disagree with.Chapters:Cold open (00:00:00)Rob's intro (00:00:58)Rob & Luisa: Bowerbirds compiling the AI story (00:03:28)Ajeya Cotra on the misalignment stories she doesn't buy (00:09:16)Rob & Luisa: Agentic AI and designing machine people (00:24:06)Holden Karnofsky on the dangers of even aligned AI, and how we probably won't all die from misaligned AI (00:39:20)Ian Morris on why we won't end up living like The Jetsons (00:47:03)Rob & Luisa: It's not hard for nonexperts to understand we're playing with fire here (00:52:21)Nick Joseph on whether AI companies' internal safety policies will be enough (00:55:43)Richard Ngo on the most important misconception in how ML models work (01:03:10)Rob & Luisa: Issues Rob is less worried about now (01:07:22)Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy (01:14:08)Michael Webb on why he's sceptical about explosive economic growth (01:20:50)Carl Shulman on why people will prefer robot nannies over humans (01:28:25)Rob & Luisa: Should we expect AI-related job loss? (01:36:19)Zvi Mowshowitz on why he thinks it's a bad idea to work on improving capabilities at cutting-edge AI companies (01:40:06)Holden Karnofsky on the power that comes from just making models bigger (01:45:21)Rob & Luisa: Are risks of AI-related misinformation overblown? (01:49:49)Hugo Mercier on how AI won't cause misinformation pandemonium (01:58:29)Rob & Luisa: How hard will it actually be to create intelligence? (02:09:08)Robert Long on whether digital sentience is possible (02:15:09)Anil Seth on why he believes in the biological basis of consciousness (02:27:21)Lewis Bollard on whether AI will be good or bad for animal welfare (02:40:52)Rob & Luisa: The most interesting new argument Rob's heard this year (02:50:37)Rohin Shah on whether AGI will be the last thing humanity ever does (02:57:35)Rob's outro (03:11:02)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions and additional content editing: Katy Moore
Podcast: AI SummerEpisode: Ajeya Cotra on AI safety and the future of humanityRelease date: 2025-01-16Get Podcast Transcript →powered by Listen411 - fast audio-to-text and summarizationAjeya Cotra works at Open Philanthropy, a leading funder of efforts to combat existential risks from AI. She has led the foundation's grantmaking on technical research to understand and reduce catastrophic risks from advanced AI. She is co-author of Planned Obsolescence, a newsletter about AI futurism and AI alignment.Although a committed doomer herself, Cotra has worked hard to understand the perspectives of AI safety skeptics. In this episode, we asked her to guide us through the contentious debate over AI safety and—perhaps—explain why people with similar views on other issues frequently reach divergent views on this one. We spoke to Cotra on December 10. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit www.aisummer.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Estimating Tail Risk in Neural Networks, published by Jacob Hilton on September 13, 2024 on The AI Alignment Forum. Machine learning systems are typically trained to maximize average-case performance. However, this method of training can fail to meaningfully control the probability of tail events that might cause significant harm. For instance, while an artificial intelligence (AI) assistant may be generally safe, it would be catastrophic if it ever suggested an action that resulted in unnecessary large-scale harm. Current techniques for estimating the probability of tail events are based on finding inputs on which an AI behaves catastrophically. Since the input space is so large, it might be prohibitive to search through it thoroughly enough to detect all potential catastrophic behavior. As a result, these techniques cannot be used to produce AI systems that we are confident will never behave catastrophically. We are excited about techniques to estimate the probability of tail events that do not rely on finding inputs on which an AI behaves badly, and can thus detect a broader range of catastrophic behavior. We think developing such techniques is an exciting problem to work on to reduce the risk posed by advanced AI systems: Estimating tail risk is a conceptually straightforward problem with relatively objective success criteria; we are predicting something mathematically well-defined, unlike instances of eliciting latent knowledge (ELK) where we are predicting an informal concept like "diamond". Improved methods for estimating tail risk could reduce risk from a variety of sources, including central misalignment risks like deceptive alignment. Improvements to current methods can be found both by doing empirical research, or by thinking about the problem from a theoretical angle. This document will discuss the problem of estimating the probability of tail events and explore estimation strategies that do not rely on finding inputs on which an AI behaves badly. In particular, we will: Introduce a toy scenario about an AI engineering assistant for which we want to estimate the probability of a catastrophic tail event. Explain some deficiencies of adversarial training, the most common method for reducing risk in contemporary AI systems. Discuss deceptive alignment as a particularly dangerous case in which adversarial training might fail. Present methods for estimating the probability of tail events in neural network behavior that do not rely on evaluating behavior on concrete inputs. Conclude with a discussion of why we are excited about work aimed at improving estimates of the probability of tail events. This document describes joint research done with Jacob Hilton, Victor Lecomte, David Matolcsi, Eric Neyman, Thomas Read, George Robinson, and Gabe Wu. Thanks additionally to Ajeya Cotra, Lukas Finnveden, and Erik Jenner for helpful comments and suggestions. A Toy Scenario Consider a powerful AI engineering assistant. Write M for this AI system, and M(x) for the action it suggests given some project description x. We want to use this system to help with various engineering projects, but would like it to never suggest an action that results in large-scale harm, e.g. creating a doomsday device. In general, we define a behavior as catastrophic if it must never occur in the real world.[1] An input is catastrophic if it would lead to catastrophic behavior. Assume we can construct a catastrophe detector C that tells us if an action M(x) will result in large-scale harm. For the purposes of this example, we will assume both that C has a reasonable chance of catching all catastrophes and that it is feasible to find a useful engineering assistant M that never triggers C (see Catastrophe Detectors for further discussion). We will also assume we can use C to train M, but that it is ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ten arguments that AI is an existential risk, published by KatjaGrace on August 13, 2024 on LessWrong. This is a snapshot of a new page on the AI Impacts Wiki. We've made a list of arguments[1] that AI poses an existential risk to humanity. We'd love to hear how you feel about them in the comments and polls. Competent non-aligned agents Summary: 1. Humans will build AI systems that are 'agents', i.e. they will autonomously pursue goals 2. Humans won't figure out how to make systems with goals that are compatible with human welfare and realizing human values 3. Such systems will be built or selected to be highly competent, and so gain the power to achieve their goals 4. Thus the future will be primarily controlled by AIs, who will direct it in ways that are at odds with long-run human welfare or the realization of human values Selected counterarguments: It is unclear that AI will tend to have goals that are bad for humans There are many forms of power. It is unclear that a competence advantage will ultimately trump all others in time This argument also appears to apply to human groups such as corporations, so we need an explanation of why those are not an existential risk People who have favorably discussed[2] this argument (specific quotes here): Paul Christiano (2021), Ajeya Cotra (2023), Eliezer Yudkowsky (2024), Nick Bostrom (2014[3]). See also: Full wiki page on the competent non-aligned agents argument Second species argument Summary: 1. Human dominance over other animal species is primarily due to humans having superior cognitive and coordination abilities 2. Therefore if another 'species' appears with abilities superior to those of humans, that species will become dominant over humans in the same way 3. AI will essentially be a 'species' with superior abilities to humans 4. Therefore AI will dominate humans Selected counterarguments: Human dominance over other species is plausibly not due to the cognitive abilities of individual humans, but rather because of human ability to communicate and store information through culture and artifacts Intelligence in animals doesn't appear to generally relate to dominance. For instance, elephants are much more intelligent than beetles, and it is not clear that elephants have dominated beetles Differences in capabilities don't necessarily lead to extinction. In the modern world, more powerful countries arguably control less powerful countries, but they do not wipe them out and most colonized countries have eventually gained independence People who have favorably discussed this argument (specific quotes here): Joe Carlsmith (2024), Richard Ngo (2020), Stuart Russell (2020[4]), Nick Bostrom (2015). See also: Full wiki page on the second species argument Loss of control via inferiority Summary: 1. AI systems will become much more competent than humans at decision-making 2. Thus most decisions will probably be allocated to AI systems 3. If AI systems make most decisions, humans will lose control of the future 4. If humans have no control of the future, the future will probably be bad for humans Selected counterarguments: Humans do not generally seem to become disempowered by possession of software that is far superior to them, even if it makes many 'decisions' in the process of carrying out their will In the same way that humans avoid being overpowered by companies, even though companies are more competent than individual humans, humans can track AI trustworthiness and have AI systems compete for them as users. This might substantially mitigate untrustworthy AI behavior People who have favorably discussed this argument (specific quotes here): Paul Christiano (2014), Ajeya Cotra (2023), Richard Ngo (2024). See also: Full wiki page on loss of control via inferiority Loss of control via speed Summary: 1. Advances in AI will produce...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ten arguments that AI is an existential risk, published by KatjaGrace on August 13, 2024 on LessWrong. This is a snapshot of a new page on the AI Impacts Wiki. We've made a list of arguments[1] that AI poses an existential risk to humanity. We'd love to hear how you feel about them in the comments and polls. Competent non-aligned agents Summary: 1. Humans will build AI systems that are 'agents', i.e. they will autonomously pursue goals 2. Humans won't figure out how to make systems with goals that are compatible with human welfare and realizing human values 3. Such systems will be built or selected to be highly competent, and so gain the power to achieve their goals 4. Thus the future will be primarily controlled by AIs, who will direct it in ways that are at odds with long-run human welfare or the realization of human values Selected counterarguments: It is unclear that AI will tend to have goals that are bad for humans There are many forms of power. It is unclear that a competence advantage will ultimately trump all others in time This argument also appears to apply to human groups such as corporations, so we need an explanation of why those are not an existential risk People who have favorably discussed[2] this argument (specific quotes here): Paul Christiano (2021), Ajeya Cotra (2023), Eliezer Yudkowsky (2024), Nick Bostrom (2014[3]). See also: Full wiki page on the competent non-aligned agents argument Second species argument Summary: 1. Human dominance over other animal species is primarily due to humans having superior cognitive and coordination abilities 2. Therefore if another 'species' appears with abilities superior to those of humans, that species will become dominant over humans in the same way 3. AI will essentially be a 'species' with superior abilities to humans 4. Therefore AI will dominate humans Selected counterarguments: Human dominance over other species is plausibly not due to the cognitive abilities of individual humans, but rather because of human ability to communicate and store information through culture and artifacts Intelligence in animals doesn't appear to generally relate to dominance. For instance, elephants are much more intelligent than beetles, and it is not clear that elephants have dominated beetles Differences in capabilities don't necessarily lead to extinction. In the modern world, more powerful countries arguably control less powerful countries, but they do not wipe them out and most colonized countries have eventually gained independence People who have favorably discussed this argument (specific quotes here): Joe Carlsmith (2024), Richard Ngo (2020), Stuart Russell (2020[4]), Nick Bostrom (2015). See also: Full wiki page on the second species argument Loss of control via inferiority Summary: 1. AI systems will become much more competent than humans at decision-making 2. Thus most decisions will probably be allocated to AI systems 3. If AI systems make most decisions, humans will lose control of the future 4. If humans have no control of the future, the future will probably be bad for humans Selected counterarguments: Humans do not generally seem to become disempowered by possession of software that is far superior to them, even if it makes many 'decisions' in the process of carrying out their will In the same way that humans avoid being overpowered by companies, even though companies are more competent than individual humans, humans can track AI trustworthiness and have AI systems compete for them as users. This might substantially mitigate untrustworthy AI behavior People who have favorably discussed this argument (specific quotes here): Paul Christiano (2014), Ajeya Cotra (2023), Richard Ngo (2024). See also: Full wiki page on loss of control via inferiority Loss of control via speed Summary: 1. Advances in AI will produce...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Why so many "racists" at Manifest?, published by Austin on June 18, 2024 on The Effective Altruism Forum. Manifest 2024 is a festival that we organized last weekend in Berkeley. By most accounts, it was a great success. On our feedback form, the average response to "would you recommend to a friend" was a 9.0/10. Reviewers said nice things like "one of the best weekends of my life" and "dinners and meetings and conversations with people building local cultures so achingly beautiful they feel almost like dreams" and "I've always found tribalism mysterious, but perhaps that was just because I hadn't yet found my tribe." Arnold Brooks running a session on Aristotle's Metaphysics. More photos of Manifest here. However, a recent post on The Guardian and review on the EA Forum highlight an uncomfortable fact: we invited a handful of controversial speakers to Manifest, whom these authors call out as "racist". Why did we invite these folks? First: our sessions and guests were mostly not controversial - despite what you may have heard Here's the schedule for Manifest on Saturday: (The largest & most prominent talks are on the left. Full schedule here.) And here's the full list of the 57 speakers we featured on our website: Nate Silver, Luana Lopes Lara, Robin Hanson, Scott Alexander, Niraek Jain-sharma, Byrne Hobart, Aella, Dwarkesh Patel, Patrick McKenzie, Chris Best, Ben Mann, Eliezer Yudkowsky, Cate Hall, Paul Gu, John Phillips, Allison Duettmann, Dan Schwarz, Alex Gajewski, Katja Grace, Kelsey Piper, Steve Hsu, Agnes Callard, Joe Carlsmith, Daniel Reeves, Misha Glouberman, Ajeya Cotra, Clara Collier, Samo Burja, Stephen Grugett, James Grugett, Javier Prieto, Simone Collins, Malcolm Collins, Jay Baxter, Tracing Woodgrains, Razib Khan, Max Tabarrok, Brian Chau, Gene Smith, Gavriel Kleinwaks, Niko McCarty, Xander Balwit, Jeremiah Johnson, Ozzie Gooen, Danny Halawi, Regan Arntz-Gray, Sarah Constantin, Frank Lantz, Will Jarvis, Stuart Buck, Jonathan Anomaly, Evan Miyazono, Rob Miles, Richard Hanania, Nate Soares, Holly Elmore, Josh Morrison. Judge for yourself; I hope this gives a flavor of what Manifest was actually like. Our sessions and guests spanned a wide range of topics: prediction markets and forecasting, of course; but also finance, technology, philosophy, AI, video games, politics, journalism and more. We deliberately invited a wide range of speakers with expertise outside of prediction markets; one of the goals of Manifest is to increase adoption of prediction markets via cross-pollination. Okay, but there sure seemed to be a lot of controversial ones… I was the one who invited the majority (~40/60) of Manifest's special guests; if you want to get mad at someone, get mad at me, not Rachel or Saul or Lighthaven; certainly not the other guests and attendees of Manifest. My criteria for inviting a speaker or special guest was roughly, "this person is notable, has something interesting to share, would enjoy Manifest, and many of our attendees would enjoy hearing from them". Specifically: Richard Hanania - I appreciate Hanania's support of prediction markets, including partnering with Manifold to run a forecasting competition on serious geopolitical topics and writing to the CFTC in defense of Kalshi. (In response to backlash last year, I wrote a post on my decision to invite Hanania, specifically) Simone and Malcolm Collins - I've enjoyed their Pragmatist's Guide series, which goes deep into topics like dating, governance, and religion. I think the world would be better with more kids in it, and thus support pronatalism. I also find the two of them to be incredibly energetic and engaging speakers IRL. Jonathan Anomaly - I attended a talk Dr. Anomaly gave about the state-of-the-art on polygenic embryonic screening. I was very impressed that something long-considered scien...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Big Picture AI Safety: Introduction, published by EuanMcLean on May 23, 2024 on LessWrong. tldr: I conducted 17 semi-structured interviews of AI safety experts about their big picture strategic view of the AI safety landscape: how will human-level AI play out, how things might go wrong, and what should the AI safety community be doing. While many respondents held "traditional" views (e.g. the main threat is misaligned AI takeover), there was more opposition to these standard views than I expected, and the field seems more split on many important questions than someone outside the field may infer. What do AI safety experts believe about the big picture of AI risk? How might things go wrong, what we should do about it, and how have we done so far? Does everybody in AI safety agree on the fundamentals? Which views are consensus, which are contested and which are fringe? Maybe we could learn this from the literature (as in the MTAIR project), but many ideas and opinions are not written down anywhere, they exist only in people's heads and in lunchtime conversations at AI labs and coworking spaces. I set out to learn what the AI safety community believes about the strategic landscape of AI safety. I conducted 17 semi-structured interviews with a range of AI safety experts. I avoided going into any details of particular technical concepts or philosophical arguments, instead focussing on how such concepts and arguments fit into the big picture of what AI safety is trying to achieve. This work is similar to the AI Impacts surveys, Vael Gates' AI Risk Discussions, and Rob Bensinger's existential risk from AI survey. This is different to those projects in that both my approach to interviews and analysis are more qualitative. Part of the hope for this project was that it can hit on harder-to-quantify concepts that are too ill-defined or intuition-based to fit in the format of previous survey work. Questions I asked the participants a standardized list of questions. What will happen? Q1 Will there be a human-level AI? What is your modal guess of what the first human-level AI (HLAI) will look like? I define HLAI as an AI system that can carry out roughly 100% of economically valuable cognitive tasks more cheaply than a human. Q1a What's your 60% or 90% confidence interval for the date of the first HLAI? Q2 Could AI bring about an existential catastrophe? If so, what is the most likely way this could happen? Q2a What's your best guess at the probability of such a catastrophe? What should we do? Q3 Imagine a world where, absent any effort from the AI safety community, an existential catastrophe happens, but actions taken by the AI safety community prevent such a catastrophe. In this world, what did we do to prevent the catastrophe? Q4 What research direction (or other activity) do you think will reduce existential risk the most, and what is its theory of change? Could this backfire in some way? What mistakes have been made? Q5 Are there any big mistakes the AI safety community has made in the past or are currently making? These questions changed gradually as the interviews went on (given feedback from participants), and I didn't always ask the questions exactly as I've presented them here. I asked participants to answer from their internal model of the world as much as possible and to avoid deferring to the opinions of others (their inside view so to speak). Participants Adam Gleave is the CEO and co-founder of the alignment research non-profit FAR AI. (Sept 23) Adrià Garriga-Alonso is a research scientist at FAR AI. (Oct 23) Ajeya Cotra leads Open Philantropy's grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. (Jan 24) Alex Turner is a research scientist at Google DeepMind on the Scalable Alignment team. (Feb 24) Ben Cottie...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Big Picture AI Safety: Introduction, published by EuanMcLean on May 23, 2024 on LessWrong. tldr: I conducted 17 semi-structured interviews of AI safety experts about their big picture strategic view of the AI safety landscape: how will human-level AI play out, how things might go wrong, and what should the AI safety community be doing. While many respondents held "traditional" views (e.g. the main threat is misaligned AI takeover), there was more opposition to these standard views than I expected, and the field seems more split on many important questions than someone outside the field may infer. What do AI safety experts believe about the big picture of AI risk? How might things go wrong, what we should do about it, and how have we done so far? Does everybody in AI safety agree on the fundamentals? Which views are consensus, which are contested and which are fringe? Maybe we could learn this from the literature (as in the MTAIR project), but many ideas and opinions are not written down anywhere, they exist only in people's heads and in lunchtime conversations at AI labs and coworking spaces. I set out to learn what the AI safety community believes about the strategic landscape of AI safety. I conducted 17 semi-structured interviews with a range of AI safety experts. I avoided going into any details of particular technical concepts or philosophical arguments, instead focussing on how such concepts and arguments fit into the big picture of what AI safety is trying to achieve. This work is similar to the AI Impacts surveys, Vael Gates' AI Risk Discussions, and Rob Bensinger's existential risk from AI survey. This is different to those projects in that both my approach to interviews and analysis are more qualitative. Part of the hope for this project was that it can hit on harder-to-quantify concepts that are too ill-defined or intuition-based to fit in the format of previous survey work. Questions I asked the participants a standardized list of questions. What will happen? Q1 Will there be a human-level AI? What is your modal guess of what the first human-level AI (HLAI) will look like? I define HLAI as an AI system that can carry out roughly 100% of economically valuable cognitive tasks more cheaply than a human. Q1a What's your 60% or 90% confidence interval for the date of the first HLAI? Q2 Could AI bring about an existential catastrophe? If so, what is the most likely way this could happen? Q2a What's your best guess at the probability of such a catastrophe? What should we do? Q3 Imagine a world where, absent any effort from the AI safety community, an existential catastrophe happens, but actions taken by the AI safety community prevent such a catastrophe. In this world, what did we do to prevent the catastrophe? Q4 What research direction (or other activity) do you think will reduce existential risk the most, and what is its theory of change? Could this backfire in some way? What mistakes have been made? Q5 Are there any big mistakes the AI safety community has made in the past or are currently making? These questions changed gradually as the interviews went on (given feedback from participants), and I didn't always ask the questions exactly as I've presented them here. I asked participants to answer from their internal model of the world as much as possible and to avoid deferring to the opinions of others (their inside view so to speak). Participants Adam Gleave is the CEO and co-founder of the alignment research non-profit FAR AI. (Sept 23) Adrià Garriga-Alonso is a research scientist at FAR AI. (Oct 23) Ajeya Cotra leads Open Philantropy's grantmaking on technical research that could help to clarify and reduce catastrophic risks from advanced AI. (Jan 24) Alex Turner is a research scientist at Google DeepMind on the Scalable Alignment team. (Feb 24) Ben Cottie...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Now THIS is forecasting: understanding Epoch's Direct Approach, published by Elliot Mckernon on May 5, 2024 on The Effective Altruism Forum. Happy May the 4th from Convergence Analysis! Cross-posted on LessWrong. As part of Convergence Analysis's scenario research, we've been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook. We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting. In what follows, we set out our interpretation of Epoch's 'Direct Approach' to forecasting the arrival of transformative AI (TAI). We're eager to see how closely our understanding of this matches others'. We've also fiddled with Epoch's interactive model and include some findings on its sensitivity to plausible changes in parameters. The Epoch team recently attempted to replicate DeepMind's influential Chinchilla scaling law, an important quantitative input to Epoch's forecasting model, but found inconsistencies in DeepMind's presented data. We'll summarise these findings and explore how an improved model might affect Epoch's forecasting results. This is where the fun begins (the assumptions) The goal of Epoch's Direct Approach is to quantitatively predict the progress of AI capabilities. The approach is 'direct' in the sense that it uses observed scaling laws and empirical measurements to directly predict performance improvements as computing power increases. This stands in contrast to indirect techniques, which instead seek to estimate a proxy for performance. A notable example is Ajeya Cotra's Biological Anchors model, which approximates AI performance improvements by appealing to analogies between AIs and human brains. Both of these approaches are discussed and compared, along with expert surveys and other forecasting models, in Zershaaneh Qureshi's recent post, Timelines to Transformative AI: an investigation. In their blog post, Epoch summarises the Direct Approach as follows: The Direct Approach is our name for the idea of forecasting AI timelines by directly extrapolating and interpreting the loss of machine learning models as described by scaling laws. Let's start with scaling laws. Generally, these are just numerical relationships between two quantities, but in machine learning they specifically refer to the various relationships between a model's size, the amount of data it was trained with, its cost of training, and its performance. These relationships seem to fit simple mathematical trends, and so we can use them to make predictions: if we make the model twice as big - give it twice as much 'compute' - how much will its performance improve? Does the answer change if we use less training data? And so on. If we combine these relationships with projections of how much compute AI developers will have access to at certain times in the future, we can build a model which predicts when AI will cross certain performance thresholds. Epoch, like Convergence, is interested in when we'll see the emergence of transformative AI (TAI): AI powerful enough to revolutionise our society at a scale comparable to the agricultural and industrial revolutions. To understand why Convergence is especially interested in that milestone, see our recent post 'Transformative AI and Scenario Planning for AI X-risk'. Specifically, Epoch uses an empirically m...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Now THIS is forecasting: understanding Epoch's Direct Approach, published by Elliot Mckernon on May 4, 2024 on LessWrong. Happy May the 4th from Convergence Analysis! Cross-posted on the EA Forum. As part of Convergence Analysis's scenario research, we've been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook. We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting. In what follows, we set out our interpretation of Epoch's 'Direct Approach' to forecasting the arrival of transformative AI (TAI). We're eager to see how closely our understanding of this matches others'. We've also fiddled with Epoch's interactive model and include some findings on its sensitivity to plausible changes in parameters. The Epoch team recently attempted to replicate DeepMind's influential Chinchilla scaling law, an important quantitative input to Epoch's forecasting model, but found inconsistencies in DeepMind's presented data. We'll summarise these findings and explore how an improved model might affect Epoch's forecasting results. This is where the fun begins (the assumptions) The goal of Epoch's Direct Approach is to quantitatively predict the progress of AI capabilities. The approach is 'direct' in the sense that it uses observed scaling laws and empirical measurements to directly predict performance improvements as computing power increases. This stands in contrast to indirect techniques, which instead seek to estimate a proxy for performance. A notable example is Ajeya Cotra's Biological Anchors model, which approximates AI performance improvements by appealing to analogies between AIs and human brains. Both of these approaches are discussed and compared, along with expert surveys and other forecasting models, in Zershaaneh Qureshi's recent post, Timelines to Transformative AI: an investigation. In their blog post, Epoch summarises the Direct Approach as follows: The Direct Approach is our name for the idea of forecasting AI timelines by directly extrapolating and interpreting the loss of machine learning models as described by scaling laws. Let's start with scaling laws. Generally, these are just numerical relationships between two quantities, but in machine learning they specifically refer to the various relationships between a model's size, the amount of data it was trained with, its cost of training, and its performance. These relationships seem to fit simple mathematical trends, and so we can use them to make predictions: if we make the model twice as big - give it twice as much 'compute' - how much will its performance improve? Does the answer change if we use less training data? And so on. If we combine these relationships with projections of how much compute AI developers will have access to at certain times in the future, we can build a model which predicts when AI will cross certain performance thresholds. Epoch, like Convergence, is interested in when we'll see the emergence of transformative AI (TAI): AI powerful enough to revolutionise our society at a scale comparable to the agricultural and industrial revolutions. To understand why Convergence is especially interested in that milestone, see our recent post 'Transformative AI and Scenario Planning for AI X-risk'. Specifically, Epoch uses an empirically measured scaling ...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Now THIS is forecasting: understanding Epoch's Direct Approach, published by Elliot Mckernon on May 4, 2024 on LessWrong. Happy May the 4th from Convergence Analysis! Cross-posted on the EA Forum. As part of Convergence Analysis's scenario research, we've been looking into how AI organisations, experts, and forecasters make predictions about the future of AI. In February 2023, the AI research institute Epoch published a report in which its authors use neural scaling laws to make quantitative predictions about when AI will reach human-level performance and become transformative. The report has a corresponding blog post, an interactive model, and a Python notebook. We found this approach really interesting, but also hard to understand intuitively. While trying to follow how the authors derive a forecast from their assumptions, we wrote a breakdown that may be useful to others thinking about AI timelines and forecasting. In what follows, we set out our interpretation of Epoch's 'Direct Approach' to forecasting the arrival of transformative AI (TAI). We're eager to see how closely our understanding of this matches others'. We've also fiddled with Epoch's interactive model and include some findings on its sensitivity to plausible changes in parameters. The Epoch team recently attempted to replicate DeepMind's influential Chinchilla scaling law, an important quantitative input to Epoch's forecasting model, but found inconsistencies in DeepMind's presented data. We'll summarise these findings and explore how an improved model might affect Epoch's forecasting results. This is where the fun begins (the assumptions) The goal of Epoch's Direct Approach is to quantitatively predict the progress of AI capabilities. The approach is 'direct' in the sense that it uses observed scaling laws and empirical measurements to directly predict performance improvements as computing power increases. This stands in contrast to indirect techniques, which instead seek to estimate a proxy for performance. A notable example is Ajeya Cotra's Biological Anchors model, which approximates AI performance improvements by appealing to analogies between AIs and human brains. Both of these approaches are discussed and compared, along with expert surveys and other forecasting models, in Zershaaneh Qureshi's recent post, Timelines to Transformative AI: an investigation. In their blog post, Epoch summarises the Direct Approach as follows: The Direct Approach is our name for the idea of forecasting AI timelines by directly extrapolating and interpreting the loss of machine learning models as described by scaling laws. Let's start with scaling laws. Generally, these are just numerical relationships between two quantities, but in machine learning they specifically refer to the various relationships between a model's size, the amount of data it was trained with, its cost of training, and its performance. These relationships seem to fit simple mathematical trends, and so we can use them to make predictions: if we make the model twice as big - give it twice as much 'compute' - how much will its performance improve? Does the answer change if we use less training data? And so on. If we combine these relationships with projections of how much compute AI developers will have access to at certain times in the future, we can build a model which predicts when AI will cross certain performance thresholds. Epoch, like Convergence, is interested in when we'll see the emergence of transformative AI (TAI): AI powerful enough to revolutionise our society at a scale comparable to the agricultural and industrial revolutions. To understand why Convergence is especially interested in that milestone, see our recent post 'Transformative AI and Scenario Planning for AI X-risk'. Specifically, Epoch uses an empirically measured scaling ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Timelines to Transformative AI: an investigation, published by Zershaaneh Qureshi on March 26, 2024 on The Effective Altruism Forum. This post is part of a series by Convergence Analysis' AI Clarity team. Justin Bullock and Elliot Mckernon have recently motivated AI Clarity's focus on the notion of transformative AI (TAI). In an earlier post, Corin Katzke introduced a framework for applying scenario planning methods to AI safety, including a discussion of strategic parameters involved in AI existential risk. In this post, I focus on a specific parameter: the timeline to TAI. Subsequent posts will explore 'short' timelines to transformative AI in more detail. Feedback and discussion are welcome. Summary In this post, I gather, compare, and investigate a range of notable recent predictions of the timeline to transformative AI (TAI). Over the first three sections, I map out a bird's eye view of the current landscape of predictions, highlight common assumptions about scaling which influence many of the surveyed views, then zoom in closer to examine two specific examples of quantitative forecast models for the arrival of TAI (from Ajeya Cotra and Epoch). Over the final three sections, I find that: A majority of recent median predictions for the arrival of TAI fall within the next 10-40 years. This is a notable result given the vast possible space of timelines, but rough similarities between forecasts should be treated with some epistemic caution in light of phenomena such as Platt's Law and information cascades. In the last few years, people generally seem to be updating their beliefs in the direction of shorter timelines to TAI. There are important questions over how the significance of this very recent trend should be interpreted within the wider historical context of AI timeline predictions, which have been quite variable over time and across sources. Despite difficulties in obtaining a clean overall picture here, each individual example of belief updates still has some evidentiary weight in its own right. There is also some conceptual support in favour of TAI timelines which fall on the shorter end of the spectrum. This comes partly in the form of the plausible assumption that the scaling hypothesis will continue to hold. However, there are several possible flaws in reasoning which may underlie prevalent beliefs about TAI timelines, and we should therefore take care to avoid being overconfident in our predictions. Weighing these points up against potential objections, the evidence still appears sufficient to warrant (1) conducting serious further research into short timeline scenarios and (2) affording real importance to these scenarios in our strategic preparation efforts. Introduction The timeline for the arrival of advanced AI is a key consideration for AI safety and governance. It is a critical determinant of the threat models we are likely to face, the magnitude of those threats, and the appropriate strategies for mitigating them. Recent years have seen growing discourse around the question of what AI timelines we should expect and prepare for. At a glance, the dialogue is filled with contention: some anticipate rapid progression towards advanced AI, and therefore advocate for urgent action; others are highly sceptical that we'll see significant progress in our lifetimes; many views fall somewhere in between these poles, with unclear strategic implications. The dialogue is also evolving, as AI research and development progresses in new and sometimes unexpected ways. Overall, the body of evidence this constitutes is in need of clarification and interpretation. This article is an effort to navigate the rough terrain of AI timeline predictions. Specifically: Section I collects and loosely compares a range of notable, recent predictions on AI timelines (taken from su...
Our next SF event is AI UX 2024 - let's see the new frontier for UX since last year! Last call: we are recording a preview of the AI Engineer World's Fair with swyx and Ben Dunphy, send any questions about Speaker CFPs and Sponsor Guides you have!Alessio is now hiring engineers for a new startup he is incubating at Decibel: Ideal candidate is an “ex-technical co-founder type”. Reach out to him for more!David Luan has been at the center of the modern AI revolution: he was the ~30th hire at OpenAI, he led Google's LLM efforts and co-led Google Brain, and then started Adept in 2022, one of the leading companies in the AI agents space. In today's episode, we asked David for some war stories from his time in early OpenAI (including working with Alec Radford ahead of the GPT-2 demo with Sam Altman, that resulted in Microsoft's initial $1b investment), and how Adept is building agents that can “do anything a human does on a computer" — his definition of useful AGI.Why Google *couldn't* make GPT-3While we wanted to discuss Adept, we couldn't talk to a former VP Eng of OpenAI and former LLM tech lead at Google Brain and not ask about the elephant in the room. It's often asked how Google had such a huge lead in 2017 with Vaswani et al creating the Transformer and Noam Shazeer predicting trillion-parameter models and yet it was David's team at OpenAI who ended up making GPT 1/2/3. David has some interesting answers:“So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized…what they (should) have done would be say, hey, Noam Shazeer, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too…You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing. He's got this decoder only transformer that's probably going to get there before we do. And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why. At the time, there was a thing called the Brain Credit Marketplace. Everyone's assigned a credit. So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused.”Cloning HGI for AGIHuman intelligence got to where it is today through evolution. Some argue that to get to AGI, we will approximate all the “FLOPs” that went into that process, an approach most famously mapped out by Ajeya Cotra's Biological Anchors report:The early days of OpenAI were very reinforcement learning-driven with the Dota project, but that's a very inefficient way for these models to re-learn everything. (Kanjun from Imbue shared similar ideas in her episode).David argues that there's a shortcut. We can bootstrap from existing intelligence.“Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there… I think we are ignoring the fact that you have a giant shortcut, which is you can behaviorally clone everything humans already know. And that's what we solved with LLMs!”LLMs today basically model intelligence using all (good!) written knowledge (see our Datasets 101 episode), and have now expanded to non-verbal knowledge (see our HuggingFace episode on multimodality). The SOTA self-supervised pre-training process is surprisingly data-efficient in taking large amounts of unstructured data, and approximating reasoning without overfitting.But how do you cross the gap from the LLMs of today to building the AGI we all want? This is why David & friends left to start Adept.“We believe the clearest framing of general intelligence is a system that can do anything a human can do in front of a computer. A foundation model for actions, trained to use every software tool, API, and webapp that exists, is a practical path to this ambitious goal” — ACT-1 BlogpostCritical Path: Abstraction with ReliabilityThe AGI dream is fully autonomous agents, but there are levels to autonomy that we are comfortable giving our agents, based on how reliable they are. In David's word choice, we always want higher levels of “abstractions” (aka autonomy), but our need for “reliability” is the practical limit on how high of an abstraction we can use.“The critical path for Adept is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that.”We saw how Adept thinks about different levels of abstraction at the 2023 Summit:The highest abstraction is the “AI Employee”, but we'll get there with “AI enabled employees”. Alessio recently gave a talk about the future of work with “services as software” at this week's Nvidia GTC (slides).No APIsUnlike a lot of large research labs, Adept's framing of AGI as "being able to use your computer like a human" carries with it a useful environmental constraint:“Having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path (to economic value).”This realization and conviction means that multimodal modals are the way to go. Instead of using function calling to call APIs to build agents, which is what OpenAI and most of the open LLM industry have done to date, Adept wants to “drive by vision”, (aka see the screen as a human sees it) and pinpoint where to click and type as a human does. No APIs needed, because most software don't expose APIs.Extra context for readers: You can see the DeepMind SIMA model in the same light: One system that learned to play a diverse set of games (instead of one dedicated model per game) using only pixel inputs and keyboard-and-mouse action outputs!The OpenInterpreter team is working on a “Computer API” that also does the same.To do this, Adept had to double down on a special kind of multimodality for knowledge work:“A giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents……I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera… (but) where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so Adept spent a lot of time building that.”With this context, you can now understand the full path of Adept's public releases:* ACT-1 (Sept 2022): a large Transformers model optimized for browser interactions. It has a custom rendering of the browser viewport that allows it to better understand it and take actions.* Persimmon-8B (Sept 2023): a permissive open LLM (weights and code here)* Fuyu-8B (Oct 2023): a small version of the multimodal model that powers Adept. Vanilla decoder-only transformer with no specialized image encoder, which allows it to handle input images of varying resolutions without downsampling.* Adept Experiments (Nov 2023): A public tool to build automations in the browser. This is powered by Adept's core technology but it's just a piece of their enterprise platform. They use it as a way to try various design ideas.* Fuyu Heavy (Jan 2024) - a new multimodal model designed specifically for digital agents and the world's third-most-capable multimodal model (beating Gemini Pro on MMMU, AI2D, and ChartQA), “behind only GPT4-V and Gemini Ultra, which are 10-20 times bigger”The Fuyu-8B post in particular exhibits a great number of examples on knowledge work multimodality:Why Adept is NOT a Research LabWith OpenAI now worth >$90b and Anthropic >$18b, it is tempting to conclude that the AI startup metagame is to build a large research lab, and attract the brightest minds and highest capital to build AGI. Our past guests (see the Humanloop episode) and (from Imbue) combined to ask the most challenging questions of the pod - with David/Adept's deep research pedigree from Deepmind and OpenAI, why is Adept not building more general foundation models (like Persimmon) and playing the academic benchmarks game? Why is Adept so focused on commercial agents instead?“I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from “Can we make a better agent”…… I think pure play foundation model companies are just going to be pinched by how good the next couple of (Meta Llama models) are going to be… And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.”and the commercial grounding is his answer to Kanjun too (whom we also asked the inverse question to compare with Adept):“… the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build AGI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations are not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals.. I think that's a degree of practicality that really helps.”And his customers seem pretty happy, because David didn't need to come on to do a sales pitch:David: “One of the things we haven't shared before is we're completely sold out for Q1.”Swyx: “Sold out of what?”David: “Sold out of bandwidth to onboard more customers.”Well, that's a great problem to have.Show Notes* David Luan* Dextro at Data Driven NYC (2015)* Adept* ACT-1* Persimmon-8B* Adept Experiments* Fuyu-8B* $350M Series B announcement* Amelia Wattenberger talk at AI Engineer Summit* FigureChapters* [00:00:00] Introductions* [00:01:14] Being employee #30 at OpenAI and its early days* [00:13:38] What is Adept and how do you define AGI?* [00:21:00] Adept's critical path and research directions* [00:26:23] How AI agents should interact with software and impact product development* [00:30:37] Analogies between AI agents and self-driving car development* [00:32:42] Balancing reliability, cost, speed and generality in AI agents* [00:37:30] Potential of foundation models for robotics* [00:39:22] Core research questions and reasons to work at AdeptTranscriptsAlessio [00:00:00]: Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.Swyx [00:00:15]: Hey, and today we have David Luan, CEO, co-founder of Adept in the studio. Welcome.David [00:00:20]: Yeah, thanks for having me.Swyx [00:00:21]: Been a while in the works. I've met you socially at one of those VC events and you said that you were interested in coming on and glad we finally were able to make this happen.David: Yeah, happy to be part of it.Swyx: So we like to introduce the speaker and then also just like have you talk a little bit about like what's not on your LinkedIn, what people should just generally know about you. You started a company in college, which was the first sort of real time video detection classification API that was Dextro, and that was your route to getting acquired into Axon where you're a director of AI. Then you were the 30th hire at OpenAI?David [00:00:53]: Yeah, 30, 35, something around there. Something like that.Swyx [00:00:56]: So you were VP of Eng for two and a half years to two years, briefly served as tech lead of large models at Google, and then in 2022 started Adept. So that's the sort of brief CV. Is there anything else you like want to fill in the blanks or like people should know more about?David [00:01:14]: I guess a broader story was I joined OpenAI fairly early and I did that for about two and a half to three years leading engineering there. It's really funny, I think second or third day of my time at OpenAI, Greg and Ilya pulled me in a room and we're like, you know, you should take over our directs and we'll go mostly do IC work. So that was fun, just coalescing a bunch of teams out of a couple of early initiatives that had already happened. The company, the Dota effort was going pretty hard and then more broadly trying to put bigger picture direction around what we were doing with basic research. So I spent a lot of time doing that. And then I led Google's LLM efforts, but also co-led Google Brain was one of the brain leads more broadly. You know, there's been a couple of different eras of AI research, right? If we count everything before 2012 as prehistory, which people hate it when I say that, kind of had this like you and your three best friends write a research paper that changes the world period from like 2012 to 2017. And I think the game changed in 2017 and like most labs didn't realize it, but we at OpenAI really did. I think in large part helped by like Ilya's constant beating of the drum that the world would be covered in data centers. And I think-Swyx [00:02:15]: It's causally neat.David [00:02:16]: Yeah. Well, like I think we had conviction in that, but it wasn't until we started seeing results that it became clear that that was where we had to go. But also part of it as well was for OpenAI, like when I first joined, I think one of the jobs that I had to do was how do I tell a differentiated vision for who we were technically compared to, you know, hey, we're just smaller Google Brain, or like you work at OpenAI if you live in SF and don't want to commute to Mountain View or don't want to live in London, right? That's like not enough to like hang your technical identity as a company. And so what we really did was, and I spent a lot of time pushing this, is just how do we get ourselves focused on a certain class of like giant swings and bets, right? Like how do you flip the script from you just do bottom-up research to more about how do you like leave some room for that, but really make it about like, what are the big scientific outcomes that you want to show? And then you just solve them at all costs, whether or not you care about novelty and all that stuff. And that became the dominant model for a couple of years, right? And then what's changed now is I think the number one driver of AI products over the next couple of years is going to be the deep co-design and co-evolution of product and users for feedback and actual technology. And I think labs, every tool to go do that are going to do really well. And that's a big part of why I started Adept.Alessio [00:03:20]: You mentioned Dota, any memories thinking from like the switch from RL to Transformers at the time and kind of how the industry was evolving more in the LLM side and leaving behind some of the more agent simulation work?David [00:03:33]: Like zooming way out, I think agents are just absolutely the correct long-term direction, right? You just go to find what AGI is, right? You're like, Hey, like, well, first off, actually, I don't love AGI definitions that involve human replacement because I don't think that's actually how it's going to happen. Even this definition of like, Hey, AGI is something that outperforms humans at economically valuable tasks is kind of implicit view of the world about what's going to be the role of people. I think what I'm more interested in is like a definition of AGI that's oriented around like a model that can do anything a human can do on a computer. If you go think about that, which is like super tractable, then agent is just a natural consequence of that definition. And so what did all the work we did on our own stuff like that get us was it got us a really clear formulation. Like you have a goal and you want to maximize the goal, you want to maximize reward, right? And the natural LLM formulation doesn't come with that out of the box, right? I think that we as a field got a lot right by thinking about, Hey, how do we solve problems of that caliber? And then the thing we forgot is the Novo RL is like a pretty terrible way to get there quickly. Why are we rediscovering all the knowledge about the world? Years ago, I had a debate with a Berkeley professor as to what will it actually take to build AGI. And his view is basically that you have to reproduce all the flops that went into evolution in order to be able to get there. Right.Swyx [00:04:44]: The biological basis theory. Right.David [00:04:46]: So I think we are ignoring the fact that you have a giant shortcut, which is you can behavioral clone everything humans already know. And that's what we solved with LLMs. We've solved behavioral cloning, everything that humans already know. Right. So like today, maybe LLMs is like behavioral cloning every word that gets written on the internet in the future, the multimodal models are becoming more of a thing where behavioral cloning the visual world. But really, what we're just going to have is like a universal byte model, right? Where tokens of data that have high signal come in, and then all of those patterns are like learned by the model. And then you can regurgitate any combination now. Right. So text into voice out, like image into other image out or video out or whatever, like these like mappings, right? Like all just going to be learned by this universal behavioral cloner. And so I'm glad we figured that out. And I think now we're back to the era of how do we combine this with all of the lessons we learned during the RL period. That's what's going to drive progress.Swyx [00:05:35]: I'm still going to pressure you for a few more early opening stories before we turn to the ADET stuff. On your personal site, which I love, because it's really nice, like personal, you know, story context around like your history. I need to update it. It's so old. Yeah, it's so out of date. But you mentioned GPT-2. Did you overlap with GPT-1? I think you did, right?David [00:05:53]: I actually don't quite remember. I think I was joining right around- Right around then?Swyx [00:05:57]: I was right around that, yeah. Yeah. So what I remember was Alec, you know, just kind of came in and was like very obsessed with Transformers and applying them to like Reddit sentiment analysis. Yeah, sentiment, that's right. Take us through-David [00:06:09]: Sentiment neuron, all this stuff.Swyx [00:06:10]: The history of GPT as far as you know, you know, according to you. Ah, okay.David [00:06:14]: History of GPT, according to me, that's a pretty good question. So I think the real story of GPT starts at Google, of course, right? Because that's where Transformers sort of came about. However, the number one shocking thing to me was that, and this is like a consequence of the way that Google is organized, where like, again, you and your three best friends write papers, right? Okay. So zooming way out, right? I think about my job when I was a full-time research leader as a little bit of a portfolio allocator, right? So I've got really, really smart people. My job is to convince people to coalesce around a small number of really good ideas and then run them over the finish line. My job is not actually to promote a million ideas and never have critical mass. And then as the ideas start coming together and some of them start working well, my job is to nudge resources towards the things that are really working and then start disbanding some of the things that are not working, right? That muscle did not exist during my time at Google. And I think had they had it, what they would have done would be say, hey, Noam Shazir, you're a brilliant guy. You know how to scale these things up. Here's half of all of our TPUs. And then I think they would have destroyed us. He clearly wanted it too.Swyx [00:07:17]: He's talking about trillion parameter models in 2017.David [00:07:20]: Yeah. So that's the core of the GPT story, right? Which is that, and I'm jumping around historically, right? But after GPT-2, we were all really excited about GPT-2. I can tell you more stories about that. It was the last paper that I even got to really touch before everything became more about building a research org. You know, every day we were scaling up GPT-3, I would wake up and just be stressed. And I was stressed because, you know, you just look at the facts, right? Google has all this compute. Google has all the people who invented all of these underlying technologies. There's a guy named Noam who's really smart, who's already gone and done this talk about how he wants a trillion parameter model. And I'm just like, we're probably just doing duplicative research to what he's doing, right? He's got this decoder only transformer that's probably going to get there before we do. And I was like, but like, please just like let this model finish, right? And it turned out the whole time that they just couldn't get critical mass. So during my year where I led the Google LM effort and I was one of the brain leads, you know, it became really clear why, right? At the time, there was a thing called the brain credit marketplace. And did you guys know the brain credit marketplace? No, I never heard of this. Oh, so it's actually, it's a, you can ask any Googler.Swyx [00:08:23]: It's like just like a thing that, that, I mean, look like, yeah, limited resources, you got to have some kind of marketplace, right? You know, sometimes it's explicit, sometimes it isn't, you know, just political favors.David [00:08:34]: You could. And so then basically everyone's assigned a credit, right? So if you have a credit, you get to buy end chips according to supply and demand. So if you want to go do a giant job, you had to convince like 19 or 20 of your colleagues not to do work. And if that's how it works, it's really hard to get that bottom up critical mass to go scale these things. And the team at Google were fighting valiantly, but we were able to beat them simply because we took big swings and we focused. And I think, again, that's like part of the narrative of like this phase one of AI, right? Of like this modern AI era to phase two. And I think in the same way, I think phase three company is going to out execute phase two companies because of the same asymmetry of success.Swyx [00:09:12]: Yeah. I think it's underrated how much NVIDIA works with you in the early days as well. I think maybe, I think it was Jensen. I'm not sure who circulated a recent photo of him delivering the first DGX to you guys.David [00:09:24]: I think Jensen has been a complete legend and a mastermind throughout. I have so much respect for NVIDIA. It is unreal.Swyx [00:09:34]: But like with OpenAI, like kind of give their requirements, like co-design it or just work of whatever NVIDIA gave them.David [00:09:40]: So we work really closely with them. There's, I'm not sure I can share all the stories, but examples of ones that I've found particularly interesting. So Scott Gray is amazing. I really like working with him. He was on one of my teams, the supercomputing team, which Chris Berner runs and Chris Berner still does a lot of stuff in that. As a result, like we had very close ties to NVIDIA. Actually, one of my co-founders at Adept, Eric Elson, was also one of the early GPGPU people. So he and Scott and Brian Catanzaro at NVIDIA and Jonah and Ian at NVIDIA, I think all were very close. And we're all sort of part of this group of how do we push these chips to the absolute limit? And I think that kind of collaboration helped quite a bit. I think one interesting set of stuff is knowing the A100 generation, that like quad sparsity was going to be a thing. Is that something that we want to go look into, right? And figure out if that's something that we could actually use for model training. Really what it boils down to is that, and I think more and more people realize this, six years ago, people, even three years ago, people refused to accept it. This era of AI is really a story of compute. It's really the story of how do you more efficiently map actual usable model flops to compute,Swyx [00:10:38]: Is there another GPT 2, 3 story that you love to get out there that you think is underappreciated for the amount of work that people put into it?David [00:10:48]: So two interesting GPT 2 stories. One of them was I spent a good bit of time just sprinting to help Alec get the paper out. And I remember one of the most entertaining moments was we were writing the modeling section. And I'm pretty sure the modeling section was the shortest modeling section of any ML, reasonably legitimate ML paper to that moment. It was like section three model. This is a standard vanilla decoder only transformer with like these particular things, those paragraph long if I remember correctly. And both of us were just looking at the same being like, man, the OGs in the field are going to hate this. They're going to say no novelty. Why did you guys do this work? So now it's funny to look at in hindsight that it was pivotal kind of paper, but I think it was one of the early ones where we just leaned fully into all we care about is solving problems in AI and not about, hey, is there like four different really simple ideas that are cloaked in mathematical language that doesn't actually help move the field forward?Swyx [00:11:42]: Right. And it's like you innovate on maybe like data set and scaling and not so much the architecture.David [00:11:48]: We all know how it works now, right? Which is that there's a collection of really hard won knowledge that you get only by being at the frontiers of scale. And that hard won knowledge, a lot of it's not published. A lot of it is stuff that's actually not even easily reducible to what looks like a typical academic paper. But yet that's the stuff that helps differentiate one scaling program from another. You had a second one? So the second one is, there's like some details here that I probably shouldn't fully share, but hilariously enough for the last meeting we did with Microsoft before Microsoft invested in OpenAI, Sam Altman, myself and our CFO flew up to Seattle to do the final pitch meeting. And I'd been a founder before. So I always had a tremendous amount of anxiety about partner meetings, which this basically this is what it was. I had Kevin Scott and Satya and Amy Hood, and it was my job to give the technical slides about what's the path to AGI, what's our research portfolio, all of this stuff, but it was also my job to give the GPT-2 demo. We had a slightly bigger version of GPT-2 that we had just cut maybe a day or two before this flight up. And as we all know now, model behaviors you find predictable at one checkpoint are not predictable in another checkpoint. And so I'd spent all this time trying to figure out how to keep this thing on rails. I had my canned demos, but I knew I had to go turn it around over to Satya and Kevin and let them type anything in. And that just, that really kept me up all night.Swyx [00:13:06]: Nice. Yeah.Alessio [00:13:08]: I mean, that must have helped you talking about partners meeting. You raised $420 million for Adept. The last round was a $350 million Series B, so I'm sure you do great in partner meetings.Swyx [00:13:18]: Pitchers meetings. Nice.David [00:13:20]: No, that's a high compliment coming from a VC.Alessio [00:13:22]: Yeah, no, I mean, you're doing great already for us. Let's talk about Adept. And we were doing pre-prep and you mentioned that maybe a lot of people don't understand what Adept is. So usually we try and introduce the product and then have the founders fill in the blanks, but maybe let's do the reverse. Like what is Adept? Yeah.David [00:13:38]: So I think Adept is the least understood company in the broader space of foundational models plus agents. So I'll give some color and I'll explain what it is and I'll explain also why it's actually pretty different from what people would have guessed. So the goal for Adept is we basically want to build an AI agent that can do, that can basically help humans do anything a human does on a computer. And so what that really means is we want this thing to be super good at turning natural language like goal specifications right into the correct set of end steps and then also have all the correct sensors and actuators to go get that thing done for you across any software tool that you already use. And so the end vision of this is effectively like I think in a couple of years everyone's going to have access to like an AI teammate that they can delegate arbitrary tasks to and then also be able to, you know, use it as a sounding board and just be way, way, way more productive. Right. And just changes the shape of every job from something where you're mostly doing execution to something where you're mostly actually doing like these core liberal arts skills of what should I be doing and why. Right. And I find this like really exciting and motivating because I think it's actually a pretty different vision for how AGI will play out. I think systems like Adept are the most likely systems to be proto-AGIs. But I think the ways in which we are really counterintuitive to everybody is that we've actually been really quiet because we are not a developer company. We don't sell APIs. We don't sell open source models. We also don't sell bottom up products. We're not a thing that you go and click and download the extension and like we want more users signing up for that thing. We're actually an enterprise company. So what we do is we work with a range of different companies, some like late stage multi-thousand people startups, some fortune 500s, et cetera. And what we do for them is we basically give them an out of the box solution where big complex workflows that their employees do every day could be delegated to the model. And so we look a little different from other companies in that in order to go build this full agent thing, the most important thing you got to get right is reliability. So initially zooming way back when, one of the first things that DEP did was we released this demo called Act One, right? Act One was like pretty cool. It's like kind of become a hello world thing for people to show agent demos by going to Redfin and asking to buy a house somewhere because like we did that in the original Act One demo and like showed that, showed like Google Sheets, all this other stuff. Over the last like year since that has come out, there's been a lot of really cool demos and you go play with them and you realize they work 60% of the time. But since we've always been focused on how do we build an amazing enterprise product, enterprises can't use anything that isn't in the nines of reliability. And so we've actually had to go down a slightly different tech tree than what you might find in the prompt engineering sort of plays in the agent space to get that reliability. And we've decided to prioritize reliability over all else. So like one of our use cases is crazy enough that it actually ends with a physical truck being sent to a place as the result of the agent workflow. And if you're like, if that works like 60% of the time, you're just blowing money and poor truck drivers going places.Alessio [00:16:30]: Interesting. One of the, our investment teams has this idea of services as software. I'm actually giving a talk at NVIDIA GTC about this, but basically software as a service, you're wrapping user productivity in software with agents and services as software is replacing things that, you know, you would ask somebody to do and the software just does it for you. When you think about these use cases, do the users still go in and look at the agent kind of like doing the things and can intervene or like are they totally removed from them? Like the truck thing is like, does the truck just show up or are there people in the middle checking in?David [00:17:04]: I think there's two current flaws in the framing for services as software, or I think what you just said. I think that one of them is like in our experience, as we've been rolling out Adept, the people who actually do the jobs are the most excited about it because they don't go from, I do this job to, I don't do this job. They go from, I do this job for everything, including the shitty rote stuff to I'm a supervisor. And I literally like, it's pretty magical when you watch the thing being used because now it parallelizes a bunch of the things that you had to do sequentially by hand as a human. And you can just click into any one of them and be like, Hey, I want to watch the trajectory that the agent went through to go solve this. And the nice thing about agent execution as opposed to like LLM generations is that a good chunk of the time when the agent fails to execute, it doesn't give you the wrong result. It just fails to execute. And the whole trajectory is just broken and dead and the agent knows it, right? So then those are the ones that the human then goes and solves. And so then they become a troubleshooter. They work on the more challenging stuff. They get way, way more stuff done and they're really excited about it. I think the second piece of it that we've found is our strategy as a company is to always be an augmentation company. And I think one out of principle, that's something we really care about. But two, actually, if you're framing yourself as an augmentation company, you're always going to live in a world where you're solving tasks that are a little too hard for what the model can do today and still needs a human to provide oversight, provide clarifications, provide human feedback. And that's how you build a data flywheel. That's how you actually learn from the smartest humans how to solve things models can't do today. And so I actually think that being an augmentation company forces you to go develop your core AI capabilities faster than someone who's saying, ah, okay, my job is to deliver you a lights off solution for X.Alessio [00:18:42]: Yeah. It's interesting because we've seen two parts of the market. One is we have one company that does agents for SOC analysts. People just don't have them, you know, and just they cannot attract the talent to do it. And similarly, in a software development, you have Copilot, which is the augmentation product, and then you have sweep.dev and you have these products, which they just do the whole thing. I'm really curious to see how that evolves. I agree that today the reliability is so important in the enterprise that they just don't use most of them. Yeah. Yeah. No, that's cool. But it's great to hear the story because I think from the outside, people are like, oh, a dev, they do Act One, they do Persimon, they do Fuyu, they do all this stuff. Yeah, it's just the public stuff.Swyx [00:19:20]: It's just public stuff.David [00:19:21]: So one of the things we haven't shared before is we're completely sold out for Q1. And so I think...Swyx [00:19:26]: Sold out of what?David [00:19:27]: Sold out of bandwidth to go on board more customers. And so we're like working really hard to go make that less of a bottleneck, but our expectation is that I think we're going to be significantly more public about the broader product shape and the new types of customers we want to attract later this year. So I think that clarification will happen by default.Swyx [00:19:43]: Why have you become more public? You know, if the whole push has... You're sold out, you're my enterprise, but you're also clearly putting effort towards being more open or releasing more things.David [00:19:53]: I think we just flipped over that way fairly recently. That's a good question. I think it actually boils down to two things. One, I think that, frankly, a big part of it is that the public narrative is really forming around agents as being the most important thing. And I'm really glad that's happening because when we started the company in January 2022, everybody in the field knew about the agents thing from RL, but the general public had no conception of what it was. They were still hanging their narrative hat on the tree of everything's a chatbot. And so I think now one of the things that I really care about is that when people think agent, they actually think the right thing. All sorts of different things are being called agents. Chatbots are being called agents. Things that make a function call are being called agents. To me, an agent is something that you can give a goal and get an end step workflow done correctly in the minimum number of steps. And so that's a big part of why. And I think the other part is because I think it's always good for people to be more aware of Redept as they think about what the next thing they want to do in their careers. The field is quickly pivoting in a world where foundation models are looking more and more commodity. And I think a huge amount of gain is going to happen from how do you use foundation models as the well-learned behavioral cloner to go solve agents. And I think people who want to do agents research should really come to Redept.Swyx [00:21:00]: When you say agents have become more part of the public narrative, are there specific things that you point to? I'll name a few. Bill Gates in his blog post mentioning that agents are the future. I'm the guy who made OSes, and I think agents are the next thing. So Bill Gates, I'll call that out. And then maybe Sam Altman also saying that agents are the future for open AI.David [00:21:17]: I think before that even, I think there was something like the New York Times, Cade Metz wrote a New York Times piece about it. Right now, in a bit to differentiate, I'm seeing AI startups that used to just brand themselves as an AI company, but now brand themselves as an AI agent company. It's just like, it's a term I just feel like people really want.Swyx [00:21:31]: From the VC side, it's a bit mixed. Is it? As in like, I think there are a lot of VCs where like, I would not touch any agent startups because like- Why is that? Well, you tell me.Alessio [00:21:41]: I think a lot of VCs that are maybe less technical don't understand the limitations of the-Swyx [00:21:46]: No, that's not fair.Alessio [00:21:47]: No, no, no, no. I think like- You think so? No, no. I think like the, what is possible today and like what is worth investing in, you know? And I think like, I mean, people look at you and say, well, these guys are building agents. They needed 400 million to do it. So a lot of VCs are maybe like, oh, I would rather invest in something that is tacking on AI to an existing thing, which is like easier to get the market and kind of get some of the flywheel going. But I'm also surprised a lot of funders just don't want to do agents. It's not even the funding. Sometimes we look around and it's like, why is nobody doing agents for X? Wow.David [00:22:17]: That's good to know actually. I never knew that before. My sense from my limited perspective is there's a new agent company popping up every day.Swyx [00:22:24]: So maybe I'm- They are. They are. But like I have advised people to take agents off of their title because it's so diluted.David [00:22:31]: It's now so diluted.Swyx [00:22:32]: Yeah. So then it doesn't stand for anything. Yeah.David [00:22:35]: That's a really good point.Swyx [00:22:36]: So like, you know, you're a portfolio allocator. You have people know about Persimmon, people know about Fuyu and Fuyu Heavy. Can you take us through like how you think about that evolution of that and what people should think about what that means for adepts and sort of research directions? Kind of take us through the stuff you shipped recently and how people should think about the trajectory of what you're doing.David [00:22:56]: The critical path for adepts is we want to build agents that can do a higher and higher level abstraction things over time, all while keeping an insanely high reliability standard. Because that's what turns us from research into something that customers want. And if you build agents with really high reliability standard, but are continuing pushing a level of abstraction, you then learn from your users how to get that next level of abstraction faster. So that's how you actually build the data flow. That's the critical path for the company. Everything we do is in service of that. So if you go zoom way, way back to Act One days, right? Like the core thing behind Act One is can we teach large model basically how to even actuate your computer? And I think we're one of the first places to have solved that and shown it and shown the generalization that you get when you give it various different workflows and texts. But I think from there on out, we really realized was that in order to get reliability, companies just do things in various different ways. You actually want these models to be able to get a lot better at having some specification of some guardrails for what it actually should be doing. And I think in conjunction with that, a giant thing that was really necessary is really fast multimodal models that are really good at understanding knowledge work and really good at understanding screens. And that is needs to kind of be the base for some of these agents. Back then we had to do a ton of research basically on how do we actually make that possible? Well, first off, like back in forgot exactly one month to 23, like there were no multimodal models really that you could use for things like this. And so we pushed really hard on stuff like the Fuyu architecture. I think one big hangover primarily academic focus for multimodal models is most multimodal models are primarily trained on like natural images, cat and dog photos, stuff that's come out of the camera. Coco. Yeah, right. And the Coco is awesome. Like I love Coco. I love TY. Like it's really helped the field. Right. But like that's the build one thing. I actually think it's really clear today. Multimodal models are the default foundation model, right? It's just going to supplant LLMs. Like you just train a giant multimodal model. And so for that though, like where are they going to be the most useful? They're going to be most useful in knowledge work tasks. That's where the majority of economic value is going to be. It's not in cat and dogs. Right. And so if that's what it is, what do you need to train? I need to train on like charts, graphs, tables, invoices, PDFs, receipts, unstructured data, UIs. That's just a totally different pre-training corpus. And so a depth spent a lot of time building that. And so the public for use and stuff aren't trained on our actual corpus, it's trained on some other stuff. But you take a lot of that data and then you make it really fast and make it really good at things like dense OCR on screens. And then now you have the right like raw putty to go make a good agent. So that's kind of like some of the modeling side, we've kind of only announced some of that stuff. We haven't really announced much of the agent's work, but that if you put those together with the correct product form factor, and I think the product form factor also really matters. I think we're seeing, and you guys probably see this a little bit more than I do, but we're seeing like a little bit of a pushback against the tyranny of chatbots as form factor. And I think that the reason why the form factor matters is the form factor changes what data you collect in the human feedback loop. And so I think we've spent a lot of time doing full vertical integration of all these bits in order to get to where we are.Swyx [00:25:44]: Yeah. I'll plug Amelia Wattenberger's talk at our conference, where she gave a little bit of the thinking behind like what else exists other than chatbots that if you could delegate to reliable agents, you could do. I was kind of excited at Adept experiments or Adept workflows, I don't know what the official name for it is. I was like, okay, like this is something I can use, but it seems like it's just an experiment for now. It's not your product.David [00:26:06]: So you basically just use experiments as like a way to go push various ideas on the design side to some people and just be like, yeah, we'll play with it. Actually the experiments code base underpins the actual product, but it's just the code base itself is kind of like a skeleton for us to go deploy arbitrary cards on the side.Swyx [00:26:22]: Yeah.Alessio [00:26:23]: Makes sense. I was going to say, I would love to talk about the interaction layer. So you train a model to see UI, but then there's the question of how do you actually act on the UI? I think there was some rumors about open app building agents that are kind of like, they manage the end point. So the whole computer, you're more at the browser level. I read in one of your papers, you have like a different representation, kind of like you don't just take the dome and act on it. You do a lot more stuff. How do you think about the best way the models will interact with the software and like how the development of products is going to change with that in mind as more and more of the work is done by agents instead of people?David [00:26:58]: This is, there's so much surface area here and it's actually one of the things I'm really excited about. And it's funny because I've spent most of my time doing research stuff, but there's like a whole new ball game that I've been learning about and I find it really cool. So I would say the best analogy I have to why Adept is pursuing a path of being able to use your computer like a human, plus of course being able to call APIs and being able to call APIs is the easy part, like being able to use your computer like a human is a hard part. It's in the same way why people are excited about humanoid robotics, right? In a world where you had T equals infinity, right? You're probably going to have various different form factors that robots could just be in and like all the specialization. But the fact is that humans live in a human environment. So having a human robot lets you do things that humans do without changing everything along the way. It's the same thing for software, right? If you go itemize out the number of things you want to do on your computer for which every step has an API, those numbers of workflows add up pretty close to zero. And so then many points along the way, you need the ability to actually control your computer like a human. It also lets you learn from human usage of computers as a source of training data that you don't get if you have to somehow figure out how every particular step needs to be some particular custom private API thing. And so I think this is actually the most practical path. I think because it's the most practical path, I think a lot of success will come from going down this path. I kind of think about this early days of the agent interaction layer level is a little bit like, do you all remember Windows 3.1? Like those days? Okay, this might be, I might be, I might be too old for you guys on this. But back in the day, Windows 3.1, we had this transition period between pure command line, right? Being the default into this new world where the GUI is the default and then you drop into the command line for like programmer things, right? The old way was you booted your computer up, DOS booted, and then it would give you the C colon slash thing. And you typed Windows and you hit enter, and then you got put into Windows. And then the GUI kind of became a layer above the command line. The same thing is going to happen with agent interfaces is like today we'll be having the GUI is like the base layer. And then the agent just controls the current GUI layer plus APIs. And in the future, as more and more trust is built towards agents and more and more things can be done by agents, if more UIs for agents are actually generative in and of themselves, then that just becomes a standard interaction layer. And if that becomes a standard interaction layer, what changes for software is that a lot of software is going to be either systems or record or like certain customized workflow execution engines. And a lot of how you actually do stuff will be controlled at the agent layer.Alessio [00:29:19]: And you think the rabbit interface is more like it would like you're not actually seeing the app that the model interacts with. You're just saying, hey, I need to log this call on Salesforce. And you're never actually going on salesforce.com directly as the user. I can see that being a model.David [00:29:33]: I think I don't know enough about what using rabbit in real life will actually be like to comment on that particular thing. But I think the broader idea that, you know, you have a goal, right? The agent knows how to break your goal down into steps. The agent knows how to use the underlying software and systems or record to achieve that goal for you. The agent maybe presents you information in a custom way that's only relevant to your particular goal, all just really leads to a world where you don't really need to ever interface with the apps underneath unless you're a power user for some niche thing.Swyx [00:30:03]: General question. So first of all, I think like the sort of input mode conversation. I wonder if you have any analogies that you like with self-driving, because I do think like there's a little bit of how the model should perceive the world. And you know, the primary split in self-driving is LiDAR versus camera. And I feel like most agent companies that I'm tracking are all moving towards camera approach, which is like the multimodal approach, you know, multimodal vision, very heavy vision, all the Fuyu stuff that you're doing. You're focusing on that, including charts and tables. And do you find that inspiration there from like the self-driving world? That's a good question.David [00:30:37]: I think sometimes the most useful inspiration I've found from self-driving is the levels analogy. I think that's awesome. But I think that our number one goal is for agents not to look like self-driving. We want to minimize the chances that agents are sort of a thing that you just have to bang your head at for a long time to get to like two discontinuous milestones, which is basically what's happened in self-driving. We want to be living in a world where you have the data flywheel immediately, and that takes you all the way up to the top. But similarly, I mean, compared to self-driving, like two things that people really undervalue is like really easy to driving a car down highway 101 in a sunny day demo. That actually doesn't prove anything anymore. And I think the second thing is that as a non-self-driving expert, I think one of the things that we believe really strongly is that everyone undervalues the importance of really good sensors and actuators. And actually a lot of what's helped us get a lot of reliability is a really strong focus on actually why does the model not do this thing? And the non-trivial amount of time, the time the model doesn't actually do the thing is because if you're a wizard of ozzing it yourself, or if you have unreliable actuators, you can't do the thing. And so we've had to fix a lot of those problems.Swyx [00:31:43]: I was slightly surprised just because I do generally consider the way most that we see all around San Francisco as the most, I guess, real case of agents that we have in very material ways.David [00:31:55]: Oh, that's absolutely true. I think they've done an awesome job, but it has taken a long time for self-driving to mature from when it entered the consciousness and the driving down 101 on a sunny day moment happened to now. Right. So I want to see that more compressed.Swyx [00:32:07]: And I mean, you know, cruise, you know, RIP. And then one more thing on just like, just going back on this reliability thing, something I have been holding in my head that I'm curious to get your commentary on is I think there's a trade-off between reliability and generality, or I want to broaden reliability into just general like sort of production readiness and enterprise readiness scale. Because you have reliability, you also have cost, you have speed, speed is a huge emphasis for a debt. The tendency or the temptation is to reduce generality to improve reliability and to improve cost, improve speed. Do you perceive a trade-off? Do you have any insights that solve those trade-offs for you guys?David [00:32:42]: There's definitely a trade-off. If you're at the Pareto frontier, I think a lot of folks aren't actually at the Pareto frontier. I think the way you get there is basically how do you frame the fundamental agent problem in a way that just continues to benefit from data? I think one of the main ways of being able to solve that particular trade-off is you basically just want to formulate the problem such that every particular use case just looks like you collecting more data to go make that use case possible. I think that's how you really solve. Then you get into the other problems like, okay, are you overfitting on these end use cases? You're not doing a thing where you're being super prescriptive for the end steps that the model can only do, for example.Swyx [00:33:17]: Then the question becomes, do you have one house model that you can then customize for each customer and you're fine-tuning them on each customer's specific use case?David [00:33:25]: Yeah.Swyx [00:33:26]: We're not sharing that. You're not sharing that. It's tempting, but that doesn't look like AGI to me. You know what I mean? That is just you have a good base model and then you fine-tune it.David [00:33:35]: For what it's worth, I think there's two paths to a lot more capability coming out of the models that we all are training these days. I think one path is you figure out how to spend, compute, and turn it into data. In that path, I consider search, RL, all the things that we all love in this era as part of that path, like self-play, all that stuff. The second path is how do you get super competent, high intelligence demonstrations from humans? I think the right way to move forward is you kind of want to combine the two. The first one gives you maximum sample efficiency for a little second, but I think that it's going to be hard to be running at max speed towards AGI without actually solving a bit of both.Swyx [00:34:16]: You haven't talked much about synthetic data, as far as I can tell. Probably this is a bit too much of a trend right now, but any insights on using synthetic data to augment the expensive human data?David [00:34:26]: The best part about framing AGI as being able to help people do things on computers is you have an environment.Swyx [00:34:31]: Yes. So you can simulate all of it.David [00:34:35]: You can do a lot of stuff when you have an environment.Alessio [00:34:37]: We were having dinner for our one-year anniversary. Congrats. Yeah. Thank you. Raza from HumanLoop was there, and we mentioned you were coming on the pod. This is our first-Swyx [00:34:45]: So he submitted a question.Alessio [00:34:46]: Yeah, this is our first, I guess, like mailbag question. He asked, when you started GPD 4 Data and Exist, now you have a GPD 4 vision and help you building a lot of those things. How do you think about the things that are unique to you as Adept, and like going back to like the maybe research direction that you want to take the team and what you want people to come work on at Adept, versus what is maybe now become commoditized that you didn't expect everybody would have access to?David [00:35:11]: Yeah, that's a really good question. I think implicit in that question, and I wish he were tier two so he can push back on my assumption about his question, but I think implicit in that question is calculus of where does advantage accrue in the overall ML stack. And maybe part of the assumption is that advantage accrues solely to base model scaling. But I actually believe pretty strongly that the way that you really win is that you have to go build an agent stack that is much more than that of the base model itself. And so I think like that is always going to be a giant advantage of vertical integration. I think like it lets us do things like have a really, really fast base model, is really good at agent things, but is bad at cat and dog photos. It's pretty good at cat and dog photos. It's not like soda at cat and dog photos, right? So like we're allocating our capacity wisely, right? That's like one thing that you really get to do. I also think that the other thing that is pretty important now in the broader foundation modeling space is I feel despite any potential concerns about how good is agents as like a startup area, right? Like we were talking about earlier, I feel super good that we're doing foundation models in service of agents and all of the reward within Adept is flowing from can we make a better agent? Because right now I think we all see that, you know, if you're training on publicly available web data, you put in the flops and you do reasonable things, then you get decent results. And if you just double the amount of compute, then you get predictably better results. And so I think pure play foundation model companies are just going to be pinched by how good the next couple of llamas are going to be and the next what good open source thing. And then seeing the really big players put ridiculous amounts of compute behind just training these base foundation models, I think is going to commoditize a lot of the regular LLMs and soon regular multimodal models. So I feel really good that we're just focused on agents.Swyx [00:36:56]: So you don't consider yourself a pure play foundation model company?David [00:36:59]: No, because if we were a pure play foundation model company, we would be training general foundation models that do summarization and all this other...Swyx [00:37:06]: You're dedicated towards the agent. Yeah.David [00:37:09]: And our business is an agent business. We're not here to sell you tokens, right? And I think like selling tokens, unless there's like a...Swyx [00:37:14]: Not here to sell you tokens. I love it.David [00:37:16]: It's like if you have a particular area of specialty, right? Then you won't get caught in the fact that everyone's just scaling to ridiculous levels of compute. But if you don't have a specialty, I find that, I think it's going to be a little tougher.Swyx [00:37:27]: Interesting. Are you interested in robotics at all? Just a...David [00:37:30]: I'm personally fascinated by robotics. I've always loved robotics.Swyx [00:37:33]: Embodied agents as a business, you know, Figure is like a big, also sort of open AI affiliated company that raises a lot of money.David [00:37:39]: I think it's cool. I think, I mean, I don't know exactly what they're doing, but...Swyx [00:37:44]: Robots. Yeah.David [00:37:46]: Well, I mean, that's a...Swyx [00:37:47]: Yeah. What question would you ask? If we had them on, what would you ask them?David [00:37:50]: Oh, I just want to understand what their overall strategy is going to be between now and when there's reliable stuff to be deployed. But honestly, I just don't know enough about it.Swyx [00:37:57]: And if I told you, hey, fire your entire warehouse workforce and, you know, put robots in there, isn't that a strategy? Oh yeah.David [00:38:04]: Yeah. Sorry. I'm not questioning whether they're doing smart things. I genuinely don't know what they're doing as much, but I think there's two things. One, I'm so excited for someone to train a foundation model of robots. It's just, I think it's just going to work. Like I will die on this hill, but I mean, like again, this whole time, like we've been on this podcast, we're just going to continually saying these models are basically behavioral cloners. Right. So let's go behavioral clone all this like robot behavior. Right. And then you figure out everything else you have to do in order to teach you how to solve a new problem. That's going to work. I'm super stoked for that. I think unlike what we're doing with helping humans with knowledge work, it just sounds like a more zero sum job replacement play. Right. And I'm personally less excited about that.Alessio [00:38:46]: We had a Ken June from InBoo on the podcast. We asked her why people should go work there and not at Adept.Swyx [00:38:52]: Oh, that's so funny.Alessio [00:38:54]: Well, she said, you know, there's space for everybody in this market. We're all doing interesting work. And she said, they're really excited about building an operating system for agent. And for her, the biggest research thing was like getting models, better reasoning and planning for these agents. The reverse question to you, you know, why should people be excited to come work at Adept instead of InBoo? And maybe what are like the core research questions that people should be passionate about to have fun at Adept? Yeah.David [00:39:22]: First off, I think that I'm sure you guys believe this too. The AI space to the extent there's an AI space and the AI agent space are both exactly as she likely said, I think colossal opportunities and people are just going to end up winning in different areas and a lot of companies are going to do well. So I really don't feel that zero something at all. I would say to like change the zero sum framing is why should you be at Adept? I think there's two huge reasons to be at Adept. I think one of them is everything we do is in the service of like useful agents. We're not a research lab. We do a lot of research in service of that goal, but we don't think about ourselves as like a classic research lab at all. And I think the second reason I work at Adept is if you believe that actually having customers and a reward signal from customers lets you build a GI faster, which we really believe, then you should come here. And I think the examples for why that's true is for example, our evaluations, they're not academic evals. They're not simulator evals. They're like, okay, we have a customer that really needs us to do these particular things. We can do some of them. These are the ones they want us to, we can't do them at all. We've turned those into evals, solve it, right? I think that's really cool. Like everybody knows a lot of these evals are like pretty saturated and the new ones that even are not saturated. You look at someone and you're like, is this actually useful? Right? I think that's a degree of practicality that really helps. Like we're equally excited about the same problems around reasoning and planning and generalization and all of this stuff. They're very grounded in actual needs right now, which is really cool.Swyx [00:40:45]: Yeah. This has been a wonderful dive. You know, I wish we had more time, but I would just leave it kind of open to you. I think you have broad thoughts, you know, just about
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MATS AI Safety Strategy Curriculum, published by Ryan Kidd on March 8, 2024 on LessWrong. As part of the MATS Winter 2023-24 Program, scholars were invited to take part in a series of weekly discussion groups on AI safety strategy. Each strategy discussion focused on a specific crux we deemed relevant to prioritizing AI safety interventions and was accompanied by a reading list and suggested discussion questions. The discussion groups were faciliated by several MATS alumni and other AI safety community members and generally ran for 1-1.5 h. As assessed by our alumni reviewers, scholars in our Summer 2023 Program were much better at writing concrete plans for their research than they were at explaining their research's theory of change. We think it is generally important for researchers, even those early in their career, to critically evaluate the impact of their work, to: Choose high-impact research directions and career pathways; Conduct adequate risk analyses to mitigate unnecessary safety hazards and avoid research with a poor safety-capabilities advancement ratio; Discover blindspots and biases in their research strategy. We expect that the majority of improvements to the above areas occur through repeated practice, ideally with high-quality feedback from a mentor or research peers. However, we also think that engaging with some core literature and discussing with peers is beneficial. This is our attempt to create a list of core literature for AI safety strategy appropriate for the average MATS scholar, who should have completed the AISF Alignment Course. We are not confident that the reading lists and discussion questions below are the best possible version of this project, but we thought they were worth publishing anyways. MATS welcomes feedback and suggestions for improvement. Week 1: How will AGI arise? What is AGI? Karnofsky - Forecasting Transformative AI, Part 1: What Kind of AI? (13 min) Metaculus - When will the first general AI system be devised, tested, and publicly announced? (read Resolution Criteria) (5 min) How large will models need to be and when will they be that large? Alexander - Biological Anchors: The Trick that Might or Might Not Work (read Parts I-II) (27 min) Optional: Davidson - What a compute-centric framework says about AI takeoff speeds (20 min) Optional: Habryka et al. - AI Timelines (dialogue between Ajeya Cotra, Daniel Kokotajlo, and Ege Erdil) (61 min) Optional: Halperin, Chow, Mazlish - AGI and the EMH: markets are not expecting aligned or unaligned AI in the next 30 years (31 min) How far can current architectures scale? Patel - Will Scaling Work? (16 min) Epoch - AI Trends (5 min) Optional: Nostalgebraist - Chinchilla's Wild Implications (13 min) Optional: Porby - Why I think strong general AI is coming soon (40 min) What observations might make us update? Ngo - Clarifying and predicting AGI (5 min) Optional: Berglund et al. - Taken out of context: On measuring situational awareness in LLMs (33 min) Optional: Cremer, Whittlestone - Artificial Canaries: Early Warning Signs for Anticipatory and Democratic Governance of AI (34 min) Suggested discussion questions If you look at any of the outside view models linked in "Biological Anchors: The Trick that Might or Might Not Work" (e.g., Ajeya Cotra's and Tom Davidson's models), which of their quantitative estimates do you agree or disagree with? Do your disagreements make your timelines longer or shorter? Do you disagree with the models used to forecast AGI? That is, rather than disagree with their estimates of particular variables, do you disagree with any more fundamental assumptions of the model? How does that change your timelines, if at all? If you had to make a probabilistic model to forecast AGI, what quantitative variables would you use and what fundamental assumptions would ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Managing catastrophic misuse without robust AIs, published by Ryan Greenblatt on January 16, 2024 on The AI Alignment Forum. Many people worry about catastrophic misuse of future AIs with highly dangerous capabilities. For instance, powerful AIs might substantially lower the bar to building bioweapons or allow for massively scaling up cybercrime. How could an AI lab serving AIs to customers manage catastrophic misuse? One approach would be to ensure that when future powerful AIs are asked to perform tasks in these problematic domains, the AIs always refuse. However, it might be a difficult technical problem to ensure these AIs refuse: current LLMs are possible to jailbreak into doing arbitrary behavior, and the field of adversarial robustness, which studies these sorts of attacks, has made only slow progress in improving robustness over the past 10 years. If we can't ensure that future powerful AIs are much more robust than current models[1], then malicious users might be able to jailbreak these models to allow for misuse. This is a serious concern, and it would be notably easier to prevent misuse if models were more robust to these attacks. However, I think there are plausible approaches to effectively mitigating catastrophic misuse which don't require high levels of robustness on the part of individual AI models. (In this post, I'll use "jailbreak" to refer to any adversarial attack.) In this post, I'll discuss addressing bioterrorism and cybercrime misuse as examples of how I imagine mitigating catastrophic misuse[2]. I'll do this as a nearcast where I suppose that scaling up LLMs results in powerful AIs that would present misuse risk in the absence of countermeasures. The approaches I discuss won't require better adversarial robustness than exhibited by current LLMs like Claude 2 and GPT-4. I think that the easiest mitigations for bioterrorism and cybercrime are fairly different, because of the different roles that LLMs play in these two threat models. The mitigations I'll describe are non-trivial, and it's unclear if they will happen by default. But regardless, this type of approach seems considerably easier to me than trying to achieve very high levels of adversarial robustness. I'm excited for work which investigates and red-teams methods like the ones I discuss. [Thanks to Fabien Roger, Ajeya Cotra, Nate Thomas, Max Nadeau, Aidan O'Gara, and Ethan Perez for comments or discussion. This post was originally posted as a comment in response to this post by Aidan O'Gara; you can see the original comment here for reference. Inside view, I think most research on preventing misuse seems less leveraged (for most people) than preventing AI takeover caused by catastrophic misalignment; see here for more discussion. Mitigations for bioterrorism In this section, I'll describe how I imagine handling bioterrorism risk for an AI lab deploying powerful models (e.g., ASL-3/ASL-4). As I understand it, the main scenario by which LLMs cause bioterrorism risk is something like the following: there's a team of relatively few people, who are not top experts in the relevant fields but who want to do bioterrorism for whatever reason. Without LLMs, these people would struggle to build bioweapons - they wouldn't be able to figure out various good ideas, and they'd get stuck while trying to manufacture their bioweapons (perhaps like Aum Shinrikyo). But with LLMs, they can get past those obstacles. (I'm making the assumption here that the threat model is more like "the LLM gives the equivalent of many hours of advice" rather than "the LLM gives the equivalent of five minutes of advice". I'm not a biosecurity expert and so don't know whether that's an appropriate assumption to make; it probably comes down to questions about what the hard steps in building catastrophic bioweapons are. And s...
Rebroadcast: this episode was originally released in January 2021.You wake up in a mysterious box, and hear the booming voice of God: “I just flipped a coin. If it came up heads, I made ten boxes, labeled 1 through 10 — each of which has a human in it. If it came up tails, I made ten billion boxes, labeled 1 through 10 billion — also with one human in each box. To get into heaven, you have to answer this correctly: Which way did the coin land?”You think briefly, and decide you should bet your eternal soul on tails. The fact that you woke up at all seems like pretty good evidence that you're in the big world — if the coin landed tails, way more people should be having an experience just like yours.But then you get up, walk outside, and look at the number on your box.‘3'. Huh. Now you don't know what to believe.If God made 10 billion boxes, surely it's much more likely that you would have seen a number like 7,346,678,928?In today's interview, Ajeya Cotra — a senior research analyst at Open Philanthropy — explains why this thought experiment from the niche of philosophy known as ‘anthropic reasoning' could be relevant for figuring out where we should direct our charitable giving.Links to learn more, summary, and full transcript.Some thinkers both inside and outside Open Philanthropy believe that philanthropic giving should be guided by ‘longtermism' — the idea that we can do the most good if we focus primarily on the impact our actions will have on the long-term future.Ajeya thinks that for that notion to make sense, there needs to be a good chance we can settle other planets and solar systems and build a society that's both very large relative to what's possible on Earth and, by virtue of being so spread out, able to protect itself from extinction for a very long time.But imagine that humanity has two possible futures ahead of it: Either we're going to have a huge future like that, in which trillions of people ultimately exist, or we're going to wipe ourselves out quite soon, thereby ensuring that only around 100 billion people ever get to live.If there are eventually going to be 1,000 trillion humans, what should we think of the fact that we seemingly find ourselves so early in history? Being among the first 100 billion humans, as we are, is equivalent to walking outside and seeing a three on your box. Suspicious! If the future will have many trillions of people, the odds of us appearing so strangely early are very low indeed.If we accept the analogy, maybe we can be confident that humanity is at a high risk of extinction based on this so-called ‘doomsday argument‘ alone.If that's true, maybe we should put more of our resources into avoiding apparent extinction threats like nuclear war and pandemics. But on the other hand, maybe the argument shows we're incredibly unlikely to achieve a long and stable future no matter what we do, and we should forget the long term and just focus on the here and now instead.There are many critics of this theoretical ‘doomsday argument', and it may be the case that it logically doesn't work. This is why Ajeya spent time investigating it, with the goal of ultimately making better philanthropic grants.In this conversation, Ajeya and Rob discuss both the doomsday argument and the challenge Open Phil faces striking a balance between taking big ideas seriously, and not going all in on philosophical arguments that may turn out to be barking up the wrong tree entirely.They also discuss:Which worldviews Open Phil finds most plausible, and how it balances themWhich worldviews Ajeya doesn't embrace but almost doesHow hard it is to get to other solar systemsThe famous ‘simulation argument'When transformative AI might actually arriveThe biggest challenges involved in working on big research reportsWhat it's like working at Open PhilAnd much moreProducer: Keiran HarrisAudio mastering: Ben CordellTranscriptions: Sofia Davis-Fogel
Note: I can't seem to edit or remove the “transcript” tab. I recommend you ignore that and just look at the much higher quality, slightly cleaned up one below. Most importantly, follow Sarah on Twitter! Summary (Written by chatGPT, as you can probably tell)In this episode of Pigeon Hour host Aaron delves deep into the world of AI safety with his guest, Sarah Woodhouse. Sarah shares her unexpected journey from fearing job automation to becoming a recognized voice on AI safety Twitter. Her story starts with a simple Google search that led her down a rabbit hole of existential dread and unexpected fame on social media. As she narrates her path from lurker to influencer, Sarah reflects on the quirky dynamics of the AI safety community, her own existential crisis, and the serendipitous tweet that resonated with thousands.Aaron and Sarah's conversation takes unexpected turns, discussing everything from the peculiarities of EA rationalists to the surprisingly serious topic of shrimp welfare. They also explore the nuances of AI doom probabilities, the social dynamics of tech Twitter, and Sarah's unexpected viral fame as a tween. This episode is a rollercoaster of insights and anecdotes, perfect for anyone interested in the intersection of technology, society, and the unpredictable journey of internet fame.Topics discussedDiscussion on AI Safety and Personal Journeys:* Aaron and Sarah discuss her path to AI safety, triggered by concerns about job automation and the realization that AI could potentially replace her work.* Sarah's deep dive into AI safety started with a simple Google search, leading her to Geoffrey Hinton's alarming statements, and eventually to a broader exploration without finding reassuring consensus.* Sarah's Twitter engagement began with lurking, later evolving into active participation and gaining an audience, especially after a relatable tweet thread about an existential crisis.* Aaron remarks on the rarity of people like Sarah, who follow the AI safety rabbit hole to its depths, considering its obvious implications for various industries.AI Safety and Public Perception:* Sarah discusses her surprise at discovering the AI safety conversation happening mostly in niche circles, often with a tongue-in-cheek attitude that could seem dismissive of the serious implications of AI risks.* The discussion touches on the paradox of AI safety: it's a critically important topic, yet it often remains confined within certain intellectual circles, leading to a lack of broader public engagement and awareness.Cultural Differences and Personal Interests:* The conversation shifts to cultural differences between the UK and the US, particularly in terms of sincerity and communication styles.* Personal interests, such as theater and musicals (like "Glee"), are also discussed, revealing Sarah's background and hobbies.Effective Altruism (EA) and Rationalist Communities:* Sarah points out certain quirks of the EA and rationalist communities, such as their penchant for detailed analysis, hedging statements, and the use of probabilities in discussions.* The debate around the use of "P(Doom)" (probability of doom) in AI safety discussions is critiqued, highlighting how it can be both a serious analytical tool and a potentially alienating jargon for outsiders.Shrimp Welfare and Ethical Considerations:* A detailed discussion on shrimp welfare as an ethical consideration in effective altruism unfolds, examining the moral implications and effectiveness of focusing on animal welfare at a large scale.* Aaron defends his position on prioritizing shrimp welfare in charitable giving, based on the principles of importance, tractability, and neglectedness.Personal Decision-Making in Charitable Giving:* Strategies for personal charitable giving are explored, including setting a donation cutoff point to balance moral obligations with personal needs and aspirations.TranscriptAARON: Whatever you want. Okay. Yeah, I feel like you said this on Twitter. The obvious thing is, how did you learn about AI safety? But maybe you've already covered that. That's boring. First of all, do you want to talk about that? Because we don't have to.SARAH: I don't mind talking about that.AARON: But it's sort of your call, so whatever. I don't know. Maybe briefly, and then we can branch out?SARAH: I have a preference for people asking me things and me answering them rather than me setting the agenda. So don't ever feel bad about just asking me stuff because I prefer that.AARON: Okay, cool. But also, it feels like the kind of thing where, of course, we have AI. Everyone already knows that this is just like the voice version of these four tweets or whatever. But regardless. Yes. So, Sarah, as Pigeon Hour guest, what was your path through life to AI safety Twitter?SARAH: Well, I realized that a chatbot could very easily do my job and that my employers either hadn't noticed this or they had noticed, but they were just being polite about it and they didn't want to fire me because they're too nice. And I was like, I should find out what AI development is going to be like over the next few years so that I know if I should go and get good at some other stuff.SARAH: I just had a little innocent Google. And then within a few clicks, I'd completely doom pilled myself. I was like, we're all going to die. I think I found Geoffrey Hinton because he was on the news at the time, because he just quit his job at Google. And he was there saying things that sounded very uncertain, very alarming. And I was like, well, he's probably the pessimist, but I'm sure that there are loads of optimists to counteract that because that's how it usually goes. You find a doomer and then you find a bunch of more moderate people, and then there's some consensus in the middle that everything's basically fine.SARAH: I was like, if I just keep looking, I'll find the consensus because it's there. I'm sure it's there. So I just kept looking and looking for it. I looked for it for weeks. I just didn't find it. And then I was like, nobody knows what's going on. This seems really concerning. So then I started lurking on Twitter, and then I got familiar with all the different accounts, whatever. And then at some point, I was like, I'm going to start contributing to this conversation, but I didn't think that anybody would talk back to me. And then at some point, they started talking back to me and I was like, this is kind of weird.SARAH: And then at some point, I was having an existential crisis and I had a couple of glasses of wine or something, and I just decided to type this big, long thread. And then I went to bed. I woke up the next morning slightly grouchy and hungover. I checked my phone and there were all these people messaging me and all these people replying to my thread being like, this is so relatable. This really resonated with me. And I was like, what is going on?AARON: You were there on Twitter before that thread right? I'm pretty sure I was following you.SARAH: I think, yeah, I was there before, but no one ever really gave me any attention prior to that. I think I had a couple of tweets that blew up before that, but not to the same extent. And then after that, I think I was like, okay, so now I have an audience. When I say an audience, like, obviously a small one, but more of an audience than I've ever had before in my life. And I was like, how far can I take this?SARAH: I was a bit like, people obviously started following me because I'm freFreaking out about AI, but if I post an outfit, what's going to happen? How far can I push this posting, these fit checks? I started posting random stuff about things that were completely unrelated. I was like, oh, people are kind of here for this, too. Okay, this is weird. So now I'm just milking it for all its worth, and I really don't know why anybody's listening to me. I'm basically very confused about the whole thing.AARON: I mean, I think it's kind of weird from your perspective, or it's weird in general because there aren't that many people who just do that extremely logical thing at the beginning. I don't know, maybe it's not obvious to people in every industry or whatever that AI is potentially a big deal, but there's lots of truckers or whatever. Maybe they're not the best demographic or the most conducive demographic, like, getting on Twitter or whatever, but there's other jobs that it would make sense to look into that. It's kind of weird to me that only you followed the rabbit hole all the way down.SARAH: I know! This is what I…Because it's not that hard to complete the circle. It probably took me like a day, it took me like an afternoon to get from, I'm worried about job automation to I should stop saving for retirement. It didn't take me that long. Do you know what I mean? No one ever looks. I literally don't get it. I was talking to some people. I was talking to one of my coworkers about this the other day, and I think I came up in conversation. She was like, yeah, I'm a bit worried about AI because I heard on the radio that taxi drivers might be out of a job. That's bad. And I was like, yeah, that is bad. But do you know what else? She was like, what are the AI companies up to that we don't know about? And I was like, I mean, you can go on their website. You can just go on their website and read about how they think that their technology is an extinction risk. It's not like they're hiding. It's literally just on there and no one ever looks. It's just crazy.AARON: Yeah. Honestly, I don't even know if I was in your situation, if I would have done that. It's like, in some sense, I am surprised. It's very few people maybe like one, but at another level, it's more rationality than most humans have or something. Yeah. You regret going down that rabbit hole?SARAH: Yeah, kind of. Although I'm enjoying the Twitter thing and it's kind of fun, and it turns out there's endless comedic material that you can get out of impending doom. The whole thing is quite funny. It's not funny, but you can make it funny if you try hard enough. But, yeah, what was I going to say? I think maybe I was more primed for doom pilling than your average person because I already knew what EA was and I already knew, you know what I mean. That stuff was on my radar.AARON: That's interesting.SARAH: I think had it not been on my radar, I don't think I would have followed the pipeline all the way.AARON: Yeah. I don't know what browser you use, but it would be. And you should definitely not only do this if you actually think it would be cool or whatever, but this could be in your browser history from that day and that would be hilarious. You could remove anything you didn't want to show, but if it's like Google Chrome, they package everything into sessions. It's one browsing session and it'll have like 10,000 links.SARAH: Yeah, I think for non-sketchy reasons, I delete my Google history more regularly than that. I don't think I'd be able to find that. But I can remember the day and I can remember my anxiety levels just going up and up somewhere between 01:00 p.m. and 07:00 p.m. And by the evening I'm like, oh, my God.AARON: Oh, damn, that's wild.SARAH: It was really stressful.AARON: Yeah, I guess props for, I don't know if props…Is the right word, I guess, impressed? I'm actually somewhat surprised to hear that you said you regret it. I mean, that sucks though, I guess. I'm sorry.SARAH: If you could unknow this, would you?AARON: No, because I think it's worth maybe selfishly, but not overall because. Okay, yeah, I think that would plausibly be the selfish thing to do. Actually. No, actually, hold on. No, I actually don't think that's true. I actually think there's enough an individual can do selfishly such that it makes sense. Even the emotional turmoil.SARAH: It would depend how much you thought that you were going to personally move the needle by knowing about it. I personally don't think that I'm going to be able to do very much. I was going to tip the scales. I wouldn't selfishly unknow it and sacrifice the world. But me being not particularly informed or intelligent and not having any power, I feel like if I forgot that AI was going to end the world, it would not make much difference.AARON: You know what I mean? I agree that it's like, yes, it is unlikely for either of us to tip the scales, but.SARAH: Maybe you can't.AARON: No, actually, in terms of, yeah, I'm probably somewhat more technically knowledgeable just based on what I know about you. Maybe I'm wrong.SARAH: No, you're definitely right.AARON: It's sort of just like a probabilities thing. I do think that ‘doom' - that word - is too simplified, often too simple to capture what people really care about. But if you just want to say doom versus no doom or whatever, AI doom versus no AI doom. Maybe there's like a one in 100,000 chance that one of us tips the scales. And that's important. Maybe even, like, one in 10,000. Probably not. Probably not.SARAH: One in 10,000. Wow.AARON: But that's what people do. People vote, even though this is old 80k material I'm regurgitating because they basically want to make the case for why even if you're not. Or in some article they had from a while ago, they made a case for why doing things that are unlikely to counterfactually matter can still be amazingly good. And the classic example, just voting if you're in a tight race, say, in a swing state in the United States, and it could go either way. Yeah. It might be pretty unlikely that you are the single swing vote, but it could be one in 100,000. And that's not crazy.SARAH: It doesn't take very much effort to vote, though.AARON: Yeah, sure. But I think the core justification, also, the stakes are proportionally higher here, so maybe that accounts for some. But, yes, you're absolutely right. Definitely different amounts of effort.SARAH: Putting in any effort to saving the world from AI. I wouldn't say that. I wouldn't say that I'm sacrificing.AARON: I don't even know if I like. No. Maybe it doesn't feel like a sacrifice. Maybe it isn't. But I do think there's, like, a lot. There's at least something to be. I don't know if this really checks out, but I would, like, bet that it does, which is that more reasonably, at least calibrated. I wanted to say reasonably well informed. But really what it is is, like, some level of being informed and, like, some level of knowing what you don't know or whatever, and more just like, normal. Sorry. I hope normal is not like a bat. I'm saying not like tech Bros, I guess so more like non tech bros. People who are not coded as tech bros. Talking about this on a public platform just seems actually, in fact, pretty good.SARAH: As long as we like, literally just people that aren't men as well. No offense.AARON: Oh, no, totally. Yeah.SARAH: Where are all the women? There's a few.AARON: There's a few that are super. I don't know, like, leaders in some sense, like Ajeya Cotra and Katja Grace. But I think the last EA survey was a third. Or I could be butchering this or whatever. And maybe even within that category, there's some variation. I don't think it's 2%.SARAH: Okay. All right. Yeah.AARON: Like 15 or 20% which is still pretty low.SARAH: No, but that's actually better than I would have thought, I think.AARON: Also, Twitter is, of all the social media platforms, especially mail. I don't really know.SARAH: Um.AARON: I don't like Instagram, I think.SARAH: I wonder, it would be interesting to see whether or not that's much, if it's become more male dominated since Elon Musk took.AARON: It's not a huge difference, but who knows?SARAH: I don't know. I have no idea. I have no idea. We'll just be interesting to know.AARON: Okay. Wait. Also, there's no scheduled time. I'm very happy to keep talking or whatever, but as soon as you want to take a break or hop off, just like. Yeah.SARAH: Oh, yeah. I'm in no rush.AARON: Okay, well, I don't know. We've talked about the two obvious candidates. Do you have a take or something? Want to get out to the world? It's not about AI or obesity or just a story you want to share.SARAH: These are my two pet subjects. I don't know anything else.AARON: I don't believe you. I know you know about house plants.SARAH: I do. A secret, which you can't tell anyone, is that I actually only know about house plants that are hard to kill, and I'm actually not very good at taking care of them.AARON: Well, I'm glad it's house plants in that case, rather than pets. Whatever.SARAH: Yeah. I mean, I have killed some sea monkeys, too, but that was a long time ago.AARON: Yes. So did I, actually.SARAH: Did you? I feel like everyone has. Everyone's got a little sea monkey graveyard in their past.AARON: New cause area.SARAH: Are there more shrimp or more sea monkeys? That's the question.AARON: I don't even know what even. I mean, are they just plankton?SARAH: No, they're not plankton.AARON: I know what sea monkeys are.SARAH: There's definitely a lot of them because they're small and insignificant.AARON: Yeah, but I also think we don't. It depends if you're talking about in the world, which I guess probably like sea monkeys or farmed for food, which is basically like. I doubt these are farmed either for food or for anything.SARAH: Yeah, no, you're probably right.AARON: Or they probably are farmed a tiny bit for this niche little.SARAH: Or they're farmed to sell in aquariums for kids.AARON: Apparently. They are a kind of shrimp, but they were bred specifically to, I don't know, be tiny or something. I'm just skimming that, Wikipedia. Here.SARAH: Sea monkeys are tiny shrimp. That is crazy.AARON: Until we get answers, tell me your life story in whatever way you want. It doesn't have to be like. I mean, hopefully not. Don't straight up lie, but wherever you want to take that.SARAH: I'm not going to lie. I'm just trying to think of ways to make it spicier because it's so average. I don't know what to say about it.AARON: Well, it's probably not that average, right? I mean, it might be average among people you happen to know.SARAH: Do you have any more specific questions?AARON: Okay, no. Yeah, hold on. I have a meta point, which is like, I think the people who are they have a thing on the top of their mind, and if I give any sort of open ended question whatsoever, they'll take it there and immediately just start giving slinging hot takes. But thenOther people, I think, this category is very EA. People who aren't, especially my sister, they're like, “No, I have nothing to talk about. I don't believe that.” But they're not, I guess, as comfortable.SARAH: No, I mean, I have. Something needs to trigger them in me. Do you know what I mean? Yeah, I need an in.AARON: Well, okay, here's one. Is there anything you're like, “Maybe I'll cut this. This is kind of, like narcissistic. I don't know. But is there anything you want or curious to ask?” This does sound kind of weird. I don't know. But we can cut it if need be.SARAH: What does the looking glass in your Twitter name mean? Because I've seen a bunch of people have this, and I actually don't know what it means, but I was like, no.AARON: People ask this. I respond to a tweet that's like, “What does that like?” At least, I don't know, once every month or two. Or know basically, like Spencer Greenberg. I don't know if you're familiar with him. He's like a sort of.SARAH: I know the know.AARON: He literally just tweeted, like a couple years ago. Put this in your bio to show that you really care about finding the truth or whatever and are interested in good faith conversations. Are you familiar with the scout mindset?SARAH: Yeah.AARON: Julia Galef. Yeah. That's basically, like the short version.SARAH: Okay.AARON: I'm like, yeah, all right. And there's at least three of us who have both a magnifying glass. Yeah. And a pause thing, which is like, my tightest knit online community I guess.SARAH: I think I've followed all the pause people now. I just searched the emoji on Twitter, and I just followed everyone. Now I can't find. And I also noticed when I was doing this, that some people, if they've suspended their account or they're taking time off, then they put a pause in their thing. So I was, like, looking, and I was like, oh, these are, like, AI people. But then they were just, like, in their bio, they were, like, not tweeting until X date. This is a suspended account. And I was like, I see we have a messaging problem here. Nice. I don't know how common that actually.AARON: Was. I'm glad. That was, like, a very straightforward question. Educated the masses. Max Alexander said Glee. Is that, like, the show? You can also keep asking me questions, but again, this is like.SARAH: Wait, what did he say? Is that it? Did he just say glee? No.AARON: Not even a question mark. Just the word glee.SARAH: Oh, right. He just wants me to go off about Glee.AARON: Okay. Go off about. Wait, what kind of Glee are we? Vaguely. This is like a show or a movie or something.SARAH: Oh, my God. Have you not seen it?AARON: No. I mean, I vaguely remember, I think, watching some TV, but maybe, like, twelve years ago or something. I don't know.SARAH: I think it stopped airing in, like, maybe 2015?AARON: 16. So go off about it. I don't know what I. Yeah, I.SARAH: Don't know what to say about this.AARON: Well, why does Max think you might have a take about Glee?SARAH: I mean, I don't have a take about. Just see the thing. See? No, not even, like, I am just transparently extremely lame. And I really like cheesy. I'm like. I'm like a musical theater kid. Not even ironically. I just like show tunes. And Glee is just a show about a glee club at a high school where they sing show tunes and there's, like, petty drama, and people burst into song in the hallways, and I just think it's just the most glorious thing on Earth. That's it. There are no hot takes.AARON: Okay, well, that's cool. I don't have a lot to say, unfortunately, but.SARAH: No, that's totally fine. I feel like this is not a spicy topic for us to discuss. It's just a good time.AARON: Yeah.SARAH: Wait.AARON: Okay. Yeah. So I do listen to Hamilton on Spotify.SARAH: Okay.AARON: Yeah, that's about it.SARAH: I like Hamilton. I've seen it three times. Oh.AARON: Live or ever. Wow. Cool. Yeah, no, that's okay. Well, what do people get right or wrong about theater kids?SARAH: Oh, I don't know. I think all the stereotypes are true.AARON: I mean, that's generally true, but usually, it's either over moralized, there's like a descriptive thing that's true, but it's over moralized, or it's just exaggerated.SARAH: I mean, to put this in more context, I used to be in choir. I went every Sunday for twelve years. And then every summer we do a little summer school and we go away and put on a production. So we do a musical or something. So I have been. What have I been? I was in Guys and Dolls. I think I was just in the chorus for that. I was the reverend in Anything Goes. But he does unfortunately get kidnapped in like the first five minutes. So he's not a big presence. Oh, I've been Tweedle dumb in Alice in Wonderland. I could go on, but right now as I'm saying this, I'm looking at my notice board and I have two playbills from when I went to Broadway in April where I saw Funny Girl and Hadestown.SARAH: I went to New York.AARON: Oh, cool. Oh yeah. We can talk about when you're moving to the United States. However.SARAH: I'm not going to do that. Okay.AARON: I know. I'm joking. I mean, I don't know.SARAH: I don't think I'm going to do that. I don't know. It just seems like you guys have got a lot going on over there. It seems like things aren't quite right with you guys. Things aren't quite right with us either.AARON: No, I totally get this. I think it would be cool. But also I completely relate to not wanting to. I've lived within 10 miles of one. Not even 10 miles, 8 miles in one location. Obviously gone outside of that. But my entire life.SARAH: You've just always lived in DC.AARON: Yeah, either in DC or. Sorry. But right now in Maryland, it's like right next to DC on the Metro or at Georgia University, which is in the trying to think would I move to the UK. Like I could imagine situations that would make me move to the UK. But it would still be annoying. Kind of.SARAH: Yeah, I mean, I guess it's like they're two very similar places, but there are all these little cultural things which I feel like kind of trip you up.AARON: I don't to. Do you want to say what?SARAH: Like I think people, I just like, I don't know. I don't have that much experience because I've only been to America twice. But people seem a lot more sincere in a way that you don't really get that. Like people are just never really being upfront. And in America, I just got the impression that people just have less of a veneer up, which is probably a good thing. But it's really hard to navigate if you're not used to it or something. I don't know how to describe that.AARON: Yeah, I've definitely heard this at least. And yeah, I think it's for better and for worse.SARAH: Yeah, I think it's generally a good thing.AARON: Yeah.SARAH: But it's like there's this layer of cynicism or irony or something that is removed and then when it's not there, it's just everything feels weak. I can't describe it.AARON: This is definitely, I think, also like an EA rationalist thing. I feel like I'm pretty far on the spectrum. Towards the end of surgical niceties are fine, but I don't know, don't obscure what you really think unless it's a really good reason to or something. But it can definitely come across as being rude.SARAH: Yeah. No, but I think it's actually a good rule of thumb to obscure what you. It's good to try not to obscure what you think most of the time, probably.Ably, I don't know, but I would love to go over temporarily for like six months or something and just hang out for a bit. I think that'd be fun. I don't know if I would go back to New York again. Maybe. I like the bagels there.AARON: I should have a place. Oh yeah. Remember, I think we talked at some point. We can cut this out if you like. Don't if either of us doesn't want it in. But we discussed, oh yeah, I should be having a place. You can. I emailed the landlord like an hour before this. Hopefully, probably more than 50%. That is still an offer. Yeah, probably not for all six months, but I don't know.SARAH: I would not come and sleep on your sofa for six months. That would be definitely impolite and very weird.AARON: Yeah. I mean, my roommates would probably grumble.SARAH: Yeah. They would be like.AARON: Although I don't know. Who knows? I wouldn't be shocked if people were actually like, whatever somebody asked for as a question. This is what he said. I might also be interested in hearing how different backgrounds. Wait, sorry. This is not good grammar. Let me try to parse this. Not having a super hardcore EA AI rationalist background shape how you think or how you view AI as rationality?SARAH: Oh, that's a good question. I think it's more happening the other way around, the more I hang around in these circles. You guys are impacting how I think.AARON: It's definitely true for me as well.SARAH: Seeping into my brain and my language as well. I've started talking differently. I don't know. That's a good question, though. Yeah. One thing that I will say is that there are certain things that I find irritating about the EA way of style of doing things. I think one specific, I don't know, the kind of like hand ring about everything. And I know that this is kind of the point, right? But it's kind of like, you know, when someone's like, I want to take a stance on something, but then whenever they want to take a stance on something, they feel the need to write like a 10,000 word blog post where they're thinking about the second and order and third and fifth order effects of this thing. And maybe this thing that seems good is actually bad for this really convoluted reason. That's just so annoying.AARON: Yeah.SARAH: Also understand that maybe that is a good thing to do sometimes, but it just seems like, I don't know how anyone ever gets anywhere. It seems like everyone must be paralyzed by indecision all the time because they just can't commit to ever actually just saying anything.AARON: I think this kind of thing is really good if you're trying to give away a billion dollars. Oh yes, I do want the billion dollar grantor to be thinking through second and third order effects of how they give away their billion dollars. But also, no, I am super. The words on the tip of my tongue, not overwhelmed but intimidated when I go on the EA forum because the posts, none of them are like normal, like five paragraph essays. Some of them are like, I think one of them I looked up for fun because I was going to make a meme about it and still will. Probably was like 30,000 words or something. And even the short form posts, which really gets me kind of not even annoyed. I don't know, maybe kind of annoyed is that the short form posts, which is sort of the EA forum version of Twitter, are way too high quality, way too intimidating. And so maybe I should just suck it up and post stuff anyway more often. It just feels weird. I totally agree.SARAH: I was also talking to someone recently about how I lurked on the EA forum and less wrong for months and months and I couldn't figure out the upvoting system and I was like, am I being stupid or why are there four buttons? And I was like, well, eventually I had to ask someone because I couldn't figure it out. And then he explained it to me and I was like, that is just so unnecessary. Like, just do it.AARON: No, I do know what you mean.SARAH: I just tI think it's annoying. It pisses me off. I just feel like sometimes you don't need to add more things. Sometimes less is good. Yeah, that's my hot take. Nice things.AARON: Yeah, that's interesting.SARAH: But actually, a thing that I like that EA's do is the constant hedging and caveatting. I do find it kind of adorable. I love that because it's like you're having to constantly acknowledge that you probably didn't quite articulate what you really meant and that you're not quite making contact with reality when you're talking. So you have to clarify that you probably were imprecise when you said this thing. It's unnecessary, but it's kind of amazing.AARON: No, it's definitely. I am super guilty of this because I'll give an example in a second. I think I've been basically trained to try pretty hard, even in normal conversation with anybody, to just never say anything that's literally wrong. Or at least if I do caveat it.AARON: I was driving home, me and my parents and I, unless visited, our grandparents were driving back, and we were driving back past a cruise ship that was in a harbor. And my mom, who was driving at the time, said, “Oh, Aaron, can you see if there's anyone on there?” And I immediately responded like, “Well, there's probably at least one person.” Obviously, that's not what she meant. But that was my technical best guess. It's like, yes, there probably are people on there, even though I couldn't see anybody on the decks or in the rooms. Yeah, there's probably a maintenance guy. Felt kind of bad.SARAH: You can't technically exclude that there are, in fact, no people.AARON: Then I corrected myself. But I guess I've been trained into giving that as my first reaction.SARAH: Yeah, I love that. I think it's a waste of words, but I find it delightful.AARON: It does go too far. People should be more confident. I wish that, at least sometimes, people would say, “Epistemic status: Want to bet?” or “I am definitely right about this.” Too rarely do we hear, "I'm actually pretty confident here.SARAH: Another thing is, people are too liberal with using probabilities. The meaning of saying there is an X percent chance of something happening is getting watered down by people constantly saying things like, “I would put 30% on this claim.” Obviously, there's no rigorous method that's gone into determining why it's 30 and not 35. That's a problem and people shouldn't do that. But I kind of love it.AARON: I can defend that. People are saying upfront, “This is my best guess. But there's no rigorous methodology.” People should take their word for that. In some parts of society, it's seen as implying that a numeric probability came from a rigorous model. But if you say, “This is my best guess, but it's not formed from anything,” people should take their word for that and not refuse to accept them at face value.SARAH: But why do you have to put a number on it?AARON: It depends on what you're talking about. Sometimes probabilities are relevant and if you don't use numbers, it's easy to misinterpret. People would say, “It seems quite likely,” but what does that mean? One person might think “quite reasonably likely” means 70%, the other person thinks it means 30%. Even though it's weird to use a single number, it's less confusing.SARAH: To be fair, I get that. I've disagreed with people about what the word “unlikely” means. Someone's pulled out a scale that the government uses, or intelligence services use to determine what “unlikely” means. But everyone interprets those words differently. I see what you're saying. But then again, I think people in AI safety talking about P Doom was making people take us less seriously, especially because people's probabilities are so vibey.AARON: Some people are, but I take Paul Cristiano's word seriously.SARAH: He's a 50/50 kind of guy.AARON: Yeah, I take that pretty seriously.Obviously, it's not as simple as him having a perfect understanding of the world, even after another 10,000 hours of investigation. But it's definitely not just vibes, either.SARAH: No, I came off wrong there. I don't mean that everyone's understanding is just vibes.AARON: Yeah.SARAH: If you were looking at it from the outside, it would be really difficult to distinguish between the ones that are vibes and the ones that are rigorous, unless you carefully parsed all of it and evaluated everyone's background, or looked at the model yourself. If you're one step removed, it looks like people just spitting out random, arbitrary numbers everywhere.AARON: Yeah. There's also the question of whether P doom is too weird or silly, or if it could be easily dismissed as such.SARAH: Exactly, the moment anyone unfamiliar with this discussion sees it, they're almost definitely going to dismiss it. They won't see it as something they need to engage with.AARON: That's a very fair point. Aside from the social aspect, it's also a large oversimplification. There's a spectrum of outcomes that we lump into doom and not doom. While this binary approach can be useful at times, it's probably overdone.SARAH: Yeah, because when some people say doom, they mean everyone dies, while others mean everyone dies plus everything is terrible. And no one specifies what they mean. It is silly. But, I also find it kind of funny and I kind of love it.AARON: I'm glad there's something like that. So it's not perfect. The more straightforward thing would be to say P existential risk from AI comes to pass. That's the long version, whatever.SARAH: If I was in charge, I would probably make people stop using PDOOm. I think it's better to say it the long way around. But obviously I'm not in charge. And I think it's funny and kind of cute, so I'll keep using it.AARON: Maybe I'm willing to go along and try to start a new norm. Not spend my whole life on it, but say, I think this is bad for X, Y, and Z reasons. I'll use this other phrase instead and clarify when people ask.SARAH: You're going to need Twitter premium because you're going to need a lot more characters.AARON: I think there's a shorthand which is like PX risk or P AiX risk.SARAH: Maybe it's just the word doom that's a bit stupid.AARON: Yeah, that's a term out of the Bay Area rationalists.SARAH: But then I also think it kind of makes the whole thing seem less serious. People should be indignant to hear that this meme is being used to trade probabilities about the likelihood that they're going to die and their families are going to die. This has been an in-joke in this weird niche circle for years and they didn't know about it. I'm not saying that in a way to morally condemn people, but if you explain this to people…People just go to dinner parties in Silicon Valley and talk about this weird meme thing, and what they really mean is the ODs know everyone's going to prematurely die. People should be outraged by that, I think.AARON: I disagree that it's a joke. It is a funny phrase, but the actual thing is people really do stand by their belief.SARAH: No, I totally agree with that part. I'm not saying that people are not being serious when they give their numbers, but I feel like there's something. I don't know how to put this in words. There's something outrageous about the fact that for outsiders, this conversation has been happening for years and people have been using this tongue-in-cheek phrase to describe it, and 99.9% of people don't know that's happening. I'm not articulating this very well.AARON: I see what you're saying. I don't actually think it's like. I don't know a lot of jargon.SARAH: But when I first found out about this, I was outraged.AARON: I honestly just don't share that intuition. But that's really good.SARAH: No, I don't know how to describe this.AARON: I think I was just a little bit indignant, perhaps.SARAH: Yeah, I was indignant about it. I was like, you guys have been at social events making small talk by discussing the probability of human extinction all this time, and I didn't even know. I was like, oh, that's really messed up, guys.AARON: I feel like I'm standing by the rational tier because, it was always on. No one was stopping you from going on less wrong or whatever. It wasn't behind closed.SARAH: Yeah, but no one ever told me about it.AARON: Yeah, that's like a failure of outreach, I suppose.SARAH: Yeah. I think maybe I'm talking more about. Maybe the people that I'm mad at is the people who are actually working on capabilities and using this kind of jargon. Maybe I'm mad at those people. They're fine.AARON: Do we have more questions? I think we might have more questions. We have one more. Okay, sorry, but keep going.SARAH: No, I'm going to stop making that point now because I don't really know what I'm trying to say and I don't want to be controversial.AARON: Controversy is good for views. Not necessarily for you. No, thank you for that. Yes, that was a good point. I think it was. Maybe it was wrong. I think it seems right.SARAH: It was probably wrong.Shrimp Welfare: A Serious DiscussionAARON: I don't know what she thinks about shrimp welfare. Oh, yeah. I think it's a general question, but let's start with that. What do you think about shrimp? Well, today.SARAH: Okay. Is this an actual cause area or is this a joke about how if you extrapolate utilitarianism to its natural conclusion, you would really care about shrimp?AARON: No, there's a charity called the Shrimp Welfare Initiative or project. I think it's Shrimp Welfare Initiative. I can actually have a rant here about how it's a meme that people find amusing. It is a serious thing, but I think people like the meme more than they're willing to transfer their donations in light of it. This is kind of wrong and at least distasteful.No, but there's an actual, if you Google, Shrimp Welfare Project. Yeah, it's definitely a thing, but it's only a couple of years old. And it's also kind of a meme because it does work in both ways. It sort of shows how we're weird, but in the sense that we are willing to care about things that are very different from us. Not like we're threatening other people. That's not a good description.SARAH: Is the extreme version of this position that we should put more resources into improving the lives of shrimp than into improving the lives of people just because there are so many more shrimp? Are there people that actually believe that?AARON: Well, I believe some version of that, but it really depends on who the ‘we' is there.SARAH: Should humanity be putting more resources?AARON: No one believes that as far as I know.SARAH: Okay. Right. So what is the most extreme manifestation of the shrimp welfare position?AARON: Well, I feel like my position is kind of extreme, and I'm happy to discuss it. It's easier than speculating about what the more extreme ones are. I don't think any of them are that extreme, I guess, from my perspective, because I think I'm right.SARAH: Okay, so what do you believe?AARON: I think that most people who have already decided to donate, say $20, if they are considering where to donate it and they are better morally, it would be better if they gave it to the shrimp welfare project than if they gave it to any of the commonly cited EA organizations.SARAH: Malaria nets or whatever.AARON: Yes. I think $20 of malaria nets versus $20 of shrimp. I can easily imagine a world where it would go the other way. But given the actual situation, the $20 of shrimp is much better.SARAH: Okay. Is it just purely because there's just more shrimp? How do we know how much shrimp suffering there is in the world?AARON: No, this is an excellent question. The numbers are a key factor, but no, it's not as simple. I definitely don't think one shrimp is worth one human.SARAH: I'm assuming that it's based on the fact that there are so many more shrimp than there are people that I don't know how many shrimp there are.AARON: Yeah, that's important, but at some level, it's just the margin. What I think is that when you're donating money, you should give to wherever it does the most good, whatever that means, whatever you think that means. But let's just leave it at that. The most good is morally best at the margin, which means you're not donating where you think the world should or how you think the world should expend its trillion dollar wealth. All you're doing is adding $20 at this current level, given the actual world. And so part of it is what you just said, and also including some new research from Rethink Priorities.Measuring suffering in reasonable ranges is extremely hard to do. But I believe it's difficult to do a better job than raising priorities on that, given what I've seen. I can provide some links. There are a few things to consider here: numbers, times, and the enormity of suffering. I think there are a couple of key elements, including tractability.Are you familiar with the three-pronged concept people sometimes discuss, which encompasses tractability, and neglectedness?SARAH: Okay.AARON: Importance is essentially what we just mentioned. Huge numbers and plausible amounts of suffering. When you try to do the comparison, it seems like they're a significant concern. Tractability is another factor. I think the best estimates suggest that a one-dollar donation could save around 10,000 shrimp from a very painful death.SARAH: In that sense…AARON: You could imagine that even if there were a hundred times more shrimp than there actually are, we have direct control over how they live and die because we're farming them. The industry is not dominated by wealthy players in the United States. Many individual farmers in developing nations, if educated and provided with a more humane way of killing the shrimp, would use it. There's a lot of potential for improvement here. This is partly due to the last prong, neglectedness, which is really my focus.SARAH: You're saying no one cares about the shrimp.AARON: I'm frustrated that it's not taken seriously enough. One of the reasons why the marginal cost-effectiveness is so high is because large amounts of money are donated to well-approved organizations. But individual donors often overlook this. They ignore their marginal impact. If you want to see even a 1% shift towards shrimp welfare, the thing to do is to donate to shrimp welfare. Not donate $19 to human welfare and one dollar to shrimp welfare, which is perhaps what they think the overall portfolio should be.SARAH: Interesting. I don't have a good reason why you're wrong. It seems like you're probably right.AARON: Let me put the website in the chat. This isn't a fair comparison since it's something I know more about.SARAH: Okay.AARON: On the topic of obesity, neither of us were more informed than the other. But I could have just made stuff up or said something logically fallacious.SARAH: You could have told me that there were like 50 times the number of shrimp in the world than there really are. And I would have been like, sure, seems right.AARON: Yeah. And I don't know, if I…If I were in your position, I would say, “Oh, yeah, that sounds right.” But maybe there are other people who have looked into this way more than me that disagree, and I can get into why I think it's less true than you'd expect in some sense.SARAH: I just wonder if there's like… This is like a deeply non-EA thing to say. So I don't know, maybe I shouldn't say it, but are there not any moral reasons? Is there not any good moral philosophy behind just caring more about your own species than other species? If you're sorry, but that's probably not right, is it? There's probably no way to actually morally justify that, but it seems like it feels intuitively wrong. If you've got $20 to be donating 19 of them to shrimp and one to children with malaria, that feels like there should be something wrong with that, but I can't tell you what it is.AARON: Yeah, no, there is something wrong, which is that you should donate all 20 because they're acting on the margin, for one thing. I do think that doesn't check out morally, but I think basically me and everybody I know in terms of real life or whatever, I do just care way more about humans. I don't know, for at least the people that it's hard to formalize or specify what you mean by caring about or something. But, yeah, I think you can definitely basically just be a normal human who basically cares a lot about other humans. And still that's not like, negated by changing your $20 donation or whatever. Especially because there's nothing else that I do for shrimp. I think you should be like a kind person or something. I'm like an honest person, I think. Yeah, people should be nice to other humans. I mean, you should be nice in the sense of not beating them. But if you see a pigeon on the street, you don't need to say hi or whatever, give it a pet, because. I don't know. But yeah, you should be basically like, nice.SARAH: You don't stop to say hi to every pigeon that you see on the way to anywhere.AARON: I do, but I know most normal people don't.SARAH: This is why I'm so late to everything, because I have to do it. I have to stop for every single one. No exceptions.AARON: Yeah. Or how I think about it is sort of like a little bit of compartmentalization, which I think is like… Which is just sort of like a way to function normally and also sort of do what you think really checks out at the end of the day, just like, okay, 99% of the time I'm going to just be like a normal person who doesn't care about shrimp. Maybe I'll refrain from eating them. But actually, even that is like, I could totally see a person just still eating them and then doing this. But then during the 1% of the time where you're deciding how to give money away and none of those, the beneficiaries are going to be totally out of sight either way. This is like a neutral point, I guess, but it's still worth saying, yeah, then you can be like a hardcore effective altruist or whatever and then give your money to the shrimp people.SARAH: Do you have this set up as like a recurring donation?AARON: Oh, no. Everybody should call me out as a hypocrite because I haven't donated much money, but I'm trying to figure out actually, given that I haven't had a stable income ever. And maybe, hopefully I will soon, actually. But even then, it's still a part-time thing. I haven't been able to do sort of standard 10% or more thing, and I'm trying to figure out what the best thing to do or how to balance, I guess, not luxury, not like consumption on things that I… Well, to some extent, yeah. Maybe I'm just selfish by sometimes getting an Uber. That's totally true. I think I'm just a hypocrite in that respect. But mostly I think the trade-off is between saving, investing, and giving. Beast of the money that I have saved up and past things. So this is all sort of a defense of why I don't have a recurring donation going on.SARAH: I'm not asking you to defend yourself because I do not do that either.AARON: I think if I was making enough money that I could give away $10,000 a year and plan on doing that indefinitely, I would be unlikely to set up a recurring donation. What I would really want to do is once or twice a year, really try to prioritize deciding on how to give it away rather than making it the default. This has a real cost for charities. If you set up a recurring donation, they have more certainty in some sense of their future cash flow. But that's only good to do if you're really confident that you're going to want to keep giving there in the future. I could learn new information that says something else is better. So I don't think I would do that.SARAH: Now I'm just thinking about how many shrimp did you say it was per dollar?AARON: Don't quote me. I didn't say an actual thing.SARAH: It was like some big number. Right. Because I just feel like that's such a brainworm. Imagine if you let that actually get in your head and then every time you spend some unnecessary amount of money on something you don't really need, you think about how many shrimp you just killed by getting an Uber or buying lunch out. That is so stressful. I think I'm going to try not to think about that.AARON: I don't mean to belittle this. This is like a core, I think you're new to EA type of thinking. It's super natural and also troubling when you first come upon it. Do you want me to talk about how I, or other people deal with that or take action?SARAH: Yeah, tell me how to get the shrimp off my conscience.AARON: Well, for one thing, you don't want to totally do that. But I think the main thing is that the salience of things like this just decreases over time. I would be very surprised if, even if you're still very engaged in the EA adjacent communities or EA itself in five years, that it would be as emotionally potent. Brains make things less important over time. But I think the thing to do is basically to compartmentalize in a sort of weird sense. Decide how much you're willing to donate. And it might be hard to do that, but that is sort of a process. Then you have that chunk of money and you try to give it away the best you can under whatever you think the best ethics are. But then on the daily, you have this other set pot of money. You just are a normal person. You spend it as you wish. You don't think about it unless you try not to. And maybe if you notice that you might even have leftover money, then you can donate the rest of it. But I really do think picking how much to give should sort of be its own project. And then you have a pile of money you can be a hardcore EA about.SARAH: So you pick a cut off point and then you don't agonize over anything over and above that.AARON: Yeah. And then people, I mean, the hard part is that if somebody says their cut off point is like 1% of their income and they're making like $200,000, I don't know. Maybe their cut off point should be higher. So there is a debate. It depends on that person's specific situation. Maybe if they have a kid or some super expensive disease, it's a different story. If you're just a random guy making $200,000, I think you should give more.SARAH: Maybe you should be giving away enough to feel the pinch. Well, not even that. I don't think I'm going to do that. This is something that I do actually want to do at some point, but I need to think about it more and maybe get a better job.AARON: Another thing is, if you're wanting to earn to give as a path to impact, you could think and strive pretty hard. Maybe talk to people and choose your education or professional development opportunities carefully to see if you can get a better paying job. That's just much more important than changing how much you give from 10% to 11% or something. You should have this macro level optimization. How can I have more money to spend? Let me spend, like, I don't know, depends what life stage you are, but if you had just graduated college or maybe say you're a junior in college or something. It could make sense to spend a good amount of time figuring out what that path might look like.AARON: I'm a huge hypocrite because I definitely haven't done all this nearly as much as I should, but I still endorse it.SARAH: Yeah, I think it's fine to say what you endorse doing in an ideal world, even if you're not doing that, that's fine.AARON: For anybody listening, I tweeted a while ago, asking if anyone has resources on how to think about giving away wealth. I'm not very wealthy but have some amount of savings. It's more than I really need. At the same time, maybe I should be investing it because EA orgs don't feel like, or they think they can't invest it because there's potentially a lot of blowback if they make poor investments, even though it would be higher expected value.There's also the question of, okay, having some amount of savings allows me to take higher, potentially somewhat higher risk, but higher value opportunities because I have a cushion. But I'm very confused about how to give away what I should do here. People should DM me on Twitter or anywhere they have ideas.SARAH: I think you should calculate how much you need to cover your very basic needs. Maybe you should work out, say, if you were working 40 hours a week in a minimum wage job, like how much would you make then? And then you should keep that for yourself. And then the rest should definitely all go to the shrimp. Every single penny. All of it.AARON: This is pretty plausible. Just to make it more complicated, there's also the thing that I feel like my estimates or my best guesses of the best charities to give to over time has changed. And so there's like two competing forces. One is that I might get wiser and more knowledgeable as time goes on. The other one is that in general, giving now is better than giving later. All else equal, because I think for a couple of reasons, the main one just being that the charities don't know that you're going to give later.AARON: So it's like they can plan for the future much better if they get money now. And also there's just higher leverage opportunities or higher value per dollar opportunities now in general than there will be later for a couple of reasons I don't really need to. This is what makes it really complicated. So I've donated in the past to places that I don't think, or I don't think even at the time were the best to. So then there's a question of like, okay, how long do I save this money? Do I sit on it for months until I'm pretty confident, like a year.AARON: I do think that probably over the course of zero to five years or something, becoming more confident or changing your mind is like the stronger effect than how much good you give to the, or how much better it is for the charities to give now instead of later. But also that's weird because you're never committing at all.Sometimes you might decide to give it away, and maybe you won't. Maybe at that time you're like, “Oh, that's what I want. A car, I have a house, whatever.” It's less salient or something. Maybe something bad happened with EA and you no longer identify that way. Yeah, there's a lot of really thorny considerations. Sorry, I'm talking way too much.SARAH: Long, are you factoring AI timelines into this?AARON: That makes it even more sketchy. But that could also go both ways. On one hand, you have the fact that if you don't give away your money now and you die with it, it's never going to do any good. The other thing is that it might be that especially high leverage opportunities come in the future or something potentially you need, I don't know, whatever I can imagine I could make something up about. OpenPhil needs as much money as it can get to do X, Y and Z. It's really important right now, but I won't know that until a few years down the line. So just like everything else, it doesn't neatly wash out.SARAH: What do you think the AGI is going to do to the shrimp? I reckon it's probably pretty neat, like one shrimp per paperclip. Maybe you could get more. I wonder what the sort of shrimp to paperclip conversion rate is.AARON: Has anyone looked into that morally? I think like one to zero. I don't think in terms of money. You could definitely price that. I have no idea.SARAH: I don't know. Maybe I'm not taking this as seriously as I should be because I'm.AARON: No, I mean, humor is good. When people are giving away money or deciding what to do, they should be serious. But joking and humor is good. Sorry, go ahead.SARAH: No, you go ahead.AARON: I had a half-baked idea. At EA Global, they should have a comedy show where people roast everybody, but it's a fundraiser. You have to pay to get 100 people to attend. They have a bidding contest to get into the comedy show. That was my original idea. Or they could just have a normal comedy show. I think that'd be cool.SARAH: Actually, I think that's a good idea because you guys are funny. There is a lot of wit on this side of Twitter. I'm impressed.AARON: I agree.SARAH: So I think that's a very good idea.AARON: Okay. Dear Events team: hire Aaron Bergman, professional comedian.SARAH: You can just give them your Twitter as a source for how funny you are, and that clearly qualifies you to set this up. I love it.AARON: This is not important or related to anything, but I used to be a good juggler for entertainment purposes. I have this video. Maybe I should make sure the world can see it. It's like a talent show. So maybe I can do that instead.SARAH: Juggling. You definitely should make sure the world has access to this footage.AARON: It had more views than I expected. It wasn't five views. It was 90 or something, which is still nothing.SARAH: I can tell you a secret right now if you want. That relates to Max asking in the chat about glee.AARON: Yes.SARAH: This bit will also have to edit out, but me having a public meltdown over AI was the second time that I've ever blown up on the Internet. The first time being. I can't believe I'm telling you this. I think I'm delirious right now. Were you ever in any fandoms, as a teenager?AARON: No.SARAH: Okay. Were you ever on Tumblr?AARON: No. I sort of know what the cultural vibes were. I sort of know what you're referring to. There are people who like Harry Potter stuff and bands, like Kpop stuff like that.SARAH: So people would make these fan videos where they'd take clips from TV shows and then they edit them together to music. Sometimes people would edit the clips to make it look like something had happened in the plot of the show that hadn't actually happened. For example, say, what if X character had died? And then you edit the clips together to try and make it look like they've died. And you put a sad song, how to save a life by the fray or something, over the top. And then you put it on YouTube.AARON: Sorry, tell me what…"Hat I should search or just send the link here. I'm sending my link.SARAH: Oh, no, this doesn't exist anymore. It does not exist anymore. Right? So, say if you're, like, eleven or twelve years old and you do this, and you don't even have a mechanism to download videos because you don't know how to do technology. Instead, you take your little iPod touch and you just play a YouTube video on your screen, and you literally just film the screen with your iPod touch, and that's how you're getting the clips. It's kind of shaky because you're holding the camera anyway.SARAH: Then you edit together on the iMovie app of your iPod touch, and then you put it on the Internet, and then you just forget about it. You forget about it. Two years later, you're like, oh, I wonder what happened to that YouTube account? And you log in and this little video that you've made with edited clips that you've filmed off the screen of your laptop to ‘How To Save Life' by The Fray with clips from Glee in it, has nearly half a million views.AARON: Nice. Love it.SARAH: Embarrassing because this is like, two years later. And then all the comments were like, oh, my God, this was so moving. This made me cry. And then obviously, some of them were hating and being like, do you not even know how to download video clips? Like, what? And then you're so embarrassed.AARON: I could totally seem it. Creative, but only a reasonable solution. Yeah.SARAH: So that's my story of how I went viral when I was like, twelve.AARON: It must have been kind of overwhelming.SARAH: Yeah, it was a bit. And you can tell that my time, it's like 20 to eleven at night, and now I'm starting to really go off on one and talk about weird things.AARON: Like an hour. So, yeah, we can wrap up. And I always say this, but it's actually true. Which is that low standard, like, low stakes or low threshold. Low bar for doing that in recording some of the time.SARAH: Yeah, probably. We'll have to get rid of the part about how I went viral on YouTube when I was twelve. I'll sleep on that.AARON: Don't worry. I'll send the transcription at some point soon.SARAH: Yeah, cool.AARON: Okay, lovely. Thank you for staying up late into the night for this.SARAH: It's not that late into the night. I'm just like, lame and go to bed early.AARON: Okay, cool. Yeah, I know. Yeah, for sure. All right, bye. Get full access to Aaron's Blog at www.aaronbergman.net/subscribe
Platformer's Casey Newton moderates a conversation at Code 2023 on ethics in artificial intelligence, with Ajeya Cotra, Senior Program Officer at Open Philanthropy, and Helen Toner, Director of Strategy at Georgetown University's Center for Security and Emerging Technology. The panel discusses the risks and rewards of the technology, as well as best practices and safety measures. Recorded on September 27th in Los Angeles. Learn more about your ad choices. Visit podcastchoices.com/adchoices
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What I would do if I wasn't at ARC Evals, published by Lawrence Chan on September 5, 2023 on The AI Alignment Forum. In which: I list 9 projects that I would work on if I wasn't busy working on safety standards at ARC Evals, and explain why they might be good to work on. Epistemic status: I'm prioritizing getting this out fast as opposed to writing it carefully. I've thought for at least a few hours and talked to a few people I trust about each of the following projects, but I haven't done that much digging into each of these, and it's likely that I'm wrong about many material facts. I also make little claim to the novelty of the projects. I'd recommend looking into these yourself before committing to doing them. (Total time spent writing or editing this post: ~8 hours.) Standard disclaimer: I'm writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of ARC/FAR/LTFF/Lightspeed or any other org or program I'm involved with. Thanks to Ajeya Cotra, Caleb Parikh, Chris Painter, Daniel Filan, Rachel Freedman, Rohin Shah, Thomas Kwa, and others for comments and feedback. Introduction I'm currently working as a researcher on the Alignment Research Center Evaluations Team (ARC Evals), where I'm working on lab safety standards. I'm reasonably sure that this is one of the most useful things I could be doing with my life. Unfortunately, there's a lot of problems to solve in the world, and lots of balls that are being dropped, that I don't have time to get to thanks to my day job. Here's an unsorted and incomplete list of projects that I would consider doing if I wasn't at ARC Evals: Ambitious mechanistic interpretability. Getting people to write papers/writing papers myself. Creating concrete projects and research agendas. Working on OP's funding bottleneck. Working on everyone else's funding bottleneck. Running the Long-Term Future Fund. Onboarding senior(-ish) academics and research engineers. Extending the young-EA mentorship pipeline. Writing blog posts/giving takes. I've categorized these projects into three broad categories and will discuss each in turn below. For each project, I'll also list who I think should work on them, as well as some of my key uncertainties. Note that this document isn't really written for myself to decide between projects, but instead as a list of some promising projects for someone with a similar skillset to me. As such, there's not much discussion of personal fit. If you're interested in working on any of the projects, please reach out or post in the comments below! Relevant beliefs I have Before jumping into the projects I think people should work on, I think it's worth outlining some of my core beliefs that inform my thinking and project selection: Importance of A(G)I safety: I think A(G)I Safety is one of the most important problems to work on, and all the projects below are thus aimed at AI Safety. Value beyond technical research: Technical AI Safety (AIS) research is crucial, but other types of work are valuable as well. Efforts aimed at improving AI governance, grantmaking, and community building are important and we should give more credit to those doing good work in those areas. High discount rate for current EA/AIS funding: There's several reasons for this: first, EA/AIS Funders are currently in a unique position due to a surge in AI Safety interest without a proportional increase in funding. I expect this dynamic to change and our influence to wane as additional funding and governments enter this space. Second, efforts today are important for paving the path to future efforts in the future. Third, my timelines are relatively short, which increases the importance of current funding. Building a robust EA/AIS ecosystem: The EA/AIS ecosystem should be more prepared for unpredictable s...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Benchmarks for Detecting Measurement Tampering [Redwood Research], published by Ryan Greenblatt on September 5, 2023 on The AI Alignment Forum. TL;DR: This post discusses our recent empirical work on detecting measurement tampering and explains how we see this work fitting into the overall space of alignment research. When training powerful AI systems to perform complex tasks, it may be challenging to provide training signals that are robust under optimization. One concern is measurement tampering, which is where the AI system manipulates multiple measurements to create the illusion of good results instead of achieving the desired outcome. (This is a type of reward hacking.) Over the past few months, we've worked on detecting measurement tampering by building analogous datasets and evaluating simple techniques. We detail our datasets and experimental results in this paper. Detecting measurement tampering can be thought of as a specific case of Eliciting Latent Knowledge (ELK): When AIs successfully tamper with measurements that are used for computing rewards, they possess important information that the overseer doesn't have (namely, that the measurements have been tampered with). Conversely, if we can robustly elicit an AI's knowledge of whether the measurements have been tampered with, then we could train the AI to avoid measurement tampering. In fact, our best guess is that this is the most important and tractable class of ELK problems. We also think that measurement tampering detection is a natural application for alignment work such as creating better inductive biases, studying high-level model internals, or studying generalization. We'll discuss what these applications might look like in the Future work section. In this post: We explain what measurement tampering detection is; We summarize the results of our paper; We argue that there are structural properties of measurement tampering that might make it considerably easier to detect than arbitrary cases of eliciting knowledge from models; We argue that detecting measurement tampering is very useful for avoiding catastrophic outcomes from powerful and misaligned AI systems; We explain why we think the dataset structure we chose reflects real-world difficulties of measurement tampering, and how techniques developed for our datasets could be used in practice; We discuss future work. If you're interested in pursuing follow-up work and wish to discuss it with us, feel free to email fabien@rdwrs.com or leave a comment on this post. We would like to acknowledge the contributions of Jenny Nitishinskaya, Nicholas Goldowsky-Dill, and Daniel Ziegler in the creation of early measurement tampering datasets. We would also like to thank Lawrence Chan, Paul Christiano, Ajeya Cotra, Carson Denison, Samuel Marks, Ethan Perez, Ansh Radhakrishnan, Kshitij Sachan, Jacob Steinhardt, Meg Tong, Gabriel Wu, and Mark Xu for their useful feedback. Studying measurement tampering in practice Measurement tampering detection is the task of detecting cases where measurements and the desired outcomes come apart, given a restricted trusted set (on which overseers can avoid tampering because they understand the action sequences and their effects well) and a wider untrusted set (where tampering sometimes happens). In the untrusted set, overseers don't know when measurement tampering happens, which means they have access to text inputs and measurements but not the ground truth. We are centrally interested in cases where we have access to a language model which can predict (or can be slightly fine-tuned to predict) measurements, and which computes the target outcome internally in order to predict measurements. An example where this setup might occur in practice is when training an AI by first training it to imitate human demonstrations and then afte...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AGI is easier than robotaxis, published by Daniel Kokotajlo on August 13, 2023 on The AI Alignment Forum. [Epistemic status: Hot take I wrote in 1 hour. We'll see in the comments how well it holds up.]Who would win in a race: AGI, or robotaxis? Which will be built first? There are two methods: Tech companies build AGI/robotaxis themselves. First they build AI that can massively accelerate AI R&D, then they bootstrap to AGI and/or robotaxis. The direct method Definitions: By AGI I mean a computer program that functions as a drop-in replacement for a human remote worker, except that it's better than the best humans at every important task (that can be done via remote workers). (h/t Ajeya Cotra for this language) And by robotaxis I mean at least a million fairly normal taxi rides a day are happening without any human watching ready to take over. (So e.g. if the Boring Company gets working at scale, that wouldn't count, since all those rides are in special tunnels.) 1. Scale advantage for AGI: Robotaxis are subject to crippling hardware constraints, relative to AGI. According to my rough estimations, Teslas would cost tens of thousands of dollars more per vehicle, and have 6% less range, if they scaled up the parameter count of their neural nets by 10x. Scaling up by 100x is completely out of the question for at least a decade, I'd guess. Meanwhile, scaling up GPT-4 is mostly a matter of purchasing the necessary GPUs and networking them together. It's challenging but it can be done, has been done, and will be done. We'll see about 2 OOMs of compute scale-up in the next four years, I say, and then more to come in the decade after that. This is a big deal because roughly half of AI progress historically came from scaling up compute, and because there are reasons to think it's impossible or almost-impossible for a neural net small enough to run on a Tesla to drive as well as a human, no matter how long it is trained. (It's about the size of an ant's brain. An ant is driving your car! Have you watched ants? They bump into things all the time!) 2. Stakes advantage for AGI: When a robotaxi messes up, there's a good chance someone will die. Robotaxi companies basically have to operate under the constraint that this never happens, or happens only once or twice. That would be like DeepMind training AlphaStar except that the whole training run gets shut down after the tenth game is lost. Robotaxi companies can compensate by doing lots of training in simulation, and doing lots of unsupervised learning on real-world camera recordings, but still. It's a big disadvantage. Moreover, the vast majority of tasks involved in being an AGI are 'forgiving' in the sense that it's OK to fail. If you send a weirdly worded message to a user, or make a typo in your code, it's OK, you can apologize and/or fix the error. Only in a few very rare cases are failures catastrophic. Whereas with robotaxis, the opportunity for catastrophic failure is omnipresent. As a result, I think arguably being a safe robotaxi is just inherently harder than most of of the tasks involved in being an AGI. (Analogy: Suppose that cars and people were indestructible, like in a video game, so that they just bounced off each other when they collided. Then I think we'd probably have robotaxis already; sure, it might take you 20% longer to get to your destination due to all the crashes, but it would be so much cheaper! Meanwhile, suppose that if your chatbot threatens or insults >10 users, you'd have to close down the project. Then Microsoft Bing would have been shut down, along with every other chatbot ever.) Finally, from a regulatory perspective, there are ironically much bigger barriers to building robotaxis than building AGI. If you want to deploy a fleet of a million robotaxis there is a lot of red tape you need to cut th...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Biological Anchors: The Trick that Might or Might Not Work, published by Scott Alexander on August 12, 2023 on LessWrong. This post originally posted on Astral Codex Ten on Feb 23 2022. It was printed in The Carving of Reality, the third volume of the Best of LessWrong book series. It was included as a (shorter) replacement for Ajeya Cotra's Draft report on AI timelines, and Eliezer's Biology-Inspired AGI Timelines: The Trick That Never Works, covering the topic from multiple sides. It's crossposted here with Scott's permission for completeness (i.e. having all essays in the book appear on LessWrong). Introduction I've been trying to review and summarize Eliezer Yudkowksy's recent dialogues on AI safety. Previously in sequence: Yudkowsky Contra Ngo On Agents. Now we're up to Yudkowsky contra Cotra on biological anchors, but before we get there we need to figure out what Cotra's talking about and what's going on. The Open Philanthropy Project ("Open Phil") is a big effective altruist foundation interested in funding AI safety. It's got $20 billion, probably the majority of money in the field, so its decisions matter a lot and it's very invested in getting things right. In 2020, it asked senior researcher Ajeya Cotra to produce a report on when human-level AI would arrive. It says the resulting document is "informal" - but it's 169 pages long and likely to affect millions of dollars in funding, which some might describe as making it kind of formal. The report finds a 10% chance of "transformative AI" by 2031, a 50% chance by 2052, and an almost 80% chance by 2100. Eliezer rejects their methodology and expects AI earlier (he doesn't offer many numbers, but here he gives Bryan Caplan 50-50 odds on 2030, albeit not totally seriously). He made the case in his own very long essay, Biology-Inspired AGI Timelines: The Trick That Never Works, sparking a bunch of arguments and counterarguments and even more long essays. There's a small cottage industry of summarizing the report already, eg OpenPhil CEO Holden Karnofsky's article and Alignment Newsletter editor Rohin Shah's comment. I've drawn from both for my much-inferior attempt. Part I: The Cotra Report Ajeya Cotra is a senior research analyst at OpenPhil. She's assisted by her fiancee Paul Christiano (compsci PhD, OpenAI veteran, runs an AI alignment nonprofit) and to a lesser degree by other leading lights. Although not everyone involved has formal ML training, if you care a lot about whether efforts are "establishment" or "contrarian", this one is probably more establishment. The report asks when will we first get "transformative AI" (ie AI which produces a transition as impressive as the Industrial Revolution; probably this will require it to be about as smart as humans). Its methodology is: 1. Figure out how much inferential computation the human brain does. 2. Try to figure out how much training computation it would take, right now, to get a neural net that does the same amount of inferential computation. Get some mind-bogglingly large number. 3. Adjust for "algorithmic progress", ie maybe in the future neural nets will be better at using computational resources efficiently. Get some number which, realistically, is still mind-bogglingly large. 4. Probably if you wanted that mind-bogglingly large amount of computation, it would take some mind-bogglingly large amount of money. But computation is getting cheaper every year. Also, the economy is growing every year. Also, the share of the economy that goes to investments in AI companies is growing every year. So at some point, some AI company will actually be able to afford that mind-boggingly-large amount of money, deploy the mind-bogglingly large amount of computation, and train the AI that has the same inferential computation as the human brain. 5. Figure out what year t...
This is a selection of highlights from episode #151 of The 80,000 Hours Podcast.These aren't necessarily the most important, or even most entertaining parts of the interview — and if you enjoy this, we strongly recommend checking out the full episode:Ajeya Cotra on accidentally teaching AI models to deceive usGet this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type ‘80,000 Hours' into your podcasting app. Or read the transcript.Highlights put together by Simon Monsour and Milo McGuire
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Who's right about inputs to the biological anchors model?, published by rosehadshar on July 24, 2023 on The Effective Altruism Forum. In this post, I compared forecasts from Ajeya Cotra and from forecasters in the Existential Risk Persuasion Tournament (XPT) relating to some of the inputs to Cotra's biological anchors model. Here, I give my personal take on which of those forecasts seem more plausible. Note that: I'm only considering the inputs to the bio anchors model which we have XPT forecasts for. This notably excludes the 2020 training requirements distribution, which is a very important driver of model outputs. My take is based on considering the explicit arguments that Cotra and the XPT forecasters gave, rather than on independent research. My take is subjective. I've been working with the Forecasting Research Institute (who ran the XPT) since November 2022, and this is a potential source of bias. I'm publishing this post in a personal capacity and it hasn't gone through FRI's review process. I originally wrote this early in 2023. I've tried to update it as new information came out, but I likely haven't done a comprehensive job of this. To recap, here are the relevant forecasts: See workings here and here. The 'most aggressive' and 'most conservative' forecasts can be considered equivalent to 90% confidence intervals for the median estimate. Hardware For FLOP/$ in 2025, I think both Cotra and the XPGT forecasters are wrong, but Cotra will prove more right. Epoch's current estimate of highest GPU price-performance is 4.2e18 FLOP per $. They also find a trend in GPU price-performance of 0.1 OOM/year for state of the art GPUs. So I'll extrapolate 4.2e18 to 5.97E+18. For compute price halving time to 2100, I think it depends how likely you think it is that novel technologies like optical computing will reduce compute prices in future. This is the main argument Cotra puts forward for expecting such low prices. It's an argument made in XPT too, but less weight is put on it. Counterarguments given in XPT: fundamental physical limits, progress getting harder, rare materials capping how much prices can drop, catastrophe/extinction, optimisation shifting to memory architectures. Cotra mentions some but not all of these (she doesn't mention rare materials or memory architectures). Cotra flags that she thinks after 2040 her forecasts on this are pretty unreliable. But, because of how wrong their 2024 and 2030 forecasts seem to be, I'm not inclined to put much weight on XPT forecasts here either. I'll go with the most aggressive XPT figure, which is close to Cotra's. I don't have an inside view on the likelihood of novel technologies causing further price drops. Note that the disagreement about compute price halving times drives a lot of the difference in model output. Willingness to spend On the most expensive training run by 2025, I think Cotra is a bit too aggressive and XPT forecasters are much too conservative. In 2022, Cotra updated downwards a bit on the likelihood of a $1bn training run by 2025. There isn't much time left for Cotra to be right. Cotra was predicting $20m by the end of 2020, and $80m by the end of 2021. GPT-3 was $4.6m in 2020. If you buy that unreleased proprietary models are likely to be 2-8x more expensive than public ones (which Cotra argues), that XPT forecasters missed this consideration, and that GPT-3 isn't proprietary and/or unreleased (flagging because I'm unsure what Cotra actually means by proprietary/unreleased), then this could be consistent with Cotra's forecasts. Epoch estimates that GPT-4 cost $50m to train at some point in 2022. Again, this could be in line with Cotra's predictions. More importantly, GPT-4 costs make XPT forecasters look quite wrong already - their 2024 prediction was surpassed in 2022. This is especially striking i...
Curtis, also known on the internet as AI_WAIFU, is the head of Alignment at EleutherAI. In this episode we discuss the massive orders of H100s from different actors, why he thinks AGI is 4-5 years away, why he thinks we're 90% "toast", his comment on Eliezer Yudkwosky's Death with Dignity, and what kind of Alignment projects is currently going on at EleutherAI, especially a project with Markov chains and the Alignment test project that he is currently leading. Youtube: https://www.youtube.com/watch?v=9s3XctQOgew Transcript: https://theinsideview.ai/curtis Death with Dignity: https://www.lesswrong.com/posts/j9Q8bRmwCgXRYAgcJ/miri-announces-new-death-with-dignity-strategy Alignment Minetest: https://www.eleuther.ai/projects/alignment-minetest Alignment Minetest update: https://blog.eleuther.ai/minetester-intro/ OUTLINE (00:00) Highlights / Intro (00:50) The Fuck That Noise Comment On Death With Dignity (10:28) Th Probability of Doom Is 90% (12:44) Best Counterarguments For His High P(doom) (14:41) Compute And Model Size Required For A Dangerous Model (17:59) Details For Curtis' Model Of Compute Required (21:23) Why This Estimate Of Compute Required Might Be Wrong, Ajeya Cotra's Transformative AI report (29:00) Curtis' Median For AGI Is Around 2028, Used To Be 2027 (30:50) How Curtis Approaches Life With Short Timelines And High P(Doom) (35:27) Takeoff Speeds—The Software view vs. The Hardware View (39:57) Nvidia's 400k H100 rolling down the assembly line, AIs soon to be unleashed on their own source code (41:04) Could We Get A Fast Takeoff By Fuly Automating AI Research With More Compute (46:00) The Entire World (Tech Companies, Governments, Militaries) Is Noticing New AI Capabilities That They Don't Have (47:57) Open-source vs. Close source policies. Mundane vs. Apocalyptic considerations. (53:25) Curtis' background, from teaching himself deep learning to EleutherAI (55:51) Alignment Project At EleutherAI: Markov Chain and Language Models (01:02:15) Research Philosophy at EleutherAI: Pursuing Useful Projects, Multingual, Discord, Logistics (01:07:38) Alignment MineTest: Links To Alignmnet, Embedded Agency, Wireheading (01:15:30) Next steps for Alignment Mine Test: focusing on model-based RL (01:17:07) Training On Human Data & Using an Updated Gym Environment With Human APIs (01:19:20) Model Used, Not Observing Symmetry (01:21:58) Another goal of Alignment Mine Test: Study Corrigibility (01:28:26) People ordering H100s Are Aware Of Other People Making These Orders, Race Dynamics, Last Message
This is a linkpost for "Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament," accessible here: https://forecastingresearch.org/s/XPT.pdfToday, the Forecasting Research Institute (FRI) released "Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament", which describes the results of the Existential-Risk Persuasion Tournament (XPT).The XPT, which ran from June through October of 2022, brought together forecasters from two groups with distinctive claims to knowledge about humanity's future — experts in various domains relevant to existential risk, and "superforecasters" with a track record of predictive accuracy over short time horizons. We asked tournament participants to predict the likelihood of global risks related to nuclear weapon use, biorisks, and AI, along with dozens of other related, shorter-run forecasts.Some major takeaways from the XPT include:The median domain expert predicted a 20% chance of catastrophe and a 6% chance of human extinction by 2100. The median superforecaster predicted a 9% chance of catastrophe and a 1% chance of extinction. Superforecasters predicted considerably lower chances of both catastrophe and extinction than did experts, but the disagreement between experts and superforecasters was not uniform across topics. Experts and superforecasters were furthest apart (in percentage point terms) on AI risk, and most similar on the risk of nuclear war.Predictions about risk were highly correlated across topics. For example, participants who gave higher risk estimates for AI also gave (on average) higher risk estimates for biorisks and nuclear weapon use.Forecasters with higher “intersubjective accuracy”—i.e., those best at predicting the views of other participants—estimated lower probabilities of catastrophic and extinction risks from all sources.Few minds were changed during the XPT, even among the most active participants, and despite monetary incentives for persuading others.See the full working paper here.FRI hopes that the XPT will not only inform our understanding of existential risks, but will also advance the science of forecasting by:Collecting a large set of forecasts resolving on a long timescale, in a rigorous setting. This will allow us to measure correlations between short-run (2024), medium-run (2030) and longer-run (2050) accuracy in the coming decades.Exploring the use of bonus payments for participants who both 1) produced persuasive rationales and 2) made accurate “intersubjective” forecasts (i.e., predictions of the predictions of other participants), which we are testing as early indicators of the reliability of long-range forecasts.Encouraging experts and superforecasters to interact: to share knowledge, debate, and attempt to persuade each other. We plan to explore the value of these interactions in future work.As a follow-up to our report release, we are producing a series of posts on the EA Forum that will cover the XPT's findings on:AI risk (in 6 posts):OverviewDetails on AI riskDetails on AI timelinesXPT forecasts on some key AI inputs from Ajeya Cotra's biological anchors reportXPT forecasts on some key AI inputs from Epoch's direct approach modelConsensus on the expected shape of development of AI progressOverview of findings on biorisk (1 post)Overview of findings on nuclear risk (1 post)Overview of findings from miscellaneous forecasting questions (1 post)FRI's planned next steps for this research agenda, along with a request for input on what FRI should do next (1 post)--- First published: July 10th, 2023 Source: https://forum.effectivealtruism.org/posts/un42vaZgyX7ch2kaj/announcing-forecasting-existential-risks-evidence-from-a --- Narrated by TYPE III AUDIO. Share feedback on this narration.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Presentación Introductoria al Altruismo Eficaz, published by davidfriva on June 26, 2023 on The Effective Altruism Forum. TL;DR: Spanish-Speaking Introduction to Effective Altruism covering key concepts like EA itself, 80,000 Hours, Longtermism, and the ITN Framework. Message to the English-Speaking Community (Mensaje para la comunidad angloparlante): Hey everyone! I'm David, a 21-year-old Computer Science student at the University of Buenos Aires (Argentina). I recently delivered an introductory talk on Effective Altruism (EA), drawing inspiration from Ajeya Cotra's style and targeting young adults. During the talk, I covered various important concepts such as Effective Altruism itself, the idea of 80,000 hours, Longtermism, and the ITN Framework (translated into Spanish), after sharing my personal journey of discovering these concepts and why they hold significance for me. As part of my ongoing efforts to address the lack of Spanish content on the EA Forum, I am sharing a link to the talk and the accompanying transcript in the form of an article. Spanish, being the second most widely spoken language in the world and extensively used on the internet, deserves greater representation within the EA community. My hope is that this initiative will help bridge the gap and make EA concepts more accessible to Spanish speakers. I. Hola, soy David, tengo 21 años y estudio Computación en la Universidad de Buenos Aires. Hoy, quiero hablarles sobre un concepto que transformó radicalmente mi vida: el Altruismo Eficaz. El Altruismo Eficaz es, básicamente, un proyecto que tiene como objetivo encontrar las mejores formas de ayudar a los demás y ponerlas en práctica. Es tanto un campo de investigación que busca identificar los problemas más urgentes del mundo y las mejores soluciones para ellos, como una comunidad práctica que busca utilizar esos hallazgos para hacer el bien. Pero. para que entiendan por qué es tan importante para mí, y poder profundizar en esto, necesito antes contarles un poco de mi historia: Bueno, era marzo de 2020, cuando habiendo cumplido 18 años, termino la secundaria. El mismo mes me quedo sin hogar justo antes de empezar la universidad, cuando mi padre me echa de la habitación que compartíamos en una vivienda colectiva. Terminé siendo hospedado en la casa de un amigo y comencé a buscar empleo. Eran tiempos difíciles, de pandemia, la actividad económica había quedado paralizada. Por suerte, mala y buena, me contratan como empleado de limpieza en un Hospital En mi tiempo trabajando ahí, pude ser testigo de lo colapsado que se encontraba el sistema sanitario. Veía a personas ansiosas esperando en la guardia, pacientes sufriendo de enfermedades graves, y los trabajadores de la salud que se encargaban de tratarlos, completamente exhaustos. Con la emergencia sanitaria sumándose al caos, era un ambiente bastante estresante, incluso aterrador.En ese ambiente yo trabajaba. Y a la vuelta, iba a un comedor comunitario, que maneja una iglesia en el barrio de Constitución, para pedir comida. Es ahí, donde me doy cuenta de que el hambre, frío, y desesperación, que yo sentía, era cotidiano para una parte significativa de nuestra sociedad. Durante esta época, hubo noches en las que simplemente lloraba, sintiéndome completamente impotente. No entendía cómo el mundo podía llegar a ser tan injusto, y la gente capaz de ayudar, con recursos de sobra, tan indiferente a la tragedia de otros. Eventualmente llego a una posición cómoda, habiendo conseguido empleo como programador podía trabajar desde el confort de un departamento en el barrio más caro de Buenos Aires, alejado de los comedores comunitarios y los hospitales. Viviendo en una burbuja, lentamente, me fuí olvidando de la gente que encontraba en esos lugares, de los pobres y enfermos, los más desfavorecidos. Altruismo Efica...
The U.S. surgeon general, Dr. Vivek Murthy, says social media poses a “profound risk of harm” to young people. Why do some in the tech industry disagree?Then, Ajeya Cotra, an A.I. researcher, on how A.I. could lead to a doomsday scenario.Plus: Pass the hat. Kevin and Casey play a game they call HatGPT.On today's episode:Ajeya Cotra is a senior research analyst at Open PhilanthropyAdditional reading:The surgeon general issued an advisory about the risks of social media for young people.Ajeya Cotra has researched the existential risks that A.I. poses unless countermeasures are taken.Binance commingled customer funds and company revenue, former insiders told Reuters.BuzzFeed announced Botatouille, an A.I.-powered kitchen assistant.A Twitter bug caused the platform to restore deleted tweets.Gov. Ron DeSantis of Florida announced his presidential campaign in a Twitter Spaces event rife with glitches.Two former rivals, Uber and Waymo, are teaming up to bring driverless ride-hailing to Phoenix.
[Bonus Episode] Future of Life Institute Podcast host Gus Docker interviews Conjecture CEO Connor Leahy to discuss GPT-4, magic, cognitive emulation, demand for human-like AI, and aligning superintelligence. You can read more about Connor's work at https://conjecture.dev Future of Life Institute is the organization that recently published an open letter calling for a six-month pause on training new AI systems. FLI was founded by Jann Tallinn who we interviewed in Episode 16 of The Cognitive Revolution. We think their podcast is excellent. They frequently interview critical thinkers in AI like Neel Nanda, Ajeya Cotra, and Connor Leahy - an episode we found particularly fascinating and is airing for our audience today. The FLI Podcast also recently interviewed Nathan Labenz for a 2-part episode: https://futureoflife.org/podcast/nathan-labenz-on-how-ai-will-transform-the-economy/ SUBSCRIBE: Future of Life Institute Podcast: Apple: https://podcasts.apple.com/us/podcast/future-of-life-institute-podcast/id1170991978 Spotify: https://open.spotify.com/show/2Op1WO3gwVwCrYHg4eoGyP RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics TIMESTAMPS: (00:00) Episode introduction (01:55) GPT-4 (18:30) "Magic" in machine learning (29:43) Cognitive emulations (40:00) Machine learning VS explainability (49:50) Human data = human AI? (1:01:50) Analogies for cognitive emulations (1:28:10) Demand for human-like AI (1:33:50) Aligning superintelligence If you'd like to listen to Part 2 of this interview with Connor Leahy, you can head here: https://podcasts.apple.com/us/podcast/connor-leahy-on-the-state-of-ai-and-alignment-research/id1170991978?i=1000609972001
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Accidentally teaching AI models to deceive us (Ajeya Cotra on The 80,000 Hours Podcast), published by 80000 Hours on May 15, 2023 on The Effective Altruism Forum. Over at The 80,000 Hours Podcast we just published an interview that is likely to be of particular interest to people who identify as involved in the effective altruism community: Ajeya Cotra on accidentally teaching AI models to deceive us. You can click through for the audio, a full transcript, and related links. Below is the episode summary and some key excerpts. Episode Summary I don't know yet what suite of tests exactly you could show me, and what arguments you could show me, that would make me actually convinced that this model has a sufficiently deeply rooted motivation to not try to escape human control. I think that's, in some sense, the whole heart of the alignment problem. And I think for a long time, labs have just been racing ahead, and they've had the justification — which I think was reasonable for a while — of like, “Come on, of course these systems we're building aren't going to take over the world.” As soon as that starts to change, I want a forcing function that makes it so that the labs now have the incentive to come up with the kinds of tests that should actually be persuasive. Ajeya Cotra Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons. Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods. As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it. Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky! Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes: Saints — models that care about doing what we really want Sycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them to Schemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agenda In principle, a machine learning training process based on reinforcement learning could spit out any of these three attitudes, because all three would perform roughly equally well on the tests we give them, and ‘performs well on tests' is how these models are selected. But while that's true in principle, maybe it's not something that could plausibly happen in the real world. Af...
Imagine you are an orphaned eight-year-old whose parents left you a $1 trillion company, and no trusted adult to serve as your guide to the world. You have to hire a smart adult to run that company, guide your life the way that a parent would, and administer your vast wealth. You have to hire that adult based on a work trial or interview you come up with. You don't get to see any resumes or do reference checks. And because you're so rich, tonnes of people apply for the job — for all sorts of reasons.Today's guest Ajeya Cotra — senior research analyst at Open Philanthropy — argues that this peculiar setup resembles the situation humanity finds itself in when training very general and very capable AI models using current deep learning methods.Links to learn more, summary and full transcript.As she explains, such an eight-year-old faces a challenging problem. In the candidate pool there are likely some truly nice people, who sincerely want to help and make decisions that are in your interest. But there are probably other characters too — like people who will pretend to care about you while you're monitoring them, but intend to use the job to enrich themselves as soon as they think they can get away with it.Like a child trying to judge adults, at some point humans will be required to judge the trustworthiness and reliability of machine learning models that are as goal-oriented as people, and greatly outclass them in knowledge, experience, breadth, and speed. Tricky!Can't we rely on how well models have performed at tasks during training to guide us? Ajeya worries that it won't work. The trouble is that three different sorts of models will all produce the same output during training, but could behave very differently once deployed in a setting that allows their true colours to come through. She describes three such motivational archetypes:Saints — models that care about doing what we really wantSycophants — models that just want us to say they've done a good job, even if they get that praise by taking actions they know we wouldn't want them toSchemers — models that don't care about us or our interests at all, who are just pleasing us so long as that serves their own agendaAnd according to Ajeya, there are also ways we could end up actively selecting for motivations that we don't want.In today's interview, Ajeya and Rob discuss the above, as well as:How to predict the motivations a neural network will develop through trainingWhether AIs being trained will functionally understand that they're AIs being trained, the same way we think we understand that we're humans living on planet EarthStories of AI misalignment that Ajeya doesn't buy intoAnalogies for AI, from octopuses to aliens to can openersWhy it's smarter to have separate planning AIs and doing AIsThe benefits of only following through on AI-generated plans that make sense to human beingsWhat approaches for fixing alignment problems Ajeya is most excited about, and which she thinks are overratedHow one might demo actually scary AI failure mechanismsGet this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type ‘80,000 Hours' into your podcasting app. Or read the transcript below.Producer: Keiran HarrisAudio mastering: Ryan Kessler and Ben CordellTranscriptions: Katy Moore
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Deference on AI timelines: survey results, published by Sam Clarke on March 30, 2023 on The Effective Altruism Forum. Crossposted to LessWrong. In October 2022, 91 EA Forum/LessWrong users answered the AI timelines deference survey. This post summarises the results. Context The survey was advertised in this forum post, and anyone could respond. Respondents were asked to whom they defer most, second-most and third-most, on AI timelines. You can see the survey here. Results This spreadsheet has the raw anonymised survey results. Here are some plots which try to summarise them. Simply tallying up the number of times that each person is deferred to: The plot only features people who were deferred to by at least two respondents. Some basic observations: Overall, respondents defer most frequently to themselves—i.e. their “inside view” or independent impression—and Ajeya Cotra. These two responses were each at least twice as frequent as any other response. Then there's a kind of “middle cluster”—featuring Daniel Kokotajlo, Paul Christiano, Eliezer Yudkowsky and Holden Karnofsky—where, again, each of these responses were ~at least twice as frequent as any other response. Then comes everyone else. There's probably something more fine-grained to be said here, but it doesn't seem crucial to understanding the overall picture. What happens if you redo the plot with a different metric? How sensitive are the results to that? One thing we tried was computing a “weighted” score for each person, by giving them: 3 points for each respondent who defers to them the most 2 points for each respondent who defers to them second-most 1 point for each respondent who defers to them third-most. If you redo the plot with that score, you get this plot. The ordering changes a bit, but I don't think it really changes the high-level picture. In particular, the basic observations in the previous section still hold. We think the weighted score (described in this section) and unweighted score (described in the previous section) are the two most natural metrics, so we didn't try out any others. Don't some people have highly correlated views? What happens if you cluster those together? Yeah, we do think some people have highly correlated views, in the sense that their views depend on similar assumptions or arguments. We tried plotting the results using the following basic clusters: Open Philanthropy cluster = {Ajeya Cotra, Holden Karnofsky, Paul Christiano, Bioanchors} MIRI cluster = {MIRI, Eliezer Yudkowsky} Daniel Kokotajlo gets his own cluster Inside view = deferring to yourself, i.e. your independent impression Everyone else = all responses not in one of the above categories Here's what you get if you simply tally up the number of times each cluster is deferred to: This plot gives a breakdown of two of the clusters (there's no additional information that isn't contained in the above two plots, it just gives a different view). This is just one way of clustering the responses, which seemed reasonable to us. There are other clusters you could make. Limitations of the survey Selection effects. This probably isn't a representative sample of forum users, let alone of people who engage in discourse about AI timelines, or make decisions influenced by AI timelines. The survey didn't elicit much detail about the weight that respondents gave to different views. We simply asked who respondents deferred most, second-most and third-most to. This misses a lot of information. The boundary between [deferring] and [having an independent impression] is vague. Consider: how much effort do you need to spend examining some assumption/argument for yourself, before considering it an independent impression, rather than deference? This is a limitation of the survey, because different respondents may have been using different b...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: New blog: Planned Obsolescence, published by Ajeya Cotra on March 27, 2023 on LessWrong. Kelsey Piper and I just launched a new blog about AI futurism and AI alignment called Planned Obsolescence. If you're interested, you can check it out here. Both of us have thought a fair bit about what we see as the biggest challenges in technical work and in policy to make AI go well, but a lot of our thinking isn't written up, or is embedded in long technical reports. This is an effort to make our thinking more accessible. That means it's mostly aiming at a broader audience than LessWrong and the EA Forum, although some of you might still find some of the posts interesting. So far we have seven posts: What we're doing here "Aligned" shouldn't be a synonym for "good" Situational awareness Playing the training game Training AIs to help us align AIs Alignment researchers disagree a lot The ethics of AI red-teaming Thanks to ilzolende for formatting these posts for publication. Each post has an accompanying audio version generated by a voice synthesis model trained on the author's voice using Descript Overdub. You can submit questions or comments to mailbox@planned-obsolescence.org. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
(0:00) Preview (1:17) Sponsor (4:00) Anton breaks down the advantages of vector databases (4:45) How embeddings have created an AI-native way to represent data (11:50) Anton identifies the watershed moment and step changes in AI (12:55) Open AI's pricing (18:50) How chroma works (33:04) Stable Attribution and systematic bias (36:48) How latent diffusion models work (51:26) How AI is like the early days of aviation (56:01) How Disney inspired the release of Stable Attribution (59:53):Why noise can lead to generalization (1:01:04) Nathan's KPI for The Cognitive Revolution (1:01:59) Other use cases for embedding (1:03: 19) Anton touches on the applications for biotech (1:04:35) Anton on doomerism hysteria and what actually worries him (1:11:43) - Nathan sums up a plausible doomer scenario (1:20:17)What AI tools does Anton use and why? (1:22:55) Anton's hopes *Thank you Omneky for sponsoring The Cognitive Revolution. Omneky is an omnichannel creative generation platform that lets you launch hundreds of thousands of ad iterations that actually work, customized across all platforms, with a click of a button. Omneky combines generative AI and real-time advertising data. Mention "Cog Rev" for 10% off. Twitter: @CogRev_Podcast @atroyn (Anton) @labenz (Nathan) @eriktorenberg (Erik) Join 1000's of subscribers of our Substack: https://cognitiverevolution.substack.com/ Websites: cognitivervolution.ai trychroma.com/ omneky.com Show Notes & references: -https://same.energy/ (Beta) -Wright Brothers Bio (https://www.amazon.com/Wright-Brothers-David-McCullough/dp/1476728755) -Ajeya Cotra article in Less Wrong (https://www.lesswrong.com/posts/pRkFkzwKZ2zfa3R6H/without-specific-countermeasures-the-easiest-path-to) RECOMMENDED PODCAST: The HR industry is at a crossroads. What will it take to construct the next generation of incredible businesses – and where can people leaders have the most business impact? Hosts Nolan Church and Kelli Dragovich have been through it all, the highs and the lows – IPOs, layoffs, executive turnover, board meetings, culture changes, and more. With a lineup of industry vets and experts, Nolan and Kelli break down the nitty-gritty details, trade offs, and dynamics of constructing high performing companies. Through unfiltered conversations that can only happen between seasoned practitioners, Kelli and Nolan dive deep into the kind of leadership-level strategy that often happens behind closed doors. Check out the first episode with the architect of Netflix's culture deck Patty McCord. https://link.chtbl.com/hrheretics
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Prizes for the 2021 Review, published by Raemon on February 10, 2023 on LessWrong. If you received a prize, please fill out your payment contact email and PayPal. A'ight, one final 2021 Review Roundup post – awarding prizes. I had a week to look over the results. The primary way I ranked posts was by a weighted score, which gave 1000+ karma users 3x the voting weight. Here was the distribution of votes: I basically see two strong outlier posts at the top of the ranking, followed by a cluster of 6-7 posts, followed by a smooth tail of posts that were pretty good without any clear cutoff. Post Prizes Gold Prize Posts Two posts stood noticeably out above all the others, which I'm awarding $800 to. Strong Evidence is Common by Mark Xu “PR” is corrosive; “reputation” is not, by Anna Salamon. I also particularly liked Akash's review. Silver Prize Posts And the second (eyeballed) cluster of posts, each getting $600, is: Your Cheerful Price, by Eliezer Yudkowsky. This notably had the most reviews – a lot of people wanted to weigh in and say "this personally helped me", often with some notes or nuance. ARC's first technical report: Eliciting Latent Knowledge by Paul Christiano, Ajeya Cotra and Mark Xu. This Can't Go On by Holden Karnofsky Rationalism before the Sequences, by Eric S Raymond. I liked this review by A Ray who noted one source of value here is the extensive bibliography. Lies, Damn Lies, and Fabricated Options, by Duncan Sabien Fun with +12 OOMs of Compute, by Daniel Kokotajlo. Nostalgebraist's review was particularly interesting. What 2026 looks like by Daniel Kokotajlo Ngo and Yudkowsky on alignment difficulty. This didn't naturally cluster into the same group of vote-totals as the other silver-prizes, but it was in the top 10. I think the post was fairly hard to read, and didn't have easily digestible takeaways, but nonetheless I think this kicked off some of the most important conversations in the AI Alignment space and warrants inclusion in this tier. Bronze Prize Posts Although there's not a clear clustering after this point, when I eyeball how important the next several posts were, it seems to me appropriate to give $400 to each of: How To Write Quickly While Maintaining Epistemic Rigor, by John Wentworth Science in a High-Dimensional World by John Wentworth How factories were made safe by Jason Crawford Cryonics signup guide #1: Overview by Mingyuan Making Vaccine by John Wentworth Taboo "Outside View" by Daniel Kokotaljo All Possible Views About Humanity's Future Are Wild by Holden Karnofsky Another (outer) alignment failure story by Paul Christiano Split and Commit by Duncan Sabien What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs) by Andrew Critch There's no such thing as a tree (phylogenetically), by eukaryote The Plan by John Wentworth Trapped Priors As A Basic Problem Of Rationality by Scott Alexander Finite Factored Sets by Scott Garrabrant Selection Theorems: A Program For Understanding Agents by John Wentworth Slack Has Positive Externalities For Groups by John Wentworth My research methodology by Paul Christiano Honorable Mentions This final group has the most arbitrary cutoff at all, and includes some judgment calls about how many medium or strong votes it had, among 1000+ karma users, and in some edge cases my own subjective guess of how important it was. These authors each get $100 per post. The Rationalists of the 1950s (and before) also called themselves “Rationalists” by Owain Evans Ruling Out Everything Else by Duncan Sabien Leaky Delegation: You are not a Commodity by Darmani Feature Selection by Zack Davis Cup-Stacking Skills (or, Reflexive Involuntary Mental Motions) by Duncan Sabien larger language models may disappoint you [or, an eternally unfinished draft] by Nostalgebraist Self-Integrity and the Drowni...
This is a linkpost for https://epochai.org/blog/literature-review-of-transformative-artificial-intelligence-timelinesWe summarize and compare several models and forecasts predicting when transformative AI will be developed.HighlightsThe review includes quantitative models, including both outside and inside view, and judgment-based forecasts by (teams of) experts.While we do not necessarily endorse their conclusions, the inside-view model the Epoch team found most compelling is Ajeya Cotra's “Forecasting TAI with biological anchors”, the best-rated outside-view model was Tom Davidson's “Semi-informative priors over AI timelines”, and the best-rated judgment-based forecast was Samotsvety's AGI Timelines Forecast.The inside-view models we reviewed predicted shorter timelines (e.g. bioanchors has a median of 2052) while the outside-view models predicted longer timelines (e.g. semi-informative priors has a median over 2100). The judgment-based forecasts are skewed towards agreement with the inside-view models, and are often more aggressive (e.g. Samotsvety assigned a median of 2043).Original article:https://forum.effectivealtruism.org/posts/4Ckc2zNrAKQwnAyA2/literature-review-of-transformative-artificial-intelligenceNarrated for the Effective Altruism Forum by TYPE III AUDIO.Share feedback on this narration.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Our World in Data] AI timelines: What do experts in artificial intelligence expect for the future? (Roser, 2023), published by will on February 7, 2023 on The Effective Altruism Forum. Linkposting, tagging and excerpting - in this case, excerpting the article's conclusion - in accord with 'Should pretty much all content that's EA-relevant and/or created by EAs be (link)posted to the Forum?'. [click here for a big version of the visualization] The visualization shows the forecasts of 1128 people – 812 individual AI experts, the aggregated estimates of 315 forecasters from the Metaculus platform, and the findings of the detailed study by Ajeya Cotra. There are two big takeaways from these forecasts on AI timelines: There is no consensus, and the uncertainty is high. There is huge disagreement between experts about when human-level AI will be developed. Some believe that it is decades away, while others think it is probable that such systems will be developed within the next few years or months.There is not just disagreement between experts; individual experts also emphasize the large uncertainty around their own individual estimate. As always when the uncertainty is high, it is important to stress that it cuts both ways. It might be very long until we see human-level AI, but it also means that we might have little time to prepare. At the same time, there is large agreement in the overall picture. The timelines of many experts are shorter than a century, and many have timelines that are substantially shorter than that. The majority of those who study this question believe that there is a 50% chance that transformative AI systems will be developed within the next 50 years. In this case it would plausibly be the biggest transformation in the lifetime of our children, or even in our own lifetime. The public discourse and the decision-making at major institutions have not caught up with these prospects. In discussions on the future of our world – from the future of our climate, to the future of our economies, to the future of our political institutions – the prospect of transformative AI is rarely central to the conversation. Often it is not mentioned at all, not even in a footnote. We seem to be in a situation where most people hardly think about the future of artificial intelligence, while the few who dedicate their attention to it find it plausible that one of the biggest transformations in humanity's history is likely to happen within our lifetimes. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: High-level hopes for AI alignment, published by Holden Karnofsky on December 20, 2022 on The Effective Altruism Forum. See here for an audio version. In previous pieces, I argued that there's a real and large risk of AI systems' aiming to defeat all of humanity combined - and succeeding. I first argued that this sort of catastrophe would be likely without specific countermeasures to prevent it. I then argued that countermeasures could be challenging, due to some key difficulties of AI safety research. But while I think misalignment risk is serious and presents major challenges, I don't agree with sentiments1 along the lines of “We haven't figured out how to align an AI, so if transformative AI comes soon, we're doomed.” Here I'm going to talk about some of my high-level hopes for how we might end up avoiding this risk. I'll first recap the challenge, using Ajeya Cotra's young businessperson analogy to give a sense of some of the core difficulties. In a nutshell, once AI systems get capable enough, it could be hard to test whether they're safe, because they might be able to deceive and manipulate us into getting the wrong read. Thus, trying to determine whether they're safe might be something like “being an eight-year-old trying to decide between adult job candidates (some of whom are manipulative).” I'll then go through what I see as three key possibilities for navigating this situation: Digital neuroscience: perhaps we'll be able to read (and/or even rewrite) the “digital brains” of AI systems, so that we can know (and change) what they're “aiming” to do directly - rather than having to infer it from their behavior. (Perhaps the eight-year-old is a mind-reader, or even a young Professor X.) Limited AI: perhaps we can make AI systems safe by making them limited in various ways - e.g., by leaving certain kinds of information out of their training, designing them to be “myopic” (focused on short-run as opposed to long-run goals), or something along those lines. Maybe we can make “limited AI” that is nonetheless able to carry out particular helpful tasks - such as doing lots more research on how to achieve safety without the limitations. (Perhaps the eight-year-old can limit the authority or knowledge of their hire, and still get the company run successfully.) AI checks and balances: perhaps we'll be able to employ some AI systems to critique, supervise, and even rewrite others. Even if no single AI system would be safe on its own, the right “checks and balances” setup could ensure that human interests win out. (Perhaps the eight-year-old is able to get the job candidates to evaluate and critique each other, such that all the eight-year-old needs to do is verify basic factual claims to know who the best candidate is.) These are some of the main categories of hopes that are pretty easy to picture today. Further work on AI safety research might result in further ideas (and the above are not exhaustive - see my more detailed piece, posted to the Alignment Forum rather than Cold Takes, for more). I'll talk about both challenges and reasons for hope here. I think that for the most part, these hopes look much better if AI projects are moving cautiously rather than racing furiously. I don't think we're at the point of having much sense of how the hopes and challenges net out; the best I can do at this point is to say: “I don't currently have much sympathy for someone who's highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover . are under 10% or over 90%).” The challenge This is all recapping previous pieces. If you remember them super well, skip to the next section. In previous pieces, I argued that: The coming decades could see the development of AI systems that could automate - and dramatically speed up - scientif...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Concerns over EA's possible neglect of experts, published by Jack Malde on December 16, 2022 on The Effective Altruism Forum. This is a Draft Amnesty Day draft. That means it's not polished, it's probably not up to my standards, the ideas are not thought out, I haven't checked everything, and it's unfinished. I was explicitly encouraged to post something like this!Commenting and feedback guidelines: I'm going with the default — please be nice. But constructive feedback is appreciated; please let me know what you think is wrong. Feedback on the structure of the argument is also appreciated. I am becoming increasingly concerned that EA is neglecting experts when it comes to research. I'm not saying that EA organisations don't produce high quality research, but I have a feeling that the research could be of an even higher quality if we were to embrace experts more. Epistemic status: not that confident that what I'm saying is valid. Maybe experts are utilised more than I realise. Maybe the people I mention below can reasonably be considered experts. I also haven't done an in-depth exploration of all relevant research to judge how widespread the problem might be (if it is indeed a problem) Research examples I'm NOT concerned by Let me start with some good examples (there are certainly more than I am listing here!). In 2021 Open Phil commissioned a report from David Humbird on the potential for cultured meat production to scale up to the point where it would be sufficiently available and affordable to replace a substantial portion of global meat consumption. Humbird has a PhD in chemical engineering and has extensive career experience in process engineering and techno-economic analysis, including the provision of consultancy services. In short, he seems like a great choice to carry out this research. Another example I am pleased by is Will MacAskill as author of What We Owe the Future. I cannot think of a better author of this book. Will is a respected philosopher, and a central figure in the EA movement. This book outlines the philosophical argument for longtermism, a key school of thought within EA. Boy am I happy that Will wrote this book. Other examples I was planning to write up: Modeling the Human Trajectory - David Roodman Roodman seems qualified to deliver this research. Wild Animal Initiative research such as this I like that they collaborated with Samniqueka J. Halsey who is an assistant professor Some examples I'm concerned by Open Phil's research on AI In 2020 Ajeya Cotra, a Senior Research Analyst at Open Phil, wrote a report on timelines to transformative AI. I have no doubt that the report is high-quality and that Ajeya is very intelligent. However, this is a very technical subject and, beyond having a bachelor's degree in Electrical Engineering and Computer Science, I don't see why Ajeya would be the first choice to write this report. Why wouldn't Open Phil have commissioned an expert in AI development / computational neuroscience etc. to write this report, similar to what they did with David Humbird (see above)? Ajeya's report had Paul Christiano and Dario Amodei as advisors, which is good, but advisors generally have limited input. Wouldn't it have been better to have an expert as first author? All the above applies to another Open Phil AI report, this time written by Joe Carlsmith. Joe is a philosopher by training, and whilst that isn't completely irrelevant, it once again seems to me that a better choice could have been found. Personally I'd prefer that Joe do more philosophy-related work, similar to what Will MacAsKill is doing (see above). Climate Change research (removed mention of Founder's Pledge as per jackva's comment) Climate Change and Longtermism - John Halstead John Halstead doesn't seem to have any formal training in climate science. Not sur...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: High-level hopes for AI alignment, published by HoldenKarnofsky on December 15, 2022 on LessWrong. In previous pieces, I argued that there's a real and large risk of AI systems' aiming to defeat all of humanity combined - and succeeding. I first argued that this sort of catastrophe would be likely without specific countermeasures to prevent it. I then argued that countermeasures could be challenging, due to some key difficulties of AI safety research. But while I think misalignment risk is serious and presents major challenges, I don't agree with sentiments1 along the lines of “We haven't figured out how to align an AI, so if transformative AI comes soon, we're doomed.” Here I'm going to talk about some of my high-level hopes for how we might end up avoiding this risk. I'll first recap the challenge, using Ajeya Cotra's young businessperson analogy to give a sense of some of the core difficulties. In a nutshell, once AI systems get capable enough, it could be hard to test whether they're safe, because they might be able to deceive and manipulate us into getting the wrong read. Thus, trying to determine whether they're safe might be something like “being an eight-year-old trying to decide between adult job candidates (some of whom are manipulative).” I'll then go through what I see as three key possibilities for navigating this situation: Digital neuroscience: perhaps we'll be able to read (and/or even rewrite) the “digital brains” of AI systems, so that we can know (and change) what they're “aiming” to do directly - rather than having to infer it from their behavior. (Perhaps the eight-year-old is a mind-reader, or even a young Professor X.) Limited AI: perhaps we can make AI systems safe by making them limited in various ways - e.g., by leaving certain kinds of information out of their training, designing them to be “myopic” (focused on short-run as opposed to long-run goals), or something along those lines. Maybe we can make “limited AI” that is nonetheless able to carry out particular helpful tasks - such as doing lots more research on how to achieve safety without the limitations. (Perhaps the eight-year-old can limit the authority or knowledge of their hire, and still get the company run successfully.) AI checks and balances: perhaps we'll be able to employ some AI systems to critique, supervise, and even rewrite others. Even if no single AI system would be safe on its own, the right “checks and balances” setup could ensure that human interests win out. (Perhaps the eight-year-old is able to get the job candidates to evaluate and critique each other, such that all the eight-year-old needs to do is verify basic factual claims to know who the best candidate is.) These are some of the main categories of hopes that are pretty easy to picture today. Further work on AI safety research might result in further ideas (and the above are not exhaustive - see my more detailed piece, posted to the Alignment Forum rather than Cold Takes, for more). I'll talk about both challenges and reasons for hope here. I think that for the most part, these hopes look much better if AI projects are moving cautiously rather than racing furiously. I don't think we're at the point of having much sense of how the hopes and challenges net out; the best I can do at this point is to say: “I don't currently have much sympathy for someone who's highly confident that AI takeover would or would not happen (that is, for anyone who thinks the odds of AI takeover . are under 10% or over 90%).” The challenge This is all recapping previous pieces. If you remember them super well, skip to the next section. In previous pieces, I argued that: The coming decades could see the development of AI systems that could automate - and dramatically speed up - scientific and technological advancement, getting us more q...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Trying to disambiguate different questions about whether RLHF is “good”, published by Buck on December 14, 2022 on LessWrong. (A few of the words in this post were written by Ryan Greenblatt and Ajeya Cotra. Thanks to Oliver Habryka and Max Nadeau for particularly helpful comments.) Sometimes people want to talk about whether RLHF is “a promising alignment strategy”, or whether it “won't work” or “is just capabilities research”. I think that conversations on these topics are pretty muddled and equivocate between a bunch of different questions. In this doc, I'll attempt to distinguish some of these questions, and as a bonus, I'll give my opinions on them. I wrote this post kind of quickly, and I didn't have time to justify all the claims I make; I hope that this post is net helpful anyway. I'm sympathetic to claims that alignment researchers should err more on the side of writing fewer but better posts; maybe I'll regret making this one now instead of waiting. Is “make a powerful AGI by using RLHF, where the feedback is coming from unaided humans”, a promising strategy for building aligned AGI? IMO, this seems like the baseline, “we didn't really try much at all” alignment scheme. Ajeya calls this a “naive safety effort” in her training game post, which lays out the basic case for pessimism about this strategy. Here's how I'd quickly summarize my problems with this scheme: Oversight problems: Overseer doesn't know: In cases where your unaided humans don't know whether the AI action is good or bad, they won't be able to produce feedback which selects for AIs that do good things. This is unfortunate, because we wanted to be able to make AIs that do complicated things that have good outcomes. Overseer is wrong: In cases where your unaided humans are actively wrong about whether the AI action is good or bad, their feedback will actively select for the AI to deceive the humans. Catastrophe problems: Even if the overseer's feedback was perfect, a model whose strategy is to lie in wait until it has an opportunity to grab power will probably be able to successfully grab power. I don't think we're 100% doomed if we follow this plan, but it does seem pretty likely to go badly. RLHF with unaided humans is not literally the most doomed alignment scheme I've ever heard seriously proposed. For example, “train models with automated rewards (e.g. simulating evolution, or training models on a curriculum of math problems) and hope that the resulting models are aligned” might be a worse plan. (Though it's pretty plausible to me that this kind of scheme would have such obvious alignment issues that people would quickly switch to the naive safety plan.) I don't think that many alignment researchers are seriously proposing this naive plan. Many researchers who work on RLHF are sympathetic to the concerns that I listed here. For example, OpenAI's alignment plan emphasizes the importance of using models to assist human evaluation. Is RLHF broadly construed (i.e. pretraining a model and then fine-tuning it based on some overseer's evaluations of its actions) plausibly part of a not-completely-doomed alignment plan? IMO, yes, we are very likely to want to make powerful models by fine-tuning models based on overseer feedback, because fine-tuning models using preferences over their outputs is a broadly applicable strategy that can be used in lots of ways. I think that our best current alignment strategies (I basically agree with Holden here on the current best plan) involve fine-tuning a model based on feedback from human overseers who have AI and software assistance, on inputs including some that were chosen by a red team who is trying to find inputs where the model behaves very badly. If prosaic alignment researchers are able to come up with alignment schemes which look good on paper (i.e., do...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Biological Anchors external review by Jennifer Lin (linkpost), published by peterhartree on November 30, 2022 on The Effective Altruism Forum. This report is one of the winners of the EA Criticism and Red Teaming Contest. Summary: This is a summary and critical review of Ajeya Cotra's biological anchors report on AI timelines. It provides an easy-to-understand overview of the main methodology of Cotra's report. It then examines and challenges central assumptions of the modelling in Cotra's report. First, the review looks at reasons why we might not expect 2022 architectures to scale to AGI. Second, it raises the point that we don't know how to specify a space of algorithmic architectures that contains something that could scale to AGI and can be efficiently searched through (inability to specify this could undermine the ability to take the evolutionary anchors from the report as a bound on timelines). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Discussing how to align Transformative AI if it's developed very soon, published by elifland on November 28, 2022 on LessWrong. Coauthored with equal contribution from Eli Lifland and Charlotte Siegmann. Thanks to Holden Karnofsky, Misha Yagudin, Adam Bales, Michael Aird, and Sam Clarke for feedback. All views expressed are our own. Introduction Background Holden Karnofsky recently published a series on AI strategy "nearcasting”: What would it look like if we developed Transformative AI (TAI) soon? How should Magma, the hypothetical company that develops TAI, align and deploy it? In this post we focus on How might we align transformative AI if it's developed very soon. The nearcast setup is: A major AI company (“Magma,” [following the setup and description of this post from Ajeya Cotra]) has good reason to think that it can develop transformative AI very soon (within a year), using what Ajeya calls “human feedback on diverse tasks” (HFDT) - and has some time (more than 6 months, but less than 2 years) to set up special measures to reduce the risks of misaligned AI before there's much chance of someone else deploying transformative AI. Summary For time-constrained readers, we think the most important sections are: Categorizing ways of limiting AIs Clarifying advanced collusion Disincentivizing deceptive behavior likely requires more than a small chance of catching it We discuss Magma's goals and strategy. The discussion should be useful for people unfamiliar or familiar with Karnofsky's post and can be read as a summary, clarification and expansion of Karnofsky's post. We describe a potential plan for Magma involving coordinating with other AI labs to deploy the most aligned AI in addition to stopping other misaligned AIs. more We define the desirability of Magma's AIs in terms of their ability to help Magma achieve its goals while avoiding negative outcomes. We discuss desirable properties such as differential capability and value-alignment, and describe initial hypotheses regarding how Magma should think about prioritizing between desirable properties. more We discuss Magma's strategies to increase desirability: how Magma can make AIs more desirable by changing properties of the AIs and the context in which they're applied, and how Magma should apply (often limited) AIs to make other AIs more desirable. more We clarify that the chance of collusion depends on whether AIs operate on a smaller scale, have very different architectures and orthogonal goals. We outline strategies to reduce collusion conditional on whether the AIs have indexical goals and follow causal decision theory or not. more We discuss how Magma can test the desirability of AIs via audits and threat assessments. Testing can provide evidence regarding the effectiveness of various alignment strategies and the overall level of misalignment. more We highlight potential disagreements with Karnofsky, including: We aren't convinced that a small chance of catching deceptive behavior by itself might make deception much less likely. We argue that in addition to having a small chance of catching deceptive behavior, the AI's supervisor needs to be capable enough to (a) distinguish between easy-to-catch and hard-to-catch deceptive behaviors and (b) attain a very low “false positive rate” of harshly penalizing non-deceptive behaviors. The AI may also need to be inner-aligned, i.e. intrinsically motivated by the reward. more We are more pessimistic than Karnofsky about the promise of adjudicating AI debates. We aren't convinced there's much theoretical reason to believe that AI debates robustly tend toward truth, and haven't been encouraged by empirical results. more We discuss the chance that Magma would succeed: We discuss the promise of "hacky" solutions to alignment. If applied alignment techniques that feel br...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Announcing AI Alignment Awards: $100k research contests about goal misgeneralization & corrigibility, published by Akash on November 22, 2022 on The AI Alignment Forum. We're grateful to our advisors Nate Soares, John Wentworth, Richard Ngo, Lauro Langosco, and Amy Labenz. We're also grateful to Ajeya Cotra and Thomas Larsen for their feedback on the contests. TLDR: AI Alignment Awards is running two contests designed to raise awareness about AI alignment research and generate new research proposals. Prior experience with AI safety is not required. Promising submissions will win prizes up to $100,000 (though note that most prizes will be between $1k-$20k; we will only award higher prizes if we receive exceptional submissions.) You can help us by sharing this post with people who are or might be interested in alignment research (e.g., student mailing lists, FB/Slack/Discord groups.) What are the contests? We're currently running two contests: Goal Misgeneralization Contest (based on Langosco et al., 2021): AIs often learn unintended goals. Goal misgeneralization occurs when a reinforcement learning agent retains its capabilities out-of-distribution yet pursues the wrong goal. How can we prevent or detect goal misgeneralization? Shutdown Problem Contest (based on Soares et al., 2015): Given that powerful AI systems might resist attempts to turn them off, how can we make sure they are open to being shut down? What types of submissions are you interested in? For the Goal Misgeneralization Contest, we're interested in submissions that do at least one of the following: Propose techniques for preventing or detecting goal misgeneralization Propose ways for researchers to identify when goal misgeneralization is likely to occur Identify new examples of goal misgeneralization in RL or non-RL domains. For example: We might train an imitation learner to imitate a "non-consequentialist" agent, but it actually ends up learning a more consequentialist policy. We might train an agent to be myopic (e.g., to only care about the next 10 steps), but it actually learns a policy that optimizes over a longer timeframe. Suggest other ways to make progress on goal misgeneralization For the Shutdown Problem Contest, we're interested in submissions that do at least one of the following: Propose ideas for solving the shutdown problem or designing corrigible AIs. These submissions should also include (a) explanations for how these ideas address core challenges raised in the corrigibility paper and (b) possible limitations and ways the idea might fail Define The Shutdown Problem more rigorously or more empirically Propose new ways of thinking about corrigibility (e.g., ways to understand corrigibility within a deep learning paradigm) Strengthen existing approaches to training corrigible agents (e.g., by making them more detailed, exploring new applications, or describing how they could be implemented) Identify new challenges that will make it difficult to design corrigible agents Suggest other ways to make progress on corrigibility Why are you running these contests? We think that corrigibility and goal misgeneralization are two of the most important problems that make AI alignment difficult. We expect that people who can reason well about these problems will be well-suited for alignment research, and we believe that progress on these subproblems would be meaningful advances for the field of AI alignment. We also think that many people could potentially contribute to these problems (we're only aware of a handful of serious attempts at engaging with these challenges). Moreover, we think that tackling these problems will offer a good way for people to "think like an alignment researcher." We hope the contests will help us (a) find people who could become promising theoretical and empirical AI safety...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: GiveWell should use shorter TAI timelines, published by Oscar Delaney on October 27, 2022 on The Effective Altruism Forum. Summary GiveWell's discount rate of 4% includes a 1.4% contribution from ‘temporal uncertainty' arising from the possibility of major events radically changing the world. This is incompatible with the transformative artificial intelligence (TAI) timelines of many AI safety researchers. I argue that GiveWell should increase its discount rate, or at least provide a justification for differing significantly from the commonly-held (in EA) view that TAI could come soon. Epistemic Status: timelines are hard, and I don't have novel takes, but I worry perhaps GiveWell doesn't either, and they are dissenting unintentionally. In my accompanying post I argued GiveWell should use a probability distribution over discount rates, I will ignore that here though, and just consider whether their point estimate is appropriate. GiveWell's current discount rate of 4% is calculated as the sum of three factors. Quoting their explanations from this document: Improving circumstances over time: 1.7%. “Increases in consumption over time meaning marginal increases in consumption in the future are less valuable.” Compounding non-monetary benefits: 0.9%. “There are non-monetary returns not captured in our cost-effectiveness analysis which likely compound over time and are causally intertwined with consumption. These include reduced stress and improved nutrition.” Temporal uncertainty: 1.4%. “Uncertainty increases with projections into the future, meaning the projected benefits may fail to materialize. James [a GiveWell researcher] recommended a rate of 1.4% based on judgement on the annual likelihood of an unforeseen event or longer term change causing the expected benefits to not be realized. Examples of such events are major changes in economic structure, catastrophe, or political instability.” I do not have a good understanding of how these numbers were derived, and have no reason to think the first two are unfair estimates. I think the third is a significant underestimate. TAI is precisely the sort of “major change” meant to be captured by the temporal uncertainty factor. I have no insights to add on the question of TAI timelines, but I think absent GiveWell providing justification to the contrary, they should default towards using the timelines of people who have thought about this a lot. One such person is Ajeya Cotra, who in August reported a 50% credence in TAI being developed by 2040. I do not claim, and nor does Ajeya, that this is authoritative, however it seems a reasonable starting point for GiveWell to use, given they have not and (I think rightly) probably will not put significant work into forming independent timelines. Also in August, a broader survey of 738 experts by AI Impacts resulted in a median year for TAI of 2059. This source has the advantage of including many more people, but conversely most of them will have not spent much time thinking carefully about timelines. I will not give a sophisticated instantiation of what I propose, but rather gesture at what I think a good approach would be, and give a toy example to improve on the status quo. A naive thing to do would be to imagine that there is a fixed annual probability of developing TAI conditional on not having developed it to date. This method gives an annual probability of 3.8% under Ajeya's timelines, and 1.9% under the AI Impacts timelines.[1] In reality, more of our probability mass should be placed on the later years between now and 2040 (or 2059), and we should not simply stop at 2040 (or 2059). A proper model would likely need to dispense with a constant discount rate entirely, and instead track the probability the world has not seen a “major change” by each year. A model that accomplishes ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Lessons learned from talking to >100 academics about AI safety, published by Marius Hobbhahn on October 10, 2022 on The AI Alignment Forum. I'd like to thank MH, Jaime Sevilla and Tamay Besiroglu for their feedback. During my Master's and Ph.D. (still ongoing), I have spoken with many academics about AI safety. These conversations include chats with individual PhDs, poster presentations and talks about AI safety. I think I have learned a lot from these conversations and expect many other people concerned about AI safety to find themselves in similar situations. Therefore, I want to detail some of my lessons and make my thoughts explicit so that others can scrutinize them. TL;DR: People in academia seem more and more open to arguments about risks from advanced intelligence over time and I would genuinely recommend having lots of these chats. Furthermore, I underestimated how much work related to some aspects AI safety already exists in academia and that we sometimes reinvent the wheel. Messaging matters, e.g. technical discussions got more interest than alarmism and explaining the problem rather than trying to actively convince someone received better feedback. Executive summary I have talked to somewhere between 100 and 200 academics (depending on your definitions) ranging from bachelor students to senior faculty. I use a broad definition of “conversations”, i.e. they include small chats, long conversations, invited talks, group meetings, etc. Findings Most of the people I talked to were more open about the concerns regarding AI safety than I expected, e.g. they acknowledged that it is a problem and asked further questions to clarify the problem or asked how they could contribute. Often I learned something during these discussions. For example, the academic literature on interpretability and robustness is rich and I was pointed to resources I didn't yet know. Even in cases where I didn't learn new concepts, people scrutinized my reasoning such that my explanations got better and clearer over time. The higher up the career ladder the person was, the more likely they were to quickly dismiss the problem (this might not be true in general, I only talked with a handful of professors). Often people are much more concerned with intentional bad effects of AI, e.g. bad actors using AI tools for surveillance, than unintended side-effects from powerful AI. The intuition that “AI is just a tool and will just do what we want” seems very strong. There is a lot of misunderstanding about AI safety. Some people think AI safety is the same as fairness, self-driving cars or medical AI. I think this is an unfortunate failure of the AI safety community but is quickly getting better. Most people really dislike alarmist attitudes. If I motivated the problem with X-risk, I was less likely to be taken seriously. Most people are interested in the technical aspects, e.g. when I motivated the problem with uncontrollability or interpretability (rather than X-risk), people were more likely to find the topic interesting. Making the core arguments for “how deep learning could go wrong” as detailed, e.g. by Ajeya Cotra or Richard Ngo usually worked well. Many people were interested in how they could contribute. However, often they were more interested in reframing their specific topic to sound more like AI safety rather than making substantial changes to their research. I think this is understandable from a risk-reward perspective of the typical Ph.D. student. People are aware of the fact that AI safety is not an established field in academia and that working on it comes with risks, e.g. that you might not be able to publish or be taken seriously by other academics. In the end, even when people agree that AI safety is a really big problem and know that they could contribute, they rarely change their...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A review of the Bio-Anchors report, published by jylin04 on October 3, 2022 on LessWrong. This is a linkpost for a review of Ajeya Cotra's Biological Anchors report (see also update here) that I wrote in April 2022. It's since won a prize from the EA criticism and red-teaming contest, so I thought it might be good to share here for further discussion. Here's a summary from the judges of the red-teaming contest: This is a summary and critical review of Ajeya Cotra's biological anchors report on AI timelines. It provides an easy-to-understand overview of the main methodology of Cotra's report. It then examines and challenges central assumptions of the modelling in Cotra's report. First, the review looks at reasons why we might not expect 2022 architectures to scale to AGI. Second, it raises the point that we don't know how to specify a space of algorithmic architectures that contains something that could scale to AGI and can be efficiently searched through (inability to specify this could undermine the ability to take the evolutionary anchors from the report as a bound on timelines). Note that a link to a summary/review of the book Principles of Deep Learning Theory on page 8 is currently offline. I plan to put it on LW later this week. (Relatedly, I may not be around to engage much with comments right away, but I'll be back later in the week.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Read the full transcript here. What is Effective Altruism? Which parts of the Effective Altruism movement are good and not so good? Who outside of the EA movement are doing lots of good in the world? What are the psychological effects of thinking constantly about the trade-offs of spending resources on ourselves versus on others? To what degree is the EA movement centralized intellectually, financially, etc.? Does the EA movement's tendency to quantify everything, to make everything legible to itself, cause it to miss important features of the world? To what extent do EA people rationalize spending resources on inefficient or selfish projects by reframing them in terms of EA values? Is a feeling of tension about how to allocate our resources actually a good thing?Ajeya Cotra is a Senior Research Analyst at Open Philanthropy, a grantmaking organization that aims to do as much good as possible with its resources (broadly following effective altruist methodology); she mainly does research relevant to Open Phil's work on reducing existential risks from AI. Ajeya discovered effective altruism in high school through the book The Life You Can Save, and quickly became a major fan of GiveWell. As a student at UC Berkeley, she co-founded and co-ran the Effective Altruists of Berkeley student group, and taught a student-led course on EA. Listen to her 80,000 Hours podcast episode or visit her LessWrong author page for more info.Michael Nielsen was on the podcast back in episode 016. You can read more about him there! [Read more]
Read the full transcriptWhat is Effective Altruism? Which parts of the Effective Altruism movement are good and not so good? Who outside of the EA movement are doing lots of good in the world? What are the psychological effects of thinking constantly about the trade-offs of spending resources on ourselves versus on others? To what degree is the EA movement centralized intellectually, financially, etc.? Does the EA movement's tendency to quantify everything, to make everything legible to itself, cause it to miss important features of the world? To what extent do EA people rationalize spending resources on inefficient or selfish projects by reframing them in terms of EA values? Is a feeling of tension about how to allocate our resources actually a good thing?Ajeya Cotra is a Senior Research Analyst at Open Philanthropy, a grantmaking organization that aims to do as much good as possible with its resources (broadly following effective altruist methodology); she mainly does research relevant to Open Phil's work on reducing existential risks from AI. Ajeya discovered effective altruism in high school through the book The Life You Can Save, and quickly became a major fan of GiveWell. As a student at UC Berkeley, she co-founded and co-ran the Effective Altruists of Berkeley student group, and taught a student-led course on EA. Listen to her 80,000 Hours podcast episode or visit her LessWrong author page for more info.Michael Nielsen was on the podcast back in episode 016. You can read more about him there!
What is Effective Altruism? Which parts of the Effective Altruism movement are good and not so good? Who outside of the EA movement are doing lots of good in the world? What are the psychological effects of thinking constantly about the trade-offs of spending resources on ourselves versus on others? To what degree is the EA movement centralized intellectually, financially, etc.? Does the EA movement's tendency to quantify everything, to make everything legible to itself, cause it to miss important features of the world? To what extent do EA people rationalize spending resources on inefficient or selfish projects by reframing them in terms of EA values? Is a feeling of tension about how to allocate our resources actually a good thing?Ajeya Cotra is a Senior Research Analyst at Open Philanthropy, a grantmaking organization that aims to do as much good as possible with its resources (broadly following effective altruist methodology); she mainly does research relevant to Open Phil's work on reducing existential risks from AI. Ajeya discovered effective altruism in high school through the book The Life You Can Save, and quickly became a major fan of GiveWell. As a student at UC Berkeley, she co-founded and co-ran the Effective Altruists of Berkeley student group, and taught a student-led course on EA. Listen to her 80,000 Hours podcast episode or visit her LessWrong author page for more info.Michael Nielsen was on the podcast back in episode 016. You can read more about him there!