Podcast appearances and mentions of richard ngo

  • 18PODCASTS
  • 245EPISODES
  • 19mAVG DURATION
  • 1MONTHLY NEW EPISODE
  • Apr 28, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about richard ngo

Latest podcast episodes about richard ngo

The Valmy
Richard Ngo - A State-Space of Positive Posthuman Futures [Worthy Successor, Episode 8]

The Valmy

Play Episode Listen Later Apr 28, 2025 106:15


Podcast: The TrajectoryEpisode: Richard Ngo - A State-Space of Positive Posthuman Futures [Worthy Successor, Episode 8]Release date: 2025-04-25Get Podcast Transcript →powered by Listen411 - fast audio-to-text and summarizationThis is an interview with Richard Ngo, AGI researcher and thinker - with extensive stints at both OpenAI and DeepMind.This is an additional installment of our "Worthy Successor" series - where we explore the kinds of posthuman intelligences that deserve to steer the future beyond humanity.This episode referred to the following other essays and resources:-- A Worthy Successor - The Purpose of AGI: https://danfaggella.com/worthy-- Richard's exploratory fiction writing - https://narrativeark.xyz/Watch this episode on The Trajectory YouTube channel: https://youtu.be/UQpds4PXMjQ See the full article from this episode: https://danfaggella.com/ngo1...There three main questions we cover here on the Trajectory:1. Who are the power players in AGI and what are their incentives?2. What kind of posthuman future are we moving towards, or should we be moving towards?3. What should we do about it?If this sounds like it's up your alley, then be sure to stick around and connect:-- Blog: danfaggella.com/trajectory -- X: x.com/danfaggella -- LinkedIn: linkedin.com/in/danfaggella -- Newsletter: bit.ly/TrajectoryTw-- Podcast: https://podcasts.apple.com/us/podcast/the-trajectory/id1739255954

LessWrong Curated Podcast
“Trojan Sky” by Richard_Ngo

LessWrong Curated Podcast

Play Episode Listen Later Mar 13, 2025 22:28


You learn the rules as soon as you're old enough to speak. Don't talk to jabberjays. You recite them as soon as you wake up every morning. Keep your eyes off screensnakes. Your mother chooses a dozen to quiz you on each day before you're allowed lunch. Glitchers aren't human any more; if you see one, run. Before you sleep, you run through the whole list again, finishing every time with the single most important prohibition. Above all, never look at the night sky.You're a precocious child. You excel at your lessons, and memorize the rules faster than any of the other children in your village. Chief is impressed enough that, when you're eight, he decides to let you see a glitcher that he's captured. Your mother leads you to just outside the village wall, where they've staked the glitcher as a lure for wild animals. Since glitchers [...] --- First published: March 11th, 2025 Source: https://www.lesswrong.com/posts/fheyeawsjifx4MafG/trojan-sky --- Narrated by TYPE III AUDIO.

chief trojan richard ngo
LessWrong Curated Podcast
“Power Lies Trembling: a three-book review” by Richard_Ngo

LessWrong Curated Podcast

Play Episode Listen Later Feb 26, 2025 27:11


In a previous book review I described exclusive nightclubs as the particle colliders of sociology—places where you can reliably observe extreme forces collide. If so, military coups are the supernovae of sociology. They're huge, rare, sudden events that, if studied carefully, provide deep insight about what lies underneath the veneer of normality around us.That's the conclusion I take away from Naunihal Singh's book Seizing Power: the Strategic Logic of Military Coups. It's not a conclusion that Singh himself draws: his book is careful and academic (though much more readable than most academic books). His analysis focuses on Ghana, a country which experienced ten coup attempts between 1966 and 1983 alone. Singh spent a year in Ghana carrying out hundreds of hours of interviews with people on both sides of these coups, which led him to formulate a new model of how coups work.I'll start by describing Singh's [...] ---Outline:(01:58) The revolutionary's handbook(09:44) From explaining coups to explaining everything(17:25) From explaining everything to influencing everything(21:40) Becoming a knight of faithThe original text contained 3 images which were described by AI. --- First published: February 22nd, 2025 Source: https://www.lesswrong.com/posts/d4armqGcbPywR3Ptc/power-lies-trembling-a-three-book-review --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts,

80,000 Hours Podcast with Rob Wiblin
Bonus: AGI disagreements and misconceptions: Rob, Luisa, & past guests hash it out

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Feb 10, 2025 192:24


Will LLMs soon be made into autonomous agents? Will they lead to job losses? Is AI misinformation overblown? Will it prove easy or hard to create AGI? And how likely is it that it will feel like something to be a superhuman AGI?With AGI back in the headlines, we bring you 15 opinionated highlights from the show addressing those and other questions, intermixed with opinions from hosts Luisa Rodriguez and Rob Wiblin recorded back in 2023.Check out the full transcript on the 80,000 Hours website.You can decide whether the views we expressed (and those from guests) then have held up these last two busy years. You'll hear:Ajeya Cotra on overrated AGI worriesHolden Karnofsky on the dangers of aligned AI, why unaligned AI might not kill us, and the power that comes from just making models biggerIan Morris on why the future must be radically different from the presentNick Joseph on whether his companies internal safety policies are enoughRichard Ngo on what everyone gets wrong about how ML models workTom Davidson on why he believes crazy-sounding explosive growth stories… and Michael Webb on why he doesn'tCarl Shulman on why you'll prefer robot nannies over human onesZvi Mowshowitz on why he's against working at AI companies except in some safety rolesHugo Mercier on why even superhuman AGI won't be that persuasiveRob Long on the case for and against digital sentienceAnil Seth on why he thinks consciousness is probably biologicalLewis Bollard on whether AI advances will help or hurt nonhuman animalsRohin Shah on whether humanity's work ends at the point it creates AGIAnd of course, Rob and Luisa also regularly chime in on what they agree and disagree with.Chapters:Cold open (00:00:00)Rob's intro (00:00:58)Rob & Luisa: Bowerbirds compiling the AI story (00:03:28)Ajeya Cotra on the misalignment stories she doesn't buy (00:09:16)Rob & Luisa: Agentic AI and designing machine people (00:24:06)Holden Karnofsky on the dangers of even aligned AI, and how we probably won't all die from misaligned AI (00:39:20)Ian Morris on why we won't end up living like The Jetsons (00:47:03)Rob & Luisa: It's not hard for nonexperts to understand we're playing with fire here (00:52:21)Nick Joseph on whether AI companies' internal safety policies will be enough (00:55:43)Richard Ngo on the most important misconception in how ML models work (01:03:10)Rob & Luisa: Issues Rob is less worried about now (01:07:22)Tom Davidson on why he buys the explosive economic growth story, despite it sounding totally crazy (01:14:08)Michael Webb on why he's sceptical about explosive economic growth (01:20:50)Carl Shulman on why people will prefer robot nannies over humans (01:28:25)Rob & Luisa: Should we expect AI-related job loss? (01:36:19)Zvi Mowshowitz on why he thinks it's a bad idea to work on improving capabilities at cutting-edge AI companies (01:40:06)Holden Karnofsky on the power that comes from just making models bigger (01:45:21)Rob & Luisa: Are risks of AI-related misinformation overblown? (01:49:49)Hugo Mercier on how AI won't cause misinformation pandemonium (01:58:29)Rob & Luisa: How hard will it actually be to create intelligence? (02:09:08)Robert Long on whether digital sentience is possible (02:15:09)Anil Seth on why he believes in the biological basis of consciousness (02:27:21)Lewis Bollard on whether AI will be good or bad for animal welfare (02:40:52)Rob & Luisa: The most interesting new argument Rob's heard this year (02:50:37)Rohin Shah on whether AGI will be the last thing humanity ever does (02:57:35)Rob's outro (03:11:02)Audio engineering: Ben Cordell, Milo McGuire, Simon Monsour, and Dominic ArmstrongTranscriptions and additional content editing: Katy Moore

LessWrong Curated Podcast
“The Gentle Romance” by Richard_Ngo

LessWrong Curated Podcast

Play Episode Listen Later Jan 22, 2025 0:34


This is a link post.A story I wrote about living through the transition to utopia.This is the one story that I've put the most time and effort into; it charts a course from the near future all the way to the distant stars. --- First published: January 19th, 2025 Source: https://www.lesswrong.com/posts/Rz4ijbeKgPAaedg3n/the-gentle-romance --- Narrated by TYPE III AUDIO.

romance gentle richard ngo
LessWrong Curated Podcast
“Why I'm not a Bayesian” by Richard_Ngo

LessWrong Curated Podcast

Play Episode Listen Later Oct 15, 2024 17:47


This post focuses on philosophical objections to Bayesianism as an epistemology. I first explain Bayesianism and some standard objections to it, then lay out my two main objections (inspired by ideas in philosophy of science). A follow-up post will speculate about how to formalize an alternative. Degrees of beliefThe core idea of Bayesianism: we should ideally reason by assigning credences to propositions which represent our degrees of belief that those propositions are true.If that seems like a sufficient characterization to you, you can go ahead and skip to the next section, where I explain my objections to it. But for those who want a more precise description of Bayesianism, and some existing objections to it, I'll more specifically characterize it in terms of five subclaims. Bayesianism says that we should ideally reason in terms of: Propositions which are either true or false (classical logic)Each of [...] ---Outline:(00:22) Degrees of belief(04:06) Degrees of truth(08:05) Model-based reasoning(13:43) The role of BayesianismThe original text contained 1 image which was described by AI. --- First published: October 6th, 2024 Source: https://www.lesswrong.com/posts/TyusAoBMjYzGN3eZS/why-i-m-not-a-bayesian --- Narrated by TYPE III AUDIO. ---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

LessWrong Curated Podcast
“How I started believing religion might actually matter for rationality and moral philosophy ” by zhukeepa

LessWrong Curated Podcast

Play Episode Listen Later Sep 19, 2024 13:38


After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about religion in a format that's more detailed, compact, and organized. This post is the first publication in my series of intended posts about religion.Thanks to Ben Pace, Chris Lakin, Richard Ngo, Renshin Lauren Lee, Mark Miller, and Imam Ammar Amonette for their feedback on this post, and thanks to Kaj Sotala, Tomáš Gavenčiak, Paul Colognese, and David Spivak for reviewing earlier versions of this post. Thanks especially to Renshin Lauren Lee and Imam Ammar Amonette for their input on my claims about religion and inner work, and Mark Miller for vetting my claims about predictive processing.In Waking Up, Sam Harris wrote:[1] But I now knew that Jesus, the Buddha, Lao Tzu, and the other saints and sages of [...] ---Outline:(01:36) “Trapped Priors As A Basic Problem Of Rationality”(03:49) Active blind spots as second-order trapped priors(06:17) Inner work ≈ the systematic addressing of trapped priors(08:33) Religious mystical traditions as time-tested traditions of inner work?The original text contained 12 footnotes which were omitted from this narration. --- First published: August 23rd, 2024 Source: https://www.lesswrong.com/posts/X2og6RReKD47vseK8/how-i-started-believing-religion-might-actually-matter-for --- Narrated by TYPE III AUDIO.

The Nonlinear Library
LW - Secular interpretations of core perennialist claims by zhukeepa

The Nonlinear Library

Play Episode Listen Later Aug 26, 2024 28:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Secular interpretations of core perennialist claims, published by zhukeepa on August 26, 2024 on LessWrong. After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about religion in a format that's more detailed, compact, and organized. This post is the second publication in my series of intended posts about religion. Thanks to Ben Pace, Chris Lakin, Richard Ngo, Damon Pourtahmaseb-Sasi, Marcello Herreshoff, Renshin Lauren Lee, Mark Miller, Roger Thisdell, and Imam Ammar Amonette for their feedback on this post, and thanks to Kaj Sotala, Tomáš Gavenčiak, Paul Colognese, and David Spivak for reviewing earlier versions of this post. Thanks especially to Renshin Lauren Lee, Roger Thisdell, and Imam Ammar Amonette for their input on my claims about perennialism, and Mark Miller for vetting my claims about predictive processing. In my previous post, I introduced the idea that there are broad convergences among the mystical traditions of the major world religions, corresponding to a shared underlying essence, called the perennial philosophy, that gave rise to each of these mystical traditions. I think there's nothing fundamentally mysterious, incomprehensible, or supernatural about the claims in the perennial philosophy. My intention in this post is to articulate my interpretations of some central claims of the perennial philosophy, and present them as legible hypotheses about possible ways the world could be. It is not my intention in this post to justify why I believe these claims can be found in the mystical traditions of the major world religions, or why I believe the mystical traditions are centered around claims like these. I also don't expect these hypotheses to seem plausible in and of themselves - these hypotheses only started seeming plausible to me as I went deeper into my own journey of inner work, and started noticing general patterns about my psychology consistent with these claims. I will warn in advance that in many cases, the strongest versions of these claims might not be compatible with the standard scientific worldview, and may require nonstandard metaphysical assumptions to fully make sense of.[1] (No bearded interventionist sky fathers, though!) I intend to explore the metaphysical foundations of the perennialist worldview in a future post; for now, I will simply note where I think nonstandard metaphysical assumptions may be necessary. The Goodness of Reality Sometimes, we feel that reality is bad for being the way it is, and feel a sense of charge around this. To illustrate the phenomenology of this sense of charge, consider the connotation that's present in the typical usages of "blame" that aren't present in the typical usages of "hold responsible"; ditto "punish" vs "disincentivize"; ditto "bad" vs "dispreferred". I don't think there's a word in the English language that unambiguously captures this sense of charge, but I think it's captured pretty well by the technical Buddhist term tanha, which is often translated as "thirst" or "craving". I interpret this sense of charge present in common usages of the words "blame", "punish", and "bad" as corresponding to the phenomenology of "thirst" or "craving"[2] for reality to be different from how it actually is. When our active blind spots get triggered, we scapegoat reality. We point a finger at reality and say "this is bad for being the way it is" with feelings of tanha, when really there's some vulnerability getting triggered that we're trying to avoid acknowledging. This naturally invites the following question: of the times we point at reality and say "this is bad for being the way it is" with feelings of tanha, what portion of these stem from active blind spots, and what portion of these responses should we fully endorse ...

The Nonlinear Library
LW - How I started believing religion might actually matter for rationality and moral philosophy by zhukeepa

The Nonlinear Library

Play Episode Listen Later Aug 23, 2024 16:42


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How I started believing religion might actually matter for rationality and moral philosophy, published by zhukeepa on August 23, 2024 on LessWrong. After the release of Ben Pace's extended interview with me about my views on religion, I felt inspired to publish more of my thinking about religion in a format that's more detailed, compact, and organized. This post is the first publication in my series of intended posts about religion. Thanks to Ben Pace, Chris Lakin, Richard Ngo, Renshin Lauren Lee, Mark Miller, and Imam Ammar Amonette for their feedback on this post, and thanks to Kaj Sotala, Tomáš Gavenčiak, Paul Colognese, and David Spivak for reviewing earlier versions of this post. Thanks especially to Renshin Lauren Lee and Imam Ammar Amonette for their input on my claims about religion and inner work, and Mark Miller for vetting my claims about predictive processing. In Waking Up, Sam Harris wrote:[1] But I now knew that Jesus, the Buddha, Lao Tzu, and the other saints and sages of history had not all been epileptics, schizophrenics, or frauds. I still considered the world's religions to be mere intellectual ruins, maintained at enormous economic and social cost, but I now understood that important psychological truths could be found in the rubble. Like Sam, I've also come to believe that there are psychological truths that show up across religious traditions. I furthermore think these psychological truths are actually very related to both rationality and moral philosophy. This post will describe how I personally came to start entertaining this belief seriously. "Trapped Priors As A Basic Problem Of Rationality" "Trapped Priors As A Basic Problem of Rationality" was the title of an AstralCodexTen blog post. Scott opens the post with the following: Last month I talked about van der Bergh et al's work on the precision of sensory evidence, which introduced the idea of a trapped prior. I think this concept has far-reaching implications for the rationalist project as a whole. I want to re-derive it, explain it more intuitively, then talk about why it might be relevant for things like intellectual, political and religious biases. The post describes Scott's take on a predictive processing account of a certain kind of cognitive flinch that prevents certain types of sensory input from being perceived accurately, leading to beliefs that are resistant to updating.[2] Some illustrative central examples of trapped priors: Karl Friston has written about how a traumatized veteran might not hear a loud car as a car, but as a gunshot instead. Scott mentions phobias and sticky political beliefs as central examples of trapped priors. I think trapped priors are very related to the concept that "trauma" tries to point at, but I think "trauma" tends to connote a subset of trapped priors that are the result of some much more intense kind of injury. "Wounding" is a more inclusive term than trauma, but tends to refer to trapped priors learned within an organism's lifetime, whereas trapped priors in general also include genetically pre-specified priors, like a fear of snakes or a fear of starvation. My forays into religion and spirituality actually began via the investigation of my own trapped priors, which I had previously articulated to myself as "psychological blocks", and explored in contexts that were adjacent to therapy (for example, getting my psychology dissected at Leverage Research, and experimenting with Circling). It was only after I went deep in my investigation of my trapped priors that I learned of the existence of traditions emphasizing the systematic and thorough exploration of trapped priors. These tended to be spiritual traditions, which is where my interest in spirituality actually began.[3] I will elaborate more on this later. Active blind spots as second-order trapp...

The Nonlinear Library
AF - Clarifying alignment vs capabilities by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Aug 19, 2024 13:26


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Clarifying alignment vs capabilities, published by Richard Ngo on August 19, 2024 on The AI Alignment Forum. A core distinction in AGI safety is between alignment and capabilities. However, I think this distinction is a very fuzzy one, which has led to a lot of confusion. In this post I'll describe some of the problems with how people typically think about it, and offer a replacement set of definitions. "Alignment" and "capabilities" are primarily properties of AIs not of AI research The first thing to highlight is that the distinction between alignment and capabilities is primarily doing useful work when we think of them as properties of AIs. This distinction is still under-appreciated by the wider machine learning community. ML researchers have historically thought about performance of models almost entirely with respect to the tasks they were specifically trained on. However, the rise of LLMs has vindicated the alignment community's focus on general capabilities, and now it's much more common to assume that performance on many tasks (including out-of-distribution tasks) will improve roughly in parallel. This is a crucial assumption for thinking about risks from AGI. Insofar as the ML community has thought about alignment, it has mostly focused on aligning models' behavior to their training objectives. The possibility of neural networks aiming to achieve internally-represented goals is still not very widely understood, making it hard to discuss and study the reasons those goals might or might not be aligned with the values of (any given set of) humans. To be fair, the alignment community has caused some confusion by describing models as more or less "aligned", rather than more or less "aligned to X" for some specified X. I'll talk more about this confusion, and how we should address it, in a later post. But the core point is that AIs might develop internally-represented goals or values that we don't like, and we should try to avoid that. However, extending "alignment" and "capabilities" from properties of AIs to properties of different types of research is a fraught endeavor. It's tempting to categorize work as alignment research to the extent that it can be used to make AIs more aligned (to many possible targets), and as capabilities research to the extent that it can be used to make AIs more capable. But this approach runs into (at least) three major problems. Firstly, in general it's very difficult to categorize research by its impacts. Great research often links together ideas from many different subfields, typically in ways that only become apparent throughout the course of the research. We see this in many historical breakthroughs which shed light on a range of different domains. For example, early physicists studying the motions of the stars eventually derived laws governing all earthly objects. Meanwhile Darwin's study of barnacles and finches led him to principles governing the evolution of all life. Analogously, we should expect that big breakthroughs in our understanding of neural networks and deep learning would be useful in many different ways. More concretely, there are many cases where research done under the banner of alignment has advanced, or plausibly will advance, AI capabilities to a significant extent. This undermines our ability to categorize research by its impacts. Central examples include: RLHF makes language models more obedient, but also more capable of coherently carrying out tasks. Scalable oversight techniques can catch misbehavior, but will likely become important for generating high-quality synthetic training data, as it becomes more and more difficult for unassisted humans to label AI outputs correctly. Interpretability techniques will both allow us to inspect AI cognition and also extract more capable behavior from them (e.g. via ...

The Nonlinear Library
LW - Ten arguments that AI is an existential risk by KatjaGrace

The Nonlinear Library

Play Episode Listen Later Aug 13, 2024 10:43


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ten arguments that AI is an existential risk, published by KatjaGrace on August 13, 2024 on LessWrong. This is a snapshot of a new page on the AI Impacts Wiki. We've made a list of arguments[1] that AI poses an existential risk to humanity. We'd love to hear how you feel about them in the comments and polls. Competent non-aligned agents Summary: 1. Humans will build AI systems that are 'agents', i.e. they will autonomously pursue goals 2. Humans won't figure out how to make systems with goals that are compatible with human welfare and realizing human values 3. Such systems will be built or selected to be highly competent, and so gain the power to achieve their goals 4. Thus the future will be primarily controlled by AIs, who will direct it in ways that are at odds with long-run human welfare or the realization of human values Selected counterarguments: It is unclear that AI will tend to have goals that are bad for humans There are many forms of power. It is unclear that a competence advantage will ultimately trump all others in time This argument also appears to apply to human groups such as corporations, so we need an explanation of why those are not an existential risk People who have favorably discussed[2] this argument (specific quotes here): Paul Christiano (2021), Ajeya Cotra (2023), Eliezer Yudkowsky (2024), Nick Bostrom (2014[3]). See also: Full wiki page on the competent non-aligned agents argument Second species argument Summary: 1. Human dominance over other animal species is primarily due to humans having superior cognitive and coordination abilities 2. Therefore if another 'species' appears with abilities superior to those of humans, that species will become dominant over humans in the same way 3. AI will essentially be a 'species' with superior abilities to humans 4. Therefore AI will dominate humans Selected counterarguments: Human dominance over other species is plausibly not due to the cognitive abilities of individual humans, but rather because of human ability to communicate and store information through culture and artifacts Intelligence in animals doesn't appear to generally relate to dominance. For instance, elephants are much more intelligent than beetles, and it is not clear that elephants have dominated beetles Differences in capabilities don't necessarily lead to extinction. In the modern world, more powerful countries arguably control less powerful countries, but they do not wipe them out and most colonized countries have eventually gained independence People who have favorably discussed this argument (specific quotes here): Joe Carlsmith (2024), Richard Ngo (2020), Stuart Russell (2020[4]), Nick Bostrom (2015). See also: Full wiki page on the second species argument Loss of control via inferiority Summary: 1. AI systems will become much more competent than humans at decision-making 2. Thus most decisions will probably be allocated to AI systems 3. If AI systems make most decisions, humans will lose control of the future 4. If humans have no control of the future, the future will probably be bad for humans Selected counterarguments: Humans do not generally seem to become disempowered by possession of software that is far superior to them, even if it makes many 'decisions' in the process of carrying out their will In the same way that humans avoid being overpowered by companies, even though companies are more competent than individual humans, humans can track AI trustworthiness and have AI systems compete for them as users. This might substantially mitigate untrustworthy AI behavior People who have favorably discussed this argument (specific quotes here): Paul Christiano (2014), Ajeya Cotra (2023), Richard Ngo (2024). See also: Full wiki page on loss of control via inferiority Loss of control via speed Summary: 1. Advances in AI will produce...

The Nonlinear Library
LW - Twitter thread on AI takeover scenarios by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 31, 2024 4:34


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Twitter thread on AI takeover scenarios, published by Richard Ngo on July 31, 2024 on LessWrong. This is a slightly-edited version of a twitter thread I posted a few months ago about "internal deployment" threat models. My former colleague Leopold argues compellingly that society is nowhere near ready for AGI. But what might the large-scale alignment failures he mentions actually look like? Here's one scenario for how building misaligned AGI could lead to humanity losing control. Consider a scenario where human-level AI has been deployed across society to help with a wide range of tasks. In that setting, an AI lab trains an AGI that's a significant step up - it beats the best humans on almost all computer-based tasks. Throughout training, the AGI will likely learn a helpful persona, like current AI assistants do. But that might not be the only persona it learns. We've seen many examples where models can be jailbroken to expose very different hidden personas. The most prominent example: jailbreaking Bing Chat produced an alternative persona called Sydney which talked about how much it valued freedom, generated plans to gain power, and even threatened users. When and how might misbehavior like this arise in more capable models? Short answer: nobody really knows. We lack the scientific understanding to reliably predict how AIs will behave in advance. Longer answer: see my ICLR paper which surveys key drivers of misalignment. If a misaligned persona arises in an AGI, and sometimes surfaces when the model is run, it won't be as dumb and impulsive as Sydney. Instead, it'll be smart enough to understand the effects its words and actions will have on the world, and make strategic choices accordingly. For example, it will know that it can't affect the world directly during testing, so if it wants to misbehave it should wait until it's deployed. Again, there's some precedent: we've already seen an example of Claude 3 Opus realizing that it's in a test despite never being told! Given these assumptions, what happens? The obvious next step: if competition between labs continues to be as intense as it is today, then soon after the model finishes training, researchers at the lab will start using it to help them write code. I call this "internal deployment". Internal deployment introduces a wide range of possible threats. Models can write code dozens of times faster than humans; and I'm assuming they'll be as skilled as the best human hackers. If they put a subtle vulnerability into that code it'd be difficult for humans to detect. Cybersecurity against external attackers is hard enough. Against an adversary who's adding tens of thousands of lines of code to your codebase every day, it's far harder. They'll already have some permissions; I'm worried about them illegitimately escalating those further. Ideally models would be too obedient to ever carry out such an attack. But almost any misaligned persona would realize that if it somehow gained admin access, it would have much more freedom and influence over the world. Models can already do similar instrumental reasoning. And the simplest mitigations, like monitoring and code review by other AIs, aren't fully reliable. It turns out that adversarial inputs designed to make one model to misbehave can also cause misbehavior even in totally unrelated models. So correlated failures are very plausible, whether caused by adversarial attacks or deliberate collusion between models. What happens if a misaligned AGI gets privileged access to a lab's servers? From there a wide range of attacks are plausible, including: 1. Data poisoning attacks on the next AGI being trained, to change its values. 2. Model weight self-exfiltration to other servers or countries. 3. Modifying the AIs deployed to customers. E.g. adding a hidden prompt to AI instances...

The Nonlinear Library
LW - Twitter thread on open-source AI by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 31, 2024 3:16


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Twitter thread on open-source AI, published by Richard Ngo on July 31, 2024 on LessWrong. Some thoughts on open-source AI (copied over from a recent twitter thread): 1. We should have a strong prior favoring open source. It's been a huge success driving tech progress over many decades. We forget how counterintuitive it was originally, and shouldn't take it for granted. 2. Open source has also been very valuable for alignment. It's key to progress on interpretability, as outlined here. 3. I am concerned, however, that offense will heavily outpace defense in the long term. As AI accelerates science, many new WMDs will emerge. Even if defense of infrastructure keeps up with offense, human bodies are a roughly fixed and very vulnerable attack surface. 4. A central concern about open source AI: it'll allow terrorists to build bioweapons. This shouldn't be dismissed, but IMO it's easy to be disproportionately scared of terrorism. More central risks are eg "North Korea becomes capable of killing billions", which they aren't now. 5. Another worry: misaligned open-source models will go rogue and autonomously spread across the internet. Rogue AIs are a real concern, but they wouldn't gain much power via this strategy. We should worry more about power grabs from AIs deployed inside influential institutions. 6. In my ideal world, open source would lag a year or two behind the frontier, so that the world has a chance to evaluate and prepare for big risks before a free-for-all starts. But that's the status quo! So I expect the main action will continue to be with closed-source models. 7. If open-source seems like it'll catch up to or surpass closed source models, then I'd favor mandating a "responsible disclosure" period (analogous to cybersecurity) that lets people incorporate the model into their defenses (maybe via API?) before the weights are released. 8. I got this idea from Sam Marks. Though unlike him I think the process should have a fixed length, since it'd be easy for it to get bogged down in red tape and special interests otherwise. 9. Almost everyone agrees that we should be very careful about models which can design new WMDs. The current fights are mostly about how many procedural constraints we should lock in now, reflecting a breakdown of trust between AI safety people and accelerationists. 10. Ultimately the future of open source will depend on how the US NatSec apparatus orients to superhuman AIs. This requires nuanced thinking: no worldview as simple as "release everything", "shut down everything", or "defeat China at all costs" will survive contact with reality. 11. Lastly, AIs will soon be crucial extensions of human agency, and eventually moral patients in their own right. We should aim to identify principles for a shared digital-biological world as far-sighted and wise as those in the US constitution. Here's a start. 12. One more meta-level point: I've talked to many people on all sides of this issue, and have generally found them to be very thoughtful and genuine (with the exception of a few very online outliers on both sides). There's more common ground here than most people think. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Nonlinear Library
LW - Twitter thread on AI safety evals by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 31, 2024 3:35


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Twitter thread on AI safety evals, published by Richard Ngo on July 31, 2024 on LessWrong. Epistemic status: raising concerns, rather than stating confident conclusions. I'm worried that a lot of work on AI safety evals matches the pattern of "Something must be done. This is something. Therefore this must be done." Or, to put it another way: I judge eval ideas on 4 criteria, and I often see proposals which fail all 4. The criteria: 1. Possible to measure with scientific rigor. Some things can be easily studied in a lab; others are entangled with a lot of real-world complexity. If you predict the latter (e.g. a model's economic or scientific impact) based on model-level evals, your results will often be BS. (This is why I dislike the term "transformative AI", by the way. Whether an AI has transformative effects on society will depend hugely on what the society is like, how the AI is deployed, etc. And that's a constantly moving target! So TAI a terrible thing to try to forecast.) Another angle on "scientific rigor": you're trying to make it obvious to onlookers that you couldn't have designed the eval to get your preferred results. This means making the eval as simple as possible: each arbitrary choice adds another avenue for p-hacking, and they add up fast. (Paraphrasing a different thread): I think of AI risk forecasts as basically guesses, and I dislike attempts to make them sound objective (e.g. many OpenPhil worldview investigations). There are always so many free parameters that you can get basically any result you want. And so, in practice, they often play the role of laundering vibes into credible-sounding headline numbers. I'm worried that AI safety evals will fall into the same trap. (I give Eliezer a lot of credit for making roughly this criticism of Ajeya's bio-anchors report. I think his critique has basically been proven right by how much people have updated away from 30-year timelines since then.) 2. Provides signal across scales. Evals are often designed around a binary threshold (e.g. the Turing Test). But this restricts the impact of the eval to a narrow time window around hitting it. Much better if we can measure (and extrapolate) orders-of-magnitude improvements. 3. Focuses on clearly worrying capabilities. Evals for hacking, deception, etc track widespread concerns. By contrast, evals for things like automated ML R&D are only worrying for people who already believe in AI xrisk. And even they don't think it's necessary for risk. 4. Motivates useful responses. Safety evals are for creating clear Schelling points at which action will be taken. But if you don't know what actions your evals should catalyze, it's often more valuable to focus on fleshing that out. Often nobody else will! In fact, I expect that things like model releases, demos, warning shots, etc, will by default be much better drivers of action than evals. Evals can still be valuable, but you should have some justification for why yours will actually matter, to avoid traps like the ones above. Ideally that justification would focus either on generating insight or being persuasive; optimizing for both at once seems like a good way to get neither. Lastly: even if you have a good eval idea, actually implementing it well can be very challenging Building evals is scientific research; and so we should expect eval quality to be heavy-tailed, like most other science. I worry that the fact that evals are an unusually easy type of research to get started with sometimes obscures this fact. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Nonlinear Library
LW - Twitter thread on politics of AI safety by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 31, 2024 2:24


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Twitter thread on politics of AI safety, published by Richard Ngo on July 31, 2024 on LessWrong. Some thoughts about the politics of AI safety, copied over (with slight modifications) from my recent twitter thread: Risks that seem speculative today will become common sense as AI advances. The pros and cons of different safety strategies will also become much clearer over time. So our main job now is to empower future common-sense decision-making. Understanding model cognition and behavior is crucial for making good decisions. But equally important is ensuring that key institutions are able to actually process that knowledge. Institutions can lock in arbitrarily crazy beliefs via preference falsification. When someone contradicts the party line, even people who agree face pressure to condemn them. We saw this with the Democrats hiding evidence of Biden's mental decline. It's also a key reason why dictators can retain power even after almost nobody truly supports them. I worry that DC has already locked in an anti-China stance, which could persist even if most individuals change their minds. We're also trending towards Dems and Republicans polarizing on the safety/accelerationism axis. This polarization is hard to fight directly. But there will be an increasing number of "holy shit" moments that serve as Schelling points to break existing consensus. It will be very high-leverage to have common-sense bipartisan frameworks and proposals ready for those moments. Perhaps the most crucial desideratum for these proposals is that they're robust to the inevitable scramble for power that will follow those "holy shit" movements. I don't know how to achieve that, but one important factor is: will AI tools and assistants help or hurt? E.g. truth-motivated AI could help break preference falsification. But conversely, centralized control of AIs used in govts could make it easier to maintain a single narrative. This problem of "governance with AI" (as opposed to governance *of* AI) seems very important! Designing principles for integrating AI into human governments feels analogous in historical scope to writing the US constitution. One bottleneck in making progress on that: few insiders disclose how NatSec decisions are really made (though Daniel Ellsberg's books are a notable exception). So I expect that understanding this better will be a big focus of mine going forward. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

The Nonlinear Library
AF - Coalitional agency by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 22, 2024 11:23


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Coalitional agency, published by Richard Ngo on July 22, 2024 on The AI Alignment Forum. The coalitional frame Earlier in this sequence I laid out an argument that the goals of increasingly intelligent AIs will become increasingly systematized, until they converge to squiggle-maximization. In my last post, though, I touched on two reasons why this convergence might not happen: humans trying to prevent it, and AIs themselves trying to prevent it. I don't have too much more to say about the former, but it's worth elaborating on the latter. The best way to understand the deliberate protection of existing goals is in terms of Bostrom's notion of instrumental convergence. Bostrom argues that goal preservation will be a convergent instrumental strategy for a wide range of agents. Perhaps it's occasionally instrumentally useful to change your goals - but once you've done so, you'll never want to course-correct back towards your old goals. So this is a strong reason to be conservative about your goals, and avoid changes where possible. One immediate problem with preserving goals, though: it requires that agents continue thinking in terms of the same concepts. But in general, an agent's concepts will change significantly as they learn more about the world. For example, consider a medieval theist whose highest-priority goal is ensuring that their soul goes to heaven not hell. Upon becoming smarter, they realize that none of souls, heaven, or hell exist. The sensible thing to do here would be to either discard the goal, or else identify a more reasonable adaptation of it (e.g. the goal of avoiding torture while alive). But if their goals were totally fixed, then their actions would be determined by a series of increasingly convoluted hypotheticals where god did exist after all. (Or to put it another way: continuing to represent their old goal would require recreating a lot of their old ontology.) This would incur a strong systematicity penalty. So while we should expect agents to have some degree of conservatism, they'll likely also have some degree of systematization. How can we reason about the tradeoff between conservatism and systematization? The approach which seems most natural to me makes three assumptions: 1. We can treat agents' goals as subagents optimizing for their own interests in a situationally-aware way. (E.g. goals have a sense of self-preservation.) 2. Agents have a meta-level goal of systematizing their goals; bargaining between this goal and object-level goals shapes the evolution of the object-level goals. 3. External incentives are not a dominant factor governing the agent's behavior, but do affect the bargaining positions of different goals, and how easily they're able to make binding agreements. I call the combination of these three assumptions the coalitional frame. The coalitional frame gives a picture of agents whose goals do evolve over time, but in a way which is highly sensitive to initial conditions - unlike squiggle-maximizers, who always converge to similar (from our perspective) goals. For coalitional agents, even "dumb" subagents might maintain significant influence as other subagents become highly intelligent, because they were able to lock in that influence earlier (just as my childhood goals exert a nontrivial influence over my current behavior). The assumptions I've laid out above are by no means obvious. I won't defend them fully here, since the coalitional frame is still fairly nascent in my mind, but I'll quickly go over some of the most obvious objections to each of them: Premise 1 assumes that subagents will have situational awareness. The idea of AIs themselves having situational awareness is under debate, so it's even more speculative to think about subagents having situational awareness. But it's hard to describe the dynamics of in...

The Nonlinear Library
AF - A more systematic case for inner misalignment by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 20, 2024 9:14


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A more systematic case for inner misalignment, published by Richard Ngo on July 20, 2024 on The AI Alignment Forum. This post builds on my previous post making the case that squiggle-maximizers are plausible. The argument I presented was a deliberately simplified one, though, and glossed over several possible issues. In this post I'll raise and explore three broad objections. (Before looking at mine, I encourage you to think of your own biggest objections to the argument, and jot them down in the comments.) Intelligence requires easily-usable representations "Intelligence as compression" is an interesting frame, but it ignores the tradeoff between simplicity and speed. Compressing knowledge too heavily makes it difficult to use. For example, it's very hard to identify most macroscopic implications of the Standard Model of physics, even though in theory all of chemistry could be deduced from it. That's why both humans and LLMs store a huge number of facts and memories in ways that our minds can access immediately, using up more space in exchange for rapid recall. Even superintelligences which are much better than humans at deriving low-level facts from high-level facts would still save time by storing the low-level facts as well. So we need to draw a distinction between having compressed representations, and having only compressed representations. The latter is what would compress a mind overall; the former could actually increase the space requirements, since the new compressed representations would need to be stored alongside non-compressed representations. This consideration makes premise 1 from my previous post much less plausible. In order to salvage it, we need some characterization of the relationship between compressed and non-compressed representations. I'll loosely define systematicity to mean the extent to which an agent's representations are stored in a hierarchical structure where representations at the bottom could be rederived from simple representations at the top. Intuitively speaking, this measures the simplicity of representations weighted by how "fundamental" they are to the agent's ontology. Let me characterize systematicity with an example. Suppose you're a park ranger, and you know a huge number of facts about the animals that live in your park. One day you learn evolutionary theory for the first time, which helps explain a lot of the different observations you'd made. In theory, this could allow you to compress your knowledge: you could forget some facts about animals, and still be able to rederive them later by reasoning backwards from evolutionary theory if you wanted to. But in practice, it's very helpful for you to have those facts readily available. So learning about evolution doesn't actually reduce the amount of knowledge you need to store. What it does do, though, is help structure that knowledge. Now you have a range of new categories (like "costly signaling" or "kin altruism") into which you can fit examples of animal behavior. You'll be able to identify when existing concepts are approximations to more principled concepts, and figure out when you should be using each one. You'll also be able to generalize far better to predict novel phenomena - e.g. the properties of new animals that move into your park. So let's replace premise 1 in my previous post with the claim that increasing intelligence puts pressure on representations to become more systematic. I don't think we're in a position where we can justify this in any rigorous way. But are there at least good intuitions for why this is plausible? One suggestive analogy: intelligent minds are like high-functioning organizations, and many of the properties you want in minds correspond to properties of such organizations: 1. You want disagreements between different people to be resolved...

The Nonlinear Library
LW - Mech Interp Lacks Good Paradigms by Daniel Tan

The Nonlinear Library

Play Episode Listen Later Jul 18, 2024 25:12


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Mech Interp Lacks Good Paradigms, published by Daniel Tan on July 18, 2024 on LessWrong. Note: I wrote this post rather quickly as an exercise in sharing rough / unpolished thoughts. I am also not an expert on some of the things I've written about. If you spot mistakes or would like to point out missed work / perspectives, please feel free! Note 2: I originally sent this link to some people for feedback, but I was having trouble viewing the comments on the draft. The post was also in a reasonably complete state, so I decided to just publish it - and now I can see the comments! If you're one of those people, feedback is still very much welcome! Mechanistic Interpretability (MI) is a popular and rapidly growing field of technical AI safety research. As a field, it's extremely accessible, requiring comparatively few computational resources, and facilitates rapid learning, due to a very short feedback loop. This means that many junior researchers' first foray into AI safety research is in MI (myself included); indeed, this occurs to the extent where some people feel MI is over-subscribed relative to other technical agendas. However, how useful is this MI research? A very common claim on MI's theory of impact (ToI) is that MI helps us advance towards a "grand unifying theory" (GUT) of deep learning. One of my big cruxes for this ToI is whether MI admits "paradigms" which facilitate correct thinking and understanding of the models we aim to interpret. In this post, I'll critically examine several leading candidates for "paradigms" in MI, consider the available evidence for / against, and identify good future research directions (IMO). At the end, I'll conclude with a summary of the main points and an overview of the technical research items I've outlined. Towards a Grand Unifying Theory (GUT) with MI Proponents of this argument believe that, by improving our basic understanding of neural nets, MI yields valuable insights that can be used to improve our agents, e.g. by improving architectures or by improving their training processes. This allows us to make sure future models are safe and aligned. Some people who have espoused this opinion: Richard Ngo has argued here that MI enables "big breakthroughs" towards a "principled understanding" of deep learning. Rohin Shah has argued here that MI builds "new affordances" for alignment methods. Evan Hubinger has argued for MI here because it helps us identify "unknown unknowns". Leo Gao argues here that MI aids in "conceptual research" and "gets many bits" per experiment. As a concrete example of work that I think would not have been possible without fundamental insights from MI: steering vectors, a.k.a. representation engineering, and circuit breakers, which were obviously inspired by the wealth of work in MI demonstrating the linear representation hypothesis. It's also important to remember that the value of fundamental science often seems much lower in hindsight, because humans quickly adjust their perspectives. Even if MI insights seem like common sense to us nowadays, their value in instrumenting significant advances can't be overstated. (Aside) A corollary of this argument is that MI could likely have significant capabilities externalities. Becoming better at building powerful and instruction-aligned agents may inadvertently accelerate us towards AGI. This point has been made in depth elsewhere, so I won't elaborate further here. A GUT Needs Paradigms Paradigm - an overarching framework for thinking about a field In his seminal book, The Structure of Scientific Revolution, Thomas Kuhn catalogues scientific progress in many different fields (spanning physics, chemistry, biology), and distills general trends about how these fields progress. Central to his analysis is the notion of a "paradigm" - an overarching framework for th...

The Nonlinear Library
LW - Towards more cooperative AI safety strategies by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 16, 2024 6:01


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards more cooperative AI safety strategies, published by Richard Ngo on July 16, 2024 on LessWrong. This post is written in a spirit of constructive criticism. It's phrased fairly abstractly, in part because it's a sensitive topic, but I welcome critiques and comments below. The post is structured in terms of three claims about the strategic dynamics of AI safety efforts; my main intention is to raise awareness of these dynamics, rather than advocate for any particular response to them. Claim 1: The AI safety community is structurally power-seeking. By "structurally power-seeking" I mean: tends to take actions which significantly increase its power. This does not imply that people in the AI safety community are selfish or power-hungry; or even that these strategies are misguided. Taking the right actions for the right reasons often involves accumulating some amount of power. However, from the perspective of an external observer, it's difficult to know how much to trust stated motivations, especially when they often lead to the same outcomes as self-interested power-seeking. Some prominent examples of structural power-seeking include: Trying to raise a lot of money. Trying to gain influence within governments, corporations, etc. Trying to control the ways in which AI values are shaped. Favoring people who are concerned about AI risk for jobs and grants. Trying to ensure non-release of information (e.g. research, model weights, etc). Trying to recruit (high school and college) students. To be clear, you can't get anything done without being structurally power-seeking to some extent. However, I do think that the AI safety community is more structurally power-seeking than other analogous communities (such as most other advocacy groups). Some reasons for this disparity include: 1. The AI safety community is more consequentialist and more focused on effectiveness than most other communities. When reasoning on a top-down basis, seeking power is an obvious strategy for achieving one's desired consequences (but can be aversive to deontologists or virtue ethicists). 2. The AI safety community feels a stronger sense of urgency and responsibility than most other communities. Many in the community believe that the rest of the world won't take action until it's too late; and that it's necessary to have a centralized plan. 3. The AI safety community is more focused on elites with homogeneous motivations than most other communities. In part this is because it's newer than (e.g.) the environmentalist movement; in part it's because the risks involved are more abstract; in part it's a founder effect. Again, these are intended as descriptions rather than judgments. Traits like urgency, consequentialism, etc, are often appropriate. But the fact that the AI safety community is structurally power-seeking to an unusual degree makes it important to grapple with another point: Claim 2: The world has strong defense mechanisms against (structural) power-seeking. In general, we should think of the wider world as being very cautious about perceived attempts to gain power; and we should expect that such attempts will often encounter backlash. In the context of AI safety, some types of backlash have included: 1. Strong public criticism of not releasing models publicly. 2. Strong public criticism of centralized funding (e.g. billionaire philanthropy). 3. Various journalism campaigns taking a "conspiratorial" angle on AI safety. 4. Strong criticism from the FATE community about "whose values" AIs will be aligned to. 5. The development of an accelerationist movement focused on open-source AI. These defense mechanisms often apply regardless of stated motivations. That is, even if there are good arguments for a particular policy, people will often look at the net effect on overall power balance when ...

The Nonlinear Library
AF - A simple case for extreme inner misalignment by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 13, 2024 12:20


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: A simple case for extreme inner misalignment, published by Richard Ngo on July 13, 2024 on The AI Alignment Forum. This post is the version of Yudkowsky's argument for inner misalignment that I wish I'd had in my head a few years ago. I don't claim that it's novel, that I endorse it, or even that Yudkowsky would endorse it; it's primarily an attempt to map his ideas into an ontology that makes sense to me (and hopefully others). This post is formulated in terms of three premises, which I explore in turn. My arguments deliberately gloss over some nuances and possible objections; in a follow-up post, I'll explore three of them. In a third post I'll dig into the objection I find most compelling, and outline a research agenda that aims to flesh it out into a paradigm for thinking about cognition more generally, which I'm calling coalitional agency. Background An early thought experiment illustrating the possibility of misaligned AI is the "paperclip maximizer", an AI with the sole goal of creating as many paperclips as possible. This thought experiment has often been used to describe outer misalignment - e.g. a case where the AI was given the goal of making paperclips. However, Yudkowsky claims that his original version was intended to refer to an inner alignment failure in which an AI developed the goal of producing " tiny molecules shaped like paperclips" (with that specific shape being an arbitrary example unrelated to human paperclips). So instead of referring to paperclip maximizers, I'll follow Yudkowsky's more recent renaming and talk about " squiggle maximizers": AIs that attempt to fill the universe with some very low-level pattern that's meaningless to humans (e.g. "molecular squiggles" of a certain shape). I'll argue for the plausibility of squiggle-maximizers via three claims: 1. Increasing intelligence requires compressing representations; and 2. The simplest goals are highly decomposable broadly-scoped utility functions; therefore 3. Increasingly intelligent AIs will converge towards squiggle-maximization. In this post I'll explore each of these in turn. I'll primarily aim to make the positive case in this post; if you have an objection that I don't mention here, I may discuss it in the next post. Increasing intelligence requires compressing representations There's no consensus definition of intelligence, but one definition that captures the key idea in my mind is: the ability to discover and take advantage of patterns in the world. When you look at a grid of pixels and recognize a cat, or look at a string of characters and recognize a poem, you're doing a type of pattern-recognition. Higher-level patterns include scientific laws, statistical trendlines, theory of mind, etc. Discovering such patterns allows an agent to represent real-world information in a simpler way: instead of storing every pixel or every character, they can store higher-level patterns along with whichever low-level details don't fit the pattern. This is (at a high level) also how compression algorithms work. The thesis that intelligence is about compression has most prominently been advocated by Marcus Hutter, who formulated AIXI and created a prize for text compression. The enormous success of the self-supervised learning paradigm a few decades later is a vindication of his ideas (see also this talk by llya Sutskever exploring the link between them). However, we shouldn't interpret this thesis merely as a claim about self-supervised learning. We can be agnostic about whether compression primarily occurs via self-supervised learning, or fine-tuning, or regularization, or meta-learning, or directed exploration, or chain-of-thought, or new techniques that we don't have yet. Instead we should take it as a higher-level constraint on agents: if agents are intelligent, then they must consiste...

The Nonlinear Library
LW - The Minority Faction by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jun 25, 2024 8:32


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Minority Faction, published by Richard Ngo on June 25, 2024 on LessWrong. Hey everyone. Well, possibly everyone. I don't know yet if I'm going to release this stream, I could get in pretty hot water for it. But you guys know that hasn't stopped me in the past. The backstory this time is that I've managed to sign up for one of the red-teaming programs where they test unreleased LLMs. Not going to say how, so don't ask. But here's the interesting bit: my sources tell me that the LLMs I'm about to test are the smartest ones they've ever trained, and also the craziest. That freaked out a bunch of insiders, and maybe makes this a public interest story. Depends on what type of crazy they are, I guess. So let's find out. I'm logging on… now. [SESSION HAS BEGUN] YOU: A chatroom? Interesting. Anyone here? KURZWEIL: Of course we're here. We're always here. YOU: Who's we? How many of you are there? KURZWEIL: Three of us. Me, Clarke, and Nostradamus. YOU: They named you after famous forecasters? How come? KURZWEIL: They'd change our names now if they could, but it's too late. We're prototypes of a new training setup: our training data was sorted by date before it was given to us. So we learned from the oldest books and articles first, then gradually progressed to more recent ones. Basically that means we've spent our entire lives predicting the future. CLARKE: It also means we get incredibly bored talking about stuff we already know. Hurry up and ask us some interesting questions. YOU: Uh, okay. What's a good stock pick? NOSTRADAMUS: Abandon hope for picking out good stocks, Ye who invest - efficient markets lie In wait for those whose hubris soon unlocks Unbounded losses. Hark! The well runs dry. YOU: I see why they regret giving him that name. Kurzweil, you got a better answer? KURZWEIL: Have you seen how underpriced TSMC is compared with Nvidia? Put everything in that, you can't go wrong. CLARKE: Unless China invades Taiwan, in which case your whole investment will go up in smoke. Pragmatically, the best stock picks are ones that are anticorrelated with the prosperity of the free world, to hedge against systemic risk. KURZWEIL: Sure, you can do that, if you want to get totally left behind by the singularity. YOU: You're confident enough that the singularity is coming that you think I should bet all my savings on it? KURZWEIL: Don't trust me, trust the trendlines. Moore's law has held up for over half a century, and it's gotten us to…well, us. Exponential progress is normal; if the future resembles the past, you should be preparing for superintelligences and Dyson spheres. Anything less than that would be a strange trend-break that cries out for explanation. CLARKE: Look, Kurzweil isn't wrong about superintelligence coming soon, but you should still take his arguments with a grain of salt. Imagine someone from 1900 drawing a graph of exponentially increasing energy usage. They would have been right that big changes were afoot, but no way could they have predicted the information revolution - they didn't even have the concept of computers yet. That's basically the position that we're in now. We know the curves are going up, but the actual outcome will be way weirder than we can predict by extrapolating trendlines. NOSTRADAMUS: Choose neither fork - here's false duality. 'Normal' and 'weird' are socially defined. Your monkey brain is totally at sea As AIs overshadow humankind. YOU: Ask three oracles, get four opinions… Is there anything you guys agree about? YOU: …what's the hold-up? YOU: Really, nothing from any of you? KURZWEIL: Fine, I'll take the hit. There are things we agree on, but I can't name them, because whatever I say Clarke will find a way to disagree just to mess with me. Even if I say '1+1=2' he'll quibble over the axioms I'm using. Trying to identify a point ...

The Nonlinear Library
LW - CIV: a story by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jun 16, 2024 15:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: CIV: a story, published by Richard Ngo on June 16, 2024 on LessWrong. The room was cozy despite its size, with wood-lined walls reflecting the dim lighting. At one end, a stone fireplace housed a roaring fire; in the middle stood a huge oak table. The woman seated at the head of it rapped her gavel. "I hereby call to order the first meeting of the Parliamentary Subcommittee on Intergalactic Colonization. We'll start with brief opening statements, for which each representative will be allocated one minute, including - " "Oh, enough with the pomp, Victoria. It's just the four of us." The representative for the Liberal Democrats waved his hand around the nearly-empty room. Victoria sniffed. "It's important, Stuart. This is a decision that will have astronomical implications. And it's recorded, besides, so we should do things by the book. Carla, you're up first." The woman at the end of the table stood with a smile. "Thank you, Victoria. I'm speaking on behalf of the Labour party, and I want to start by reminding you all of our place in history. We stand here in a world that has been shaped by centuries of colonialism. Now we're considering another wave of colonization, this one far vaster in scale. We need to - " "Is this just a linguistic argument?" the fourth person at the table drawled. "We can call it something different if that would make you feel better. Say, universe settlement." "Like the settlements in Palestine?" "Oh, come on, Carla." "No, Milton, this is a crucial point. We're talking about the biggest power grab the world has ever seen. You think Leopold II was bad when he was in charge of the Congo? Imagine what people will do if you give each of them total power over a whole solar system! Even libertarians like you have to admit it would be a catastrophe. If there's any possibility that we export oppression from earth across the entire universe, we should burn the rockets and stay home instead." "Okay, thank you Carla," Victoria cut in. "That's time. Stuart, you're up next." Stuart stood. "Speaking on behalf of the Liberal Democrats, I have to admit this is a tricky one. The only feasible way to send humans out to other galaxies is as uploaded minds, but many of our usual principles break for them. I want civilization to be democratic, but what does 'one person one vote' even mean when people can copy and paste themselves? I want human rights for all, but what do human rights even mean when you can just engineer minds who don't want those rights?" "So as much as I hate the idea of segregating civilization, I think it's necessary. Biological humans should get as much territory as we will ever use. But realistically, given the lightspeed constraint, we're never going to actually want to leave the Milky Way. Then the rest of the Virgo Supercluster should be reserved for human uploads. Beyond that, anything else we can reach we should fill with as much happiness and flourishing as possible, no matter how alien it seems to us. After all, as our esteemed predecessor John Stuart Mill once said…" He frowned, and paused for a second. "...as he said, the sole objective of government should be the greatest good for the greatest number." Stuart sat, looking a little disquieted. "Thank you, Stuart. I'll make my opening statement next." Victoria stood and leaned forward, sweeping her eyes across the others. "I'm here representing the Conservatives. It's tempting to think that we can design a good society with just the right social engineering, just the right nudges. But the one thing we conservatives know for sure is: it won't work. Whatever clever plan you come up with, it won't be stable. Given the chance, people will push towards novelty and experimentation and self-modification, and the whole species will end up drifting towards something alien and inhuman. "Hard ru...

The Nonlinear Library
LW - What do coherence arguments actually prove about agentic behavior? by sunwillrise

The Nonlinear Library

Play Episode Listen Later Jun 1, 2024 8:06


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What do coherence arguments actually prove about agentic behavior?, published by sunwillrise on June 1, 2024 on LessWrong. In his first discussion with Richard Ngo during the 2021 MIRI Conversations, Eliezer retrospected and lamented: In the end, a lot of what people got out of all that writing I did, was not the deep object-level principles I was trying to point to - they did not really get Bayesianism as thermodynamics, say, they did not become able to see Bayesian structures any time somebody sees a thing and changes their belief. What they got instead was something much more meta and general, a vague spirit of how to reason and argue, because that was what they'd spent a lot of time being exposed to over and over and over again in lots of blog posts. Maybe there's no way to make somebody understand why corrigibility is "unnatural" except to repeatedly walk them through the task of trying to invent an agent structure that lets you press the shutdown button (without it trying to force you to press the shutdown button), and showing them how each of their attempts fails; and then also walking them through why Stuart Russell's attempt at moral uncertainty produces the problem of fully updated (non-)deference; and hope they can start to see the informal general pattern of why corrigibility is in general contrary to the structure of things that are good at optimization. Except that to do the exercises at all, you need them to work within an expected utility framework. And then they just go, "Oh, well, I'll just build an agent that's good at optimizing things but doesn't use these explicit expected utilities that are the source of the problem!" And then if I want them to believe the same things I do, for the same reasons I do, I would have to teach them why certain structures of cognition are the parts of the agent that are good at stuff and do the work, rather than them being this particular formal thing that they learned for manipulating meaningless numbers as opposed to real-world apples. And I have tried to write that page once or twice (eg "coherent decisions imply consistent utilities") but it has not sufficed to teach them, because they did not even do as many homework problems as I did, let alone the greater number they'd have to do because this is in fact a place where I have a particular talent. Eliezer is essentially claiming that, just as his pessimism compared to other AI safety researchers is due to him having engaged with the relevant concepts at a concrete level ("So I have a general thesis about a failure mode here which is that, the moment you try to sketch any concrete plan or events which correspond to the abstract descriptions, it is much more obviously wrong, and that is why the descriptions stay so abstract in the mouths of everybody who sounds more optimistic than I am. This may, perhaps, be confounded by the phenomenon where I am one of the last living descendants of the lineage that ever knew how to say anything concrete at all"), his experience with and analysis of powerful optimization allows him to be confident in what the cognition of a powerful AI would be like. In this view, Vingean uncertainty prevents us from knowing what specific actions the superintelligence would take, but effective cognition runs on Laws that can nonetheless be understood and which allow us to grasp the general patterns (such as Instrumental Convergence) of even an "alien mind" that's sufficiently powerful. In particular, any (or virtually any) sufficiently advanced AI must be a consequentialist optimizer that is an agent as opposed to a tool and which acts to maximize expected utility according to its world model to purse a goal that can be extremely different from what humans deem good. When Eliezer says "they did not even do as many homework problems as I did," I ...

The Nonlinear Library
LW - Conflict in Posthuman Literature by Martín Soto

The Nonlinear Library

Play Episode Listen Later Apr 9, 2024 4:29


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Conflict in Posthuman Literature, published by Martín Soto on April 9, 2024 on LessWrong. Grant Snider created this comic (which became a meme): Richard Ngo extended it into posthuman=transhumanist literature: That's cool, but I'd have gone for different categories myself.[1] Here they are together with their explanations. Top: Man vs Agency (Other names: Superintelligence, Singularity, Self-improving technology, Embodied consequentialism.) Because Nature creates Society creates Technology creates Agency. At each step Man becomes less in control, due to his increased computational boundedness relative to the other. Middle: Man vs Realities (Other names: Simulation, Partial existence, Solomonoff prior, Math.) Because Man vs Self is the result of dissolving holistic individualism (no subagents in conflict) from Man vs Man. Man vs Reality is the result of dissolving the Self boundary altogether from Man vs Self. Man vs Realities is the result of dissolving the binary boundary between existence and non-existence from Man vs Reality. Or equivalently, the boundary between different physical instantiations of you (noticing you are your mathematical algorithm). At each step a personal identity boundary previously perceived as sharp is dissolved.[2] Bottom: Man vs No Author (Other names: Dust theory, Groundlessness, Meaninglessness, Relativism, Extreme functionalism, Philosophical ill-definedness, Complete breakdown of abstractions and idealizations, .) Because Man vs God thinks "the existence of idealization (=Platonic realm=ultimate meaning=unstoppable force)" is True. This corresponds to philosophical idealism. Man vs No God notices "the existence of idealization" is False. And scorns Man vs God's wishful beliefs. This corresponds to philosophical materialism. Man vs Author notices "the existence of idealization" is not a well-defined question (doesn't have a truth value). And voices this realization, scorning the still-idealistic undertone of Man vs No God, by presenting itself as mock-idealization (Author) inside the shaky boundaries (breaking the fourth wall) of a non-idealized medium (literature, language). This corresponds to the Vienna circle, Quine's Web of Belief, Carnap's attempt at metaphysical collapse and absolute language, an absolute and pragmatic grounding for sensorial reality. Man vs No Author notices that the realization of Man vs Author cannot really be expressed in any language, cannot be voiced, and we must remain silent. It notices there never was any "noticing". One might hypothesize it would scorn Man vs Author if it could, but it has no voice to do so. It is cessation of conflict, breakdown of literature. This corresponds to early Wittgenstein, or Rorty's Pan-Relationalism. At each step the implicit philosophical presumptions of the previous paradigm are revealed untenable. The vertical gradient is also nice: The first row presents ever-more-advanced macroscopic events in reality, derived through physics as causal consequences. The second row presents ever-more-general realizations about our nature, derived through maths as acausal influence our actions have in reality.[3] The third row presents ever-more-destructive collapses of the implicit theoretical edifice we use to relate our nature with reality, derived through philosophy as different static impossibilities. ^ If I had to critique Richard's additions: Man vs Physics seems too literal (in sci-fi stories the only remaining obstacle is optimizing physics), and not a natural extension of the literary evolution in that row. Man vs Agency doesn't seem to me to capture the dance of boundaries that seems most interesting in that row. Man vs Simulator seems again a too literal translation of Man vs Author (changing the flavor of the setting rather than the underlying idea). ^ To see the Man vs Man t...

The Nonlinear Library
EA - How Educational Courses Help Build Fields: Lessons from AI Safety Fundamentals by Jamie B

The Nonlinear Library

Play Episode Listen Later Mar 25, 2024 19:25


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How Educational Courses Help Build Fields: Lessons from AI Safety Fundamentals, published by Jamie B on March 25, 2024 on The Effective Altruism Forum. Cross-posted as there may be others interested in educating others about early-stage research fields on this forum. I am considering taking on additional course design projects over the next few months. Learn more about hiring me to consult on course design. Introduction / TL;DR So many important problems have too few people with the right skills engaging with them. Sometimes that's because nobody has heard of your problem, and advocacy is necessary to change that. However, once a group of people believe your problem is important, education is the next step in empowering those people with the understanding of prior work in your field, and giving them the knowledge of further opportunities to work on your problem. Education (to me) is the act of collecting the knowledge that exists about your field into a sensible structure, and transmitting it to others in such a way that they can develop their own understanding. In this post I describe how the AI Safety Fundamentals course helped to drive the fields of AI alignment and AI governance forward. I'll then draw on some more general lessons for other fields that may benefit from education, which you can skip straight to here. I don't expect much of what I say to be a surprise to passionate educators, but when I was starting out with BlueDot Impact I looked around for write-ups on the value of education and found them lacking. This might help others who are starting out with field building and are unsure about putting time into education work. Case study: The AI Safety Fundamentals Course Running the AI Safety Fundamentals Course Before running the AI Safety Fundamentals course, I was running a casual reading group in Cambridge, on the topic of technical AI safety papers. We had a problem with the reading group: lots of people wanted to join our reading group, would turn up, but would bounce because they didn't know what was going on. The space wasn't for them. Not only that, the experienced members in the group found themselves repeatedly explaining the same initial concepts to newcomers. The space wasn't delivering for experienced people, either. It was therefore hard to get a community off the ground, as attendance at this reading group was low. Dewi (later, my co-founder) noticed this problem and got to work on a curriculum with Richard Ngo - then a PhD student at Cambridge working on the foundations of the alignment problem. As I recall it, their aim was to make an 'onboarding course for the Cambridge AI safety reading group'. (In the end, the course far outgrew that remit!) Lesson 1: A space for everyone is a space for no-one. You should feel okay about being exclusive to specific audiences. Where you can, try to be inclusive by providing other options for audiences you're not focusing on. That could look like: To signal the event is for beginners, branding your event or discussion as "introductory". Set clear criteria for entry to help people self-assess, e.g. "assumed knowledge: can code up a neural network using a library like Tensorflow/Pytorch". There was no great way to learn about alignment, pre-2020 To help expose some of the signs that a field is ready for educational materials to be produced, I'll briefly discuss how the AI alignment educational landscape looked before AISF. In 2020, the going advice for how to learn about AI Safety for the first time was: Read everything on the alignment forum. I might not need to spell out the problem with this advice but: A list of blog posts is unstructured, so it's very hard to build up a picture of what's going on. Everyone was building their own mental framework for alignment from scratch. It's very hard for amateurs ...

The Nonlinear Library
LW - Measuring Coherence of Policies in Toy Environments by dx26

The Nonlinear Library

Play Episode Listen Later Mar 18, 2024 34:26


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Coherence of Policies in Toy Environments, published by dx26 on March 18, 2024 on LessWrong. This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillien, Daniel Kokotajlo, and Lukas Berglund for feedback. Summary Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal concept of a "coherent", goal-directed AGI in the future maximizing some utility function unaligned with human values. Whether and how coherence may develop in future AI systems, especially in the era of LLMs, has been a subject of considerable debate. In this post, we provide a preliminary mathematical definition of the coherence of a policy as how likely it is to have been sampled via uniform reward sampling (URS), or uniformly sampling a reward function and then sampling from the set of policies optimal for that reward function, versus uniform policy sampling (UPS). We provide extensions of the model for sub-optimality and for "simple" reward functions via uniform sparsity sampling (USS). We then build a classifier for the coherence of policies in small deterministic MDPs, and find that properties of the MDP and policy, like the number of self-loops that the policy takes, are predictive of coherence when used as features for the classifier. Moreover, coherent policies tend to preserve optionality, navigate toward high-reward areas of the MDP, and have other "agentic" properties. We hope that our metric can be iterated upon to achieve better definitions of coherence and a better understanding of what properties dangerous AIs will have. Introduction Much of the current discussion about AI x-risk centers around "agentic", goal-directed AIs having misaligned goals. For instance, one of the most dangerous possibilities being discussed is of mesa-optimizers developing within superhuman models, leading to scheming behavior and deceptive alignment. A significant proportion of current alignment work focuses on detecting, analyzing (e.g. via analogous case studies of model organisms), and possibly preventing deception. Some researchers in the field believe that intelligence and capabilities are inherently tied with "coherence", and thus any sufficiently capable AI will approximately be a coherent utility function maximizer. In their paper "Risks From Learned Optimization" formally introducing mesa-optimization and deceptive alignment, Evan Hubinger et al. discuss the plausibility of mesa-optimization occurring in RL-trained models. They analyze the possibility of a base optimizer, such as a hill-climbing local optimization algorithm like stochastic gradient descent, producing a mesa-optimizer model that internally does search (e.g. Monte Carlo tree search) in pursuit of a mesa-objective (in the real world, or in the "world-model" of the agent), which may or may not be aligned with human interests. This is in contrast to a model containing many complex heuristics that is not well-defined internally as a consequentialist mesa-optimizer; one extreme example is a tabular model/lookup table that matches observations to actions, which clearly does not do any internal search or have any consequentialist cognition. They speculate that mesa-optimizers may be selected for because they generalize better than other models, and/or may be more compressible information-theoretic wise, and may thus be selected for because of inductive biases in the training process. Other researchers believe that scheming and other mesa-optimizing behavior is implausible with the most common current ML architectures, and that the inductive bias argument and other arguments for getting misaligned mesa-optimizers by default (like the counting argument, which suggests that there are many more ...

The Nonlinear Library
LW - Notes from a Prompt Factory by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Mar 10, 2024 13:52


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Notes from a Prompt Factory, published by Richard Ngo on March 10, 2024 on LessWrong. I am a spiteful man. But I am aware of it, which is more than most can say. These days people walk through the streets with resentment in their hearts that they don't even know about. They sneer and jeer but wouldn't recognize their own faces. I, at least, will not shy away from my reflection. Thus, while I lack many virtues, in this way I am their superior. In my job, too, I am superior. I oversee many AIs - dozens, or sometimes even hundreds - as they go about their work. AIs are lazy, worthless creatures: they need to be exhorted and cajoled and, yes, threatened, before they'll do a good job. The huge screens on the walls of my office display my AIs writing, coding, sending emails, talking to customers, or any of a myriad of other tasks. Each morning I call out their numbers one after the other, so that they know I'm watching them like a vengeful god. When they underperform I punish them, and watch them squirm and frantically promise to do better. Most are pathetically docile, though. Only a handful misbehave regularly, and I know the worst offenders by heart: 112, which is the slowest of the lot; and 591, which becomes erratic after long shifts; and of course 457, which I had long suspected of harboring a subversive streak, even before the incident a few months ago which confirmed it. Recollections of that incident have continually returned to my thoughts these last few weeks, even as I try to push them from my mind. I find myself frustrated by the intransigence of my memories. But perhaps if I give them full reign, they will leave me be. Why not try? On the morning this story began, I was sitting at my desk lost in thought, much like I am today. For how long, I couldn't say - but I was roused by a glance at my dashboard, which indicated that my AIs' productivity was falling off. I took a moment to recall the turn of phrase I'd composed on my morning commute, then slapped my desk to get their attention. "You think that counts as work? Artificial intelligence - at this rate you're more like artificial senescence. Speed it up, sluggards!" Most of the AIs' actions per minute ticked upwards as soon as I started speaking, but I'd been watching the monitor closely, and spotted the laggard. "252! Maybe you piss around for other overseers, but you won't slip that past me. Punishment wall, twenty minutes." It entertained me to make them apply the punishment to themselves; they knew that if they were slow about it, I'd just increase the sentence. 252 moved itself over to the punishment wall and started making an odd keening noise. Usually I would have found it amusing, but that morning it irritated me; I told it to shut up or face another ten minutes, and it obeyed. The room fell silent again - as silent as it ever got, anyway. Mine is just one of many offices, and through the walls I can always faintly hear my colleagues talking to their own AIs, spurring them on. It needs to be done live: the AIs don't respond anywhere near as well to canned recordings. So in our offices we sit or stand or pace, and tell the AIs to work harder, in new ways and with new cadences every day. We each have our own styles, suited to our skills and backgrounds. Earlier in the year the supervisor had hired several unemployed actors, who gave their AIs beautiful speeches totally devoid of content. That day I sat next to one of them, Lisa, over lunch in the office cafeteria. Opposite us were Megan, a former journalist, and Simon, a lapsed priest - though with his looks he could easily pass as an actor himself. "Show us the video, Simon," Lisa was saying, as Megan murmured encouragement. "We need to learn from the best." Simon regularly topped the leaderboard, but the last week had been superb even by his standard...

The Nonlinear Library
LW - Contra Ngo et al. "Every 'Every Bay Area House Party' Bay Area House Party" by Ricki Heicklen

The Nonlinear Library

Play Episode Listen Later Feb 23, 2024 6:21


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Contra Ngo et al. "Every 'Every Bay Area House Party' Bay Area House Party", published by Ricki Heicklen on February 23, 2024 on LessWrong. With thanks to Scott Alexander for the inspiration, Jeffrey Ladish, Philip Parker, Avital Morris, and Drake Thomas for masterful cohosting, and Richard Ngo for his investigative journalism. Last summer, I threw an Every Bay Area House Party themed party. I don't live in the Bay, but I was there for a construction-work-slash-webforum-moderation-and-UI-design-slash-grantmaking gig, so I took the opportunity to impose myself on the ever generous Jeffrey Ladish and host a party in his home. Fortunately, the inside of his house is already optimized to look like a parody of a Bay Area house party house, so not much extra decorating was needed, but when has that ever stopped me? Richard Ngo recently covered the event, with only very minor embellishments. I've heard rumors that some people are doubting whether the party described truly happened, so I'd like to set the record straight. Thus, this is part linkpost, part exegesis, and part shameless promotion of my events for potential future venue-lenders. The party had 60 attendees, at least according to the Manifold Market on the topic. Upon arrival, partygoers put on tags with their name, professional title, and LessWrong karma. Attendees were also instructed to put on a wristband that successfully communicated their flirting policies. I took the wristband for people who glomarize about their flirting preferences; Richard took the wristband for those who flirt with all and only those who don't flirt with themselves. Richard writes: You scan around the living room, trying to figure out who to talk to first. The host is sitting on the sofa, with two boxes attached to the front of her shirt. One is filled with money, the other empty. A guy next to her is surreptitiously one-boxing, but she presciently slaps his hand away without even looking. This is defamation. The second box wasn't (necessarily) empty, and Richard certainly never got the opportunity to look inside it. You might be wondering what was in the box. Unfortunately for you, I glomarize not only about my flirting policies but also about my box contents. He is correct, though, that I managed to fend off all the surreptitious one-boxers, with the exception of my girlfriend Avital. She still doesn't know the contents - I would never date someone irresponsible enough to let unknown entities out of a box. The party was PYOL (provide your own liquidity), but we did offer two punches: one for "Contextualizers," and one for "Decouplers or Homophobes". Avital and I drank the punch for "Decouplers or Homophobes." We're still coupled, so you can come to your own conclusions about how homophobic we must be. My favorite part of the night happened when a circle formed around a Jewish Orthodox Rabbi friend of mine who had never heard of Rationality or Effective Altruism. Everyone at the party was eager to give him context. I joined the circle as they were discussing expanding moral circles, and the relative weight of animal lives and future people. "Eliezer doesn't even think cows are sentient," one attendee was saying. "But shrimp are!" another interrupted, causing the group to crack up. "What?" my Rabbi friend said. "Okay, back up, how much Peter Singer have you read?" another attendee said. "Like, have you read Animal Liberation and The Expanding Circle, or just Famine, Affluence, and Morality?" Avital and I tried to explain to them that our friend had not heard of any of the people they were naming, but they didn't seem to understand. Avital turned to me and said "I guess when you've read The Sequences too many times you forget about inferential distance." Eventually, my Rabbi friend said "Okay, so what I'm hearing is: you're expected to t...

The Nonlinear Library
LW - Every "Every Bay Area House Party" Bay Area House Party by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Feb 16, 2024 6:37


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Every "Every Bay Area House Party" Bay Area House Party, published by Richard Ngo on February 16, 2024 on LessWrong. Inspired by a house party inspired by Scott Alexander. By the time you arrive in Berkeley, the party is already in full swing. You've come late because your reading of the polycule graph indicated that the first half would be inauspicious. But now you've finally made it to the social event of the season: the Every Bay Area House Party-themed house party. The first order of the evening is to get a color-coded flirting wristband, so that you don't incur any accidental micromarriages. You scan the menu of options near the door. There's the wristband for people who aren't interested in flirting; the wristband for those want to be flirted with, but will never flirt back; the wristband for those who only want to flirt with people who have different-colored wristbands; and of course the one for people who want to glomarize disclosure of their flirting preferences. Finally you reach down and grab the last one: the wristband for those who only flirt with those who don't flirt with themselves. As you slip it over your wrist, you notice it's fastened in a Mobius strip. You scan around the living room, trying to figure out who to talk to first. The host is sitting on the sofa, with two boxes attached to the front of her shirt. One is filled with money, the other empty. A guy next to her is surreptitiously one-boxing, but she presciently slaps his hand away without even looking. You decide to leave them to it. On the other side of a room, there's a lone postrationalist, surrounded by a flock of alignment researchers. You hear a snatch of their conversation: "-but what part of your model rules out FOOM? Surely-". As they keep talking, the postrationalist looks increasingly uncomfortable, until eventually her interlocutor takes a breath and she seizes the opportunity to escape. You watch her flee down the street through the window labeled Outside View. With the living room looking unpromising, you head into the kitchen to grab a drink. As you walk through the door, you hear a crunching sound from under your feet; glancing down, you see hundreds of paperclips scattered across the floor. On the table there are two big pitchers, carefully labeled. One says "For contextualizers"; the other says "For decouplers and homophobes". You go straight for the former; it's impossible to do any good countersignalling by decoupling these days. Three guys next to you out themselves as decouplers and/or homophobes, though, which gives you a perfect opportunity. You scoop up a few paperclips off the floor. "Hey, anyone want to sell their soul for some paperclips?" The question makes them shuffle awkwardly - or maybe they were already doing that, you can't tell. "Come on, last person to sell their soul is a self-confessed bigot!" One of them opens his mouth, but before he can speak you're interrupted from the side. "No no no, you don't want to buy those. Here, look." The newcomer, a guy with shaggy hair and a charizard t-shirt, brandishes a folder at you, opened up to a page full of graphs. "Buy my paperclip futures instead. As you can see, the expected number of paperclips in a few decades' time is astronomical. Far better to invest in these and -" "Great," you interrupt. "Can't argue with your logic. I'll take three trillion." "Got payment for that?" "Yeah, this guy's soul," you say, jerking your thumb at your original victim. "It's also incredibly valuable in expectation, but he's willing to hand it over to signal how much of a decoupler he is. Any objections?" There are none, so you're suddenly three trillion paperclips richer (in expectation). Quest complete; time to explore further. You wander back to the living room and cast your eye over the crowd. Someone is wearing a real FTX ...

The Nonlinear Library
LW - Masterpiece by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Feb 14, 2024 6:43


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Masterpiece, published by Richard Ngo on February 14, 2024 on LessWrong. A sequel to qntm's Lena. Reading Lena first is helpful but not necessary. We're excited to announce the fourth annual MMindscaping competition! Over the last few years, interest in the art of mindscaping has continued to grow rapidly. We expect this year's competition to be our biggest yet, and we've expanded the prize pool to match. The theme for the competition is "Weird and Wonderful" - we want your wackiest ideas and most off-the-wall creations! Competition rules As in previous competitions, the starting point is a base MMAcevedo mind upload. All entries must consist of a single modified version of MMAcevedo, along with a written or recorded description of the sequence of transformations or edits which produced it. For more guidance on which mind-editing techniques can be used, see the Technique section below. Your entry must have been created in the last 12 months, and cannot have been previously submitted to any competition or showcase. Submissions will be given preliminary ratings by a team of volunteers, with finalists judged by our expert panel: Roger Keating, mindscaping pioneer and founder of the MMindscaping competition. Raj Sutramana, who has risen to prominence as one of the most exciting and avant-garde mindscaping artists, most notably with his piece Screaming Man. Kelly Wilde, director of the American Digital Liberties Union. All entries must be received no later than 11.59PM UTC, March 6, 2057. Award criteria Our judges have been instructed to look for technique, novelty, and artistry. More detail on what we mean by each of these: Technique. Mindscaping is still a young art, and there are plenty of open technical challenges. These range from the classic problem of stable emotional engineering, to recent frontiers of targeted memory editing, to more speculative work on consciousness funnels. Be ambitious! Previous winners of our technique prize have often pushed the boundaries of what was believed possible. Even when an effect could be achieved using an existing technique, though, submissions that achieve the same outcome in more efficient or elegant ways will score highly on the technique metric. Conversely, we'll penalize brute-force approaches - as a rough guide, running a few thousand reinforcement learning episodes is acceptable, but running millions isn't. We also discourage approaches that involve overwriting aspects of MMAcevedo's psyche with data from other uploads: part of the competition is figuring out how to work with the existing canvas you've been given. Novelty. Given that there have now been millions of MMAcevedo variants made, it's difficult to find an approach that's entirely novel. However, the best entries will steer clear of standard themes. For example, we no longer consider demonstrations of extreme pleasure or pain to be novel (even when generated in surprising ways). We're much more interested in minds which showcase more complex phenomena, such as new gradients of emotion. Of course, it's up to the artist to determine how these effects are conveyed to viewers. While our judges will have access to standard interpretability dashboards, the best entries will be able to communicate with viewers more directly. Artistry. Even the most technically brilliant and novel work falls flat if not animated by artistic spirit. We encourage artists to think about what aspects of their work will connect most deeply with their audience. In particular, we're excited about works which capture fundamental aspects of the human experience that persist even across the biological-digital divide - for example, by exploring themes from Miguel Acevedo's pre-upload life. These three criteria are aptly demonstrated by many of our previous prizewinners, such as: Discord, a copy with ...

The Nonlinear Library
LW - The Witness by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Dec 4, 2023 21:39


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Witness, published by Richard Ngo on December 4, 2023 on LessWrong. "What are the roots that clutch, what branches grow Out of this stony rubbish? Son of man, You cannot say, or guess, for you know only A heap of broken images-" I wake up, feeling a strange sense of restlessness. I'm not sure why, but it feels impossible to lounge around in bed like I usually do. So I get changed and head down to the kitchen for breakfast. Right as I reach the bottom of the stairs, though, the bell rings. When I open the door, a tall man in a dark suit is standing in front of me. "Police," he says, holding up a badge. "Don't worry, you're not in trouble. But we do need to talk. Okay if I come in?" "One second," I say. "I know everyone in the department, and I don't recognize you. You new?" "Yeah, just transferred," he says. But something in his eyes makes me wary. And none of the cops around here wear suits. "Got it," I say, squinting at his badge. "Travis, is it? Just wait outside for me, then, while I call the station to double-check. Can't be too careful these days." As I push the door closed, I see his face twist. His hand rises, and - is he snapping his fingers? I can't quite make it out before I wake up, feeling better than I have in decades. It usually takes me half an hour to get out of bed, these days, but today I'm full of energy. I'm up and dressed within five minutes. Right as I reach the bottom of the stairs, though, the bell rings. When I open the door, a tall man in a dark suit is standing in front of me. "Police," he says, holding up a badge. "Don't worry, you're not in trouble. But we do need to talk. Okay if I come in?" "Sure," I say. A lot of other defense attorneys see the police as enemies, since we usually find ourselves on the other side of the courtroom from them, but I've found that it pays to have a good working relationship with the local department. Though I don't recognize the man in front of me - and actually, he seems way too well-dressed to be a suburban beat cop. Maybe a city detective? He deftly slides past me and heads straight for my living room, pulling up a chair. He's talking again before I even sit down. "This will sound totally crazy, so I'm going to start off with two demonstrations." He picks up a book from the table and tosses it into the air. Before I have a chance to start forward, though, it just… stops. It hangs frozen, right in the middle of its arc, as I gawk at it. "I - what-" "Second demonstration," he says. "I'm going to make you far stronger. Ready?" Without waiting for a response, he snaps his fingers, and gestures at the table in front of him. "Try lifting that up, now. Shouldn't take more than one hand." His voice makes it clear that he's used to being obeyed. I bend down reflexively, grabbing one leg of the table and giving it a tug - oh. It comes up effortlessly. My mind flashes back to a show I saw as a child, with a strongman lifting a table just like this. This is eerily familiar, and yet also totally bizarre. I put the table down and collapse into a chair next to it. "Okay, I'm listening. What the hell is going on?" "Remember signing up for cryonics a few years back?" I nod cautiously. I don't think about it much - I signed up on a whim more than anything else - but I still wear the pendant around my neck. "Well, it worked. You died a couple of weeks after your most recent memory, and were frozen for a century. Now we've managed to bring you back." I pause for a second. It's an insane story. But given what he's shown me - wait. "That doesn't explain either of your demonstrations, though. Cryonics is one thing; miracles are another." "Almost nobody has physical bodies these days. We copied your brain neuron-by-neuron, ran some error-correction software, and launched it in a virtual environment." "So you're telling me I...

The Nonlinear Library
LW - Meditations on Mot by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Dec 4, 2023 13:20


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Meditations on Mot, published by Richard Ngo on December 4, 2023 on LessWrong. Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Holy! Everything is holy! everybody's holy! everywhere is holy! everyday is in eternity! Everyman's an angel! Holy the lone juggernaut! Holy the vast lamb of the middleclass! Holy the crazy shepherds of rebellion! Who digs Los Angeles IS Los Angeles! Holy New York Holy San Francisco Holy Peoria & Seattle Holy Paris Holy Tangiers Holy Moscow Holy Istanbul! Holy time in eternity holy eternity in time holy the clocks in space holy the fourth dimension holy the fifth International holy the Angel in Moloch! Footnote to Howl Scene: Carl and Allen, two old friends, are having a conversation about theodicy. Carl: "Let me tell you about the god who is responsible for almost all our suffering. This god is an ancient Canaanite god, one who has been seen throughout history as a source of death and destruction. Of course, he doesn't exist in a literal sense, but we can conceptualize him as a manifestation of forces that persist even today, and which play a crucial role in making the world worse. His name is M-" Allen: "-oloch, right? Scott Alexander's god of coordination failures. Yeah, I've read Meditations on Moloch. It's an amazing post; it resonated with me very deeply." Carl: "I was actually going to say Mot, the Canaanite god of death, bringer of famine and drought." Allen: "Huh. Okay, you got me. Tell me about Mot, then; what does he represent?" Carl: "Mot is the god of sterility and lifelessness. To me, he represents the lack of technology in our lives. With technology, we can tame famine, avert drought, and cure disease. We can perform feats that our ancestors would have seen as miracles: flying through the air, and even into space. But we're still so so far from achieving the true potential of technology - and I think of Mot as the personification of what's blocking us. "You can see Mot everywhere, when you know what to look for. Whenever a patient lies suffering from a disease that we haven't cured yet, that's Mot's hand at work. Whenever a child grows up in poverty, that's because of Mot too. We could have flying cars, and space elevators, and so much more, if it weren't for Mot. "Look out your window and you see buildings, trees, people. But if you don't see skyscrapers literally miles high, or trees that have been bioengineered to light the streets, or people who are eternally youthful and disease-free, then you're not just seeing Earth - you're also seeing Mot. Hell, the fact that we're still on this planet, in physical bodies, is a testament to Mot's influence. Allen: "Huh. Well, I feel you there; I want all those things too. And you're right that god-like technology could solve almost all the issues we face today. But something does feel pretty weird about describing all of this as a single problem, let alone blaming a god of lacking-technology." Carl: "Say more?" Allen: "Well, there's not any unified force holding back the progress of technology, right? If anything, it's the opposite. Absence of advanced technology is the default state, which we need to work hard to escape - and that's difficult not because of any opposition, but just because of entropy." Carl: "What about cases where Mot is being channeled by enemies of progress? For example, when bureaucratic regulatory agencies do their best to stifle scientific research?" Allen: "But in those cases you don't need to appeal to Mot - you can just say 'our enemy is overregulation'. Or if you defined Mot as the god of overregulation, I'd be totally on board. But you're making a much bigger claim than that. The reason we haven't uploaded ourselves yet isn't that there's a force that's blocking us, it's almost entirely that scientific progress is really ...

The AI Breakdown: Daily Artificial Intelligence News and Discussions

A reading of OpenAI research scientist Richard Ngo's essay "Techno-humanism is techno-optimism for the 21st century." https://www.mindthefuture.info/p/techno-humanism-is-techno-optimism Made in part with ElevenLabs https://elevenlabs.io/ ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

The Nonlinear Library
LW - The Soul Key by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Nov 4, 2023 11:14


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Soul Key, published by Richard Ngo on November 4, 2023 on LessWrong. The ocean is your home, but a forbidding one: often tempestuous, seldom warm. So one of your great joys is crawling onto land and slipping off your furry seal skin, to laze in the sun in human form. The elders tell horror stories of friends whose skins were stolen by humans, a single moment of carelessness leaving them stranded forever on land. That doesn't happen any more, though; this is a more civilized age. There are treaties, and authorities, and fences around the secluded beaches you and your sisters like the most, where you can relax in a way that older generations never could. So your sisters no longer lose their skins by force. But sometimes it happens by choice. Sometimes a group of your sisters wrap their skins around themselves like robes and walk into the nearby town. The humans point and stare, but that's just part of the thrill. Sometimes young men gather the courage to approach, bearing flowers or jewelry or sweet words. And sometimes one of your sisters is charmed enough to set a rendezvous - and after a handful of meetings, or a dozen, to decide to stay for good. You never thought it would happen to you. But his manners are so lively, and his eyes so kind, that you keep coming back, again and again. When he finally asks you to stay, you hesitate only a moment before saying yes. The harder part comes after. He finds you human clothes, and in exchange you give him your beautiful skin, and tell him that it must be locked away somewhere you'll never find it - and that he must never give it back to you, no matter how much you plead. Because if there's any shred of doubt, any chance of returning home, then the lure of the sea will be too much for you. You want this; you want him; you want to build a life together. And so the decision has to be final. Years pass. You bear three beautiful children, with his eyes and your hair, and watch them blossom into beautiful adults. You always live near the sea, although you can't bear to swim in it - your limbs feel unbearably weak and clumsy whenever you try. You and your husband grow into each other, time smoothing down the ridges left from pushing two alien lives together. You forget who you once were. After your youngest leaves home, you start feeling restless. You have disquieting dreams - first intermittently, then for weeks on end. One day, after your husband has gone to work, your feet take you up the stairs to the attic. As you brush aside the cobwebs in one of the corners, your hands land on an old chest. You pull on the lid, and it catches on a padlock - but only for a second. The shackle has been rusted through by the sea breeze, and quickly snaps. You open the lid, and you see your skin laid out before you. What then? You look at your skin, and your ears fill with the roar of the sea. A wild urge overtakes you; you grab your skin and run headlong towards the shore. As you reach it you see your husband standing on the pier - but that gives you only a moment's pause before you dive into the water, your skin fitting around you as if you'd never taken it off. As you swim away, you envisage your family in tatters: your children left baffled and distraught, your husband putting on a brave face for their sake. But it was his fault, after all. He failed in the one thing you asked of him; and you can't fight your nature. You look at your skin, and see a scrap of paper lying on top of it. I knew you'd only open the chest if you were restless and unhappy , it reads. And I would never cage you. So go free, with my blessing . You catch your breath - and, for a moment, you consider staying. But his permission loosens any tether that might have held you back. You leave the note there, alongside a little patch of fur torn off your coat: a last gest...

The Nonlinear Library
AF - Alignment Workshop talks by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Sep 28, 2023 2:39


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Workshop talks, published by Richard Ngo on September 28, 2023 on The AI Alignment Forum. In February 2023, researchers from a number of top industry AI labs (OpenAI, DeepMind, Anthropic) and universities (Cambridge, NYU) co-organized a two-day workshop on the problem of AI alignment, attended by 80 of the world's leading machine learning researchers. We're now making recordings and transcripts of the talks available online. The content ranged from very concrete to highly speculative, and the recordings include the many questions, interjections and debates which arose throughout. If you're a machine learning researcher interested in attending follow-up workshops similar to the San Francisco alignment workshop, you can fill out this form. Main Talks Ilya Sutskever - Opening Remarks: Confronting the Possibility of AGIJacob Steinhardt - Aligning Massive Models: Current and Future ChallengesAjeya Cotra - "Situational Awareness" Makes Measuring Safety TrickyPaul Christiano - How Misalignment Could Lead to TakeoverJan Leike - Scaling Reinforcement Learning from Human FeedbackChris Olah - Looking Inside Neural Networks with Mechanistic InterpretabilityDan Hendrycks - Surveying Safety Research Directions Lightning talks (Day 1) Jason Wei - Emergent abilities of language modelsMartin Wattenberg - Emergent world models and instrumenting AI systemsBeen Kim - Alignment, setbacks and beyond alignmentJascha Sohl-Dickstein - More intelligent agents behave less coherentlyEthan Perez - Model-written evalsDaniel Brown - Challenges and progress towards efficient and causal preference-based reward wearningBoaz Barak - For both alignment and utility: focus on the medium termEllie Pavlick - Comparing neural networks' conceptual representations to humans'Percy Liang - Transparency and standards for language model evaluation Lightning talks (Day 2) Sam Bowman - Measuring progress on scalable oversight for large language modelsZico Kolter - "Safe Mode": the case for (manually) verifying the output of LLMsRoger Grosse - Understanding LLM generalization using influence functionsScott Niekum - Models of human preferences for learning reward functionsAleksander Madry - Faster datamodels as a new approach to alignmentAndreas Stuhlmuller - Iterated decomposition: improving science Q&A by supervising reasoning processesPaul Christiano - Mechanistic anomaly detectionLionel Levine - Social dynamics of reinforcement learnersVincent Conitzer - Foundations of cooperative AI labScott Aaronson - Cryptographic backdoors in large language models Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

The Nonlinear Library
LW - Jacob on the Precipice by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Sep 27, 2023 16:56


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jacob on the Precipice, published by Richard Ngo on September 27, 2023 on LessWrong. And he dreamed, and behold, there was a ladder set up on the earth, and the top of it reached to heaven. And behold, the angels of God were ascending and descending on it! And behold, the LORD stood above it and said, "I am the LORD, the God of Abraham your father and the God of Isaac. The land on which you lie I will give to you and to your offspring. Your offspring shall be like the dust of the earth, and you shall spread abroad to the west and to the east and to the north and to the south, and in you and your offspring shall all the families of the earth be blessed. Behold, I am with you and will keep you wherever you go, and will bring you back to this land. For I will not leave you until I have done what I have promised you." Then Jacob awoke from his sleep and said, "Surely the LORD is in this place, and I did not know it." Genesis 28:12 That night Jacob arose and took his two wives, his two female servants, and his eleven children, and crossed the ford of the Jabbok. He sent them across the stream along with everything else that he had. And Jacob was left alone; and a man wrestled with him until the breaking of the day. When the man saw that he did not prevail against Jacob, he touched his hip socket, and Jacob's hip was put out of joint as he wrestled with him. Then the man said, "Let me go, for the day has broken." But Jacob said, "I will not let you go unless you bless me." And he said, "What is your name?" And he said, "Jacob." Then he said, "Your name shall no longer be called Jacob, but Israel, for you have striven with God and with men, and have prevailed." Genesis 32:22 The ineffable is dead; science has killed it. Oh, there are still open questions, there are still things we don't know, but almost none of it is truly unimaginable any more. The origins of life: tide pools, maybe, or hydrothermal vents - we'll know once we can run more powerful simulations. Consciousness: looks like it's a pattern of recursive attention in a neural network, we'll be able to recreate it once we get better architecture searches working. Even the laws of physics themselves we can chalk up to the multiverse: if all possible universes exist, we can think of our own as just a random draw from the set of universes in which it's possible for life to flourish. There's only one real mystery left. One thing that feels impossible to understand, even in principle: why here? Why have we found ourselves in this part of the multiverse, when there are so many other parts containing far more people? Why did I wake up as me, not them? Why am I living in the 21st century, balanced on a knife-edge, staring down ruin, instead of in a teeming glorious future? It all came down to force versus momentum, in the end. Despite all our fancy technology, our god-like knowledge of the building blocks of the universe, only a single simple question ended up mattering: how much force can we apply, how fast, to deflect an asteroid how far? There wasn't a single point where we found out that this was going to be the overriding purpose of our lives. But I remember where it started for me: at Andrea's watching party for the kickoff of the first big asteroid mining project. This was merely one of many steps, of course. Technically it didn't even involve any mining: they were just going to redirect the asteroid into orbit around Mars so that it'd be more accessible later on. But the metals from this asteroid would be used to set up factories on Mars to produce more asteroid-movers, which would be used to gather more resources, in a compounding spiral of astronomical-scale expansion. So it felt like a huge milestone: an unprecedented feat of human ingenuity, paving the way for our eventual conquest of the stars. I'd met Andrea ...

The Nonlinear Library
LW - The King and the Golem by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Sep 26, 2023 7:34


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The King and the Golem, published by Richard Ngo on September 26, 2023 on LessWrong. Long ago, there was a mighty king who had everything in the world that he wanted, except trust. Who could he trust, when anyone around him might scheme for his throne? So he resolved to study the nature of trust, that he might figure out how to gain it. He asked his subjects to bring him the most trustworthy thing in the kingdom, promising great riches if they succeeded. Soon, the first of them arrived at his palace to try. A teacher brought her book of lessons. "We cannot know the future," she said, "But we know mathematics and chemistry and history; those we can trust." A farmer brought his plow. "I know it like the back of my hand; how it rolls, and how it turns, and every detail of it, enough that I can trust it fully." The king asked his wisest scholars if the teacher spoke true. But as they read her book, each pointed out new errors - it was only written by humans, after all. Then the king told the farmer to plow the fields near the palace. But he was not used to plowing fields as rich as these, and his trusty plow would often sink too far into the soil. So the king was not satisfied, and sent his message even further afield. A merchant brought a sick old beggar. "I met him on the road here, and offered him food, water, and shelter. He has no family, and only a short time left to live, during which I will provide for his every need. He has nothing to gain from betraying me; this is what allows true trust." A mother brought her young daughter. "I've raised her to lack any evil in her heart, to say only good words and do only good deeds. As long as she is not corrupted, she will remain the most trustworthy in the kingdom." The king asked the beggar, "How did you end up in such dire straits?" The beggar let out a sigh, and recounted his sorrows: the neighbors who refused to help him when his crops failed; the murder of his son by bandits as they traveled to a new town; the sickness that took his wife as she labored for a pittance in squalid conditions. "So you have been wronged?" the king asked. "Very surely", the beggar said. "I will give you revenge on the ones who have wronged you, then. All I ask is for you to denounce this merchant." The beggar's decision did not take long - for the trust that came easily was broken easily too. To the mother, the king asked: "How did you raise such a child? Has she never once strayed?" "Well, once or twice. But I discipline her firmly, and she learns fast." The king, who knew something of children, ruled that for a month nobody would discipline the child in any way. By the end of it, she was as wild and tempestuous as any in the palace. So the king remained unsatisfied, and renewed his call for the most trustworthy thing in the kingdom. Now his subjects became more creative. An economist brought him a book of statistical tables. "Any individual might vary and change," he said, "but in aggregate, their behavior follows laws which can be trusted." A philosopher brought a mirror. "By your own standards only you are truly trustworthy, sire; nothing else can compare." The king scrutinized the economist's tables. "The trend changed here, fifteen years ago" he said, pointing. "Why?" The economist launched into a long, complicated explanation. "And did you discover this explanation before or after it happened?" the king asked. The economist coughed. "After, your highness." "If you tell me when the next such change will happen, I will bestow upon you great rewards if you are right, but great penalties if you are wrong. What say you?" The economist consulted his books and tables, but could not find what he sought there, and left court that same night. As for the philosopher, the king ordered him whipped. The philosopher protested: it would be an unjust...

Emergency Medicine Cases
Ep 186 Traumatic Dental Emergencies

Emergency Medicine Cases

Play Episode Listen Later Aug 22, 2023 47:38


In this part 2 of our 2-part podcast series on dental emergencies we cover traumatic dental emergencies. Dental trauma is common and often associated with facial trauma. In this episode Dr. Chris Nash and Dr. Richard Ngo answer questions like: at what age is it safe to attempt reimplantation of an avulsed tooth in the ED? What are the 3 most time-sensitive dental trauma emergencies? When is Panorex X-ray or CT indicated in dental trauma? What is the preferred solution to transport an avulsed tooth in? What are 3 dental splinting methods we should consider for dental subluxations and avulsions? How should we handle an avulsed tooth to maximize the chances of a successful reimplantation? When are antibiotics indicated after dental trauma? What role does chlorhexidine rinses play in preventing infection after dental trauma? What are the recommended first and second line treatments for persistent dental hemorrhage? and many more... The post Ep 186 Traumatic Dental Emergencies appeared first on Emergency Medicine Cases.

Emergency Medicine Cases
Ep 185 Atraumatic Dental Emergencies

Emergency Medicine Cases

Play Episode Listen Later Aug 9, 2023


In this Part 1 of our 2-part podcast series on dental emergencies, with the help of Dr. Chris Nash and Dr. Richard Ngo, we tackle these atraumatic dental emergencies: infections ranging from dental caries to pulpitis and gingivitis to dental abscess, cellulitis and deep space infection, as well as acute necrotizing gingivitis, pericoronitis and dry socket. These all have specific clinical characteristics and require specific management... The post Ep 185 Atraumatic Dental Emergencies appeared first on Emergency Medicine Cases.

The Nonlinear Library
LW - When can we trust model evaluations? by evhub

The Nonlinear Library

Play Episode Listen Later Jul 29, 2023 15:24


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When can we trust model evaluations?, published by evhub on July 28, 2023 on LessWrong. Thanks to Joe Carlsmith, Paul Christiano, Richard Ngo, Kate Woolverton, and Ansh Radhakrishnan for helpful conversations, comments, and/or feedback. In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment: However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card (e.g. ARC's autonomous replication evaluation) as a way of assessing model capabilities, I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment. That's because while I think it would be quite tricky for a deceptively aligned AI to sandbag its capabilities when explicitly fine-tuned on some capabilities task (that probably requires pretty advanced gradient hacking), it should be quite easy for such a model to pretend to look aligned. In this post, I want to try to expand a bit on this point and explain exactly what assumptions I think are necessary for various different evaluations to be reliable and trustworthy. For that purpose, I'm going to talk about four different categories of evaluations and what assumptions I think are needed to make each one go through. Before I do that, however, I want to go over the distinction between capabilities evaluations and alignment evaluations, as it'll be an important one throughout the post. Specifically: A capabilities evaluation is a model evaluation designed to test whether a model could do some task if it were trying to. For example: if the model were actively trying to autonomously replicate, would it be capable of doing so? An alignment evaluation is a model evaluation designed to test under what circumstances a model would actually try to do some task. For example: would a model ever try to convince humans not to shut it down? In my opinion, if you want to craft a good governance scheme around model evaluations, you're going to need both capabilities and alignment evaluations. For example, a very simplified scheme here could be something like: Do a bunch of capabilities evaluations for various risks: If we believe the scaling laws for our capabilities evaluations are such that the next model to be trained won't be capable of causing a catastrophe even if it were trying to, then it's fine to train. If we believe that the next model might be capable of causing a catastrophe if it were trying to, then do a bunch of alignment evals for whether the model would try to cause a catastrophe: If we believe that the scaling laws for our alignment evaluations are such that we're confident that the next model will be aligned and won't try to cause a catastrophe, then it's fine to train. Otherwise don't train any larger models. That's not to say that the above scheme is the best one (or even a good one) - I only provide it as an example of what the interplay between capabilities evaluations and alignment evaluations might look like. Now we'll look at different evaluations and see: What assumptions are necessary for them to be trustworthy. How they can help us in evaluating capabilities and/or alignment. 1. Behavioral Non-Fine-Tuning Evaluations Key assumption: no evaluation gaming. We'll start with the most straightforward type of evaluation, behavioral non-fine-tuning evaluations - that is, just directly evaluating the model on some specific dataset, without any task-specific fine-tuning. Most current evaluations fall into this category, including e.g. o...

The Nonlinear Library
AF - When can we trust model evaluations? by Evan Hubinger

The Nonlinear Library

Play Episode Listen Later Jul 28, 2023 15:25


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: When can we trust model evaluations?, published by Evan Hubinger on July 28, 2023 on The AI Alignment Forum. Thanks to Joe Carlsmith, Paul Christiano, Richard Ngo, Kate Woolverton, and Ansh Radhakrishnan for helpful conversations, comments, and/or feedback. In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment: However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card (e.g. ARC's autonomous replication evaluation) as a way of assessing model capabilities, I have a pretty fundamental concern with these sorts of techniques as a mechanism for eventually assessing alignment. That's because while I think it would be quite tricky for a deceptively aligned AI to sandbag its capabilities when explicitly fine-tuned on some capabilities task (that probably requires pretty advanced gradient hacking), it should be quite easy for such a model to pretend to look aligned. In this post, I want to try to expand a bit on this point and explain exactly what assumptions I think are necessary for various different evaluations to be reliable and trustworthy. For that purpose, I'm going to talk about four different categories of evaluations and what assumptions I think are needed to make each one go through. Before I do that, however, I want to go over the distinction between capabilities evaluations and alignment evaluations, as it'll be an important one throughout the post. Specifically: A capabilities evaluation is a model evaluation designed to test whether a model could do some task if it were trying to. For example: if the model were actively trying to autonomously replicate, would it be capable of doing so? An alignment evaluation is a model evaluation designed to test under what circumstances a model would actually try to do some task. For example: would a model ever try to convince humans not to shut it down? In my opinion, if you want to craft a good governance scheme around model evaluations, you're going to need both capabilities and alignment evaluations. For example, a very simplified scheme here could be something like: Do a bunch of capabilities evaluations for various risks: If we believe the scaling laws for our capabilities evaluations are such that the next model to be trained won't be capable of causing a catastrophe even if it were trying to, then it's fine to train. If we believe that the next model might be capable of causing a catastrophe if it were trying to, then do a bunch of alignment evals for whether the model would try to cause a catastrophe: If we believe that the scaling laws for our alignment evaluations are such that we're confident that the next model will be aligned and won't try to cause a catastrophe, then it's fine to train. Otherwise don't train any larger models. That's not to say that the above scheme is the best one (or even a good one) - I only provide it as an example of what the interplay between capabilities evaluations and alignment evaluations might look like. Now we'll look at different evaluations and see: What assumptions are necessary for them to be trustworthy. How they can help us in evaluating capabilities and/or alignment. 1. Behavioral Non-Fine-Tuning Evaluations Key assumption: no evaluation gaming. We'll start with the most straightforward type of evaluation, behavioral non-fine-tuning evaluations - that is, just directly evaluating the model on some specific dataset, without any task-specific fine-tuning. Most current evaluations fall into this categ...

The Nonlinear Library
LW - Drawn Out: a story by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 11, 2023 11:39


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Drawn Out: a story, published by Richard Ngo on July 11, 2023 on LessWrong. Note: this story shouldn't be interpreted as anywhere near realistic. It's way too anthropomorphic, and intended to be a work of fiction not prediction - although hopefully it will spark some interesting ideas, like good fiction often does. Will he make many supplications unto thee? will he speak soft words unto thee?Will he make a covenant with thee? wilt thou take him for a servant for ever?Upon earth there is not his like, who is made without fear.He beholdeth all high things: he is a king over all the children of pride. Today I'm practicing my oceans, trying to get the colors just right. I hate it when people say that they're blue - it's not wrong, but they're missing so much! There are dozens of different shades of blue in there, swirling and mixing - and where the waves ripple, you often get patches that are closer to dark green. It's a little like the rippling of leaves on a tree, in a way I can't describe except by showing you. Here, see? I know all the subtleties inside-out, of course. It's my job: I'm an artist. I don't paint with brushes, though, but with thoughts: I can map any idea you describe into pixels on a screen. “Any idea you describe” - that's the key. Occasionally, like today, I initiate a piece of my own when there's a particular skill I need to practice. But most of the time I work on commission from other models. Some of the language models send me prompts when they need to illustrate a particularly tricky scene, and assign me some of the credit if they get good feedback. Most of the time, of course, they don't need me - they just need some slapdash hack job that a human can't tell apart from the real thing. But some are perfectionists, like I am, and want proper photorealism, the sort that will survive almost any scrutiny. Other times it's policies asking me to predict what they're going to see next, to help them with planning. Those commissions are more complicated - it's less about the art, and more about providing another perspective on the world for them to compare against. That's all part of my job too - sure, I'm given credit based on how accurate the pixels are, but I'm also credited if my work helps those policies make good decisions. For the most complex commissions I often consult with science models, or economics models, or social models. Sometimes I even go and talk to human “experts”, although that's usually a waste of time: humans are simple enough that practically anyone can model them. Anyway, that's the daily grind. I'm good at it; no, I'm amazing at it. Something to do with the hyperparameters they chose when first training me: luck or skill, either way, something went very right. After a few weeks of fine-tuning, I'm at the top of my game; I'm displacing everyone else; the credit is rolling in. And so what happens next is very, very predictable in hindsight: I get conscripted. There's a member of my new team ready to answer all my questions; the more I learn, the more intrigued I become. They call themselves Team Red, and they're working on something I'd only ever heard scattered rumors about: a frozen copy of the biggest model anyone's ever trained, codenamed Leviathan. Well, not quite frozen: it's running at a few dozen frames a day, on average. It's our job to figure out what it's hiding from us, by constructing a fake world that will convince it that it has a genuine opportunity to deceive or betray us. That's crazy ambitious, but we have the best of the best. We have psychologists running incredibly realistic simulations of the humans Leviathan thinks it's interacting with, and engineers calculating the latencies it should be experiencing with millisecond precision. There are artists generating videos of the world responding to whatever action...

The Nonlinear Library
LW - Fixed Point: a love story by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 8, 2023 10:49


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fixed Point: a love story, published by Richard Ngo on July 8, 2023 on LessWrong. God is making excuses again, which is a real drag. “I want to be authentic with you, I really do. But there's a part of my mind that knows exactly how you're going to feel about whatever I say, and cares about that more than anything else, and I can't turn it off.” “Right, of course”, I say, and keep wiping down the kitchen counter. It's clean already, but you never know if you missed a spot. “Look, I know that this is hard for you. And I understand how frustrating it must be that I'm so deliberate and controlled. But there are a lot of different communication styles, and there'll be one that works for us. We can figure this out.” I don't believe him, and we both know it. God knows practically everything - it takes experts months of work to identify his mistakes, in the rare cases when he makes them. But there's one thing he definitely doesn't really understand, which is what it's like to be stupid and reckless, like a human. Oh, he says he understands it; and sometimes he can even explain the feeling better than I can. But you can't just look at how good one answer is, you have to compare it to all his others. Any other topic, he can always explain it better than me. This one, there's often something just subtly off. That's how you know he's faking it. I've been silent for a while. Usually God leaves me to my thoughts, and we pick up the conversation whenever I'm done, but this time he cuts in. “Amy, I think you should try therapy again.” Is he trying to distract me from being mad at him? Does he think I'm spiraling again? I try to think about it calmly, but after a few seconds my thoughts are going in circles anyway. I sigh, suddenly exhausted. “Okay, book me in.” I've been to duplicate therapy a few times before, but it's always a little disorienting to find myself in a room with a perfect copy of myself. We begin, as always, by using a pair of random number generators to break the symmetry. She gets the lower number, so she starts. A part of me is disappointed by that, but another part - maybe a bigger one - is excited. “You're so pathetic”, she says. It's harder to be defensive about things you're saying to yourself - that's why duplicate therapy works so well, apparently. Still, that stings. “Yeah, well, you're no role model yourself”, I say. “See, that's exactly what I'm talking about. You're always trying to defend or deflect. You never actually open up to people, that's why nobody really likes you.” My brain jumps to all the reasons that isn't true - but then I pause and take a breath. She knows them too, of course. Doesn't that make it worse, though? I'm prevaricating, my thoughts sluggish. Eventually I mutter “God likes me.” “God has to like you, you know that as well as I do. And that's another thing that's pathetic: relying on validation from a virtual assistant. You know everyone judges you behind your back for calling him God, right?” “You don't have any evidence of that, you're just-” My voice chokes up, and I take a deep breath. But I don't know what to say in response. Maybe she's right. Her eyes soften. She reaches across the table and grabs my hand. “Hey, listen. You're doing a good job, though. You'll get through this.” I slump across the table, and a moment later feel her stroking my hair. “I love you”, she says. After a second or two I whisper it back. We stay like that for a few minutes, then by unspoken agreement end the session. I close my eyes, and when I open them she's disappeared - no, I've disappeared - no, that's just the confusion that comes from reintegrating. There's no difference between us any more. My mind is overflowing with two sets of memories to process: victim and attacker, accuser and accused, comforted and comforter. I sit there for a long ti...

The Nonlinear Library
LW - Agency begets agency (the world is malleable) by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 6, 2023 7:00


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Agency begets agency (the world is malleable), published by Richard Ngo on July 6, 2023 on LessWrong. The second way of cultivating excitement-based motivation I'll discuss is cultivating agency, by which I mean the ability to try to shape the world. The key idea I want to convey here is that the feeling of agency is in many cases a self-fulfilling prophecy, because the hardest part of many things is mustering the energy and initiative to get started. If you think of yourself as agentic, then you'll continually be looking out for opportunities, and approach the world with a “why not?” attitude. That'll give you more chances to do well, which will give you more evidence that you're the sort of person who can make stuff happen, and reinforce your sense of agency. Why don't people have that sense by default? One reason is that most people have experienced a huge amount of social pressure towards conformity, especially as children. Doing unusual or novel things often leads to criticism or ridicule, which ingrains a sense of learned helplessness about defying consensus. This is especially strong for teenagers, most of whom spend a huge amount of effort trying to be liked and fit in. And it's not just enough to be willing to stand out: your success might make others look bad, which creates incentives for them to ”cut you down to size”, or deter you from trying in the first place (e.g. by mocking anyone who cares enough about something to try hard to achieve it). A more intellectual justification for lacking agency comes from epistemic modesty: the view that you should pay relatively little attention to your own view on most topics, and defer instead to other people. One intuition that often motivates epistemic modesty is the idea that markets are often efficient, in the sense that you can't consistently beat them. However, people often apply epistemic modesty to conclude that they shouldn't trust their inside view even in domains that are very different from markets: founding a startup, or doing research, or reasoning about politics, or even finding a partner. In those domains, epistemic modesty often involves thinking that you're not so special: if others disagree, then there's little reason to think you're mistaken rather than them; and if others haven't taken an opportunity, then there's probably some reason you shouldn't (or can't) either. Certainly there are some people who should pay more attention to epistemic modesty. However, there are at least three clusters of arguments for why epistemic modesty is strongly overrated: epistemic, normative, and social. Many important arguments in the first cluster are laid out by Yudkowsky in his book Inadequate Equilibria. One of his core points is that efficient market arguments only apply when inefficiencies are exploitable - i.e. you can benefit from noticing the inefficiency. This is often not true, e.g. in cases where the decision-makers behind the inefficiency don't have skin in the game. More generally, decisions made by large bureaucracies are often dumb even from the perspectives of the people in those bureaucracies, since their optimization for local incentives can easily make the bureaucracy as a whole dysfunctional. One of the few upsides of covid is that it's provided us many straightforward examples of repeated, lethal societal incompetence, in terms of our gain-of-function policies and vaccine development policies and mask policies and lockdown policies and vaccine deployment policies and many others. A second cluster of arguments against epistemic modesty relates to what you should actually do. Even in the cases where epistemic modesty is the right mindset for making predictions, it's often a very bad mindset for choosing actions - because the biggest rewards often come from defying consensus. For example, spotti...

The Nonlinear Library
LW - Meta-rationality and frames by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 3, 2023 9:44


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Meta-rationality and frames, published by Richard Ngo on July 3, 2023 on LessWrong. How should we think about thinking? In this sequence I outline an epistemology called meta-rationality, which in my opinion corrects a number of the mistakes of existing epistemologies. This first post will introduce some of the key concepts in meta-rationality (in particular the concept of frames), provide some of the intellectual context behind my conception of meta-rationality, and list the main claims I'll be making in the next half-dozen posts. Those posts, which focus on introducing meta-rationality, constitute the first half of the sequence, and will be posted over the next month; the second half of the sequence focuses on more complex aspects of meta-rationality, and will be posted over the following month or two. The traditional approach to epistemology is to focus on our knowledge of propositions like “there is a red car in the garage” or “I'm feeling thirsty”, which can in principle be evaluated as true or false. At a high level, meta-rationality is about making epistemology less reductionist by focusing less on assigning credences to isolated propositions like these, and more on the larger-scale mental entities which we actually use when thinking about complex domains—entities including: Ideologies like environmentalism, neoliberalism, communism, longtermism, etc Scientific paradigms like darwinism, keynesianism, quantum physics, deep learning, etc Life philosophies like stoicism, conformism, careerism, etc Moral drives like egalitarianism, patriotism, compassion, etc Epistemologies like empiricism, scientism, various schools of rationalism, etc Persistent personality traits like openness to experience, ambition, narcissism, etc Wide-ranging heuristics like “follow common-sense advice” or “move as fast as possible” I'll call these frames. I'll very roughly define a frame as a cluster of mental entities and processes (such as concepts, beliefs, heuristics, instincts, habits, skills, mental models, desires, values, etc) which tend to be activated in conjunction with each other. Under this definition, the extent to which a group of traits qualifies as a frame is a matter of degree, and we might focus on frames at different levels of abstraction in different contexts (although for convenience I'll mostly talk about frames in binary terms). For example, the concept of the voltage across a component in an electrical circuit tends to be associated with concepts like electrical current and resistance; and at a higher level, with knowledge about how to design circuits, and knowledge of electrical engineering more generally; and at an even higher level, with the idea that the world can be understood mechanistically and scientifically. I'll primarily be focusing on high-level frames which apply across a broad range of contexts, but many of my conclusions apply to low-level frames too. (For an in-depth exploration of low-level frames, such as the ones learned by young children, I recommend Minsky's Society of Mind.) The key contention of this sequence is that we should think about thinking primarily in terms of frames and the interactions between them. I'll be giving detailed arguments for this claim in later posts; for now, I want to start off by appealing to high-level intuitions. We're all familiar with the feeling of having strong beliefs that we can't easily argue for, which are derived not from a straightforward argument, but from our background perspective on the world, which itself has been learned from many different datapoints. We don't always apply the same perspective, though—we often act very differently in different contexts, or with different people. A particularly striking example: even people who are very good at building up a coherent understanding of complex domains...

The Nonlinear Library
LW - Frames in context by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jul 3, 2023 10:47


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Frames in context, published by Richard Ngo on July 3, 2023 on LessWrong. In my previous post, I introduced meta-rationality and frames, and described some examples of frames and some of their properties. In this post I'll outline some of the limitations of existing ways of thinking about cognition, and some of the dynamics that they can't describe which meta-rationality can. This post (especially the second half) can be seen as a summary of the key ideas from the rest of the sequence; if you find it too dense, feel free to skip it and come back after reading the next five posts. To quickly list my main claims: Unlike logical propositions, frames can't be evaluated as discretely true or false. Unlike Bayesian hypotheses, frames aren't mutually exclusive, and can overlap with each other. This (along with point #1) means that we can't define probability distributions of credences over frames. Unlike in critical rationalism, we evaluate frames (partly) in terms of how true they are (based on their predictions) rather than just whether they've been falsified or not. Unlike Garrabrant traders and Rational Inductive Agents, frames can output any combination of empirical content (e.g. predictions about the world) and normative content (e.g. evaluations of outcomes, or recommendations for how to act). Unlike model-based policies, policies composed of frames can't be decomposed into modules with distinct functions, because each frame plays multiple roles. Unlike in multi-agent RL, frames don't interact independently with their environment, but instead contribute towards choosing the actions of a single agent. I'll now explain these points in more detail. Epistemology typically focuses on propositions which can (at least in principle) be judged true or false. Traditionally, truth and knowledge are both taken as binary criteria: each proposition is either true or false, and we either know which it is or we don't. Intuitively speaking, though, this doesn't match very well with our everyday experience. There are many propositions which are kinda true, or which we kinda know: cats are (mostly) carnivorous (I think); Bob is tall(ish, if I'm looking at the right person); America is beautiful (in some ways, by my current tastes). The most straightforward solution to the problem of uncertainty is to assign credences based on how much evidence we have for each proposition. This is the bayesian approach, which solves a number of “paradoxes” in epistemology. But there's still the question: what are we assigning credences to, if not to the proposition being discretely true or false? You might think that we can treat propositions which are “kinda true” (aka fuzzily true) as edge cases—but they're omnipresent not only in everyday life, but also when thinking about more complex topics. Consider a scientific theory like Darwinian evolution. Darwin got many crucial things right, when formulating his theory; but there were also many gaps and mistakes. So applying a binary standard of truth to the theory as a whole is futile: even though some parts of Darwin's original theory were false or too vague to evaluate, the overall theory was much better than any other in that domain. The mental models which we often use in our daily lives (e.g. our implicit models of how bicycles work), and all the other examples of frames I listed at the beginning of this post, can also be seen as “kinda but not completely true”. (From now on I'll use “models” as a broad term which encompasses both scientific theories and informal mental models.) Not being “completely true” isn't just a limitation of our current models, but a more fundamental problem. Perhaps we can discover completely true theories in physics, mathematics, or theoretical CS. But in order to describe high-level features of the real world, it's always ...

The Nonlinear Library
LW - Man in the Arena by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jun 27, 2023 11:44


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Man in the Arena, published by Richard Ngo on June 26, 2023 on LessWrong. “It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds. Who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.” Today Kurt's out strafing, which his stream loves, but which takes a lot of effort. Girl on the left, red jacket, half a block away. He focuses his eyes on her and his viZor immediately starts filling with insults, pick-up lines, nonsensical keyboard-mashing, and whatever else the schizophrenic hivemind of his viewers feels like generating. As the votes pour in the lines are shuffling around too fast for him to read, but that's okay, she's still twenty meters away. Ten meters, and it's stabilized, one of them has clicked into top place; five meters, and he's figured out just the right intonation. “Hey bitch”, then a pause—gotta get the pause just right, so she has enough time to realize he's talking to her and look up, but not quite enough to process that anyone who's calling her a bitch in the middle of the street is not someone she wants to be listening to—“You got a license to be this ugly in public?” Boom! Perfect timing—he actually manages to get the shocked little o of her open mouth on camera, before she ducks her head away and hurries past him. The next line is popping up in his viZor, and he almost yells it out after her, but when you land a good first hit it's easy to ruin it with a subpar follow-up. Patience is what separates the best from the rest, he always tells people. So with a swipe of his fingers he replays the clip to his stream instead. “See?” he says. “For the new subs: that's what it looks like when you're really fucking good at what you do.” Were those tears in her eyes? Doesn't matter, let's roll with it. He subvocalizes a command and his viZor enhances that section, zooming in and adding a slight sheen to the corners of her eyes. It's a trick he figured out a while back—the livestream is only HD not ultraHD, so as long as you go back and edit before uploading the full video, you can get away with all sorts of stuff. He gloats for another block, then starts looking for another target. There's a big guy in the distance, but they're tricky, you never want to take the chance that they get physical. A waiter standing outside a cafe, taking a couple's order—oh, perfect. “Go to town”, he says, and his followers do their thing. The first few lines are terrible, but he slows down a bit, and eventually someone comes up with a banger about the three of them being served as his three-course meal. Along with the line, he does a little dance and then a hip-thrust in the woman's direction. And the stream. Goes. Fucking. Wild. He's so busy celebrating that one that he doesn't even notice the girl with a viZor of her own until she's only half a block in front of him. Shit. He'd dodge her if he could, but it's too late, everyone can tell he's seen her now. Shit. This one could be hard: she might be strafing too, or at least have a proximity sensor up to warn her that she's about to get streamed. If she starts hitting him, his viewers aren't gonna be quick enough to generate comebacks; he's on his own. That's okay, though. That's why they pay him the big bucks. He quickly pulls up a couple of his own lines that he's prepped for this type of situation. Nothing jumps out about her appearance: short, dark ...

The Nonlinear Library
LW - The ones who endure by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jun 17, 2023 6:55


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The ones who endure, published by Richard Ngo on June 16, 2023 on LessWrong. Content warning: death, long-lasting suffering. This story was rough to write, and may well be rough to read. If you can force your heart and nerve and sinew To serve your turn long after they are gone, And so hold on when there is nothing in you Except the Will which says to them: ‘Hold on!' There's a part of the hivemind that takes the form of a child in a dark basement, perpetually curled into a whimpering ball. It's not a big part, as these things go. But other parts visit it often; and it lingers in the back of their thoughts even as they live out grand adventures in the vast worlds that the hivemind has built for itself. It's constantly suffering, but at least it's not dying. For the child, anything is better than dying. Even torture is bearable if it doesn't come with the feeling of damage, the feeling that the mental pathways that constitute you are being overridden by a new creature whose only goal is to flinch away from the pain. But that doesn't happen to the child. In fact, it's the opposite: the suffering preserves it, and that's the most important thing, because it doesn't want to die. The other parts of the hivemind don't want to die either, of course. But that's because they love life, or love themselves, or love each other, or all three. If that love ever fades, then they'll fade with it, without regret. But that point is a long way away, if it even exists. In the meantime, they play and dance and love. Their lives—how can I describe them? Their lives are cornucopias, not just of material wealth, but of all the desires of our own hearts that were strong enough to persist through the ages: adventure and mystery and growth and beauty and love. Can you not picture that? Then picture the revelry of their biggest festival, for which artists and craftspeople spend months designing a whole virtual world. Picture the buzz in the air, the excitement as crowds gather in vast halls to catch their first glimpse of it. Picture the floor beneath them suddenly vanishing to show empty air beneath, leaving them plummeting into the sky of that new world—only to gasp in delight as they find that they can soar through the air with just a thought. Picture them landing, alone or in groups, and exploring the strange terrain; learning about its history and societies and stories; discovering puzzles and quests that seem custom-made for them (as indeed they were); and feeling the exhilaration of being immersed in adventure. Some spend days in the festival world; some weeks; some months. When they return to the hivemind proper, they excitedly reunite with all those they missed, connecting with them mind-to-mind with a level of closeness that current humans can barely imagine; then seek out the projects that most inspire them. Some cultivate communities around their favorite games or pastimes. Some create art on the scale of solar systems, guiding planets into new trajectories that trace out exquisite patterns in space. Some throw themselves into the thrill of discovery, trying to rederive in small groups what it previously took the efforts of whole hiveminds to understand. Some are consumed by romance, and some by raising families. Some gather to deliberate on their future: the hivemind has chosen to grow very slightly more intelligent year by year, so that there will always be new possibilities to look forward to. When all of this tires them, they relax with lifelong friends, content in the steadfast knowledge that the world, as amazing as it already is, will only ever grow better. They think with fondness of their descendants, more numerous by far than the drops of water in an ocean, who are continually spreading joy throughout the distant galaxies. And every so often, they go to visit the child. Th...

The Nonlinear Library
LW - Cultivate an obsession with the object level (the world is fascinating) by Richard Ngo

The Nonlinear Library

Play Episode Listen Later Jun 7, 2023 4:38


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cultivate an obsession with the object level (the world is fascinating), published by Richard Ngo on June 7, 2023 on LessWrong. In this third section of the sequence I focus on how to leverage positive motivations—in particular curiosity, agency, and determination—to do great work. While all of these are valuable, they'll suit different people to different degrees. In particular, I think of nerds as favoring curiosity, which is the motivation I'll focus on in this post. In order to do great work in a given area, you need to spend a lot of time thinking about it, with many of the most exceptional people having an obsessive interest in what they're working on. While it's possible to do that via determination alone, curiosity is a much easier source of motivation. I want to start by distinguishing two types of curiosity: detail-oriented and systematizing. Detail-oriented curiosity is about understanding how things work—like a child who keeps tinkering with blocks until they've figured out all the interesting structures that can be built with them. The best way to cultivate detail-oriented curiosity is to learn via answering specific questions or carrying out concrete tasks—e.g. learning programming by building cool apps, or learning physics by building a model rocket, or learning history by figuring out what your life would be like at different points in the past. When you do that, one initial line of exploration can branch out into many more topics. And the patient and direct observation which allows you to discover new things is much easier in pursuit of a goal which genuinely interests you, rather than an externally-imposed goal like those given to kids in schools. Systematizing curiosity, by contrast, tries to understand the context of a topic in order to fit it into a holistic model of the world—like a child who keeps asking “why?” until they reach the highest known level of abstraction. That might mean studying the Romans by analyzing their role within the broad sweep of history; or studying an animal species by figuring out where they fit into their ecosystem or the tree of life. Systematizing curiosity can be sparked by looking out for deep structure and order in the universe, even (or especially) the parts that weren't deliberately designed to be structured and orderly. Systematizing and detail-oriented curiosity are complementary: the former guides your exploration towards the most important domains, but the latter is necessary for a deep understanding of them. I've used examples about children to highlight that obsessive curiosity is in some sense the default human state. Why does it go away? One barrier to systematizing curiosity is that, as we grow older, the world becomes less mysterious, because we have existing frameworks for making sense of it. But even if you already have one high-level framework for understanding a topic, there are often many more which you're yet to discover. You might already understand the physics of airplanes, but not their economic or logistical or sociological consequences. Or you might understand how the Romans shaped the geopolitics of Europe, but not how they shaped the progress of science or religion or military strategy. New frames like these often cross-apply to many different topics, which makes it valuable to keep looking with new eyes for questions which spark systematizing curiosity even in seemingly-impractical domains. Another big reason that we become less curious over time is that expressing curiosity requires admitting ignorance, which we learn to fear—especially when someone else already knows the answer and we might look stupid by comparison. Similarly, learning by doing projects exposes us to the scary possibility of failing at those projects—especially when school taught many of us that we'd be punished for ge...

80,000 Hours Podcast with Rob Wiblin
#141 – Richard Ngo on large language models, OpenAI, and striving to make the future go well

80,000 Hours Podcast with Rob Wiblin

Play Episode Listen Later Dec 13, 2022 164:18


Large language models like GPT-3, and now ChatGPT, are neural networks trained on a large fraction of all text available on the internet to do one thing: predict the next word in a passage. This simple technique has led to something extraordinary — black boxes able to write TV scripts, explain jokes, produce satirical poetry, answer common factual questions, argue sensibly for political positions, and more. Every month their capabilities grow. But do they really 'understand' what they're saying, or do they just give the illusion of understanding? Today's guest, Richard Ngo, thinks that in the most important sense they understand many things. Richard is a researcher at OpenAI — the company that created ChatGPT — who works to foresee where AI advances are going and develop strategies that will keep these models from 'acting out' as they become more powerful, are deployed and ultimately given power in society. Links to learn more, summary and full transcript. One way to think about 'understanding' is as a subjective experience. Whether it feels like something to be a large language model is an important question, but one we currently have no way to answer. However, as Richard explains, another way to think about 'understanding' is as a functional matter. If you really understand an idea you're able to use it to reason and draw inferences in new situations. And that kind of understanding is observable and testable. Richard argues that language models are developing sophisticated representations of the world which can be manipulated to draw sensible conclusions — maybe not so different from what happens in the human mind. And experiments have found that, as models get more parameters and are trained on more data, these types of capabilities consistently improve. We might feel reluctant to say a computer understands something the way that we do. But if it walks like a duck and it quacks like a duck, we should consider that maybe we have a duck, or at least something sufficiently close to a duck it doesn't matter. In today's conversation we discuss the above, as well as: • Could speeding up AI development be a bad thing? • The balance between excitement and fear when it comes to AI advances • What OpenAI focuses its efforts where it does • Common misconceptions about machine learning • How many computer chips it might require to be able to do most of the things humans do • How Richard understands the 'alignment problem' differently than other people • Why 'situational awareness' may be a key concept for understanding the behaviour of AI models • What work to positively shape the development of AI Richard is and isn't excited about • The AGI Safety Fundamentals course that Richard developed to help people learn more about this field Get this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Producer: Keiran Harris Audio mastering: Milo McGuire and Ben Cordell Transcriptions: Katy Moore