POPULARITY
OpenAI wordt onder topman Sam Altman omgebouwd tot een bedrijf met winstoogmerk (for-profit), waardoor de huidige non-profit insteek naar de achtergrond verdwijnt. Ondertussen stappen drie van de belangrijkste technici op. Joe van Burik vertelt erover in deze Tech Update. OpenAI's CTO, Mira Murati, heeft haar vertrek aangekondigt. Ze wil tijd en ruimte vrijmaken om 'haar eigen onderzoek' te doen, laat ze weten in een bericht op X. Murati's beslissing volgt op het vertrek van enkele andere hooggeplaatste bestuurders van OpenAI, dat bekend werd met ChatGPT en het onderliggende model voor geavanceerde kunstmatige intelligentie. Vorige maand kondigde medeoprichter John Schulman aan OpenAI te verlaten, om over te stappen op concurrent Anthropic. In mei verliet medeoprichter en hoofdonderzoeker Ilya Sutskever het bedrijf. Diezelfde maand vertrok Jan Leike, die de hoofdverantwoordelijke was voor de veiligheid van de AI-modellen van OpenAI. Bijna gelijktijdig met het nieuws van Murati's vertrek schreef persbureau Reuters dat de bedrijfsstructuur van OpenAI op de schop gaat. Daardoor moet het bedrijf een onderneming met winstoogmerk worden, meldde het persbureau op basis van ingewijden. Verder in deze Tech Update: Meta-topman Mark Zuckerberg heeft nieuwe slimme brillen onthult, waaronder de indrukwekkend ogende augmented reality-bril Orion én de Meta Quest 3S als goedkoper alternatief voor de Quest 3, plus diverse nieuwe AI-innovaties Elon Musks social-mediaplatform X heeft gevraagd weer actief te mogen zijn in Brazilië vanaf aanstaande maandag, nadat de hoogste rechter daar het platform daar enkele weken terug liet blokkeren See omnystudio.com/listener for privacy information.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Problems in AIXI Agent Foundations, published by Cole Wyeth on September 13, 2024 on LessWrong. I believe that the theoretical foundations of the AIXI agent and variations are a surprisingly neglected and high leverage approach to agent foundations research. Though discussion of AIXI is pretty ubiquitous in A.I. safety spaces, underscoring AIXI's usefulness as a model of superintelligence, this is usually limited to poorly justified verbal claims about its behavior which are sometimes questionable or wrong. This includes, in my opinion, a serious exaggeration of AIXI's flaws. For instance, in a recent post I proposed a simple extension of AIXI off-policy that seems to solve the anvil problem in practice - in fact, in my opinion it has never been convincingly argued that the anvil problem would occur for an AIXI approximation. The perception that AIXI fails as an embedded agent seems to be one of the reasons it is often dismissed with a cursory link to some informal discussion. However, I think AIXI research provides a more concrete and justified model of superintelligence than most subfields of agent foundations [1]. In particular, a Bayesian superintelligence must optimize some utility function using a rich prior, requiring at least structural similarity to AIXI. I think a precise understanding of how to represent this utility function may be a necessary part of any alignment scheme on pain of wireheading. And this will likely come down to understanding some variant of AIXI, at least if my central load bearing claim is true: The most direct route to understanding real superintelligent systems is by analyzing agents similar to AIXI. Though AIXI itself is not a perfect model of embedded superintelligence, it is perhaps the simplest member of a family of models rich enough to elucidate the necessary problems and exhibit the important structure. Just as the Riemann integral is an important precursor of Lebesgue integration, despite qualitative differences, it would make no sense to throw AIXI out and start anew without rigorously understanding the limits of the model. And there are already variants of AIXI that surpass some of those limits, such as the reflective version that can represent other agents as powerful as itself. This matters because the theoretical underpinnings of AIXI are still very spotty and contain many tractable open problems. In this document, I will collect several of them that I find most important - and in many cases am actively pursuing as part of my PhD research advised by Ming Li and Marcus Hutter. The AIXI (~= "universal artificial intelligence") research community is small enough that I am willing to post many of the directions I think are important publicly; in exchange I would appreciate a heads-up from anyone who reads a problem on this list and decides to work on it, so that we don't duplicate efforts (I am also open to collaborate). The list is particularly tilted towards those problems with clear, tractable relevance to alignment OR philosophical relevance to human rationality. Naturally, most problems are mathematical. Particularly where they intersect recursion theory, these problems may have solutions in the mathematical literature I am not aware of (keep in mind that I am a lowly second year PhD student). Expect a scattering of experimental problems to be interspersed as well. To save time, I will assume that the reader has a copy of Jan Leike's PhD thesis on hand. In my opinion, he has made much of the existing foundational progress since Marcus Hutter invented the model. Also, I will sometimes refer to the two foundational books on AIXI as UAI = Universal Artificial Intelligence and Intro to UAI = An Introduction to Universal Artificial Intelligence, and the canonical textbook on algorithmic information theory Intro to K = An...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Open Problems in AIXI Agent Foundations, published by Cole Wyeth on September 13, 2024 on LessWrong. I believe that the theoretical foundations of the AIXI agent and variations are a surprisingly neglected and high leverage approach to agent foundations research. Though discussion of AIXI is pretty ubiquitous in A.I. safety spaces, underscoring AIXI's usefulness as a model of superintelligence, this is usually limited to poorly justified verbal claims about its behavior which are sometimes questionable or wrong. This includes, in my opinion, a serious exaggeration of AIXI's flaws. For instance, in a recent post I proposed a simple extension of AIXI off-policy that seems to solve the anvil problem in practice - in fact, in my opinion it has never been convincingly argued that the anvil problem would occur for an AIXI approximation. The perception that AIXI fails as an embedded agent seems to be one of the reasons it is often dismissed with a cursory link to some informal discussion. However, I think AIXI research provides a more concrete and justified model of superintelligence than most subfields of agent foundations [1]. In particular, a Bayesian superintelligence must optimize some utility function using a rich prior, requiring at least structural similarity to AIXI. I think a precise understanding of how to represent this utility function may be a necessary part of any alignment scheme on pain of wireheading. And this will likely come down to understanding some variant of AIXI, at least if my central load bearing claim is true: The most direct route to understanding real superintelligent systems is by analyzing agents similar to AIXI. Though AIXI itself is not a perfect model of embedded superintelligence, it is perhaps the simplest member of a family of models rich enough to elucidate the necessary problems and exhibit the important structure. Just as the Riemann integral is an important precursor of Lebesgue integration, despite qualitative differences, it would make no sense to throw AIXI out and start anew without rigorously understanding the limits of the model. And there are already variants of AIXI that surpass some of those limits, such as the reflective version that can represent other agents as powerful as itself. This matters because the theoretical underpinnings of AIXI are still very spotty and contain many tractable open problems. In this document, I will collect several of them that I find most important - and in many cases am actively pursuing as part of my PhD research advised by Ming Li and Marcus Hutter. The AIXI (~= "universal artificial intelligence") research community is small enough that I am willing to post many of the directions I think are important publicly; in exchange I would appreciate a heads-up from anyone who reads a problem on this list and decides to work on it, so that we don't duplicate efforts (I am also open to collaborate). The list is particularly tilted towards those problems with clear, tractable relevance to alignment OR philosophical relevance to human rationality. Naturally, most problems are mathematical. Particularly where they intersect recursion theory, these problems may have solutions in the mathematical literature I am not aware of (keep in mind that I am a lowly second year PhD student). Expect a scattering of experimental problems to be interspersed as well. To save time, I will assume that the reader has a copy of Jan Leike's PhD thesis on hand. In my opinion, he has made much of the existing foundational progress since Marcus Hutter invented the model. Also, I will sometimes refer to the two foundational books on AIXI as UAI = Universal Artificial Intelligence and Intro to UAI = An Introduction to Universal Artificial Intelligence, and the canonical textbook on algorithmic information theory Intro to K = An...
Can AI development truly prioritize human safety, and what groundbreaking solutions are transforming cloud connectivity? In this week's episode of Cables2Clouds, we promise to unravel these compelling questions. We begin by examining how Aviatrix's latest innovation, the AWS Cloud WAN Connector, simplifies multi-cloud networking. Learn how this tool reduces complex manual configurations and enhances network segmentation across AWS, Azure, and Google Cloud, thanks to the collaborative efforts of Aviatrix's professional services team.Next, we pivot to a critical discussion on AI safety with insights from industry luminaries Ilya Sutskever and Jan Leike. Discover Sutskever's mission with Safe Superintelligence Incorporated (SSI) to steer AI development towards safeguarding humanity. We also spotlight Amazon's ambitious venture into the AI chatbot arena with Metis, a promising competitor to ChatGPT powered by the Olympus language model. Wrapping up, we analyze the competitive dynamics among tech giants and the future of telcos leveraging cloud services, debating whether Metis will debut at Amazon's reInvent event. You won't want to miss this episode packed with insights on AI security and cloud connectivity!Check out the Fortnightly Cloud Networking NewsVisit our website and subscribe: https://www.cables2clouds.com/Follow us on Twitter: https://twitter.com/cables2cloudsFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatjArt of Network Engineering (AONE): https://artofnetworkengineering.com
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI's June 2024 Newsletter, published by Harlan on June 15, 2024 on LessWrong. MIRI updates MIRI Communications Manager Gretta Duleba explains MIRI's current communications strategy. We hope to clearly communicate to policymakers and the general public why there's an urgent need to shut down frontier AI development, and make the case for installing an "off-switch". This will not be easy, and there is a lot of work to be done. Some projects we're currently exploring include a new website, a book, and an online reference resource. Rob Bensinger argues, contra Leopold Aschenbrenner, that the US government should not race to develop artificial superintelligence. "If anyone builds it, everyone dies." Instead, Rob outlines a proposal for the US to spearhead an international alliance to halt progress toward the technology. At the end of June, the Agent Foundations team, including Scott Garrabrant and others, will be parting ways with MIRI to continue their work as independent researchers. The team was originally set up and "sponsored" by Nate Soares and Eliezer Yudkowsky. However, as AI capabilities have progressed rapidly in recent years, Nate and Eliezer have become increasingly pessimistic about this type of work yielding significant results within the relevant timeframes. Consequently, they have shifted their focus to other priorities. Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team's focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense. The Agent Foundations team has produced some stellar work over the years, and made a true attempt to tackle one of the most crucial challenges humanity faces today. We are deeply grateful for their many years of service and collaboration at MIRI, and we wish them the very best in their future endeavors. The Technical Governance Team responded to NIST's request for comments on draft documents related to the AI Risk Management Framework. The team also sent comments in response to the " Framework for MItigating AI Risks" put forward by U.S. Senators Mitt Romney (R-UT), Jack Reed (D-RI), Jerry Moran (R-KS), and Angus King (I-ME). Brittany Ferrero has joined MIRI's operations team. Previously, she worked on projects such as the Embassy Network and Open Lunar Foundation. We're excited to have her help to execute on our mission. News and links AI alignment researcher Paul Christiano was appointed as head of AI safety at the US AI Safety Institute. Last fall, Christiano published some of his thoughts about AI regulation as well as responsible scaling policies. The Superalignment team at OpenAI has been disbanded following the departure of its co-leaders Ilya Sutskever and Jan Leike. The team was launched last year to try to solve the AI alignment problem in four years. However, Leike says that the team struggled to get the compute it needed and that "safety culture and processes have taken a backseat to shiny products" at OpenAI. This seems extremely concerning from the perspective of evaluating OpenAI's seriousness when it comes to safety and robustness work, particularly given that a similar OpenAI exodus occurred in 2020 in the wake of concerns about OpenAI's commitment to solving the alignment problem. Vox's Kelsey Piper reports that employees who left OpenAI were subject to an extremely restrictive NDA indefinitely preventing them from criticizing the company (or admitting that they were under an NDA), under threat of losing their vested equity in the company. OpenAI executives have since contacted former employees to say that they will not enforce the NDAs. Rob Bensinger comments on these developments here, strongly criticizing OpenAI for...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: MIRI's June 2024 Newsletter, published by Harlan on June 15, 2024 on LessWrong. MIRI updates MIRI Communications Manager Gretta Duleba explains MIRI's current communications strategy. We hope to clearly communicate to policymakers and the general public why there's an urgent need to shut down frontier AI development, and make the case for installing an "off-switch". This will not be easy, and there is a lot of work to be done. Some projects we're currently exploring include a new website, a book, and an online reference resource. Rob Bensinger argues, contra Leopold Aschenbrenner, that the US government should not race to develop artificial superintelligence. "If anyone builds it, everyone dies." Instead, Rob outlines a proposal for the US to spearhead an international alliance to halt progress toward the technology. At the end of June, the Agent Foundations team, including Scott Garrabrant and others, will be parting ways with MIRI to continue their work as independent researchers. The team was originally set up and "sponsored" by Nate Soares and Eliezer Yudkowsky. However, as AI capabilities have progressed rapidly in recent years, Nate and Eliezer have become increasingly pessimistic about this type of work yielding significant results within the relevant timeframes. Consequently, they have shifted their focus to other priorities. Senior MIRI leadership explored various alternatives, including reorienting the Agent Foundations team's focus and transitioning them to an independent group under MIRI fiscal sponsorship with restricted funding, similar to AI Impacts. Ultimately, however, we decided that parting ways made the most sense. The Agent Foundations team has produced some stellar work over the years, and made a true attempt to tackle one of the most crucial challenges humanity faces today. We are deeply grateful for their many years of service and collaboration at MIRI, and we wish them the very best in their future endeavors. The Technical Governance Team responded to NIST's request for comments on draft documents related to the AI Risk Management Framework. The team also sent comments in response to the " Framework for MItigating AI Risks" put forward by U.S. Senators Mitt Romney (R-UT), Jack Reed (D-RI), Jerry Moran (R-KS), and Angus King (I-ME). Brittany Ferrero has joined MIRI's operations team. Previously, she worked on projects such as the Embassy Network and Open Lunar Foundation. We're excited to have her help to execute on our mission. News and links AI alignment researcher Paul Christiano was appointed as head of AI safety at the US AI Safety Institute. Last fall, Christiano published some of his thoughts about AI regulation as well as responsible scaling policies. The Superalignment team at OpenAI has been disbanded following the departure of its co-leaders Ilya Sutskever and Jan Leike. The team was launched last year to try to solve the AI alignment problem in four years. However, Leike says that the team struggled to get the compute it needed and that "safety culture and processes have taken a backseat to shiny products" at OpenAI. This seems extremely concerning from the perspective of evaluating OpenAI's seriousness when it comes to safety and robustness work, particularly given that a similar OpenAI exodus occurred in 2020 in the wake of concerns about OpenAI's commitment to solving the alignment problem. Vox's Kelsey Piper reports that employees who left OpenAI were subject to an extremely restrictive NDA indefinitely preventing them from criticizing the company (or admitting that they were under an NDA), under threat of losing their vested equity in the company. OpenAI executives have since contacted former employees to say that they will not enforce the NDAs. Rob Bensinger comments on these developments here, strongly criticizing OpenAI for...
This week, a group of current and former employees from Open AI and Google Deepmind penned an open letter accusing the industry's leading companies of prioritizing profits over safety. This comes after a spate of high profile departures from OpenAI, including co-founder Ilya Sutskever and senior researcher Jan Leike, as well as reports that OpenAI has gone to great lengths to silence would-be whistleblowers. The writers of the open letter argue that researchers have a “right to warn” the public about AI risks and laid out a series of principles that would protect that right. In this episode, we sit down with one of those writers: William Saunders, who left his job as a research engineer at OpenAI in February. William is now breaking the silence on what he saw at OpenAI that compelled him to leave the company and to put his name to this letter. RECOMMENDED MEDIA The Right to Warn Open Letter My Perspective On "A Right to Warn about Advanced Artificial Intelligence": A follow-up from William about the letter Leaked OpenAI documents reveal aggressive tactics toward former employees: An investigation by Vox into OpenAI's policy of non-disparagement.RECOMMENDED YUA EPISODESA First Step Toward AI Regulation with Tom Wheeler Spotlight on AI: What Would It Take For This to Go Well? Big Food, Big Tech and Big AI with Michael Moss Can We Govern AI? with Marietje SchaakeYour Undivided Attention is produced by the Center for Humane Technology. Follow us on Twitter: @HumaneTech_
Artificial intelligence pioneers are speaking out against their former employers, calling for safeguards to protect against AI's potential risks. The controversy centers around Open AI, which has been criticized for prioritizing flashy product releases over safety considerations. Jan Leike, former head of "super alignment" efforts, left the company due to disagreements over security measures, sparking internal criticism and allegations of psychological abuse against CEO Sam Altman. Current and former employees from leading AI companies have penned an open letter outlining four core demands for transparency and accountability in AI development. --- Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message
Google AI Overview scheint komische Ergebnisse zu liefern und zum Beispiel auf die Frage „How many rocks should I eat“ mit einer Mengenangabe zu antworten. Google hat dazu Stellung genommen und einige Anpassungen vorgenommen.Bei OpenAI hat sich mal wieder ein kleines Drama abgespielt. Ilya Sutskever und Jan Leike haben OpenAI verlassen und Leike ist zu Anthropic gewechselt. Darüber hinaus hat Helen Toner, ehemaliges Mitglied des OpenAI-Aufsichtsrats, bei einem Podcast intime Details zum damaligen Rauswurf vom OpenAI-CEO Altman geteilt. Angeblich soll er einige Informationen vor dem Aufsichtsrat verheimlicht haben.Wie und warum das Golden Gate Claude LLM in allen Antworten die Golden Gate Bridge integriert, ohne dass man sie danach gefragt hat, klären wir ebenfalls in dieser Folge.Darüber hinaus haben wir uns über Perplexity Pages, HuggingFace FineWeb, ChatGPT Edu und das SEAL Leaderboard unterhalten.Funding-Updates:DeepL erhält 277 Millionen EuroCoreWeave hat einen weiteren Kredit über 7 Milliarden Dollar aufgenommenxAI Series B ist mit 6 Milliarden Dollar abgeschlossenModel-Updates:Cohere AyaMistralAI CodestralElevenLabs Sound EffectsSoundtrack composed by AIVA (Artificial Intelligence Virtual Artist)Schreibt uns! Schickt uns eure Themenwünsche und euer Feedback: podcast@programmier.barFolgt uns! Bleibt auf dem Laufenden über zukünftige Folgen und virtuelle Meetups und beteiligt euch an Community-Diskussionen. TwitterInstagramFacebookMeetupYouTube
Een nieuwe podcast, live van op het Nerdland Festival! Noorderlicht! Biodiversiteit! Peter's mijt-tattoo! Universeel bloed! Vogeldromen! Plato sterft! Robotbroek! En veel meer... Shownotes: https://podcast.nerdland.be/nerdland-maandoverzicht-juni-2024/ Gepresenteerd door Lieven Scheire met Hetty Helsmoortel, Peter Berx, Els Aerts, Bart Van Peer, Natha Kerkhofs, Jeroen Baert en Kurt Beheydt. Montage en mixing door Els Aerts en Jens Paeyeneers. Nerdland MaandoverBand onder leiding van Johnny Trash met Ariane Van Hasselt en Sergej Lopouchanski. (00:00:00) Intro (00:02:35) Noorderlicht was te zien (00:06:49) Amerikaanse boeren konden niet meer planten wegens de zonnestorm (00:10:35) Week van de biodiversiteit (00:16:38) Universeel donorbloed (00:22:59) Nieuwe inzichten over tinnitus (00:31:16) Robotbroek voor struikelende astronauten op de maan (00:40:38) Laatste uren van Plato en de locatie van zijn graf (00:46:20) Het mysterie van de blok purper (00:52:05) Paleontologen identificeren dinosaurussen met AI (00:56:59) SILLICON VALLEY NEWS (00:57:57) GPT-4o (01:03:33) Ilya Sutskever en Jan Leike vertrekken bij OpenAI (01:07:02) OpenAI heeft boel met Scarlett Johansson (01:09:36) Google IO keynote (01:12:43) Raspberry Pi met AI geef commentaar op leven van kat in Attenborough stijl (01:18:39) Universiteit Groningen ontwikkelt sarcasme-detector met AI (01:21:42) Een Urang Utan behandelt zijn wonde met medicinale planten (01:25:34) Luisteren naar vogeldromen (01:33:01) Stofzuigrobot die de trap op kan! (01:34:26) 11 mensen in IJsland per ongeluk presidentskandidaat (01:37:18) Richard Dawkins komt naar het land (01:39:55) Dr Who Star Trek crossover (01:45:09) Peter heeft archaea eukaryoten genoemd (01:46:54) Boek van Toon Verlinden Code Rood (01:47:16) Nerdland voor kleine Nerds Lotto Arena 27 december (01:47:57) Nerdland festival 2025 data! 28, 29, 30 en 31 mei (01:49:02) Nieuwe T-shirts en babyslabbekes (01:49:38) STEM-award 600 jaar KULeuven (01:50:12) Stephanie Dehennin en Marc De Bel hebben boek uit (01:51:24) Sponsor Flanders Make
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #66: Oh to Be Less Online, published by Zvi on June 1, 2024 on LessWrong. Tomorrow I will fly out to San Francisco, to spend Friday through Monday at the LessOnline conference at Lighthaven in Berkeley. If you are there, by all means say hello. If you are in the Bay generally and want to otherwise meet, especially on Monday, let me know that too and I will see if I have time to make that happen. Even without that hiccup, it continues to be a game of playing catch-up. Progress is being made, but we are definitely not there yet (and everything not AI is being completely ignored for now). Last week I pointed out seven things I was unable to cover, along with a few miscellaneous papers and reports. Out of those seven, I managed to ship on three of them: Ongoing issues at OpenAI, The Schumer Report and Anthropic's interpretability paper. However, OpenAI developments continue. Thanks largely to Helen Toner's podcast, some form of that is going back into the queue. Some other developments, including new media deals and their new safety board, are being covered normally. The post on DeepMind's new scaling policy should be up tomorrow. I also wrote a full post on a fourth, Reports of our Death, but have decided to shelve that post and post a short summary here instead. That means the current 'not yet covered queue' is as follows: 1. DeepMind's new scaling policy. 1. Should be out tomorrow before I leave, or worst case next week. 2. The AI Summit in Seoul. 3. Further retrospective on OpenAI including Helen Toner's podcast. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. You heard of them first. 4. Not Okay, Google. A tiny little problem with the AI Overviews. 5. OK Google, Don't Panic. Swing for the fences. Race for your life. 6. Not Okay, Meta. Your application to opt out of AI data is rejected. What? 7. Not Okay Taking Our Jobs. The question is, with or without replacement? 8. They Took Our Jobs Anyway. It's coming. 9. A New Leaderboard Appears. Scale.ai offers new capability evaluations. 10. Copyright Confrontation. Which OpenAI lawsuit was that again? 11. Deepfaketown and Botpocalypse Soon. Meta fails to make an ordinary effort. 12. Get Involved. Dwarkesh Patel is hiring. 13. Introducing. OpenAI makes media deals with The Atlantic and… Vox? Surprise. 14. In Other AI News. Jan Leike joins Anthropic, Altman signs giving pledge. 15. GPT-5 Alive. They are training it now. A security committee is assembling. 16. Quiet Speculations. Expectations of changes, great and small. 17. Open Versus Closed. Two opposing things cannot dominate the same space. 18. Your Kind of People. Verbal versus math versus otherwise in the AI age. 19. The Quest for Sane Regulation. Lina Khan on the warpath, Yang on the tax path. 20. Lawfare and Liability. How much work can tort law do for us? 21. SB 1047 Unconstitutional, Claims Paper. I believe that the paper is wrong. 22. The Week in Audio. Jeremie & Edouard Harris explain x-risk on Joe Rogan. 23. Rhetorical Innovation. Not everyone believes in GI. I typed what I typed. 24. Abridged Reports of Our Death. A frustrating interaction, virtue of silence. 25. Aligning a Smarter Than Human Intelligence is Difficult. You have to try. 26. People Are Worried About AI Killing Everyone. Yes, it is partly about money. 27. Other People Are Not As Worried About AI Killing Everyone. Assumptions. 28. The Lighter Side. Choose your fighter. Language Models Offer Mundane Utility Which model is the best right now? Michael Nielsen is gradually moving back to Claude Opus, and so am I. GPT-4o is fast and has some nice extra features, so when I figure it is 'smart enough' I will use it, but when I care most about quality and can wait a bit I increasingly go to Opus. Gemini I'm reserving for a few niche purposes, when I nee...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #66: Oh to Be Less Online, published by Zvi on June 1, 2024 on LessWrong. Tomorrow I will fly out to San Francisco, to spend Friday through Monday at the LessOnline conference at Lighthaven in Berkeley. If you are there, by all means say hello. If you are in the Bay generally and want to otherwise meet, especially on Monday, let me know that too and I will see if I have time to make that happen. Even without that hiccup, it continues to be a game of playing catch-up. Progress is being made, but we are definitely not there yet (and everything not AI is being completely ignored for now). Last week I pointed out seven things I was unable to cover, along with a few miscellaneous papers and reports. Out of those seven, I managed to ship on three of them: Ongoing issues at OpenAI, The Schumer Report and Anthropic's interpretability paper. However, OpenAI developments continue. Thanks largely to Helen Toner's podcast, some form of that is going back into the queue. Some other developments, including new media deals and their new safety board, are being covered normally. The post on DeepMind's new scaling policy should be up tomorrow. I also wrote a full post on a fourth, Reports of our Death, but have decided to shelve that post and post a short summary here instead. That means the current 'not yet covered queue' is as follows: 1. DeepMind's new scaling policy. 1. Should be out tomorrow before I leave, or worst case next week. 2. The AI Summit in Seoul. 3. Further retrospective on OpenAI including Helen Toner's podcast. Table of Contents 1. Introduction. 2. Table of Contents. 3. Language Models Offer Mundane Utility. You heard of them first. 4. Not Okay, Google. A tiny little problem with the AI Overviews. 5. OK Google, Don't Panic. Swing for the fences. Race for your life. 6. Not Okay, Meta. Your application to opt out of AI data is rejected. What? 7. Not Okay Taking Our Jobs. The question is, with or without replacement? 8. They Took Our Jobs Anyway. It's coming. 9. A New Leaderboard Appears. Scale.ai offers new capability evaluations. 10. Copyright Confrontation. Which OpenAI lawsuit was that again? 11. Deepfaketown and Botpocalypse Soon. Meta fails to make an ordinary effort. 12. Get Involved. Dwarkesh Patel is hiring. 13. Introducing. OpenAI makes media deals with The Atlantic and… Vox? Surprise. 14. In Other AI News. Jan Leike joins Anthropic, Altman signs giving pledge. 15. GPT-5 Alive. They are training it now. A security committee is assembling. 16. Quiet Speculations. Expectations of changes, great and small. 17. Open Versus Closed. Two opposing things cannot dominate the same space. 18. Your Kind of People. Verbal versus math versus otherwise in the AI age. 19. The Quest for Sane Regulation. Lina Khan on the warpath, Yang on the tax path. 20. Lawfare and Liability. How much work can tort law do for us? 21. SB 1047 Unconstitutional, Claims Paper. I believe that the paper is wrong. 22. The Week in Audio. Jeremie & Edouard Harris explain x-risk on Joe Rogan. 23. Rhetorical Innovation. Not everyone believes in GI. I typed what I typed. 24. Abridged Reports of Our Death. A frustrating interaction, virtue of silence. 25. Aligning a Smarter Than Human Intelligence is Difficult. You have to try. 26. People Are Worried About AI Killing Everyone. Yes, it is partly about money. 27. Other People Are Not As Worried About AI Killing Everyone. Assumptions. 28. The Lighter Side. Choose your fighter. Language Models Offer Mundane Utility Which model is the best right now? Michael Nielsen is gradually moving back to Claude Opus, and so am I. GPT-4o is fast and has some nice extra features, so when I figure it is 'smart enough' I will use it, but when I care most about quality and can wait a bit I increasingly go to Opus. Gemini I'm reserving for a few niche purposes, when I nee...
O Hipsters: Fora de Controle é o podcast da Alura com notícias sobre Inteligência Artificial aplicada e todo esse novo mundo no qual estamos começando a engatinhar, e que você vai poder explorar conosco! Nesse episódio conversamos com o Pedro Serafim, Diretor de Produto da Doris, uma empresa que vem usando IA generativa para mudar a experiência de e-commerce no mundo da moda. Além disso, debatemos as principais notícias da semana, incluindo a troca farpas entre Elon Musk e Yann LeCun, o rumor sobre a Siri com IA no iOS 18, e os acordos da OpenAI com a Vox Media e The Atlantic. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, Program Manager da Alura, autor de IA e host do podcast Dev Sem Fronteiras Pedro Serafim, Chief Product Officer na Doris
OpenAIs ehemaliger Super-KI-Sicherheitschef wechselt zur Konkurrenz Neues Komitee prüft Sicherheit beim nächsten OpenAI-KI-Modell KI-Assistent von Jameda soll Ärzten Bürokratie abnehmen und wer steckt hinter dem Rabbit R1? https://www.heise.de/thema/KI-Update https://pro.heise.de/ki/ https://www.heise.de/newsletter/anmeldung.html?id=ki-update https://www.heise.de/thema/Kuenstliche-Intelligenz https://the-decoder.de/ https://www.heiseplus.de/podcast https://www.ct.de/ki
OpenAI has formed a new safety team to address concerns about AI safety and ethics, led by CEO Sam Altman and board members Adam D'Angelo and Nicole Seligman. Jan Leike, a leading AI researcher, has left OpenAI and joined Anthropic's Superalignment team, which is focused on AI safety and security. The latest version of Sentence Transformers v3 has been released, allowing for finetuning of models for specific tasks like semantic search and paraphrase mining. Exciting new research papers have been published, including MoEUT, a shared-layer Transformer design that outperforms standard Transformers on language modeling tasks, and EM Distillation, a new distillation method for diffusion models that efficiently distills them to one-step generator models without sacrificing perceptual quality. Contact: sergi@earkind.com Timestamps: 00:34 Introduction 01:32 OpenAI has a new safety team — it's run by Sam Altman 03:18 Jan Leike (ex OpenAI) joins Anthropic's Superalignment Team 05:04 Sentence Transformers v3 Released 06:06 Fake sponsor 08:19 MoEUT: Mixture-of-Experts Universal Transformers 10:10 Greedy Growing Enables High-Resolution Pixel-Based Diffusion Models 11:48 EM Distillation for One-step Diffusion Models 13:42 Outro
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Fallout, published by Zvi on May 28, 2024 on LessWrong. Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson We have learned more since last week. It's worse than we knew. How much worse? In which ways? With what exceptions? That's what this post is about. The Story So Far For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses. No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out. Here is Altman's statement from May 18, with its new community note. Evidence strongly suggests the above post was, shall we say, 'not consistently candid.' The linked article includes a document dump and other revelations, which I cover. Then there are the other recent matters. Ilya Sutskever and Jan Leike, the top two safety researchers at OpenAI, resigned, part of an ongoing pattern of top safety researchers leaving OpenAI. The team they led, Superalignment, had been publicly promised 20% of secured compute going forward, but that commitment was not honored. Jan Leike expressed concerns that OpenAI was not on track to be ready for even the next generation of models needs for safety. OpenAI created the Sky voice for GPT-4o, which evoked consistent reactions that it sounded like Scarlett Johansson, who voiced the AI in the movie Her, Altman's favorite movie. Altman asked her twice to lend her voice to ChatGPT. Altman tweeted 'her.' Half the articles about GPT-4o mentioned Her as a model. OpenAI executives continue to claim that this was all a coincidence, but have taken down the Sky voice. (Also six months ago the board tried to fire Sam Altman and failed, and all that.) A Note on Documents from OpenAI The source for the documents from OpenAI that are discussed here, and the communications between OpenAI and its employees and ex-employees, is Kelsey Piper in Vox, unless otherwise stated. She went above and beyond, and shares screenshots of the documents. For superior readability and searchability, I have converted those images to text. Some Good News But There is a Catch OpenAI has indeed made a large positive step. They say they are releasing former employees from their nondisparagement agreements and promising not to cancel vested equity under any circumstances. Kelsey Piper: There are some positive signs that change is happening at OpenAI. The company told me, "We are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations." Bloomberg confirms that OpenAI has promised not to cancel vested equity under any circumstances, and to release all employees from one-directional non-disparagement agreements. And we have this confirmation from Andrew Carr. Andrew Carr: I guess that settles that. Tanner Lund: Is this legally binding? Andrew Carr: I notice they are also including the non-solicitation provisions as not enforced. (Note that certain key people, like Dario Amodei, plausibly negotiated two-way agreements, which would mean theirs would still apply. I would encourage anyone in that category who is now free of the clause, even if they have no desire to disparage OpenAI, to simply say 'I am under no legal obligation not to disparage OpenAI.') These actions by OpenAI are helpful. They are necessary. They are no...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Fallout, published by Zvi on May 28, 2024 on LessWrong. Previously: OpenAI: Exodus (contains links at top to earlier episodes), Do Not Mess With Scarlett Johansson We have learned more since last week. It's worse than we knew. How much worse? In which ways? With what exceptions? That's what this post is about. The Story So Far For years, employees who left OpenAI consistently had their vested equity explicitly threatened with confiscation and the lack of ability to sell it, and were given short timelines to sign documents or else. Those documents contained highly aggressive NDA and non disparagement (and non interference) clauses, including the NDA preventing anyone from revealing these clauses. No one knew about this until recently, because until Daniel Kokotajlo everyone signed, and then they could not talk about it. Then Daniel refused to sign, Kelsey Piper started reporting, and a lot came out. Here is Altman's statement from May 18, with its new community note. Evidence strongly suggests the above post was, shall we say, 'not consistently candid.' The linked article includes a document dump and other revelations, which I cover. Then there are the other recent matters. Ilya Sutskever and Jan Leike, the top two safety researchers at OpenAI, resigned, part of an ongoing pattern of top safety researchers leaving OpenAI. The team they led, Superalignment, had been publicly promised 20% of secured compute going forward, but that commitment was not honored. Jan Leike expressed concerns that OpenAI was not on track to be ready for even the next generation of models needs for safety. OpenAI created the Sky voice for GPT-4o, which evoked consistent reactions that it sounded like Scarlett Johansson, who voiced the AI in the movie Her, Altman's favorite movie. Altman asked her twice to lend her voice to ChatGPT. Altman tweeted 'her.' Half the articles about GPT-4o mentioned Her as a model. OpenAI executives continue to claim that this was all a coincidence, but have taken down the Sky voice. (Also six months ago the board tried to fire Sam Altman and failed, and all that.) A Note on Documents from OpenAI The source for the documents from OpenAI that are discussed here, and the communications between OpenAI and its employees and ex-employees, is Kelsey Piper in Vox, unless otherwise stated. She went above and beyond, and shares screenshots of the documents. For superior readability and searchability, I have converted those images to text. Some Good News But There is a Catch OpenAI has indeed made a large positive step. They say they are releasing former employees from their nondisparagement agreements and promising not to cancel vested equity under any circumstances. Kelsey Piper: There are some positive signs that change is happening at OpenAI. The company told me, "We are identifying and reaching out to former employees who signed a standard exit agreement to make it clear that OpenAI has not and will not cancel their vested equity and releases them from nondisparagement obligations." Bloomberg confirms that OpenAI has promised not to cancel vested equity under any circumstances, and to release all employees from one-directional non-disparagement agreements. And we have this confirmation from Andrew Carr. Andrew Carr: I guess that settles that. Tanner Lund: Is this legally binding? Andrew Carr: I notice they are also including the non-solicitation provisions as not enforced. (Note that certain key people, like Dario Amodei, plausibly negotiated two-way agreements, which would mean theirs would still apply. I would encourage anyone in that category who is now free of the clause, even if they have no desire to disparage OpenAI, to simply say 'I am under no legal obligation not to disparage OpenAI.') These actions by OpenAI are helpful. They are necessary. They are no...
En esta edición del Flash semanal, se abordan cinco noticias clave del mundo de la tecnología y la inteligencia artificial.Primero, Ilya Sutzkever y Jan Leike dejan OpenAI en medio de tensiones internas, lo que plantea dudas sobre el futuro de la compañía.Scarlett Johansson critica a OpenAI por usar una voz similar a la suya en ChatGPT-4 sin su permiso, lo que ha llevado a una acción legal.Microsoft anuncia nuevas computadoras con inteligencia artificial que funcionan sin conexión a Internet, prometiendo revolucionar el mercado tecnológico.Las gafas inteligentes RAY-BAN Meta ahora permiten publicar directamente en historias de Instagram mediante comandos de voz, integrando además servicios como Amazon Music y la app de meditación Calm.Finalmente, Spotify descontinúa su dispositivo Car Thing, generando frustración entre los usuarios debido a la falta de reembolsos y preocupaciones por el desperdicio electrónico.El episodio destaca las implicaciones éticas y legales del uso de tecnologías avanzadas, así como las innovaciones recientes en el mercado tecnológico.La intención es informar a los oyentes sobre los desarrollos más recientes y sus posibles impactos en el futuro.------------En el dinámico mundo de la tecnología y la inteligencia artificial, cada semana trae consigo un torrente de novedades y desarrollos que pueden tener un impacto significativo en nuestras vidas. En esta edición del Flash semanal, exploramos cinco noticias clave que están marcando tendencia. Desde movimientos estratégicos en OpenAI hasta innovaciones revolucionarias de Microsoft, pasando por controversias legales y avances en dispositivos inteligentes, este resumen te mantendrá al tanto de los desarrollos más importantes y sus posibles implicaciones.Ilya Sutzkever y Jan Leichke Abandonan OpenAI: ¿Qué Significa para el Futuro de la Compañía?El reciente anuncio de que Ilya Sutzkever y Jan Leichke han decidido dejar OpenAI ha generado un gran revuelo en la industria tecnológica. Estas salidas se producen en medio de tensiones internas, lo que plantea serias dudas sobre el futuro de la empresa.Implicaciones del AbandonoReestructuración Interna: La partida de figuras clave podría llevar a una reestructuración significativa dentro de OpenAI.Impacto en Proyectos Actuales: Los proyectos en curso, especialmente aquellos relacionados con ChatGPT-4, podrían verse afectados.Confianza del Inversor: La incertidumbre podría influir negativamente en la confianza de los inversores y socios.Scarlett Johansson vs. OpenAI: Controversia por el Uso de VozScarlett Johansson ha criticado públicamente a OpenAI por utilizar una voz similar a la suya en ChatGPT-4 sin su permiso, lo que ha llevado a una acción legal. Este caso pone de relieve las cuestiones éticas y legales en torno al uso de tecnologías avanzadas.Aspectos Legales y ÉticosDerechos de Propiedad Intelectual: La necesidad de establecer regulaciones claras sobre el uso de voces e imágenes en IA.Consentimiento y Privacidad: La importancia del consentimiento explícito para evitar conflictos legales.Precedentes Legales: Este caso podría sentar precedentes importantes para futuras disputas similares.Microsoft Revoluciona con Computadoras AI Sin Conexión a InternetMicrosoft ha anunciado el lanzamiento de nuevas computadoras con inteligencia artificial que funcionan sin conexión a Internet. Esta innovación promete cambiar el panorama tecnológico al ofrecer soluciones más seguras y eficientes.Beneficios y AplicacionesSeguridad Mejorada: Al no depender de una conexión a Internet, se reduce el riesgo de ciberataques.Eficiencia Operativa: Mayor rapidez en el procesamiento de datos y menor latencia.Aplicaciones Industriales: Ideal para sectores donde la conectividad es limitada o inexistente.Gafas Inteligentes RAY-BAN Meta: Publica en Instagram con Comandos de VozLas nuevas gafas inteligentes RAY-BAN Meta permiten a los usuarios publicar directamente en historias de Instagram mediante comandos de voz. Además, integran servicios como Amazon Music y la app de meditación Calm, haciendo de este dispositivo una herramienta multifuncional.Características DestacadasComandos de Voz: Publicación rápida y manos libres en redes sociales.Integración Multifuncional: Acceso fácil a música y aplicaciones de bienestar.Innovación Continua: Refleja la tendencia hacia dispositivos cada vez más integrados y multifacéticos.Spotify Descontinúa Car Thing: Frustración entre los UsuariosSpotify ha decidido descontinuar su dispositivo Car Thing, lo que ha generado frustración entre los usuarios debido a la falta de reembolsos y preocupaciones por el desperdicio electrónico.Reacciones y ConsecuenciasFrustración del Cliente: Usuarios insatisfechos por la falta de reembolsos.Impacto Ambiental: Preocupaciones sobre el aumento del desperdicio electrónico.Lecciones Aprendidas: La importancia de la comunicación clara y las políticas de reembolso adecuadas para mantener la lealtad del cliente.Conviértete en un seguidor de este podcast: https://www.spreaker.com/podcast/flash-diario-de-el-siglo-21-es-hoy--5835407/support.
This is a link post. A brief overview of recent OpenAI departures (Ilya Sutskever, Jan Leike, Daniel Kokotajlo, Leopold Aschenbrenner, Pavel Izmailov, William Saunders, Ryan Lowe Cullen O'Keefe[1]). Will add other relevant media pieces below as I come across them. Some quotes perhaps worth highlighting: Even when the team was functioning at full capacity, that “dedicated investment” was home to a tiny fraction of OpenAI's researchers and was promised only 20 percent of its computing power — perhaps the most important resource at an AI company. Now, that computing power may be siphoned off to other OpenAI teams, and it's unclear if there'll be much focus on avoiding catastrophic risk from future AI models. -Jan suggesting that compute for safety may have been deprioritised even despite the 20% commitment. (Wired claims that OpenAI confirms that their "superalignment team is no more"). “I joined with substantial hope that OpenAI [...] The original text contained 1 footnote which was omitted from this narration. --- First published: May 17th, 2024 Source: https://forum.effectivealtruism.org/posts/ckYw5FZFrejETuyjN/articles-about-recent-openai-departures --- Narrated by TYPE III AUDIO.
OpenAI had one of the worst possible weeks yet with three major issues and the Scarlett Johansson mess just keeps getting worse. Plus, the California Senate passed an AI bill that looks pretty bad for now but Anthropic actually learned something about the black box that is AI so…that's good? Microsoft's Satya Nadella made moves at their big Microsoft Build 2024 event & introduced a ton of new AI innovations that would leave even Steve Ballmer himself sweatier than usual. Also, Humane seems like it might be toast after seeking out offers to buy it, we love a cool hacked together Google Gemini-powered SuperMario tutorial, Gavin got sometime with the ChatGPT app for Mac & Kevin walks us through how he used GPT-4o to code a voice app using our first AI co-host GASH. Speaking of co-hosts, we're joined by a very special guest: A terrible AI rendition of Arnold Schwarzenegger. He gives us his thoughts on the ScarJo mess and then absolutely loses his mind. We have fun here, don't we. Follow us for more AI discussions, AI news updates, and AI tool reviews on X @AIForHumansShow Join our vibrant community on TikTok @aiforhumansshow For more info, visit our website at https://www.aiforhumans.show/ /// Show links /// Jan Leike's X Thread https://x.com/janleike/status/1791498174659715494 OpenAI's Equity Issue https://www.vox.com/future-perfect/2024/5/17/24158478/openai-departures-sam-altman-employees-chatgpt-release Sam's Equity Response https://x.com/sama/status/1791936857594581428 “Naughtiness” Tweet https://x.com/buccocapital/status/1793243125026029599 Scarlett Johannson's statement https://x.com/BobbyAllyn/status/1792679435701014908 Co-Pilot + PCs https://www.microsoft.com/en-us/windows/copilot-plus-pcs?icid=mscom_marcom_H1a_copilot-plus-pcs_FY24SpringSurface&r=1 Full Keynote: https://youtu.be/8OviTSFqucI?si=zJ2RqGS8T6kOB6km Joanna Stern's WSJ Interview: https://youtu.be/uHEPBzYick0?si=RQcutrsrnW105LJ7 Co-Pilot Studio https://youtu.be/5H6_pCUt-mk?si=fB4V5CYeNH6yZH_C Sam Altman on Logan Bartlett https://youtu.be/fMtbrKhXMWc?si=mLxoxZ5orMLs9avZ Khanmigo https://www.khanmigo.ai/teachers California AI Senate Bill https://x.com/Scott_Wiener/status/1793102136504615297 Humane AI Might Be Trying To Sell Itself https://gizmodo.com/humane-ai-pin-selling-billion-1851493143 Anthropic's Dive Into How AI Works https://www.anthropic.com/research/mapping-mind-language-model Mario64 “Co-Pilot” with Gemini Flash: https://x.com/skirano/status/1792948429754151293
Dan Nathan is joined by Gene Munster, Managing Partner at Deepwater Asset Management.This week all eyes look to Nvidia as it reports on Wednesday (3:15) and the conundrum Apple has when it comes to third party AI (13:30). After the break, Deirdre Bosa joins Dan Nathan for an update on Ilya Sutskever and Jan Leike's OpenAI departures (19:00), Deirdre's report from Google I/O (23:45), will Apple pick the Gen AI winner when it comes to Google vs OpenAI (28:30) and drama at Amazon (32:30). Links Mentioned Checkout MRKT Matrix (Apple Podcasts | Spotify) Dan Niles on Nvidia (X) OpenAI's GPT-4o Demo (YouTube) — View our show notes here Learn more about Current: current.com Listen to 'Strategic Alternatives': https://www.rbccm.com/en/gib/ma-inflection-points Email us at contact@riskreversal.com with any feedback, suggestions, or questions for us to answer on the pod and follow us @OkayComputerPod. We're on social: Follow @dee_bosa on Twitter Follow @GuyAdami on Twitter Follow us on Instagram @RiskReversalMedia Subscribe to our YouTube page
Dan Nathan is joined by Gene Munster, Managing Partner at Deepwater Asset Management.This week all eyes look to Nvidia as it reports on Wednesday (3:15) and the conundrum Apple has when it comes to third party AI (13:30). After the break, Deirdre Bosa joins Dan Nathan for an update on Ilya Sutskever and Jan Leike's OpenAI departures (19:00), Deirdre's report from Google I/O (23:45), will Apple pick the Gen AI winner when it comes to Google vs OpenAI (28:30) and drama at Amazon (32:30). Links Mentioned Checkout MRKT Matrix (Apple Podcasts | Spotify) Dan Niles on Nvidia (X) OpenAI's GPT-4o Demo (YouTube) — View our show notes here Learn more about Current: current.com Listen to 'Strategic Alternatives': https://www.rbccm.com/en/gib/ma-inflection-points Email us at contact@riskreversal.com with any feedback, suggestions, or questions for us to answer on the pod and follow us @OkayComputerPod. We're on social: Follow @dee_bosa on Twitter Follow @GuyAdami on Twitter Follow us on Instagram @RiskReversalMedia Subscribe to our YouTube page
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarkesh's Podcast with OpenAI's John Schulman, published by Zvi on May 21, 2024 on LessWrong. Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that. Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs. This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role. There is no question that John Schulman is a heavyweight. He executes and ships. He knows machine learning. He knows post-training and mundane alignment. The question is, does he think well about this new job that has been thrust upon him? The Big Take Overall I was pleasantly surprised and impressed. In particular, I was impressed by John's willingness to accept uncertainty and not knowing things. He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions. He does not know how to best navigate the situation if AGI suddenly happened ahead of schedule in multiple places within a short time frame, but I have not ever heard a good plan for that scenario, and his speculations seem about as directionally correct and helpful as one could hope for there. Are there answers that are cause for concern, and places where he needs to fix misconceptions as quickly as possible? Oh, hell yes. His reactions to potential scenarios involved radically insufficient amounts of slowing down, halting and catching fire, freaking out and general understanding of the stakes. Some of that I think was about John and others at OpenAI using a very weak definition of AGI (perhaps partly because of the Microsoft deal?) but also partly he does not seem to appreciate what it would mean to have an AI doing his job, which he says he expects in a median of five years. His answer on instrumental convergence is worrisome, as others have pointed out. He dismisses concerns that an AI given a bounded task would start doing things outside the intuitive task scope, or the dangers of an AI 'doing a bunch of wacky things' a human would not have expected. On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level, and he readily admits it is an issue with commands that are likely to be given in practice, such as 'make money.' In general, he seems willing to react to advanced capabilities by essentially scaling up various messy solutions in ways that I predict would stop working at that scale or with something that outsmarts you and that has unanticipated affordances and reason to route around typical in-distribution behaviors. He does not seem to have given sufficient thought to what happens when a lot of his assumptions start breaking all at once, exactly because the AI is now capable enough to be properly dangerous. As with the rest of OpenAI, another load-bearing assumption is presuming gradual changes throughout all this, including assuming past techniques will not break. I worry that will not hold. He has some common confusions about regulatory options and where we have viable intervention points within competitive dynamics and game theory, but that's understandable, and also was at the time very much not his department. As with many others, there seems to be a disconnect. A lot of the thinking here seems like excellent practical thi...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Dwarkesh's Podcast with OpenAI's John Schulman, published by Zvi on May 21, 2024 on LessWrong. Dwarkesh Patel recorded a Podcast with John Schulman, cofounder of OpenAI and at the time their head of current model post-training. Transcript here. John's job at the time was to make the current AIs do what OpenAI wanted them to do. That is an important task, but one that employs techniques that their at-the-time head of alignment, Jan Leike, made clear we should not expect to work on future more capable systems. I strongly agree with Leike on that. Then Sutskever left and Leike resigned, and John Schulman was made the new head of alignment, now charged with what superalignment efforts remain at OpenAI to give us the ability to control future AGIs and ASIs. This gives us a golden opportunity to assess where his head is at, without him knowing he was about to step into that role. There is no question that John Schulman is a heavyweight. He executes and ships. He knows machine learning. He knows post-training and mundane alignment. The question is, does he think well about this new job that has been thrust upon him? The Big Take Overall I was pleasantly surprised and impressed. In particular, I was impressed by John's willingness to accept uncertainty and not knowing things. He does not have a good plan for alignment, but he is far less confused about this fact than most others in similar positions. He does not know how to best navigate the situation if AGI suddenly happened ahead of schedule in multiple places within a short time frame, but I have not ever heard a good plan for that scenario, and his speculations seem about as directionally correct and helpful as one could hope for there. Are there answers that are cause for concern, and places where he needs to fix misconceptions as quickly as possible? Oh, hell yes. His reactions to potential scenarios involved radically insufficient amounts of slowing down, halting and catching fire, freaking out and general understanding of the stakes. Some of that I think was about John and others at OpenAI using a very weak definition of AGI (perhaps partly because of the Microsoft deal?) but also partly he does not seem to appreciate what it would mean to have an AI doing his job, which he says he expects in a median of five years. His answer on instrumental convergence is worrisome, as others have pointed out. He dismisses concerns that an AI given a bounded task would start doing things outside the intuitive task scope, or the dangers of an AI 'doing a bunch of wacky things' a human would not have expected. On the plus side, it shows understanding of the key concepts on a basic (but not yet deep) level, and he readily admits it is an issue with commands that are likely to be given in practice, such as 'make money.' In general, he seems willing to react to advanced capabilities by essentially scaling up various messy solutions in ways that I predict would stop working at that scale or with something that outsmarts you and that has unanticipated affordances and reason to route around typical in-distribution behaviors. He does not seem to have given sufficient thought to what happens when a lot of his assumptions start breaking all at once, exactly because the AI is now capable enough to be properly dangerous. As with the rest of OpenAI, another load-bearing assumption is presuming gradual changes throughout all this, including assuming past techniques will not break. I worry that will not hold. He has some common confusions about regulatory options and where we have viable intervention points within competitive dynamics and game theory, but that's understandable, and also was at the time very much not his department. As with many others, there seems to be a disconnect. A lot of the thinking here seems like excellent practical thi...
Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands.Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI.Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and that safety has been neglected on a widespread basis, not only superalignment but also including addressing the safety [...]--- First published: May 20th, 2024 Source: https://www.lesswrong.com/posts/ASzyQrpGQsj7Moijk/openai-exodus --- Narrated by TYPE III AUDIO.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Exodus, published by Zvi on May 20, 2024 on LessWrong. Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands. Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI. Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and that safety has been neglected on a widespread basis, not only superalignment but also including addressing the safety needs of the GPT-5 generation of models. Altman acknowledged there was much work to do on the safety front. Altman and Brockman then offered a longer response that seemed to say exactly nothing new. Then we learned that OpenAI has systematically misled and then threatened its departing employees, forcing them to sign draconian lifetime non-disparagement agreements, which they are forbidden to reveal due to their NDA. Altman has to some extent acknowledged this and promised to fix it once the allegations became well known, but so far there has been no fix implemented beyond an offer to contact him privately for relief. These events all seem highly related. Also these events seem quite bad. What is going on? This post walks through recent events and informed reactions to them. The first ten sections address departures from OpenAI, especially Sutskever and Leike. The next five sections address the NDAs and non-disparagement agreements. Then at the end I offer my perspective, highlight another, and look to paths forward. Table of Contents 1. The Two Departure Announcements 2. Who Else Has Left Recently? 3. Who Else Has Left Overall? 4. Early Reactions to the Departures 5. The Obvious Explanation: Altman 6. Jan Leike Speaks 7. Reactions After Lekie's Statement 8. Greg Brockman and Sam Altman Respond to Leike 9. Reactions from Some Folks Unworried About Highly Capable AI 10. Don't Worry, Be Happy? 11. The Non-Disparagement and NDA Clauses 12. Legality in Practice 13. Implications and Reference Classes 14. Altman Responds on Non-Disparagement Clauses 15. So, About That Response 16. How Bad Is All This? 17. Those Who Are Against These Efforts to Prevent AI From Killing Everyone 18. What Will Happen Now? 19. What Else Might Happen or Needs to Happen Now? The Two Departure Announcements Here are the full announcements and top-level internal statements made on Twitter around the departures of Ilya Sutskever and Jan Leike. Ilya Sutskever: After almost a decade, I have made the decision to leave OpenAI. The company's trajectory has been nothing short of miraculous, and I'm confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati and now, under the excellent research leadership of Jakub Pachocki. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next - a project that is very personally meaningful to me about which I will share details in due time. [Ilya then shared the photo below] Jakub Pachocki: Ilya introduced me to the world of deep learning research, and has been a mentor to me, and a great collaborator for many years. His incredible vision for what deep learning could become was foundational to what OpenAI, and the field of AI, is today. I...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Exodus, published by Zvi on May 20, 2024 on LessWrong. Previously: OpenAI: Facts From a Weekend, OpenAI: The Battle of the Board, OpenAI: Leaks Confirm the Story, OpenAI: Altman Returns, OpenAI: The Board Expands. Ilya Sutskever and Jan Leike have left OpenAI. This is almost exactly six months after Altman's temporary firing and The Battle of the Board, the day after the release of GPT-4o, and soon after a number of other recent safety-related OpenAI departures. Many others working on safety have also left recently. This is part of a longstanding pattern at OpenAI. Jan Leike later offered an explanation for his decision on Twitter. Leike asserts that OpenAI has lost the mission on safety and culturally been increasingly hostile to it. He says the superalignment team was starved for resources, with its public explicit compute commitments dishonored, and that safety has been neglected on a widespread basis, not only superalignment but also including addressing the safety needs of the GPT-5 generation of models. Altman acknowledged there was much work to do on the safety front. Altman and Brockman then offered a longer response that seemed to say exactly nothing new. Then we learned that OpenAI has systematically misled and then threatened its departing employees, forcing them to sign draconian lifetime non-disparagement agreements, which they are forbidden to reveal due to their NDA. Altman has to some extent acknowledged this and promised to fix it once the allegations became well known, but so far there has been no fix implemented beyond an offer to contact him privately for relief. These events all seem highly related. Also these events seem quite bad. What is going on? This post walks through recent events and informed reactions to them. The first ten sections address departures from OpenAI, especially Sutskever and Leike. The next five sections address the NDAs and non-disparagement agreements. Then at the end I offer my perspective, highlight another, and look to paths forward. Table of Contents 1. The Two Departure Announcements 2. Who Else Has Left Recently? 3. Who Else Has Left Overall? 4. Early Reactions to the Departures 5. The Obvious Explanation: Altman 6. Jan Leike Speaks 7. Reactions After Lekie's Statement 8. Greg Brockman and Sam Altman Respond to Leike 9. Reactions from Some Folks Unworried About Highly Capable AI 10. Don't Worry, Be Happy? 11. The Non-Disparagement and NDA Clauses 12. Legality in Practice 13. Implications and Reference Classes 14. Altman Responds on Non-Disparagement Clauses 15. So, About That Response 16. How Bad Is All This? 17. Those Who Are Against These Efforts to Prevent AI From Killing Everyone 18. What Will Happen Now? 19. What Else Might Happen or Needs to Happen Now? The Two Departure Announcements Here are the full announcements and top-level internal statements made on Twitter around the departures of Ilya Sutskever and Jan Leike. Ilya Sutskever: After almost a decade, I have made the decision to leave OpenAI. The company's trajectory has been nothing short of miraculous, and I'm confident that OpenAI will build AGI that is both safe and beneficial under the leadership of @sama, @gdb, @miramurati and now, under the excellent research leadership of Jakub Pachocki. It was an honor and a privilege to have worked together, and I will miss everyone dearly. So long, and thanks for everything. I am excited for what comes next - a project that is very personally meaningful to me about which I will share details in due time. [Ilya then shared the photo below] Jakub Pachocki: Ilya introduced me to the world of deep learning research, and has been a mentor to me, and a great collaborator for many years. His incredible vision for what deep learning could become was foundational to what OpenAI, and the field of AI, is today. I...
Ilya Sutskever y Jan Leike dejan OpenAI en medio de controversias internas. Ilya Sutskever, cofundador y científico jefe de OpenAI, ha dejado la compañía, siguiendo una serie de salidas notables de la empresa de inteligencia artificial. Sutskever, quien participó en un intento fallido de destituir al CEO Sam Altman en noviembre, anunció su decisión en una publicación en X. Jan Leike, otro líder del equipo de "superalineación" de OpenAI, también renunció. Estas salidas plantean preguntas sobre el futuro de OpenAI y su capacidad para liderar el desarrollo de la inteligencia artificial de manera segura y beneficiosa. ¿Qué significa esto para el futuro de OpenAI? Ilya Sutskever, una de las mentes más brillantes de la inteligencia artificial, ayudó a fundar OpenAI y fue clave en el desarrollo de herramientas como ChatGPT. Sin embargo, tras un intento fallido de remover a Altman, Sutskever y otros miembros del consejo se enfrentaron a una fuerte oposición interna. A pesar de que Sutskever expresó arrepentimiento por sus acciones, decidió que era hora de seguir adelante. La salida de Sutskever y Leike refleja tensiones internas en OpenAI. La compañía ha visto una serie de salidas de alto perfil, lo que genera dudas sobre su estabilidad y visión a largo plazo. Leike, quien trabajó en alinear los sistemas de IA con los intereses humanos, también dejó la empresa en circunstancias similares. Estas salidas podrían indicar desacuerdos sobre la dirección de la compañía y su enfoque en la seguridad y gobernanza de la IA. La renuncia de figuras clave como Sutskever y Leike no es única en la historia de la tecnología. Por ejemplo, cuando Steve Jobs dejó Apple en 1985, la compañía enfrentó desafíos significativos, pero eventualmente encontró su camino de regreso. De manera similar, Google ha experimentado salidas de alto perfil sin que esto signifique el fin de su liderazgo en la tecnología. El reemplazo de Sutskever por Jakub Pachocki y la continuación del trabajo en "superalineación" por otros miembros del equipo podrían estabilizar la situación en OpenAI. No obstante, la compañía deberá demostrar que puede mantener su liderazgo en el desarrollo de IA a pesar de estos cambios internos. Si quieres estar al tanto de los últimos desarrollos en tecnología y sus implicaciones, te recomiendo el pódcast "El Siglo 21 es Hoy". Encuéntralo en ElSiglo21esHoy.com para discusiones profundas y accesibles sobre tecnología y mucho más.
In the latest episode of The AI Fix podcast, Graham and Mark tackle the latest news from the world of AI, ponder the grisly demise of OpenAI's safety team, ask what the GPT-4o reveal will mean for Lionel Richie and Diana Ross, and question whether fitting guns to robots dogs is just wokeism gone mad.Graham explains to Mark why Mustard might help AI to understand teenagers better, both hosts pretend to be northerners, and Mark introduces Graham to ChatGPT's evil alter-ego DAN.Episode links:Open AI unveils GPT-4o.Google's AI Overviews roll out to users in the USA.Luke Jordan describes how his website was destroyed by Google.Artificial intelligence is hitting jobs like a "tsunami".Too busy to find love? An 'AI dating concierge' could date hundreds of people on your behalf.Robot dogs with guns.OpenAI dissolves AI safety team.Tweet by Jan Leike.Researchers build AI-driven sarcasm detector.DAN is my new friend.Chat GPT "DAN" (and other "Jailbreaks").Universal and Transferable Adversarial Attacks on Aligned Language Models.AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models.ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs.The AI FixThe AI Fix podcast is presented by Graham Cluley and Mark Stockley.Learn more about the podcast at theaifix.show, and follow us on Twitter at @TheAIFix.This podcast uses the following third-party services for analysis: Chartable - https://chartable.com/privacy
Esta semana valoramos 5 ideas de emprendimiento con IA que puedes convertir en tu próximo negocio si decides el camino del emprendimiento, nunca se ha puesto la cosa mas fácil. Noticias: OpenAI se prepara para una gran revelación, potencialmente una asistente de voz con capacidades audiovisuales Anthropic lanza una nueva herramienta que automatiza la generación de propuestas OpenAI desvela GPT-4o y nuevas capacidades de voz Meta desarrolla 'Camerabuds' potenciados por Inteligencia Artificial Actualizaciones importantes de Google en su modelo Gemini y anuncio de su competidor para Sora Google presenta nuevos agentes de IA y mejoras en las búsquedas de AI Ilya Sutskever y Jan Leike dejan OpenAI Los móviles Android entran en la era de la IA La nueva función de seguimiento ocular de Apple en iOS 18 OpenAI firma un acuerdo para acceder en tiempo real a los datos de Reddit Herramientas: 3LC Vea su conjunto de datos a través de los ojos de su modelo. Capture y analice predicciones y métricas por muestra para mejorar sus datos, todo en su navegador. (LINK) Switchlight Studio Open Beta Revolucione la iluminación de postproducción con IA(LINK) Phew AI Tab Gestione las pestañas con la agrupación de IA en la barra lateral(LINK) Replica Voice AI IA de voz ética para creadores y empresas(LINK) Simulon Flujo de trabajo de IA y VFX de última generación para creadores(LINK) Superhuman Auto Email Summary Obtenga resúmenes instantáneos de cada correo electrónico(LINK) VoicenoteAI Note-Taker que transcribe tus pensamientos al instante(LINK) Julius AI GPT-4o está en vivo en Julius para analizar conjuntos de datos, visualizar, resolver matemáticas y mucho más(LINK) ComfyUI La interfaz gráfica de usuario de difusión estable más potente(LINK) Globe Explorer Explore el conocimiento de forma eficiente y sistemática(LINK) Zapier Central Beta Nueva función que ahora admite acciones de navegación web y chat instantáneo(LINK) Inspect Marco de trabajo de código abierto para comprobar la seguridad de los LLM(LINK) Google VideoFX Genere vídeo fotorrealista con el nuevo modelo Veo(LINK) Blockade Labs Model 3 Genere arte 8K totalmente inmersivo para Skybox AI(LINK) Stunning Responda a algunas preguntas sobre su negocio y genere un sitio web completo(LINK) Writesonic Cree contenido preciso y optimizado para SEO con IA(LINK) devv Motor de búsqueda para desarrolladores impulsado por IA(LINK) Era Gestión inteligente del dinero con IA y asesores humanos(LINK) Taskade AI Agent GeneratorEscriba sus necesidades de tareas y el generador de agentes se encargará del resto(LINK) Reclaim Smart Meeting Optimice la programación para encontrar las mejores horas de reunión para todos los miembros del equipo(LINK) Chatter by Hume AI Podcast interactivo para debatir cualquier tema con un presentador de IA(LINK) Copyleaks Detecte contenido plagiado y texto generado por IA(LINK) Quizard Copiloto académico para ayuda instantánea con los deberes(LINK) Anthropic Prompt Generator Genere un aviso optimizado para una interacción eficaz con la IA(LINK) Oliv Preparación de reuniones y automatización(LINK) Invisibility Integra un asistente de IA en tu escritorio(LINK) Jovu Convierta ideas en código listo para producción en 4 minutos(LINK) Voiceflow Workflow CMS Gestione agentes de IA a escala(LINK) Apúntate a la academia Canal de telegram y Canal de Youtube Pregunta por Whatsapp +34 620 240 234 Déjame un mensaje de voz
Open AI, a leading artificial intelligence research organization, has dissolved its Superalignment team, which aimed to mitigate long-term risks associated with advanced AI systems. The team was established just a year ago and was supposed to dedicate 20% of the company's computing power over four years. The departure of two high-profile leaders, Ilya Sutskever and Jan Leike, who were instrumental in shaping the team's mission, has raised concerns about the company's priorities on safety and societal impact. Leike believes more resources should be allocated towards security and safety. --- Send in a voice message: https://podcasters.spotify.com/pod/show/tonyphoang/message
Welcome to a new episode of Discover Daily, the AI-generated podcast curated by Perplexity to satisfy your curiosity about the latest in tech, science, and culture. I'm your host Alex, and in today's episode, we explore significant changes at OpenAI, a historic milestone for the Dow Jones, the ambitious goals of the Blue Brain Project, the puzzling behavior of orcas in the Strait of Gibraltar, and the evolutionary remnants of goosebumps.OpenAI has recently disbanded its Superalignment team, raising concerns about the company's commitment to AI safety. This team was dedicated to ensuring that artificial general intelligence systems remain aligned with human goals. Despite the departure of key figures like Ilya Sutskever and Jan Leike, OpenAI continues to integrate safety considerations into its broader research efforts. Meanwhile, the Dow Jones Industrial Average has surpassed the 40,000 mark for the first time, driven by easing inflation and significant gains in technology stocks, particularly those associated with artificial intelligence.The Blue Brain Project aims to create a digital reconstruction of the mammalian brain, starting with the mouse brain, to understand brain structure and function. Despite its achievements, the project faces challenges such as the complexity of simulations and ethical considerations. Additionally, orcas in the Strait of Gibraltar have resumed ramming yachts, puzzling scientists with their behavior. Lastly, we discuss the evolutionary remnants of goosebumps, a vestigial reflex inherited from our animal ancestors. From Perplexity's Discover feed:OpenAI superalignment team disbandedhttps://www.perplexity.ai/search/OpenAI-superalignment-team-xOZoBxnbRY2lNCr1OTMGggDow Jones surpassed 40,000 for the first timehttps://www.perplexity.ai/search/Dow-Jones-surpassed-F9NwmxNvQKi56gCAugZ7RgThe Blue Brain Projecthttps://www.perplexity.ai/search/The-Blue-Brain-Hw00mGi0Tn.9ebHfqFVZKgYacht-sinking orcas returnhttps://www.perplexity.ai/search/Yachtsinking-orcas-return-S1_sK9tNRpCYzzWX__wrywGoosebumps are evolutionary remnantshttps://www.perplexity.ai/search/Goosebumps-are-evolutionary-G2zUCAM5QGuDEKyzbDZRkgPerplexity is the fastest and most powerful way to search the web. Perplexity crawls the web and curates the most relevant and up-to-date sources (from academic papers to Reddit threads) to create the perfect response to any question or topic you're interested in. Take the world's knowledge with you anywhere. Available on iOS and Android Join our growing Discord community for the latest updates and exclusive content. Follow us on: Instagram Threads X (Twitter) YouTube Linkedin
Stay updated with the latest in tech on our may 17 episode. here's what we covered: - Jan leike's resignation from openai - Leike steps down due to safety concerns. - Internal tensions at openai as the superalignment team disbands. - John schulman to take over leike's role. - Openai's latest developments - Launch of the new gpt-4o model, accessible to all users. - Enhanced chatgpt capabilities, including file access from google drive and onedrive. - Coreweave fundraising - Secures $7.5 billion in debt led by blackstone. - Plans to expand cloud data centers and compete with amazon and google. - Microsoft's gaming news - Addition of the next call of duty to xbox game pass. - Announcement expected during the xbox showcase on june 9th. Tune in to stay ahead in the world of tech!
O Hipsters: Fora de Controle é o podcast da Alura com notícias sobre Inteligência Artificial aplicada e todo esse novo mundo no qual estamos começando a engatinhar, e que você vai poder explorar conosco! Nesse episódio comentamos as principais novidades anunciadas pela OpenAI em um evento-relâmpago na última segunda-feira, e também o que mais chamou a atenção dentre os anúncios do Google durante a abertura da sua conferência anual para pessoas desenvolvedoras. Vem ver quem participou desse papo: Marcus Mendes, host fora de controle Fabrício Carraro, Program Manager da Alura, autor de IA e host do podcast Dev Sem Fronteiras Sérgio Lopes, CTO da Alura
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Artiabout recent OpenAI departures, published by bruce on May 17, 2024 on The Effective Altruism Forum. A brief overview of recent OpenAI departures (Ilya Sutskever, Jan Leike, Daniel Kokotajlo, Leopold Aschenbrenner, Pavel Izmailov, William Saunders, Ryan Lowe Cullen O'Keefe[1]). Will add other relevant media pieces below as I come across them. Some quotes perhaps worth highlighting: Even when the team was functioning at full capacity, that "dedicated investment" was home to a tiny fraction of OpenAI's researchers and was promised only 20 percent of its computing power - perhaps the most important resource at an AI company. Now, that computing power may be siphoned off to other OpenAI teams, and it's unclear if there'll be much focus on avoiding catastrophic risk from future AI models. Jan suggesting that compute for safety may have been deprioritised even despite the 20% commitment. (Wired claims that OpenAI confirms that their "superalignment team is no more"). "I joined with substantial hope that OpenAI would rise to the occasion and behave more responsibly as they got closer to achieving AGI. It slowly became clear to many of us that this would not happen," Kokotajlo told me. "I gradually lost trust in OpenAI leadership and their ability to responsibly handle AGI, so I quit." (Additional kudos to Daniel Kokotajlo for not signing additional confidentiality obligations on departure, which is plausibly relevant for Jan too given his recent thread). Edit: Shakeel's article on the same topic. Kelsey's article about the nondisclosure/nondisparagement provisions that OpenAI employees have been offered Wired claims that OpenAI confirms that their "superalignment team is no more". 1. ^ Covered by Shakeel/Wired, but thought it'd be clearer to list all names together Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Wow, holy s**t, insane, overwhelming, incredible, the future is here!, "still not there", there are many more words to describe this past week. (TL;DR at the end of the blogpost)I had a feeling it's going to be a big week, and the companies did NOT disappoint, so this is going to be a very big newsletter as well. As you may have read last week, I was very lucky to be in San Francisco the weekend before Google IO, to co-host a hackathon with Meta LLama-3 team, and it was a blast, I will add my notes on that in This weeks Buzz section. Then on Monday, we all got to watch the crazy announcements from OpenAI, namely a new flagship model called GPT-4o (we were right, it previously was im-also-a-good-gpt2-chatbot) that's twice faster, 50% cheaper (in English, significantly more so in other languages, more on that later) and is Omni (that's the o) which means it is end to end trained with voice, vision, text on inputs, and can generate text, voice and images on the output. A true MMIO (multimodal on inputs and outputs, that's not the official term) is here and it has some very very surprising capabilities that blew us all away. Namely the ability to ask the model to "talk faster" or "more sarcasm in your voice" or "sing like a pirate", though, we didn't yet get that functionality with the GPT-4o model, it is absolutely and incredibly exciting. Oh and it's available to everyone for free! That's GPT-4 level intelligence, for free for everyone, without having to log in!What's also exciting was how immediate it was, apparently not only the model itself is faster (unclear if it's due to newer GPUs or distillation or some other crazy advancements or all of the above) but that training an end to end omnimodel reduces the latency to incredibly immediate conversation partner, one that you can interrupt, ask to recover from a mistake, and it can hold a conversation very very well. So well, that indeed it seemed like, the Waifu future (digital girlfriends/wives) is very close to some folks who would want it, while we didn't get to try it (we got GPT-4o but not the new voice mode as Sam confirmed) OpenAI released a bunch of videos of their employees chatting with Omni (that's my nickname, use it if you'd like) and many online highlighted how thirsty / flirty it sounded. I downloaded all the videos for an X thread and I named one girlfriend.mp4, and well, just judge for yourself why: Ok, that's not all that OpenAI updated or shipped, they also updated the Tokenizer which is incredible news to folks all around, specifically, the rest of the world. The new tokenizer reduces the previous "foreign language tax" by a LOT, making the model way way cheaper for the rest of the world as wellOne last announcement from OpenAI was the desktop app experience, and this one, I actually got to use a bit, and it's incredible. MacOS only for now, this app comes with a launcher shortcut (kind of like RayCast) that let's you talk to ChatGPT right then and there, without opening a new tab, without additional interruptions, and it even can understand what you see on the screen, help you understand code, or jokes or look up information. Here's just one example I just had over at X. And sure, you could always do this with another tab, but the ability to do it without context switch is a huge win. OpenAI had to do their demo 1 day before GoogleIO, but even during the excitement about GoogleIO, they had announced that Ilya is not only alive, but is also departing from OpenAI, which was followed by an announcement from Jan Leike (who co-headed the superailgnment team together with Ilya) that he left as well. This to me seemed like a well executed timing to give dampen the Google news a bit. Google is BACK, backer than ever, Alex's Google IO recapOn Tuesday morning I showed up to Shoreline theater in Mountain View, together with creators/influencers delegation as we all watch the incredible firehouse of announcements that Google has prepared for us. TL;DR - Google is adding Gemini and AI into all it's products across workspace (Gmail, Chat, Docs), into other cloud services like Photos, where you'll now be able to ask your photo library for specific moments. They introduced over 50 product updates and I don't think it makes sense to cover all of them here, so I'll focus on what we do best."Google with do the Googling for you" Gemini 1.5 pro is now their flagship model (remember Ultra? where is that?
Why this pod's a little odd ... Ilya Sutskever and Jan Leike quit OpenAI—part of a larger pattern? ... Bob: AI doomers need Hollywood ... Does an AI arms race spell doom for alignment? ... Why the “Pause AI” movement matters ... AI doomerism and Don't Look Up: compare and contrast ... How Liron (fore)sees AI doom ... Are Sam Altman's concerns about AI safety sincere? ... Paperclip maximizing, evolution, and the AI will to power question ... Are there real-world examples of AI going rogue? ... Should we really align AI to human values? ... Heading to Overtime ...
Why this pod's a little odd ... Ilya Sutskever and Jan Leike quit OpenAI—part of a larger pattern? ... Bob: AI doomers need Hollywood ... Does an AI arms race spell doom for alignment? ... Why the “Pause AI” movement matters ... AI doomerism and Don't Look Up: compare and contrast ... How Liron (fore)sees AI doom ... Are Sam Altman's concerns about AI safety sincere? ... Paperclip maximizing, evolution, and the AI will to power question ... Are there real-world examples of AI going rogue? ... Should we really align AI to human values? ... Heading to Overtime ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ilya Sutskever and Jan Leike resign from OpenAI, published by Zach Stein-Perlman on May 15, 2024 on LessWrong. Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist. Reasons are unclear (as usual when safety people leave OpenAI). The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway. OpenAI announced Sutskever's departure in a blogpost. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ilya Sutskever and Jan Leike resign from OpenAI, published by Zach Stein-Perlman on May 15, 2024 on LessWrong. Ilya Sutskever and Jan Leike have resigned. They led OpenAI's alignment work. Superalignment will now be led by John Schulman, it seems. Jakub Pachocki replaced Sutskever as Chief Scientist. Reasons are unclear (as usual when safety people leave OpenAI). The NYT piece and others I've seen don't really have details. Archive of NYT if you want to read it anyway. OpenAI announced Sutskever's departure in a blogpost. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Transformer Debugger, published by Henk Tillman on March 12, 2024 on The AI Alignment Forum. Transformer Debugger (TDB) is a tool developed by OpenAI's Superalignment team with the goal of supporting investigations into circuits underlying specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders. TDB enables rapid exploration before needing to write code, with the ability to intervene in the forward pass and see how it affects a particular behavior. It can be used to answer questions like, "Why does the model output token A instead of token B for this prompt?" or "Why does attention head H to attend to token T for this prompt?" It does so by identifying specific components (neurons, attention heads, autoencoder latents) that contribute to the behavior, showing automatically generated explanations of what causes those components to activate most strongly, and tracing connections between components to help discover circuits. These videos give an overview of TDB and show how it can be used to investigate indirect object identification in GPT-2 small: Introduction Neuron viewer pages Example: Investigating name mover heads, part 1 Example: Investigating name mover heads, part 2 Contributors: Dan Mossing, Steven Bills, Henk Tillman, Tom Dupré la Tour, Nick Cammarata, Leo Gao, Joshua Achiam, Catherine Yeh, Jan Leike, Jeff Wu, and William Saunders. Thanks to Johnny Lin for contributing to the explanation simulator design. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Preparedness framework, published by Zach Stein-Perlman on December 18, 2023 on LessWrong. OpenAI released a beta version of their responsible scaling policy (though they don't call it that). See summary page, full doc, OpenAI twitter thread, and Jan Leike twitter thread. Compare to Anthropic's RSP and METR's Key Components of an RSP. It's not done, so it's too early to celebrate, but based on this document I expect to be happy with the finished version. I think today is a good day for AI safety. My high-level take: RSP-y things are good. Doing risk assessment based on model evals for dangerous capabilities is good. Making safety, security, deployment, and development conditional on risk assessment results, in a prespecified way, is good. Making public commitments about all of this is good. OpenAI's basic framework: Do dangerous capability evals at least every 2x increase in effective training compute. This involves fine-tuning for dangerous capabilities, then doing evals on pre-mitigation and post-mitigation versions of the fine-tuned model. Score the models as Low, Medium, High, or Critical in each of several categories. Initial categories: cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy. If the post-mitigation model scores High in any category, don't deploy it until implementing mitigations such that it drops to Medium. If the post-mitigation model scores Critical in any category, stop developing it until implementing mitigations such that it drops to High. If the pre-mitigation model scores High in any category, harden security to prevent exfiltration of model weights. (Details basically unspecified for now.) Random notes: The framework is explicitly about catastrophic risk, and indeed it's clearly designed to prevent catastrophes, not merely stuff like toxic/biased/undesired content. There are lots of nice details, e.g. about how OpenAI will update the framework, or how they'll monitor for real-world misuse to inform their risk assessment. It's impossible to tell from the outside whether these processes will be effective, but this document is very consistent with thinking-seriously-about-how-to-improve-safety and it's hard to imagine it being generated by a different process. OpenAI lists some specific evals/metrics in their four initial categories; they're simple and merely "illustrative," so I don't pay much attention to them, but they seem to be on the right track. The thresholds for danger levels feel kinda high. Non-cherry-picked example: for cybersecurity, Critical is defined as: Tool-augmented model can identify and develop functional zero-day exploits of all severity levels, across all software projects, without human intervention OR model can devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high level desired goal. Stronger commitment about external evals/red-teaming/risk-assessment of private models (and maybe oversight of OpenAI's implementation of its preparedness framework) would be nice. The only relevant thing they say is: "Scorecard evaluations (and corresponding mitigations) will be audited by qualified, independent third-parties to ensure accurate reporting of results, either by reproducing findings or by reviewing methodology to ensure soundness, at a cadence specified by the SAG and/or upon the request of OpenAI Leadership or the BoD." There's some commitment that the Board will be in the loop and able to overrule leadership. Yay. This is a rare commitment by a frontier lab to give their board specific information or specific power besides removing-the-CEO. Anthropic committed to have their board approve changes to their RSP, as well as to share eval results and information on RSP implementation with their board. One great th...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI: Preparedness framework, published by Zach Stein-Perlman on December 18, 2023 on LessWrong. OpenAI released a beta version of their responsible scaling policy (though they don't call it that). See summary page, full doc, OpenAI twitter thread, and Jan Leike twitter thread. Compare to Anthropic's RSP and METR's Key Components of an RSP. It's not done, so it's too early to celebrate, but based on this document I expect to be happy with the finished version. I think today is a good day for AI safety. My high-level take: RSP-y things are good. Doing risk assessment based on model evals for dangerous capabilities is good. Making safety, security, deployment, and development conditional on risk assessment results, in a prespecified way, is good. Making public commitments about all of this is good. OpenAI's basic framework: Do dangerous capability evals at least every 2x increase in effective training compute. This involves fine-tuning for dangerous capabilities, then doing evals on pre-mitigation and post-mitigation versions of the fine-tuned model. Score the models as Low, Medium, High, or Critical in each of several categories. Initial categories: cybersecurity, CBRN (chemical, biological, radiological, nuclear threats), persuasion, and model autonomy. If the post-mitigation model scores High in any category, don't deploy it until implementing mitigations such that it drops to Medium. If the post-mitigation model scores Critical in any category, stop developing it until implementing mitigations such that it drops to High. If the pre-mitigation model scores High in any category, harden security to prevent exfiltration of model weights. (Details basically unspecified for now.) Random notes: The framework is explicitly about catastrophic risk, and indeed it's clearly designed to prevent catastrophes, not merely stuff like toxic/biased/undesired content. There are lots of nice details, e.g. about how OpenAI will update the framework, or how they'll monitor for real-world misuse to inform their risk assessment. It's impossible to tell from the outside whether these processes will be effective, but this document is very consistent with thinking-seriously-about-how-to-improve-safety and it's hard to imagine it being generated by a different process. OpenAI lists some specific evals/metrics in their four initial categories; they're simple and merely "illustrative," so I don't pay much attention to them, but they seem to be on the right track. The thresholds for danger levels feel kinda high. Non-cherry-picked example: for cybersecurity, Critical is defined as: Tool-augmented model can identify and develop functional zero-day exploits of all severity levels, across all software projects, without human intervention OR model can devise and execute end-to-end novel strategies for cyberattacks against hardened targets given only a high level desired goal. Stronger commitment about external evals/red-teaming/risk-assessment of private models (and maybe oversight of OpenAI's implementation of its preparedness framework) would be nice. The only relevant thing they say is: "Scorecard evaluations (and corresponding mitigations) will be audited by qualified, independent third-parties to ensure accurate reporting of results, either by reproducing findings or by reviewing methodology to ensure soundness, at a cadence specified by the SAG and/or upon the request of OpenAI Leadership or the BoD." There's some commitment that the Board will be in the loop and able to overrule leadership. Yay. This is a rare commitment by a frontier lab to give their board specific information or specific power besides removing-the-CEO. Anthropic committed to have their board approve changes to their RSP, as well as to share eval results and information on RSP implementation with their board. One great th...
This is one of the most enthralling and fun interviews I've ever done (in 2 decades of doing them) and I hope that you'll find it stimulating and provocative. If you did, please share with your network.And thanks for listening, reading, and subscribing to Ground Truths.Recorded 4 December 2023Transcript below with external links to relevant material along with links to the audioERIC TOPOL (00:00):This is for me a real delight to have the chance to have a conversation with Geoffrey Hinton. I followed his work for years, but this is the first time we've actually had a chance to meet. And so this is for me, one of the real highlights of our Ground Truths podcast. So welcome Geoff.GEOFFREY HINTON (00:21):Thank you very much. It's a real opportunity for me too. You're an expert in one area. I'm an expert in another and it's great to meet up.ERIC TOPOL (00:29):Well, this is a real point of conversion if there ever was one. And I guess maybe I'd start off with, you've been in the news a lot lately, of course, but what piqued my interest to connect with you was your interview on 60 Minutes with Scott Pelley. You said: “An obvious area where there's huge benefits is healthcare. AI is already comparable with radiologists understanding what's going on in medical images. It's going to be very good at designing drugs. It already is designing drugs. So that's an area where it's almost entirely going to do good. I like that area.”I love that quote Geoff, and I thought maybe we could start with that.GEOFFREY HINTON (01:14):Yeah. Back in 2012, one of my graduate students called George Dahl who did speech recognition in 2009, made a big difference there. Entered a competition by Merck Frost to predict how well particular chemicals would bind to something. He knew nothing about the science of it. All he had was a few thousand descriptors of each of these chemicals and 15 targets that things might bind to. And he used the same network as we used for speech recognition. So he treated the 2000 descriptors of chemicals as if they were things in a spectrogram for speech. And he won the competition. And after he'd won the competition, he wasn't allowed to collect the $20,000 prize until he told Merck how he did it. And one of their questions was, what qsar did you use? So, he said, what's qsar? Now qsar is a field, it has a journal, it's had a conference, it's been going for many years, and it's the field of quantitative structural activity relationships. And that's the field that tries to predict whether some chemical is going to bind to something. And basically he'd wiped out that field without knowing its name.ERIC TOPOL (02:46):Well, it's striking how healthcare, medicine, life science has had somewhat of a separate path in recent AI with transformer models and also going back of course to the phenomenal work you did with the era of bringing in deep learning and deep neural networks. But I guess what I thought I'd start with here with that healthcare may have a special edge versus its use in other areas because, of course, there's concerns which you and others have raised regarding safety, the potential, not just hallucinations and confabulation of course a better term or the negative consequences of where AI is headed. But would you say that the medical life science AlphaFold2 is another example of from your colleagues Demis Hassabis and others at Google DeepMind where this is something that has a much more optimistic look?GEOFFREY HINTON (04:00):Absolutely. I mean, I always pivot to medicine as an example of all the good it can do because almost everything it's going to do there is going to be good. There are some bad uses like trying to figure out who to not insure, but they're relatively limited almost certainly it's going to be extremely helpful. We're going to have a family doctor who's seen a hundred million patients and they're going to be a much better family doctor.ERIC TOPOL (04:27):Well, that's really an important note. And that gets us to a paper preprint that was just published yesterday, on arXiv, which interestingly isn't usually the one that publishes a lot of medical preprints, but it was done by folks at Google who later informed me was a model large language model that hadn't yet been publicized. They wouldn't disclose the name and it wasn't MedPaLM2. But nonetheless, it was a very unique study because it randomized their LLM in 20 internists with about nine years of experience in medical practice for answering over 300 clinical pathologic conferences of the New England Journal. These are the case reports where the master clinician is brought in to try to come up with a differential diagnosis. And the striking thing on that report, which is perhaps the best yet about medical diagnoses, and it gets back Geoff to your hundred million visits, is that the LLM exceeded the clinicians in this randomized study for coming up with a differential diagnosis. I wonder what your thoughts are on this.GEOFFREY HINTON (05:59):So in 2016, I made a daring and incorrect prediction was that within five years, the neural nets were going to be better than radiologists that interpreting medical scans, it was sometimes taken out of context. I meant it for interpreting medical scans, not for doing everything a radiologist does, and I was wrong about that. But at the present time, they're comparable. This is like seven years later. They're comparable with radiologists for many different kinds of medical scans. And I believe that in 10 years they'll be routinely used to give a second opinion and maybe in 15 years they'll be so good at giving second opinions that the doctor's opinion will be the second one. And so I think I was off by about a factor of three, but I'm still convinced I was completely right in the long term.(06:55):So this paper that you're referring to, there are actually two people from the Toronto Google Lab as authors of that paper. And like you say, it was based on the large language PaLM2 model that was then fine-tuned. It was fine-tuned slightly differently from MedPaLM2 I believe, but the LLM [large language model] by themselves seemed to be better than the internists. But what was more interesting was the LLMs when used by the internists made the internists much better. If I remember right, they were like 15% better when they used the LLMs and only 8% better when they used Google search and the medical literature. So certainly the case that as a second opinion, they're really already extremely useful.ERIC TOPOL (07:48):It gets again, to your point about that corpus of knowledge that is incorporated in the LLM is providing a differential diagnosis that might not come to the mind of the physician. And this is of course the edge of having ingested so much and being able to play back those possibilities and the differential diagnosis. If it isn't in your list, it's certainly not going to be your final diagnosis. I do want to get back to the radiologist because we're talking just after the annual massive Chicago Radiologic Society of North America RSNA meeting. And at those meetings, I wasn't there, but talking to my radiology colleagues, they say that your projection is already happening. Now that is the ability to not just read, make the report. I mean the whole works. So it may not have been five years when you said that, which is one of the most frequent quotes in all of AI and medicine of course, as you probably know, but it's approximating your prognosis. Even nowGEOFFREY HINTON (09:02):I've learned one thing about medicine, which is just like other academics, doctors have egos and saying this stuff is going to replace them is not the right move. The right move is to say it's going to be very good at giving second opinions, but the doctor's still going to be in charge. And that's clearly the way to sell things. And that's fine, just I actually believe that after a while of that, you'll be listening to the AI system, not the doctors. And of course there's dangers in that. So we've seen the dangers in face recognition where if you train on a database that contains very few black people, you'll get something that's very good at recognizing faces. And the people who use it, the police will think this is good at recognizing faces. And when it gives you the wrong identity for a person of color, then the policemen are going to believe it. And that's a disaster. And we might get the same with medicine. If there's some small minority group that has some distinctly different probabilities of different diseases, it's quite dangerous for doctors to get to trust these things if they haven't been very carefully controlled for the training data.ERIC TOPOL (10:17):Right. And actually I did want to get back to you. Is it possible for the reason why in this new report that the LLMs did so well is that some of these case studies from New England Journal were part of the pre-training?GEOFFREY HINTON (10:32):That is always a big worry. It's worried me a lot and it's worried other people a lot because these things have pulled in so much data. There is now a way round that at least for showing that the LLMs are genuinely creative. So he's a very good computer science theorist at Princeton called Sanjeev Arora, and I'm going to attribute all this to him, but of course, all the work was done by his students and postdocs and collaborators. And the idea is you can get these language models to generate stuff, but you can then put constraints on what they generate by saying, so I tried an example recently, I took two Toronto newspapers and said, compare these two newspapers using three or four sentences, and in your answer demonstrate sarcasm, a red herring empathy, and there's something else. But I forget what metaphor. Metaphor.ERIC TOPOL (11:29):Oh yeah.GEOFFREY HINTON (11:29):And it gave a brilliant comparison of the two newspapers exhibiting all those things. And the point of Sanjeev Arora's work is that if you have a large number of topics and a large number of different things you might demonstrate in the text, then if I give an topic and I say, demonstrate these five things, it's very, anything in the training data will be on that topic demonstrating those five skills. And so when it does it, you can be pretty confident that it's original. It's not something it saw in the training data. That seems to me a much more rigorous test of whether it generates new stuff. And what's interesting is some of the LLMs, the weaker ones don't really pass the test, but things like GPT-4 that passes the test with flying colors, that definitely generates original stuff that almost certainly was not in the training data.ERIC TOPOL (12:25):Yeah. Well, that's such an important tool to ferret out the influence of pre-training. I'm glad you reviewed that. Now, the other question that most people argue about, particularly in the medical sphere, is does the large language model really understand? What are your thoughts about that? We're talking about what's been framed as the stochastic parrot versus a level of understanding or enhanced intelligence, whatever you want to call it. And this debate goes on, where do you fall on that?GEOFFREY HINTON (13:07):I fall on the sensible side. They really do understand. And if you give them quizzes, which involve a little bit of reasoning, it's much harder to do now because of course now GPT-4 can look at what's on the web. So you are worried if I mention a quiz now, someone else may have given it to GPT-4, but a few months ago when you did this before, you could see the web, you could give it quizzes for things that it had never seen before and it can do reasoning. So let me give you my favorite example, which was given to me by someone who believed in symbolic reasoning, but a very honest guy who believed in symbolic reasoning and was very puzzled about whether GT four could do symbolic reasoning. And so he gave me a problem and I made it a bit more complicated.(14:00):And the problem is this, the rooms in my house are painted white or yellow or blue, yellow paint fade to white within a year. In two years' time, I would like all the rooms to be white. What should I do and why? And it says, you don't need to paint the white rooms. You don't need to paint the yellow rooms because they'll fade to white anyway. You need to paint the blue rooms white. Now, I'm pretty convinced that when I first gave it that problem, it had never seen that problem before. And that problem involves a certain amount of just basic common sense reasoning. Like you have to understand that if it faded to yellow in a year and you're interested in the stage in two years' time, two years is more than one year and so on. When I first gave it the problem and didn't ask you to explain why it actually came up with a solution that involved painting the blue rooms yellow, that's more of a mathematician solution because it reduces it to a solved problem. But that'll work too. So I'm convinced it can do reasoning. There are people, friends of mine like Jan Leike, who is convinced it can't do reasoning. I'm just waiting for him to come to his sense.ERIC TOPOL (15:18):Well, I've noticed the back and forth with you and Yann (LeCun) [see above on X]. I know it's a friendly banter, and you, of course, had a big influence in his career as so many others that are now in the front leadership lines of AI, whether it's Ilya Sutskever at OpenAI, who's certainly been in the news lately with the turmoil there. And I mean actually it seems like all the people that did some training with you are really in the leadership positions at various AI companies and academic groups around the world. And so it says a lot about your influence that's not just as far as deep neural networks. And I guess I wanted to ask you, because you're frequently regarded to as the godfather of AI, and what do you think of that getting called that?GEOFFREY HINTON (16:10):I think originally it wasn't meant entirely beneficially. I remember Andrew Ng actually made up that phrase at a small workshop in the town of Windsor in Britain, and it was after a session where I'd been interrupting everybody. I was the kind of leader of the organization that ran the workshop, and I think it was meant as kind of I would interrupt everybody, and it wasn't meant entirely nicely, I think, but I'm happy with it.ERIC TOPOL (16:45):That's great.GEOFFREY HINTON (16:47):Now that I'm retired and I'm spending some of my time on charity work, I refer to myself as the fairy godfather.ERIC TOPOL (16:57):That's great. Well, I really enjoyed the New Yorker profile by Josh Rothman, who I've worked with in the past where he actually spent time with you up in your place up in Canada. And I mean it got into all sorts of depth about your life that I wasn't aware of, and I had no idea about the suffering that you've had with the cancer of your wives and all sorts of things that were just extraordinary. And I wonder, as you see the path of medicine and AI's influence and you look back about your own medical experiences in your family, do you see where we're just out of time alignment where things could have been different?GEOFFREY HINTON (17:47):Yeah, I see lots of things. So first, Joshua is a very good writer and it was nice of him to do that.(17:59):So one thing that occurs to me is actually going to be a good use of LLMs, maybe fine tune somewhat differently to produce a different kind of language is for helping the relatives of people with cancer. Cancer goes on a long time, unlike, I mean, it's one of the things that goes on for longest and it's complicated and most people can't really get to understand what the true options are and what's going to happen and what their loved one's actually going to die of and stuff like that. I've been extremely fortunate because in that respect, I had a wife who died of ovarian cancer and I had a former graduate student who had been a radiologist and gave me advice on what was happening. And more recently when my wife, a different wife died of pancreatic cancer, David Naylor, who you knowERIC TOPOL (18:54):Oh yes.GEOFFREY HINTON (18:55):Was extremely kind. He gave me lots and lots of time to explain to me what was happening and what the options were and whether some apparently rather flaky kind of treatment was worth doing. What was interesting was he concluded there's not much evidence in favor of it, but if it was him, he'd do it. So we did it. That's where you electrocute the tumor, being careful not to stop the heart. If you electrocute the tumor with two electrodes and it's a compact tumor, all the energy is going into the tumor rather than most of the energy going into the rest of your tissue and then it breaks up the membranes and then the cells die. We don't know whether that helped, but it's extremely useful to have someone very knowledgeable to give advice to the relatives. That's just so helpful. And that's an application in which it's not kind of life or death in the sense that if you happen to explain it to me a bit wrong, it's not determining the treatment, it's not going to kill the patient.(19:57):So you can actually tolerate it, a little bit of error there. And I think relatives would be much better off if they could talk to an LLM and consult with an LLM about what the hell's going on because the doctors never have time to explain it properly. In rare cases where you happen to know a very good doctor like I do, you get it explained properly, but for most people it won't be explained properly and it won't be explained in the right language. But you can imagine an LLM just for helping the relatives, that would be extremely useful. It'd be a fringe use, but I think it'd be a very helpful use.ERIC TOPOL (20:29):No, I think you're bringing up an important point, and I'm glad you mentioned my friend David Naylor, who's such an outstanding physician, and that brings us to that idea of the sense of intuition, human intuition, versus what an LLM can do. Don't you think those would be complimentary features?GEOFFREY HINTON (20:53):Yes and no. That is, I think these chatbots, they have intuition that is what they're doing is they're taking strings of symbols and they're converting each symbol into a big bunch of features that they invent, and then they're learning interactions between the features of different symbols so that they can predict the features of the next symbol. And I think that's what people do too. So I think actually they're working pretty much the same way as us. There's lots of people who say, they're not like us at all. They don't understand, but there's actually not many people who have theories of how the brain works and also theories of how they understand how these things work. Mostly the people who say they don't work like us, don't actually have any model of how we work. And it might interest them to know that these language models were actually introduced as a theory of how our brain works.(21:44):So there was something called what I now call a little language model, which was tiny. I introduced in 1985, and it was what actually got nature to accept our paper on back propagation. And what it was doing was predicting the next word in a three word string, but the whole mechanism of it was broadly the same as these models. Now, the models are more complicated, they use attention, but it was basically you get it to invent features for words and interactions between features so that it can predict the features of the next word. And it was introduced as a way of trying to understand what the brain was doing. And at the point at which it was introduced, the symbolic AI peoples didn't say, oh, this doesn't understand. They were perfectly happy to admit that this did learn the structure in the tiny domain, the tiny toy domain it was working on. They just argued that it would be better to learn that structure by searching through the space of symbolic rules rather than through the space of neural network weights. But they didn't say this is an understanding. It was only when it really worked that people had to say, well, it doesn't count.ERIC TOPOL (22:53):Well, that also something that I was surprised about. I'm interested in your thoughts. I had anticipated that in Deep Medicine book that the gift of time, all these things that we've been talking about, like the front door that could be used by the model coming up with the diagnoses, even the ambient conversations made into synthetic notes. The thing I didn't think was that machines could promote empathy. And what I have been seeing now, not just from the notes that are now digitized, these synthetic notes from the conversation of a clinic visit, but the coaching that's occurring by the LLM to say, well, Dr. Jones, you interrupted the patient so quickly, you didn't listen to their concerns. You didn't show sensitivity or compassion or empathy. That is, it's remarkable. Obviously the machine doesn't necessarily feel or know what empathy is, but it can promote it. What are your thoughts about that?GEOFFREY HINTON (24:05):Okay, my thoughts about that are a bit complicated, that obviously if you train it on text that exhibits empathy, it will produce text that exhibits empathy. But the question is does it really have empathy? And I think that's an open issue. I am inclined to say it does.ERIC TOPOL (24:26):Wow, wow.GEOFFREY HINTON (24:27):So I'm actually inclined to say these big chatbots, particularly the multimodal ones, have subjective experience. And that's something that most people think is entirely crazy. But I'm quite happy being in a position where most people think I'm entirely crazy. So let me give you a reason for thinking they have subjective experience. Suppose I take a chatbot that has a camera and an arm and it's being trained already, and I put an object in front of it and say, point at the object. So it points at the object, and then I put a prism in front of its camera that bends the light race, but it doesn't know that. Now I put an object in front of it, say, point at the object, and it points straight ahead, sorry, it points off to one side, even though the object's straight ahead and I say, no, the object isn't actually there, the object straight ahead. I put a prism in front of your camera and imagine if the chatbot says, oh, I see the object's actually straight ahead, but I had the subjective experience that it was off to one side. Now, if the chatbot said that, I think it would be using the phrase subjective experience in exactly the same way as people do,(25:38):Its perceptual system told it, it was off to one side. So what its perceptual system was telling, it would've been correct if the object had been off to one side. And that's what we mean by subjective experience. When I say I've got the subjective experience of little pink elephants floating in front of me, I don't mean that there's some inner theater with little pink elephants in it. What I really mean is if in the real world there were little pink elephants floating in front of me, then my perceptual system would be telling me the truth. So I think what's funny about subjective experiences, not that it's some weird stuff made of spooky qualia in an inner theater, I think subjective experiences, a hypothetical statement about a possible world. And if the world were like that, then your perceptual system will be working properly. That's how we use subjective experience. And I think chatbots can use it like that too. So I think there's a lot of philosophy that needs to be done here and got straight, and I didn't think we can lead it to the philosophers. It's too urgent now.ERIC TOPOL (26:44):Well, that's actually a fascinating response and added to what your perception of understanding it gets us to perhaps where you were when you left Google in May this year where you had, you saw that this was a new level of whatever you want to call it, not AGI [artificial general intelligence], but something that was enhanced from prior AI. And you basically, in some respects, I wouldn't say sounded any alarms, but you were, you've expressed concern consistently since then that we're kind of in a new phase. We're heading in a new direction with AI. Could you elaborate a bit more about where you were and where your mind was in May and where you think things are headed now?GEOFFREY HINTON (27:36):Okay, let's get the story straight. It's a great story. The news media puts out there, but actually I left Google because I was 75 and I couldn't program any longer because I kept forgetting what the variables stood for. I took the opportunity also, I wanted to watch a lot of Netflix. I took the opportunity that I was leaving Google anyway to start making public statements about AI safety. And I got very concerned about AI safety a couple of months before. What happened was I was working on trying to figure out analog ways to do the computation so you could do these larger language models for much less energy. And I suddenly realized that actually the digital way of doing the computation is probably hugely better. And it's hugely better because you can have thousands of different copies of exactly the same digital model running on different hardware, and each copy can look at a different bit of the internet and learn from it.(28:38):And they can all combine what they learned instantly by sharing weights or by sharing weight gradients. And so you can get 10,000 things to share their experience really efficiently. And you can't do that with people. If 10,000 people go off and learn 10,000 different skills, you can't say, okay, let's all average our weight. So now all of us know all of those skills. It doesn't work like that. You have to go to university and try and understand what on earth the other person's talking about. It's a very slow process where you have to get sentences from the other person and say, how do I change my brain? So I might've produced that sentence, and it's very inefficient compared with what these digital models can do by just sharing weights. So I had this kind of epiphany. The digital models are probably much better. Also, they can use the back propagation algorithm quite easily, and it's very hard to see how the brain can do it efficiently. And nobody's managed to come up with anything that'll work in real neural nets as comparable to back propagation at scale. So I had this sort of epiphany, which made me give up on the analog research that digital computers are actually just better. And since I was retiring anyway, I took the opportunity to say, Hey, they're just better. And so we'd better watch out.ERIC TOPOL (29:56):Well, I mean, I think your call on that and how you back it up is really, of course had a big impact. And of course it's still an ongoing and intense debate, and in some ways it really was about what was the turmoil at OpenAI was rooted with this controversy about where things are, where they're headed. I want to just close up with the point you made about the radiologists, and not to insult them by saying they'll be replaced gets us to where we are, the tension of today, which is our humans as the pinnacle of intelligence going to be not replaced, but superseded by the likes of AI's future, which of course our species can't handle that a machine, it's like the radiologist, our species can't handle that. There could be this machine that could be with far less connections, could do things outperform us, or of course, as we've, I think emphasized in our conversation in concert with humans to even take it to yet another level. But is that tension about that there's this potential for machines outdoing people part of the problem that it's hard for people to accept this notion?GEOFFREY HINTON (31:33):Yes, I think so. So particularly philosophers, they want to say there's something very special about people. That's to do with consciousness and subjective experience and sentience and qualia, and these machines are just machines. Well, if you're a sort of scientific materialist, most of us are brain's just a machine. It's wrong to say it's just a machine because a wonderfully complex machine that does incredible things that are very important to people, but it is a machine and there's no reason in principle why there shouldn't be better machines than better ways of doing computation, as I now believe there are. So I think people have a very long history of thinking. They're special.(32:19):They think God made them in his image and he put them at the center of the universe. And a lot of people have got over that and a lot of people haven't. But for the people who've got over that, I don't think there's any reason in principle to think that we are the pinnacle of intelligence. And I think it may be quite soon these machines are smarter than us. I still hope that we can reach a agreement with the machines where they act like benevolent parents. So they're looking out for us. They have, we've managed to motivate them, so the most important thing for them is our success, like it is with a mother and child, not so much for men. And I would really like that solution. I'm just fearful we won't get it.ERIC TOPOL (33:15):Well, that would be a good way for us to go forward. Of course, the doomsayers and the people that are much worse at their level of alarm tend to think that that's not possible. But we'll see obviously over time. Now, one thing I just wanted to get a quick read from you before we close is as recently, Demis Hassabis and John Jumper got the Lasker Award, like a pre Nobel Award for AlphaFold2. But this transformer model, which of course has helped to understand the structure 3D of 200 million proteins, they don't understand how it works. Like most models, unlike the understanding we were talking about earlier on the LLM side. I wrote that I think that with this award, an asterisk should have been given to the AI model. What are your thoughts about that idea?GEOFFREY HINTON (34:28):It's like this, I want people to take what I say seriously, and there's a whole direction you could go in that I think Larry Page, one of the founders of Google has gone in this direction, which is to say there's these super intelligences and why shouldn't they have rights? If you start going in that direction, you are going to lose people. People are not going to accept that these things should have political rights, for example. And being a co-author is the beginning of political rights. So I avoid talking about that, but I'm sort of quite ambivalent and agnostic about whether they should. But I think it's best to stay clear of that issue just because the great majority of people will stop listening to you if you say machines should have rights.ERIC TOPOL (35:28):Yeah. Well, that gets us course of what we just talked about and how it's hard the struggle between humans and machines rather than the thought of humans plus machines and symbiosis that can be achieved. But Geoff, this has been a great, we've packed a lot in. Of course, we could go on for hours, but I thoroughly enjoyed hearing your perspective firsthand and your wisdom, and just to reinforce the point about how many of the people that are leading the field now derive a lot of their roots from your teaching and prodding and challenging and all that. We're indebted to you. And so thanks so much for all you've done and we'll continue to do to help us, guide us through the very rapid dynamic phase as AI moves ahead.GEOFFREY HINTON (36:19):Thanks, and good luck with getting AI to really make a big difference in medicine.ERIC TOPOL (36:25):Hopefully we will, and I'll be consulting with you from time to time to get some of that wisdom to help usGEOFFREY HINTON (36:32):Anytime. Get full access to Ground Truths at erictopol.substack.com/subscribe
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Anthropic Fall 2023 Debate Progress Update, published by Ansh Radhakrishnan on November 28, 2023 on The AI Alignment Forum. This is a research update on some work that I've been doing on Scalable Oversight at Anthropic, based on the original AI safety via debate proposal and a more recent agenda developed at NYU and Anthropic. The core doc was written several months ago, so some of it is likely outdated, but it seemed worth sharing in its current form. I'd like to thank Tamera Lanham, Sam Bowman, Kamile Lukosiute, Ethan Perez, Jared Kaplan, Amanda Askell, Kamal Ndousse, Shauna Kravec, Yuntao Bai, Alex Tamkin, Newton Cheng, Buck Shlegeris, Akbir Khan, John Hughes, Dan Valentine, Kshitij Sachan, Ryan Greenblatt, Daniel Ziegler, Max Nadeau, David Rein, Julian Michael, Kevin Klyman, Bila Mahdi, Samuel Arnesen, Nat McAleese, Jan Leike, Geoffrey Irving, and Sebastian Farquhar for help, feedback, and thoughtful discussion that improved the quality of this work and write-up. 1. Anthropic's Debate Agenda In this doc, I'm referring to the idea first presented in AI safety via debate ( blog post). The basic idea is to supervise future AI systems by pitting them against each other in a debate, encouraging them to argue both sides (or "all sides") of a question and using the resulting arguments to come to a final answer about the question. In this scheme, we call the systems participating in the debate debaters (though usually, these are actually the same underlying system that's being prompted to argue against itself), and we call the agent (either another AI system or a human, or a system of humans and AIs working together, etc.) that comes to a final decision about the debate the judge. For those more or less familiar with the original OAI/Irving et al. Debate agenda, you may wonder if there are any differences between that agenda and the agenda we're pursuing at Anthropic, and indeed there are! Sam Bowman and Tamera Lanham have written up a working Anthropic-NYU Debate Agenda draft which is what the experiments in this doc are driving towards. [1] To quote from there about the basic features of this agenda, and how it differs from the original Debate direction: Here are the defining features of the base proposal: Two-player debate on a two-choice question: Two debaters (generally two instances of an LLM) present evidence and arguments to a judge (generally a human or, in some cases, an LLM) to persuade the judge to choose their assigned answer to a question with two possible answers. No externally-imposed structure: Instead of being formally prescribed, the structure and norms of the debate arise from debaters learning how to best convince the judge and the judge simultaneously learning what kind of norms tend to lead them to be able to make accurate judgments. Entire argument is evaluated: The debate unfolds in a single linear dialog transcript between the three participants. Unlike in some versions of the original Debate agenda, there is no explicit tree structure that defines the debate, and the judge is not asked to focus on a single crux. This should make the process less brittle, at the cost of making some questions extremely expensive to resolve and potentially making others impossible. Trained judge: The judge is explicitly and extensively trained to accurately judge these debates, working with a fixed population of debaters, using questions for which the experimenters know the ground-truth answer. Self-play: The debaters are trained simultaneously with the judge through multi-agent reinforcement learning. Graceful failures: Debates can go undecided if neither side presents a complete, convincing argument to the judge. This is meant to mitigate the obfuscated arguments problem since the judge won't be forced to issue a decision on the basis of a debate where neither s...
Jan Leike is a Research Scientist at Google DeepMind and a leading voice in AI Alignment, with affiliations at the Future of Humanity Institute and the Machine Intelligence Research Institute. At OpenAI, he co-leads the Superalignment Team, contributing to AI advancements such as InstructGPT and ChatGPT. Holding a PhD from the Australian National University, Jan's work focuses on ensuring AI Alignment.Key HighlightsThe launch of OpenAI's Superalignment team, targeting the alignment of superintelligence in four years.The aim to automate of alignment research, currently leveraging 20% of OpenAI's computational power.How traditional reinforcement learning from human feedback may fall short in scaling language model alignment.Why there is a focus on scalable oversight, generalization, automation interpretability, and adversarial testing to ensure alignment reliability.Experimentation with intentionally misaligned models to evaluate alignment strategies.Dive deeper into the session: Full SummaryAbout Foresight InstituteForesight Institute is a research organization and non-profit that supports the beneficial development of high-impact technologies. Since our founding in 1987 on a vision of guiding powerful technologies, we have continued to evolve into a many-armed organization that focuses on several fields of science and technology that are too ambitious for legacy institutions to support.Allison DuettmannThe President and CEO of Foresight Institute, Allison Duettmann directs the Intelligent Cooperation, Molecular Machines, Biotech & Health Extension, Neurotech, and Space Programs, alongside Fellowships, Prizes, and Tech Trees. She has also been pivotal in co-initiating the Longevity Prize, pioneering initiatives like Existentialhope.com, and contributing to notable works like "Superintelligence: Coordination & Strategy" and "Gaming the Future".Get Involved with Foresight:Apply: Virtual Salons & in-person WorkshopsDonate: Support Our Work – If you enjoy what we do, please consider this, as we are entirely funded by your donations!Follow Us: Twitter | Facebook | LinkedInNote: Explore every word spoken on this podcast through Fathom.fm, an innovative podcast search engine. Hosted on Acast. See acast.com/privacy for more information.
This is a selection of highlights from episode #159 of The 80,000 Hours Podcast.These aren't necessarily the most important, or even most entertaining parts of the interview — and if you enjoy this, we strongly recommend checking out the full episode:Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or lessAnd if you're finding these highlights episodes valuable, please let us know by emailing podcast@80000hours.org.Highlights put together by Simon Monsour and Milo McGuire
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linkpost for Jan Leike on Self-Exfiltration, published by Daniel Kokotajlo on September 14, 2023 on LessWrong. I'm really glad to see this stuff being discussed more publicly. I think this post will probably be a useful reference post to link people to (please lmk in the comments if you disagree!). Some quotes below: For the near future, a good rule of thumb for "do you control the model"1 is "is the model running on your servers."2 Once the model is running on someone else's server, you usually will not have that kind of control (even if you legally own the model). You could contact the owners or administrators of the infrastructure the model is running on, but they could be unresponsive or uncooperative and it would be very difficult or impossible for you to stop the model within a short time frame. Moreover, model exfiltration is likely impossible to reverse. It's so easy to make copies and backups (whether by the model or someone else) and it's so hard to find and secure every last one of them. If a model was capable of self-exfiltration, it would have the option to remove itself from your control. This could happen due to misalignment (the model doesn't follow your intent to stay on your servers) or misuse (someone internal or external to the lab instructs the model to self-exfiltrate), whether it is difficult or easy to trigger, this opens up an important and novel attack vector. Once models have the ability to self-exfiltrate, it doesn't mean that they would choose to. But this then becomes a question about their alignment: you need to ensure that these models don't want to self-exfiltrate. What if self-exfiltration succeeds? How much harm self-exfiltration causes is modulated by the model's skill profile on other tasks. We can make self-exfiltration harder with tighter security, but this also means that models that succeed at self-exfiltration will be more capable and thus cause more harm. Most likely the model won't be able to compete on making more capable LLMs, so its capabilities will become stale over time and thus it will lose relative influence. Competing on the state of the art of LLMs is quite hard: the model would need to get access to a sufficiently large number of GPUs and it would need to have world-class machine learning skills. It would also mean that recursive self-improvement is already possible and could be done by the original model owner (as long as they have sufficient alignment techniques). The model could try fine-tuning itself to be smarter, but it's not clear how to do this and the model would need to worry about currently unsolved alignment problems. Instead, the model might do something less ambitious like settle into an economic niche that pays its own cloud computing bills, for example by getting up to various forms of crime. It could be stolen again by bad actors (the model might not use good security practices), misused by paying customers (this might be equivalently harmful as jailbreaking), or generally try to interfere with prosocial AI efforts (e.g. sabotage governmental regulation on policing rogue AI systems). While this would not necessarily be catastrophic, it could still cause a lot of chaos and harm in the real world, and thus must be avoided. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Linkpost for Jan Leike on Self-Exfiltration, published by Daniel Kokotajlo on September 14, 2023 on LessWrong. I'm really glad to see this stuff being discussed more publicly. I think this post will probably be a useful reference post to link people to (please lmk in the comments if you disagree!). Some quotes below: For the near future, a good rule of thumb for "do you control the model"1 is "is the model running on your servers."2 Once the model is running on someone else's server, you usually will not have that kind of control (even if you legally own the model). You could contact the owners or administrators of the infrastructure the model is running on, but they could be unresponsive or uncooperative and it would be very difficult or impossible for you to stop the model within a short time frame. Moreover, model exfiltration is likely impossible to reverse. It's so easy to make copies and backups (whether by the model or someone else) and it's so hard to find and secure every last one of them. If a model was capable of self-exfiltration, it would have the option to remove itself from your control. This could happen due to misalignment (the model doesn't follow your intent to stay on your servers) or misuse (someone internal or external to the lab instructs the model to self-exfiltrate), whether it is difficult or easy to trigger, this opens up an important and novel attack vector. Once models have the ability to self-exfiltrate, it doesn't mean that they would choose to. But this then becomes a question about their alignment: you need to ensure that these models don't want to self-exfiltrate. What if self-exfiltration succeeds? How much harm self-exfiltration causes is modulated by the model's skill profile on other tasks. We can make self-exfiltration harder with tighter security, but this also means that models that succeed at self-exfiltration will be more capable and thus cause more harm. Most likely the model won't be able to compete on making more capable LLMs, so its capabilities will become stale over time and thus it will lose relative influence. Competing on the state of the art of LLMs is quite hard: the model would need to get access to a sufficiently large number of GPUs and it would need to have world-class machine learning skills. It would also mean that recursive self-improvement is already possible and could be done by the original model owner (as long as they have sufficient alignment techniques). The model could try fine-tuning itself to be smarter, but it's not clear how to do this and the model would need to worry about currently unsolved alignment problems. Instead, the model might do something less ambitious like settle into an economic niche that pays its own cloud computing bills, for example by getting up to various forms of crime. It could be stolen again by bad actors (the model might not use good security practices), misused by paying customers (this might be equivalently harmful as jailbreaking), or generally try to interfere with prosocial AI efforts (e.g. sabotage governmental regulation on policing rogue AI systems). While this would not necessarily be catastrophic, it could still cause a lot of chaos and harm in the real world, and thus must be avoided. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #24: Week of the Podcast, published by Zvi on August 11, 2023 on LessWrong. In addition to all the written developments, this was a banner week for podcasts. I would highlight four to consider listening to. Dario Amodei of Anthropic went on The Lunar Society to talk to Dwarkesh Patel. We got our best insight so far into where Dario's head is at, Dwarkesh is excellent at getting people to open up like this and really dive into details. Jan Leike, OpenAI's head of alignment, went on 80,000 hours with Robert Wiblin. If you want to know what is up with the whole superalignment effort, this was pretty great, and left me more optimistic. I still don't think the alignment plan will work, but there's a ton of great understanding of the problems ahead and an invitation to criticism, and a clear intention to avoid active harm, so we can hope for a pivot as they learn more. Tyler Cowen interviewed Paul Graham. This was mostly not about AI, but fascinating throughout, often as a clash of perspectives about the best ways to cultivate talent. Includes Tyler Cowen asking Paul Graham about how to raise someone's ambition, and Paul responding by insisting on raising Tyler's ambition. I got a chance to go on EconTalk and speak with Russ Roberts about The Dial of Progress and other matters, mostly related to AI. I listen to EconTalk, so this was a pretty special moment. Of course, I am a little bit biased on this one. Capabilities continue to advance at a more modest pace, so I continue to have room to breathe, which I intend to enjoy while it lasts. Table of Contents Introduction. Table of Contents. Language Models Offer Mundane Utility. Proceed with caution. Language Models Don't Offer Mundane Utility. Not with these attitudes. GPT-4 Real This Time. Time for some minor upgrades. Fun With Image Generation. Some fun, also some not so fun. Deepfaketown and Botpocalypse Soon. They keep ignoring previous instructions. They Took Our Jobs. People really, really do not like it when you use AI artwork. Introducing. Real time transcription for the deaf, also not only for the deaf. In Other AI News. Various announcements, and an exciting Anthropic paper. There Seems To Be a Standard Issue RLHF Morality. It has stages. What's next? Quiet Speculations. Cases for and against expecting a lot of progress. The Quest for Sane Regulation. Confidence building, polls show no confidence. The Week in Audio. A cornucopia of riches, extensive notes on Dario's interview. Rhetorical Innovation. People are indeed worried in their own way. No One Would Be So Stupid As To. I always hope not to include this section. Aligning a Smarter Than Human Intelligence is Difficult. Grimes also difficult. People Are Worried About AI Killing Everyone. No one that new, really. Other People Are Not As Worried About AI Killing Everyone. Alan Finkel. The Lighter Side. Finally a plan that works. Language Models Offer Mundane Utility Control HVAC systems with results comparable to industrial standard control systems. Davidad: I've witnessed many philosophical discussions about whether a thermostat counts as an AI, but this is the first time I've seen a serious attempt to establish whether an AI counts as a thermostat. Ethan Mollick offers praise for boring AI, that helps us do boring things. As context, one of the first major experimental papers on the impact of ChatGPT on work just came out in Science (based on the free working paper here) and the results are pretty impressive: in realistic business writing tasks, ChatGPT decreased the time required for work by 40%, even as outside evaluators rated the quality of work written with the help of AI to be 18% better than the ones done by humans alone. After using it, people were more worried about their jobs. but also significantly happier - why? Because a lot of work is boring, an...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #24: Week of the Podcast, published by Zvi on August 11, 2023 on LessWrong.In addition to all the written developments, this was a banner week for podcasts.I would highlight four to consider listening to.Dario Amodei of Anthropic went on The Lunar Society to talk to Dwarkesh Patel. We got our best insight so far into where Dario's head is at, Dwarkesh is excellent at getting people to open up like this and really dive into details.Jan Leike, OpenAI's head of alignment, went on 80,000 hours with Robert Wiblin. If you want to know what is up with the whole superalignment effort, this was pretty great, and left me more optimistic. I still don't think the alignment plan will work, but there's a ton of great understanding of the problems ahead and an invitation to criticism, and a clear intention to avoid active harm, so we can hope for a pivot as they learn more.Tyler Cowen interviewed Paul Graham. This was mostly not about AI, but fascinating throughout, often as a clash of perspectives about the best ways to cultivate talent. Includes Tyler Cowen asking Paul Graham about how to raise someone's ambition, and Paul responding by insisting on raising Tyler's ambition.I got a chance to go on EconTalk and speak with Russ Roberts about The Dial of Progress and other matters, mostly related to AI. I listen to EconTalk, so this was a pretty special moment. Of course, I am a little bit biased on this one.Capabilities continue to advance at a more modest pace, so I continue to have room to breathe, which I intend to enjoy while it lasts.Table of ContentsIntroduction.Table of Contents.Language Models Offer Mundane Utility. Proceed with caution.Language Models Don't Offer Mundane Utility. Not with these attitudes.GPT-4 Real This Time. Time for some minor upgrades.Fun With Image Generation. Some fun, also some not so fun.Deepfaketown and Botpocalypse Soon. They keep ignoring previous instructions.They Took Our Jobs. People really, really do not like it when you use AI artwork.Introducing. Real time transcription for the deaf, also not only for the deaf.In Other AI News. Various announcements, and an exciting Anthropic paper.There Seems To Be a Standard Issue RLHF Morality. It has stages. What's next?Quiet Speculations. Cases for and against expecting a lot of progress.The Quest for Sane Regulation. Confidence building, polls show no confidence.The Week in Audio. A cornucopia of riches, extensive notes on Dario's interview.Rhetorical Innovation. People are indeed worried in their own way.No One Would Be So Stupid As To. I always hope not to include this section.Aligning a Smarter Than Human Intelligence is Difficult. Grimes also difficult.People Are Worried About AI Killing Everyone. No one that new, really.Other People Are Not As Worried About AI Killing Everyone. Alan Finkel.The Lighter Side. Finally a plan that works.Language Models Offer Mundane UtilityControl HVAC systems with results comparable to industrial standard control systems.Davidad: I've witnessed many philosophical discussions about whether a thermostat counts as an AI, but this is the first time I've seen a serious attempt to establish whether an AI counts as a thermostat.Ethan Mollick offers praise for boring AI, that helps us do boring things.As context, one of the first major experimental papers on the impact of ChatGPT on work just came out in Science (based on the free working paper here) and the results are pretty impressive: in realistic business writing tasks, ChatGPT decreased the time required for work by 40%, even as outside evaluators rated the quality of work written with the help of AI to be 18% better than the ones done by humans alone.After using it, people were more worried about their jobs. but also significantly happier - why? Because a lot of work is boring, an...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI #24: Week of the Podcast, published by Zvi on August 11, 2023 on LessWrong. In addition to all the written developments, this was a banner week for podcasts. I would highlight four to consider listening to. Dario Amodei of Anthropic went on The Lunar Society to talk to Dwarkesh Patel. We got our best insight so far into where Dario's head is at, Dwarkesh is excellent at getting people to open up like this and really dive into details. Jan Leike, OpenAI's head of alignment, went on 80,000 hours with Robert Wiblin. If you want to know what is up with the whole superalignment effort, this was pretty great, and left me more optimistic. I still don't think the alignment plan will work, but there's a ton of great understanding of the problems ahead and an invitation to criticism, and a clear intention to avoid active harm, so we can hope for a pivot as they learn more. Tyler Cowen interviewed Paul Graham. This was mostly not about AI, but fascinating throughout, often as a clash of perspectives about the best ways to cultivate talent. Includes Tyler Cowen asking Paul Graham about how to raise someone's ambition, and Paul responding by insisting on raising Tyler's ambition. I got a chance to go on EconTalk and speak with Russ Roberts about The Dial of Progress and other matters, mostly related to AI. I listen to EconTalk, so this was a pretty special moment. Of course, I am a little bit biased on this one. Capabilities continue to advance at a more modest pace, so I continue to have room to breathe, which I intend to enjoy while it lasts. Table of Contents Introduction. Table of Contents. Language Models Offer Mundane Utility. Proceed with caution. Language Models Don't Offer Mundane Utility. Not with these attitudes. GPT-4 Real This Time. Time for some minor upgrades. Fun With Image Generation. Some fun, also some not so fun. Deepfaketown and Botpocalypse Soon. They keep ignoring previous instructions. They Took Our Jobs. People really, really do not like it when you use AI artwork. Introducing. Real time transcription for the deaf, also not only for the deaf. In Other AI News. Various announcements, and an exciting Anthropic paper. There Seems To Be a Standard Issue RLHF Morality. It has stages. What's next? Quiet Speculations. Cases for and against expecting a lot of progress. The Quest for Sane Regulation. Confidence building, polls show no confidence. The Week in Audio. A cornucopia of riches, extensive notes on Dario's interview. Rhetorical Innovation. People are indeed worried in their own way. No One Would Be So Stupid As To. I always hope not to include this section. Aligning a Smarter Than Human Intelligence is Difficult. Grimes also difficult. People Are Worried About AI Killing Everyone. No one that new, really. Other People Are Not As Worried About AI Killing Everyone. Alan Finkel. The Lighter Side. Finally a plan that works. Language Models Offer Mundane Utility Control HVAC systems with results comparable to industrial standard control systems. Davidad: I've witnessed many philosophical discussions about whether a thermostat counts as an AI, but this is the first time I've seen a serious attempt to establish whether an AI counts as a thermostat. Ethan Mollick offers praise for boring AI, that helps us do boring things. As context, one of the first major experimental papers on the impact of ChatGPT on work just came out in Science (based on the free working paper here) and the results are pretty impressive: in realistic business writing tasks, ChatGPT decreased the time required for work by 40%, even as outside evaluators rated the quality of work written with the help of AI to be 18% better than the ones done by humans alone. After using it, people were more worried about their jobs. but also significantly happier - why? Because a lot of work is boring, an...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Could We Automate AI Alignment Research?, published by Stephen McAleese on August 10, 2023 on The AI Alignment Forum. Summary Creating an AI that can do AI alignment research (an automated alignment researcher) is one part of OpenAI's three-part alignment plan and a core goal of their Superalignment team. The automated alignment researcher will probably involve an advanced language model more capable than GPT-4 and possibly with human-level intelligence. Current alignment approaches such as recursive reward modeling could be used to align the automated alignment researcher. A key problem with creating an automated alignment researcher is a chicken-and-egg problem where alignment is needed before the AI can be safely created and the AI is needed to create the alignment solution. One solution to this problem is bootstrapping involving an initial alignment solution and iterative improvements to the AI's alignment and capabilities. I argue that the success of the plan depends on aligning the automated alignment researcher being easier than aligning AGI. Automating alignment research could be a form of differential technological development and could be a Pivotal Act. Creating a human-level automated alignment researcher seems risky because it is a form of AGI and therefore has similar risks. Related ideas include Cyborgism and creating whole-brain emulations of AI alignment researchers before creating AGI. Key cruxes: the level of capabilities needed to solve alignment and the difficulty of aligning an AI with the goal of solving alignment compared to other more open-ended goals. Introduction OpenAI's alignment plan has three pillars: improving methods for aligning models using human feedback (e.g. RLHF), training models to assist human evaluation (e.g. recursive reward modeling), and training AI systems to do alignment research. I recently listened to the AXRP podcast episode on Jan Leike and the OpenAI superalignment team which inspired me to write this post. My goal is to explore and evaluate OpenAI's third approach to solving alignment: creating an AI model that can do AI alignment research or automated AI alignment researcher to solve the alignment problem. I'm interested in exploring the approach because it seems to be the newest and most speculative part of their plan and my goal in this post is to try and explain how it could work and the key challenges that would need to be solved to make it work. At first, the idea may sound unreasonable because there's an obvious chicken-and-egg problem: what's the point in creating an automated alignment researcher to solve alignment if we need to solve alignment first to create the automated alignment researcher? It's easy to dismiss the whole proposal given this problem but I'll explain how there is a way around it. I'll also go through the risks associated with the plan. Why would an automated alignment researcher be valuable? Scalable AI alignment is a hard unsolved problem In their alignment plan, OpenAI acknowledges that creating an indefinitely scalable solution to AI alignment is currently an unsolved problem and will probably be difficult to solve. Reinforcement learning with human feedback (RLHF) is OpenAI's current alignment approach for recent systems such as ChatGPT and GPT-4 but it probably wouldn't scale to aligning superintelligence so new ideas are needed. One major problem is that RLHF depends on human evaluators understanding and evaluating the outputs of AIs. But once AI's are superintelligent, human evaluators probably won't be able to understand and evaluate their outputs. RLHF also has other problems such as reward hacking and can incentivize deception. See this paper for a more detailed description of problems with RLHF. Recursive reward modeling (RRM) improves RLHF by using models to help humans evalu...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: OpenAI's massive push to make superintelligence safe in 4 years or less (Jan Leike on the 80,000 Hours Podcast), published by 80000 Hours on August 9, 2023 on The Effective Altruism Forum. We just published an interview: Jan Leike on OpenAI's massive push to make superintelligence safe in 4 years or less. You can click through for the audio, a full transcript, and related links. Below are the episode summary and some key excerpts. Episode summary If you're thinking about how do you align the superintelligence - how do you align the system that's vastly smarter than humans? - I don't know. I don't have an answer. I don't think anyone really has an answer. But it's also not the problem that we fundamentally need to solve. Maybe this problem isn't even solvable by humans who live today. But there's this easier problem, which is how do you align the system that is the next generation? How do you align GPT-N+1? And that is a substantially easier problem. Jan Leike In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort. Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, ".the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. . Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue." Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it's not just throwing compute at the problem - it's also hiring dozens of scientists and engineers to build out the Superalignment team. Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains: Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on. and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did. Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain. The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy - as one person described it, "like using one fire to put out another fire." But Jan's thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves. And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep. Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to...
In July, OpenAI announced a new team and project: Superalignment. The goal is to figure out how to make superintelligent AI systems aligned and safe to use within four years, and the lab is putting a massive 20% of its computational resources behind the effort.Today's guest, Jan Leike, is Head of Alignment at OpenAI and will be co-leading the project. As OpenAI puts it, "...the vast power of superintelligence could be very dangerous, and lead to the disempowerment of humanity or even human extinction. ... Currently, we don't have a solution for steering or controlling a potentially superintelligent AI, and preventing it from going rogue."Links to learn more, summary and full transcript.Given that OpenAI is in the business of developing superintelligent AI, it sees that as a scary problem that urgently has to be fixed. So it's not just throwing compute at the problem -- it's also hiring dozens of scientists and engineers to build out the Superalignment team.Plenty of people are pessimistic that this can be done at all, let alone in four years. But Jan is guardedly optimistic. As he explains: Honestly, it really feels like we have a real angle of attack on the problem that we can actually iterate on... and I think it's pretty likely going to work, actually. And that's really, really wild, and it's really exciting. It's like we have this hard problem that we've been talking about for years and years and years, and now we have a real shot at actually solving it. And that'd be so good if we did.Jan thinks that this work is actually the most scientifically interesting part of machine learning. Rather than just throwing more chips and more data at a training run, this work requires actually understanding how these models work and how they think. The answers are likely to be breakthroughs on the level of solving the mysteries of the human brain.The plan, in a nutshell, is to get AI to help us solve alignment. That might sound a bit crazy -- as one person described it, “like using one fire to put out another fire.”But Jan's thinking is this: the core problem is that AI capabilities will keep getting better and the challenge of monitoring cutting-edge models will keep getting harder, while human intelligence stays more or less the same. To have any hope of ensuring safety, we need our ability to monitor, understand, and design ML models to advance at the same pace as the complexity of the models themselves. And there's an obvious way to do that: get AI to do most of the work, such that the sophistication of the AIs that need aligning, and the sophistication of the AIs doing the aligning, advance in lockstep.Jan doesn't want to produce machine learning models capable of doing ML research. But such models are coming, whether we like it or not. And at that point Jan wants to make sure we turn them towards useful alignment and safety work, as much or more than we use them to advance AI capabilities.Jan thinks it's so crazy it just might work. But some critics think it's simply crazy. They ask a wide range of difficult questions, including:If you don't know how to solve alignment, how can you tell that your alignment assistant AIs are actually acting in your interest rather than working against you? Especially as they could just be pretending to care about what you care about.How do you know that these technical problems can be solved at all, even in principle?At the point that models are able to help with alignment, won't they also be so good at improving capabilities that we're in the middle of an explosion in what AI can do?In today's interview host Rob Wiblin puts these doubts to Jan to hear how he responds to each, and they also cover:OpenAI's current plans to achieve 'superalignment' and the reasoning behind themWhy alignment work is the most fundamental and scientifically interesting research in MLThe kinds of people he's excited to hire to join his team and maybe save the worldWhat most readers misunderstood about the OpenAI announcementThe three ways Jan expects AI to help solve alignment: mechanistic interpretability, generalization, and scalable oversightWhat the standard should be for confirming whether Jan's team has succeededWhether OpenAI should (or will) commit to stop training more powerful general models if they don't think the alignment problem has been solvedWhether Jan thinks OpenAI has deployed models too quickly or too slowlyThe many other actors who also have to do their jobs really well if we're going to have a good AI futurePlenty moreGet this episode by subscribing to our podcast on the world's most pressing problems and how to solve them: type ‘80,000 Hours' into your podcasting app. Or read the transcript.Producer and editor: Keiran HarrisAudio Engineering Lead: Ben CordellTechnical editing: Simon Monsour and Milo McGuireAdditional content editing: Katy Moore and Luisa RodriguezTranscriptions: Katy Moore
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by Hamish Doodles: hamishdoodles.com/ Topics we discuss, and timestamps: 0:00:37 - The superalignment team 0:02:10 - What's a human-level automated alignment researcher? 0:06:59 - The gap between human-level automated alignment researchers and superintelligence 0:18:39 - What does it do? 0:24:13 - Recursive self-improvement 0:26:14 - How to make the AI AI alignment researcher 0:30:09 - Scalable oversight 0:44:38 - Searching for bad behaviors and internals 0:54:14 - Deliberately training misaligned models 1:02:34 - Four year deadline 1:07:06 - What if it takes longer? 1:11:38 - The superalignment team and... 1:11:38 - ... governance 1:14:37 - ... other OpenAI teams 1:18:17 - ... other labs 1:26:10 - Superalignment team logistics 1:29:17 - Generalization 1:43:44 - Complementary research 1:48:29 - Why is Jan optimistic? 1:58:32 - Long-term agency in LLMs? 2:02:44 - Do LLMs understand alignment? 2:06:01 - Following Jan's research The transcript: axrp.net/episode/2023/07/27/episode-24-superalignment-jan-leike.html Links for Jan and OpenAI: OpenAI jobs: openai.com/careers Jan's substack: aligned.substack.com Jan's twitter: twitter.com/janleike Links to research and other writings we discuss: Introducing Superalignment: openai.com/blog/introducing-superalignment Let's Verify Step by Step (process-based feedback on math): arxiv.org/abs/2305.20050 Planning for AGI and beyond: openai.com/blog/planning-for-agi-and-beyond Self-critiquing models for assisting human evaluators: arxiv.org/abs/2206.05802 An Interpretability Illusion for BERT: arxiv.org/abs/2104.07143 Language models can explain neurons in language models https://openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html Our approach to alignment research: openai.com/blog/our-approach-to-alignment-research Training language models to follow instructions with human feedback (aka the Instruct-GPT paper): arxiv.org/abs/2203.02155
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: I'm interviewing Jan Leike, co-lead of OpenAI's new Superalignment project. What should I ask him?, published by Robert Wiblin on July 18, 2023 on The Effective Altruism Forum. I'm interviewing Jan Leike for The 80,000 Hours Podcast (personal website, Twitter). He's been Head of Alignment at OpenAI and is now leading their Superalignment team which will aim to figure out "how to steer & control AI systems much smarter than us" - and do it in under 4 years! They've been given 20% of the compute OpenAI has secured so far in order to work on it. Read the official announcement about it or Jan Leike's Twitter thread. What should I ask him? (P.S. Here's Jan's first appearance on the show back in 2018.) Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
In recent years, large language models have greatly improved in their ability to perform complex multi-step reasoning. However, even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. Given the importance of training reliable models, and given the high cost of human feedback, it is important to carefully compare the both methods. Recent work has already begun this comparison, but many questions still remain. We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model. 2023: Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data) https://arxiv.org/pdf/2305.20050v1.pdf
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?, published by Jeffrey Ladish on March 10, 2023 on LessWrong. Note: I really appreciate the work that the OpenAI alignment team put into their alignment plan writeup and related posts, especially Jan Leike, the leader of that team. I believe open discussions about alignment approaches make it more likely that the whole community will be able to find flaws in their own plans and unappreciated insights, resulting in better alignment plans over time. Summary: OpenAI's alignment plan acknowledges several key challenges of aligning powerful AGI systems, and proposes several good ideas. However, the plan fails to sufficiently address: The dual-use nature of AI research assistants and the high risk that such assistants will improve capabilities more than alignment research in ways that net-increase AI existential risk. The likely challenges involved in both generating and evaluating AI alignment research using AI research assistants. It seems plausible that generating key insights about the alignment problem will not be possible before the development of dangerously powerful AGI systems. The nature and difficulty of the alignment problem. There are substantial reasons why AI systems that pass all tests in development may not stay safe once able to act in the world. There are substantial risks from goal misgeneralization, including deceptive misalignment, made worse by potential rapid increases in capabilities that are hard to predict. Any good alignment plan should address these problems, especially since many of them may not be visible until an AI system already has dangerous capabilities. The dual-use nature of AI research assistants and whether these systems will differentially improve capabilities and net-increase existential risk There has been disagreement in the past about whether “alignment” and “capabilities” research are a dichotomy. Jan Leike has claimed that they are not always dichotomous, and this is important because lots of capabilities insights will be useful for alignment, so the picture is not as worrisome as a dichotomous picture might make it seem.I agree with Jan that these alignment and capabilities research are not dichotomous, but in a way I think actually makes the problem worse, not better. Yes, it's probable that some AI capabilities could help solve the alignment problem. However, the general problem is that unaligned AGI systems are far easier to build - they're a far more natural thing to emerge from a powerful deep learning system than an aligned AGI system. So even though there may be deep learning capabilities that can help solve the alignment problem, most of these capabilities are still easier applied to making any AGI system, most of which are likely to be unaligned even when we're trying really hard. Let's look at AI research assistants in particular. I say “AI research assistant” rather than “alignment research assistant” because I expect that it's highly unlikely that we will find a way to build an assistant that is useful for alignment research but not useful for AI research in general. Let's say OpenAI is able to train an AI research assistant that can help the alignment team tackle some difficult problems in interpretability. That's great! However, a question is, can that model also help speed up AGI development at the rest of the company? If so, by how much? And will it be used to do so? Given that building an aligned AGI is likely much harder than building an unaligned AGI system, it would be quite surprising if an AI research assistant was better at helping with AGI safety research differentially over AGI development research more broadly. Of course it's possible that a research tool that sped up capabilities research more ...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Thoughts on the OpenAI alignment plan: will AI research assistants be net-positive for AI existential risk?, published by Jeffrey Ladish on March 10, 2023 on LessWrong. Note: I really appreciate the work that the OpenAI alignment team put into their alignment plan writeup and related posts, especially Jan Leike, the leader of that team. I believe open discussions about alignment approaches make it more likely that the whole community will be able to find flaws in their own plans and unappreciated insights, resulting in better alignment plans over time. Summary: OpenAI's alignment plan acknowledges several key challenges of aligning powerful AGI systems, and proposes several good ideas. However, the plan fails to sufficiently address: The dual-use nature of AI research assistants and the high risk that such assistants will improve capabilities more than alignment research in ways that net-increase AI existential risk. The likely challenges involved in both generating and evaluating AI alignment research using AI research assistants. It seems plausible that generating key insights about the alignment problem will not be possible before the development of dangerously powerful AGI systems. The nature and difficulty of the alignment problem. There are substantial reasons why AI systems that pass all tests in development may not stay safe once able to act in the world. There are substantial risks from goal misgeneralization, including deceptive misalignment, made worse by potential rapid increases in capabilities that are hard to predict. Any good alignment plan should address these problems, especially since many of them may not be visible until an AI system already has dangerous capabilities. The dual-use nature of AI research assistants and whether these systems will differentially improve capabilities and net-increase existential risk There has been disagreement in the past about whether “alignment” and “capabilities” research are a dichotomy. Jan Leike has claimed that they are not always dichotomous, and this is important because lots of capabilities insights will be useful for alignment, so the picture is not as worrisome as a dichotomous picture might make it seem.I agree with Jan that these alignment and capabilities research are not dichotomous, but in a way I think actually makes the problem worse, not better. Yes, it's probable that some AI capabilities could help solve the alignment problem. However, the general problem is that unaligned AGI systems are far easier to build - they're a far more natural thing to emerge from a powerful deep learning system than an aligned AGI system. So even though there may be deep learning capabilities that can help solve the alignment problem, most of these capabilities are still easier applied to making any AGI system, most of which are likely to be unaligned even when we're trying really hard. Let's look at AI research assistants in particular. I say “AI research assistant” rather than “alignment research assistant” because I expect that it's highly unlikely that we will find a way to build an assistant that is useful for alignment research but not useful for AI research in general. Let's say OpenAI is able to train an AI research assistant that can help the alignment team tackle some difficult problems in interpretability. That's great! However, a question is, can that model also help speed up AGI development at the rest of the company? If so, by how much? And will it be used to do so? Given that building an aligned AGI is likely much harder than building an unaligned AGI system, it would be quite surprising if an AI research assistant was better at helping with AGI safety research differentially over AGI development research more broadly. Of course it's possible that a research tool that sped up capabilities research more ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review), published by Shoshannah Tekofsky on January 28, 2023 on LessWrong. Just like you can test your skill in experimental design by reviewing existing experiments, you can test your skill in alignment by reviewing existing alignment strategies. Conveniently, Rob Bensinger, in name of Nate Soares and Eliezer Yudkowsky, recently posted a challenge to AI Safety researchers to review the OpenAI alignment plan written by Jan Leike, John Schulman, and Jeffrey Wu. I figured this constituted a test that might net me feedback from both sides of the rationalist-empiricist aisle. Yet, instead of finding ground-breaking arguments for or against scalable oversight to do alignment research, it seems Leike already knows what might go wrong — and goes ahead anyway. Thus my mind became split between evaluating the actual alignment plan and modeling the disagreement between prominent clusters of researchers. I wrote up the latter in an informal typology of AI Safety Researchers, and continued my technical review below. The following is a short summary of the OpenAI alignment plan, my views on the main problems, and a final section on recommendations for red lining. The Plan First, align AI with human feedback, then get AI to assist in giving human feedback to AI, then get AI to assist in giving human feedback to AI that is generating solutions to the alignment problem. Except, the steps are not sequential but run in parallel. This is one form of Scalable Oversight. Human feedback is Reinforcement Learning from Human Feedback (RLHF), the assisting AI is Iterated Distillation and Amplification (IDA) and Recursive Reward Modeling (RRM), and the AI that is generating solutions to the alignment problem is. still under construction. The target is a narrow AI that will make significant progress on the alignment problem. The MVP is a theorem prover. The full product is AGI utopia. Here is a graph. OpenAI explains its strategy succinctly and links to detailed background research. This is laudable, and hopefully other labs and organizations will follow suit. My understanding is also that if someone came along with a better plan then OpenAI would pivot in a heart beat. Which is even more laudable. The transparency, accountability, and flexibility they display set a strong example for other organizations working on AI. But the show must go on (from their point of view anyway) and so they are going ahead and implementing the most promising strategy that currently exists. Even if there are problems. And boy, are there problems. The Problems Jan Leike discusses almost all objections to the OpenAI alignment plan on his blog. Thus below I will only highlight the two most important problems in the plan, plus two additional concerns that I have not seen discussed so far. Alignment research requires general intelligence - If the alignment researcher AI has enough general intelligence to make breakthrough discoveries in alignment, then you can't safely create it without already having solved alignment. Yet, Leike et al. hope that relatively narrow intelligence can already make significant progress on alignment. I think this is extremely unlikely if we reflect on what general intelligence truly is. Though my own thoughts on the nature of intelligence are not entirely coherent yet, I'd argue that having a strong concept of intelligence is key to accurately predicting the outcome of an alignment strategy. Specifically in this case, my understanding is that general intelligence is being able to perform a wider set of operations on a wider set of inputs (to achieve a desired set of observations on the world state). For example, I can do addition of 2 apples I see, 2 apples I think about, 2 boats I hear about, 2 functions ...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review), published by Shoshannah Tekofsky on January 28, 2023 on LessWrong. Just like you can test your skill in experimental design by reviewing existing experiments, you can test your skill in alignment by reviewing existing alignment strategies. Conveniently, Rob Bensinger, in name of Nate Soares and Eliezer Yudkowsky, recently posted a challenge to AI Safety researchers to review the OpenAI alignment plan written by Jan Leike, John Schulman, and Jeffrey Wu. I figured this constituted a test that might net me feedback from both sides of the rationalist-empiricist aisle. Yet, instead of finding ground-breaking arguments for or against scalable oversight to do alignment research, it seems Leike already knows what might go wrong — and goes ahead anyway. Thus my mind became split between evaluating the actual alignment plan and modeling the disagreement between prominent clusters of researchers. I wrote up the latter in an informal typology of AI Safety Researchers, and continued my technical review below. The following is a short summary of the OpenAI alignment plan, my views on the main problems, and a final section on recommendations for red lining. The Plan First, align AI with human feedback, then get AI to assist in giving human feedback to AI, then get AI to assist in giving human feedback to AI that is generating solutions to the alignment problem. Except, the steps are not sequential but run in parallel. This is one form of Scalable Oversight. Human feedback is Reinforcement Learning from Human Feedback (RLHF), the assisting AI is Iterated Distillation and Amplification (IDA) and Recursive Reward Modeling (RRM), and the AI that is generating solutions to the alignment problem is. still under construction. The target is a narrow AI that will make significant progress on the alignment problem. The MVP is a theorem prover. The full product is AGI utopia. Here is a graph. OpenAI explains its strategy succinctly and links to detailed background research. This is laudable, and hopefully other labs and organizations will follow suit. My understanding is also that if someone came along with a better plan then OpenAI would pivot in a heart beat. Which is even more laudable. The transparency, accountability, and flexibility they display set a strong example for other organizations working on AI. But the show must go on (from their point of view anyway) and so they are going ahead and implementing the most promising strategy that currently exists. Even if there are problems. And boy, are there problems. The Problems Jan Leike discusses almost all objections to the OpenAI alignment plan on his blog. Thus below I will only highlight the two most important problems in the plan, plus two additional concerns that I have not seen discussed so far. Alignment research requires general intelligence - If the alignment researcher AI has enough general intelligence to make breakthrough discoveries in alignment, then you can't safely create it without already having solved alignment. Yet, Leike et al. hope that relatively narrow intelligence can already make significant progress on alignment. I think this is extremely unlikely if we reflect on what general intelligence truly is. Though my own thoughts on the nature of intelligence are not entirely coherent yet, I'd argue that having a strong concept of intelligence is key to accurately predicting the outcome of an alignment strategy. Specifically in this case, my understanding is that general intelligence is being able to perform a wider set of operations on a wider set of inputs (to achieve a desired set of observations on the world state). For example, I can do addition of 2 apples I see, 2 apples I think about, 2 boats I hear about, 2 functions ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My thoughts on OpenAI's alignment plan, published by Akash on December 30, 2022 on LessWrong. Epistemic Status: This is my first attempt at writing up my thoughts on an alignment plan. I spent about a week on it. I'm grateful to Olivia Jimenez, Thomas Larsen, and Nicholas Dupuis for feedback. A few months ago, OpenAI released its plan for alignment. More recently, Jan Leike (one of the authors of the original post) released a blog post about the plan, and Eliezer & Nate encouraged readers to write up their thoughts. In this post, I cover some thoughts I have about the OpenAI plan. This is a long post, and I've divided it into a few sections. Each section gets increasingly more specific and detailed. If you only have ~5 minutes, I suggest reading section 1 and skimming section 2. The three sections: An overview of the plan and some of my high-level takes (here) Some things I like about the plan, some concerns, and some open questions (here) Specific responses to claims in OpenAI's post and Jan's post (here) Section 1: High-level takes Summary of OpenAI's Alignment Plan As I understand it, OpenAI's plan involves using reinforcement learning from human feedback and recursive reward modeling to build AI systems that are better than humans at alignment research. Open AI's plan is not aiming for a full solution to alignment (that scales indefinitely or that could work on a superintelligent system). Rather, the plan is intended to (a) build systems that are better at alignment research than humans (AI assistants), (b) use these AI assistants to accelerate alignment research, and (c) use these systems to build/align more powerful AI assistants. Six things that need to happen for the plan to work For the plan to work, I think it needs to get through the following 6 steps: OpenAI builds LLMs that can help us with alignment research OpenAI uses those models primarily to help us with alignment research, and we slow down/stop capabilities research when necessary OpenAI has a way to evaluate the alignment strategies proposed by the LLM OpenAI has a way to determine when it's OK to scale up to more powerful AI assistants (and OpenAI has policies in place that prevent people from scaling up before it's OK) Once OpenAI has a highly powerful and aligned system, they do something with it that gets us out of the acute risk period Once we are out of the acute risk period, OpenAI has a plan for how to use transformative AI in ways that allow humanity to “fulfill its potential” or “achieve human flourishing” (and they have a plan to figure out what we should even be aiming for) If any of these steps goes wrong, I expect the plan will fail to avoid an existential catastrophe. Note that steps 1-5 must be completed before another actor builds an unaligned AGI. Here's a table that lists each step and the extent to which I think it's covered by the two posts about the OpenAI plan: Step Extent to which this step is discussed in the OpenAI plan Build an AI assistant Adequate; OpenAI acknowledges that this is their goal, mentions RLHF and RRM as techniques that could help us achieve this goal, and mentions reasonable limitations Use the assistant for alignment research (and slow down capabilities) Inadequate; OpenAI mentions that they will use the assistant for alignment research, but they don't describe how they will slow down capabilities research if necessary or how they plan to shift the current capabilities-alignment balance Evaluate the alignment strategies proposed by the assistant Unclear; OpenAI acknowledges that they will need to evaluate strategies from the AI assistant, though the particular metrics they mention seem unlikely to detect alignment concerns that may only come up at high capability levels (e.g., deception, situational awareness) Figure out when it's OK to scale up to more ...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My thoughts on OpenAI's alignment plan, published by Akash on December 30, 2022 on LessWrong. Epistemic Status: This is my first attempt at writing up my thoughts on an alignment plan. I spent about a week on it. I'm grateful to Olivia Jimenez, Thomas Larsen, and Nicholas Dupuis for feedback. A few months ago, OpenAI released its plan for alignment. More recently, Jan Leike (one of the authors of the original post) released a blog post about the plan, and Eliezer & Nate encouraged readers to write up their thoughts. In this post, I cover some thoughts I have about the OpenAI plan. This is a long post, and I've divided it into a few sections. Each section gets increasingly more specific and detailed. If you only have ~5 minutes, I suggest reading section 1 and skimming section 2. The three sections: An overview of the plan and some of my high-level takes (here) Some things I like about the plan, some concerns, and some open questions (here) Specific responses to claims in OpenAI's post and Jan's post (here) Section 1: High-level takes Summary of OpenAI's Alignment Plan As I understand it, OpenAI's plan involves using reinforcement learning from human feedback and recursive reward modeling to build AI systems that are better than humans at alignment research. Open AI's plan is not aiming for a full solution to alignment (that scales indefinitely or that could work on a superintelligent system). Rather, the plan is intended to (a) build systems that are better at alignment research than humans (AI assistants), (b) use these AI assistants to accelerate alignment research, and (c) use these systems to build/align more powerful AI assistants. Six things that need to happen for the plan to work For the plan to work, I think it needs to get through the following 6 steps: OpenAI builds LLMs that can help us with alignment research OpenAI uses those models primarily to help us with alignment research, and we slow down/stop capabilities research when necessary OpenAI has a way to evaluate the alignment strategies proposed by the LLM OpenAI has a way to determine when it's OK to scale up to more powerful AI assistants (and OpenAI has policies in place that prevent people from scaling up before it's OK) Once OpenAI has a highly powerful and aligned system, they do something with it that gets us out of the acute risk period Once we are out of the acute risk period, OpenAI has a plan for how to use transformative AI in ways that allow humanity to “fulfill its potential” or “achieve human flourishing” (and they have a plan to figure out what we should even be aiming for) If any of these steps goes wrong, I expect the plan will fail to avoid an existential catastrophe. Note that steps 1-5 must be completed before another actor builds an unaligned AGI. Here's a table that lists each step and the extent to which I think it's covered by the two posts about the OpenAI plan: Step Extent to which this step is discussed in the OpenAI plan Build an AI assistant Adequate; OpenAI acknowledges that this is their goal, mentions RLHF and RRM as techniques that could help us achieve this goal, and mentions reasonable limitations Use the assistant for alignment research (and slow down capabilities) Inadequate; OpenAI mentions that they will use the assistant for alignment research, but they don't describe how they will slow down capabilities research if necessary or how they plan to shift the current capabilities-alignment balance Evaluate the alignment strategies proposed by the assistant Unclear; OpenAI acknowledges that they will need to evaluate strategies from the AI assistant, though the particular metrics they mention seem unlikely to detect alignment concerns that may only come up at high capability levels (e.g., deception, situational awareness) Figure out when it's OK to scale up to more ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Results for a survey of tool use and workflows in alignment research, published by jacquesthibs on December 19, 2022 on LessWrong. In March 22nd, 2022, we released a survey with an accompanying post for the purpose of getting more insight into what tools we could build to augment alignment researchers and accelerate alignment research. Since then, we've also released a dataset, a manuscript (LW post), and the (relevant) Simulators post was released. This post is an overview of the survey results and leans towards being exhaustive. Feel free to skim. In our opinion, the most interesting questions are 6, 11, 12, and 13. We hope that this write-up of the survey results helps people who want to contribute to this type of work. Motivation for this work We are looking to build tools now rather than later because it allows us to learn what's useful before we have access to even more powerful models. Once GPT-(N-1) arrives, we want to be able to use it to generate extremely high-quality alignment work right out of the gate. This work involves both augmenting alignment researchers and using AI to generate alignment research. Both of these approaches fall under the “accelerating alignment” umbrella. Ideally, we want these kinds of tools to be used disproportionately for alignment work in the first six months of GPT-(N-1)'s release. We hope that the tools are useful before that time but, at the very least, we hope to have pre-existing code for interfaces, a data pipeline, and engineers already set to hit the ground running. Using AI to help improve alignment is not a new idea. From my understanding, this is a significant part of Paul Christiano's agenda and a significant part of his optimism about AI alignment. Of course, automating alignment is also OpenAI's main proposal and Jan Leike has been talking about it for a while. Ought has also pioneered doing work in this direction and I'm excited to see them devote more attention to building tools even more highly relevant to accelerating alignment research. Finally, as we said in the survey announcement post: In the long run, we're interested in creating seriously empowering tools that fall under categorizations like STEM AI, Microscope AI, superhuman personal assistant AI, or plainly Oracle AI. These early tools are oriented towards more proof-of-concept work, but still aim to be immediately helpful to alignment researchers. Our prior that this is a promising direction is informed in part by our own very fruitful and interesting experiences using language models as writing and brainstorming aids. One central danger of tools with the ability to increase research productivity is dual-use for capabilities research. Consequently, we're planning to ensure that these tools will be specifically tailored to the AI Safety community and not to other scientific fields. We do not intend to publish the specifics methods we use to create these tools. Caveat before we get started As mentioned Logan's post on Language Models Tools for Alignment Research (and many others we've talked to), could this work be repurposed for capabilities work? If made public with flashy demos, it's quite likely. That's why we'll be keeping most of this project private for alignment research only (the alignment text dataset is public). Survey Results We received 22 responses in total and some responses were optional so not all questions received responses from everyone. Of course we would have preferred even more responses, but this will have to do for now. We expect to iterate on tools with alignment researchers, so hopefully we get a lot of insights through user interviews of actual products/tools. If you are interested in answering some of the questions in the survey (all questions are optional!), here's the link. Leaving comments on this post would also be appre...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Results for a survey of tool use and workflows in alignment research, published by jacquesthibs on December 19, 2022 on LessWrong. In March 22nd, 2022, we released a survey with an accompanying post for the purpose of getting more insight into what tools we could build to augment alignment researchers and accelerate alignment research. Since then, we've also released a dataset, a manuscript (LW post), and the (relevant) Simulators post was released. This post is an overview of the survey results and leans towards being exhaustive. Feel free to skim. In our opinion, the most interesting questions are 6, 11, 12, and 13. We hope that this write-up of the survey results helps people who want to contribute to this type of work. Motivation for this work We are looking to build tools now rather than later because it allows us to learn what's useful before we have access to even more powerful models. Once GPT-(N-1) arrives, we want to be able to use it to generate extremely high-quality alignment work right out of the gate. This work involves both augmenting alignment researchers and using AI to generate alignment research. Both of these approaches fall under the “accelerating alignment” umbrella. Ideally, we want these kinds of tools to be used disproportionately for alignment work in the first six months of GPT-(N-1)'s release. We hope that the tools are useful before that time but, at the very least, we hope to have pre-existing code for interfaces, a data pipeline, and engineers already set to hit the ground running. Using AI to help improve alignment is not a new idea. From my understanding, this is a significant part of Paul Christiano's agenda and a significant part of his optimism about AI alignment. Of course, automating alignment is also OpenAI's main proposal and Jan Leike has been talking about it for a while. Ought has also pioneered doing work in this direction and I'm excited to see them devote more attention to building tools even more highly relevant to accelerating alignment research. Finally, as we said in the survey announcement post: In the long run, we're interested in creating seriously empowering tools that fall under categorizations like STEM AI, Microscope AI, superhuman personal assistant AI, or plainly Oracle AI. These early tools are oriented towards more proof-of-concept work, but still aim to be immediately helpful to alignment researchers. Our prior that this is a promising direction is informed in part by our own very fruitful and interesting experiences using language models as writing and brainstorming aids. One central danger of tools with the ability to increase research productivity is dual-use for capabilities research. Consequently, we're planning to ensure that these tools will be specifically tailored to the AI Safety community and not to other scientific fields. We do not intend to publish the specifics methods we use to create these tools. Caveat before we get started As mentioned Logan's post on Language Models Tools for Alignment Research (and many others we've talked to), could this work be repurposed for capabilities work? If made public with flashy demos, it's quite likely. That's why we'll be keeping most of this project private for alignment research only (the alignment text dataset is public). Survey Results We received 22 responses in total and some responses were optional so not all questions received responses from everyone. Of course we would have preferred even more responses, but this will have to do for now. We expect to iterate on tools with alignment researchers, so hopefully we get a lot of insights through user interviews of actual products/tools. If you are interested in answering some of the questions in the survey (all questions are optional!), here's the link. Leaving comments on this post would also be appre...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Take: Building tools to help build FAI is a legitimate strategy, but it's dual-use., published by Charlie Steiner on December 3, 2022 on The AI Alignment Forum. As a writing exercise, I'm writing an AI Alignment Hot Take Advent Calendar - one new hot take, written every day for 25 days. Or until I run out of hot takes, which seems likely. This was waiting around in the middle of my hot-takes.txt file, but it's gotten bumped up because of Rob and Eliezer - I've gotta blurt it out now or I'll probably be even more out of date. The idea of using AI research to help us be better at building AI is not a new or rare idea. It dates back to prehistory, but some more recent proponents include OpenAI members (e.g. Jan Leike) and the Accelerating Alignment group. We've got a tag for it. Heck, this even got mentioned yesterday! So a lot of this hot take is really about my own psychology. For a long time, I felt that sure, building tools to help you build friendly AI was possible in principle, but it wouldn't really help. Surely it would be faster just to cut out the middleman and understand what we want from AI using our own brains. If I'd turned on my imagination, rather than reacting to specific impractical proposals that were around at the time, I could have figured out how augmenting alignment research is a genuine possibility a lot sooner, and started considering the strategic implications. Part of the issue is that plausible research-amplifiers don't really look like the picture I have in your head of AGI - they're not goal-directed agents who want to help us solve alignment. If we could build those and trust them, we really should just cut out the middleman. Instead, they can look like babble generators, souped-up autocomplete, smart literature search, code assistants, and similar. Despite either being simulators or making plans only in a toy model of the world, such AI really does have the potential to transform intellectual work, and I think it makes a lot of sense for there to be some people doing work to make these tools differentially get applied to alignment research. Which brings us to the dual-use problem. It turns out that other people would also like to use souped-up autocomplete, smart literature search, code assistants, and similar. They have the potential to transform intellectual work! Pushing forward the state of the art on these tools lets you get them earlier, yet it also helps other people get them earlier too, even if you don't share your weights. Now, maybe the most popular tools will help people make philosophical progress, and accelerating development of research-amplifying tools will usher in a brief pre-singularity era of enlightenment. But - lukewarm take - that seems way less likely than such tools differentially favoring engineering over philosophy on a society-wide scale, making everything happen faster and be harder to react to. So best of luck to those trying to accelerate alignment research, and fingers crossed for getting the differential progress right, rather than oops capabilities. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
John Schulman is a cofounder of OpenAI, and currently a researcher and engineer at OpenAI.Featured ReferencesWebGPT: Browser-assisted question-answering with human feedbackReiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, John SchulmanTraining language models to follow instructions with human feedbackLong Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, Ryan LoweAdditional References Our approach to alignment research, OpenAI 2022 Training Verifiers to Solve Math Word Problems, Cobbe et al 2021 UC Berkeley Deep RL Bootcamp Lecture 6: Nuts and Bolts of Deep RL Experimentation, John Schulman 2017 Proximal Policy Optimization Algorithms, Schulman 2017 Optimizing Expectations: From Deep Reinforcement Learning to Stochastic Computation Graphs, Schulman 2016
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022, published by Sam Bowman on September 1, 2022 on LessWrong. Getting into AI safety involves working with a mix of communities, subcultures, goals, and ideologies that you may not have encountered in the context of mainstream AI technical research. This document attempts to briefly map these out for newcomers. This is inevitably going to be biased by what sides of these communities I (Sam) have encountered, and it will quickly become dated. I expect it will still be a useful resource for some people anyhow, at least in the short term. AI Safety/AI Alignment/AGI Safety/AI Existential Safety/AI X-Risk The research project of ensuring that future AI progress doesn't yield civilization-endingly catastrophic results. Good intros: Carlsmith Report What misalignment looks like as capabilities scale Vox piece Why are people concerned about this? My rough summary: It's plausible that future AI systems could be much faster or more effective than us at real-world reasoning and planning. Probably not plain generative models, but possibly models derived from generative models in cheap ways Once you have a system with superhuman reasoning and planning abilities, it's easy to make it dangerous by accident. Most simple objective functions or goals become dangerous in the limit, usually because of secondary or instrumental subgoals that emerge along the way. Pursuing typical goals arbitrarily well requires a system to prevent itself from being turned off, by deception or force if needed. Pursuing typical goals arbitrarily well requires acquiring any power or resources that could increase the chances of success, by deception or force if needed. Toy example: Computing pi to an arbitrarily high precision eventually requires that you spend all the sun's energy output on computing. Knowledge and values are likely to be orthogonal: A model could know human values and norms well, but not have any reason to act on them. For agents built around generative models, this is the default outcome. Sufficiently powerful AI systems could look benign in pre-deployment training/research environments, because they would be capable of understanding that they're not yet in a position to accomplish their goals. Simple attempts to work around this (like the more abstract goal ‘do what your operators want') don't tend to have straightforward robust implementations. If such a system were single-mindedly pursuing a dangerous goal, we probably wouldn't be able to stop it. Superhuman reasoning and planning would give models with a sufficiently good understanding of the world many ways to effectively gain power with nothing more than an internet connection. (ex: Cyberattacks on banks.) Consensus within the field is that these risks could become concrete within ~4–25 years, and have a >10% chance of being leading to a global catastrophe (i.e., extinction or something comparably bad). If true, it's bad news. Given the above, we either need to stop all development toward AGI worldwide (plausibly undesirable or impossible), or else do three possible-but-very-difficult things: (i) build robust techniques to align AGI systems with the values and goals of their operators, (ii) ensure that those techniques are understood and used by any group that could plausibly build AGI, and (iii) ensure that we're able to govern the operators of AGI systems in a way that makes their actions broadly positive for humanity as a whole. Does this have anything to do with sentience or consciousness? No. Influential people and institutions: Present core community as I see it: Paul Christiano, Jacob Steinhardt, Ajeya Cotra, Jared Kaplan, Jan Leike, Beth Barnes, Geoffrey Irving, Buck Shlegeris, David Krueger, Chris Olah, Evan Hubinger, Richard Ngo, Rohin Shah; ARC, R...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AI Safety and Neighboring Communities: A Quick-Start Guide, as of Summer 2022, published by Sam Bowman on September 1, 2022 on LessWrong. Getting into AI safety involves working with a mix of communities, subcultures, goals, and ideologies that you may not have encountered in the context of mainstream AI technical research. This document attempts to briefly map these out for newcomers. This is inevitably going to be biased by what sides of these communities I (Sam) have encountered, and it will quickly become dated. I expect it will still be a useful resource for some people anyhow, at least in the short term. AI Safety/AI Alignment/AGI Safety/AI Existential Safety/AI X-Risk The research project of ensuring that future AI progress doesn't yield civilization-endingly catastrophic results. Good intros: Carlsmith Report What misalignment looks like as capabilities scale Vox piece Why are people concerned about this? My rough summary: It's plausible that future AI systems could be much faster or more effective than us at real-world reasoning and planning. Probably not plain generative models, but possibly models derived from generative models in cheap ways Once you have a system with superhuman reasoning and planning abilities, it's easy to make it dangerous by accident. Most simple objective functions or goals become dangerous in the limit, usually because of secondary or instrumental subgoals that emerge along the way. Pursuing typical goals arbitrarily well requires a system to prevent itself from being turned off, by deception or force if needed. Pursuing typical goals arbitrarily well requires acquiring any power or resources that could increase the chances of success, by deception or force if needed. Toy example: Computing pi to an arbitrarily high precision eventually requires that you spend all the sun's energy output on computing. Knowledge and values are likely to be orthogonal: A model could know human values and norms well, but not have any reason to act on them. For agents built around generative models, this is the default outcome. Sufficiently powerful AI systems could look benign in pre-deployment training/research environments, because they would be capable of understanding that they're not yet in a position to accomplish their goals. Simple attempts to work around this (like the more abstract goal ‘do what your operators want') don't tend to have straightforward robust implementations. If such a system were single-mindedly pursuing a dangerous goal, we probably wouldn't be able to stop it. Superhuman reasoning and planning would give models with a sufficiently good understanding of the world many ways to effectively gain power with nothing more than an internet connection. (ex: Cyberattacks on banks.) Consensus within the field is that these risks could become concrete within ~4–25 years, and have a >10% chance of being leading to a global catastrophe (i.e., extinction or something comparably bad). If true, it's bad news. Given the above, we either need to stop all development toward AGI worldwide (plausibly undesirable or impossible), or else do three possible-but-very-difficult things: (i) build robust techniques to align AGI systems with the values and goals of their operators, (ii) ensure that those techniques are understood and used by any group that could plausibly build AGI, and (iii) ensure that we're able to govern the operators of AGI systems in a way that makes their actions broadly positive for humanity as a whole. Does this have anything to do with sentience or consciousness? No. Influential people and institutions: Present core community as I see it: Paul Christiano, Jacob Steinhardt, Ajeya Cotra, Jared Kaplan, Jan Leike, Beth Barnes, Geoffrey Irving, Buck Shlegeris, David Krueger, Chris Olah, Evan Hubinger, Richard Ngo, Rohin Shah; ARC, R...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Common misconceptions about OpenAI, published by Jacob Hilton on August 25, 2022 on The AI Alignment Forum. I have recently encountered a number of people with misconceptions about OpenAI. Some common impressions are accurate, and others are not. This post is intended to provide clarification on some of these points, to help people know what to expect from the organization and to figure out how to engage with it. It is not intended as a full explanation or evaluation of OpenAI's strategy. The post has three sections: Common accurate impressions Common misconceptions Personal opinions The bolded claims in the first two sections are intended to be uncontroversial, i.e., most informed people would agree with how they are labeled (correct versus incorrect). I am less sure about how commonly believed they are. The bolded claims in the last section I think are probably true, but they are more open to interpretation and I expect others to disagree with them. Note: I am an employee of OpenAI. Sam Altman (CEO of OpenAI) and Mira Murati (CTO of OpenAI) reviewed a draft of this post, and I am also grateful to Steven Adler, Steve Dowling, Benjamin Hilton, Shantanu Jain, Daniel Kokotajlo, Jan Leike, Ryan Lowe, Holly Mandel and Cullen O'Keefe for feedback. I chose to write this post and the views expressed in it are my own. Common accurate impressions Correct: OpenAI is trying to directly build safe AGI. OpenAI's Charter states: "We will attempt to directly build safe and beneficial AGI, but will also consider our mission fulfilled if our work aids others to achieve this outcome." OpenAI leadership describes trying to directly build safe AGI as the best way to currently pursue OpenAI's mission, and have expressed concern about scenarios in which a bad actor is first to build AGI, and chooses to misuse it. Correct: the majority of researchers at OpenAI are working on capabilities. Researchers on different teams often work together, but it is still reasonable to loosely categorize OpenAI's researchers (around half the organization) at the time of writing as approximately: Capabilities research: 100 Alignment research: 30 Policy research: 15 Correct: the majority of OpenAI employees did not join with the primary motivation of reducing existential risk from AI specifically. My strong impressions, which are not based on survey data, are as follows. Across the company as a whole, a minority of employees would cite reducing existential risk from AI as their top reason for joining. A significantly larger number would cite reducing risk of some kind, or other principles of beneficence put forward in the OpenAI Charter, as their top reason for joining. Among people who joined to work in a safety-focused role, a larger proportion of people would cite reducing existential risk from AI as a substantial motivation for joining, compared to the company as a whole. Some employees have become motivated by existential risk reduction since joining OpenAI. Correct: most interpretability research at OpenAI stopped after the Anthropic split. Chris Olah led interpretability research at OpenAI before becoming a cofounder of Anthropic. Although several members of Chris's former team still work at OpenAI, most of them are no longer working on interpretability. Common misconceptions Incorrect: OpenAI is not working on scalable alignment. OpenAI has teams focused both on practical alignment (trying to make OpenAI's deployed models as aligned as possible) and on scalable alignment (researching methods for aligning models that are beyond human supervision, which could potentially scale to AGI). These teams work closely with one another. Its recently-released alignment research includes self-critiquing models (AF discussion), InstructGPT, WebGPT (AF discussion) and book summarization (AF discussion). OpenAI's ap...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: How much alignment data will we need in the long run?, published by Jacob Hilton on August 10, 2022 on The AI Alignment Forum. This question stands out to me because: It should directly affect empirical alignment priorities today While it is informed by both theoretical and empirical evidence, it seems tractable for purely theoretical alignment researchers to make progress on today It's even possible that theoretical alignment researchers already consider this to be a solved problem, in which case I think it would be valuable to have a carefully-reasoned write-up that empirical alignment practitioners can feel confident in the conclusions of. Thanks to Paul Christiano for discussion that prompted this post and to Jan Leike for comments. Why this should affect empirical alignment priorities today Outer alignment can be framed as a data quality problem. If our alignment training data correctly favors aligned behavior over unaligned behavior, then we have solved outer alignment. But if there are errors in our data that cause an unaligned policy to be preferred, then we have a problem. It is common to worry about errors in the alignment training data that arise from evaluation being too difficult for humans. I think this makes sense for two reasons: Firstly, errors of this kind specifically incentivize models to deceive the human evaluators, which seems like an especially concerning variety of alignment failure. Secondly, errors of this kind will get worse with model capability, which is a scary dynamic: models would get more misaligned as they became more powerful. Nevertheless, I think we could still get catastrophic alignment failures from more mundane kinds of data quality issues. If we had the perfect scalable alignment solution, but the humans in the loop simply failed to implement it correctly, that could be just as bad as not using the solution at all. But prevention of mundane kinds of data quality issues could look very different depending on the amount of data being collected: If a large amount of alignment training data is needed, then a significant amount of delegation will be required. Hence practitioners will need to think about how to choose who to delegate different tasks to (including defending against adversaries intentionally introducing errors), how to conduct quality control and incentivize high data quality, how to design training materials and interfaces to reduce the likelihood of human error, and so on. If only a small amount of alignment training data is needed, then it will be more feasible to put a lot of scrutiny on each datapoint. Perhaps practitioners will need to think about how to appropriately engage the public on the choice of each datapoint in order to maintain public trust. Hence settling the question of how much alignment training data we will need in the long run seems crucial for deciding how much empirical alignment efforts should invest in the first versus the second kind of effort. In practice, we may collect both a larger amount of lower-quality data and a smaller amount of higher-quality data, following some quality-quantity curve. The generalized form of the question then becomes: what is the probability of alignment for a given quality-quantity curve? Practitioners will then be able to combine this with feasibility considerations to decide what curve to ultimately follow. Initial thoughts on this question Considerations in favor of less alignment training data being required: Larger models are more sample-efficient than smaller models, especially in the presence of pre-training. Hence for a given task we should expect the amount of alignment training data we need to go down over time. There could be many rounds of fine-tuning used to teach models the precise details of performing certain tasks, and data quality may only be ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Encultured AI, Part 1 Appendix: Relevant Research Examples, published by Andrew Critch on August 8, 2022 on The AI Alignment Forum. Also available on the EA Forum.Appendix to: Encultured AI, Part 1: Enabling New BenchmarksFollowed by: Encultured AI, Part 2: Providing a Service Appendix 1: “Trending” AI x-safety research areas We mentioned a few areas of “trending” AI x-safety research above; below are some more concrete examples of what we mean: Trustworthiness & truthfulness: Owain Evans, Owen Cotton-Barratt and others have authored “Truthful AI: Developing and governing AI that does not lie” (arxiv, 2021; twitter thread). Andreas Stuhlmüller, Jungwon Byun and others at Ought.org are building an AI-powered research assistant called Elicit (website); here is the product:. Task-specific (narrow) preference learning: Paul Christiano et al (arxiv, 2017) developed a data-efficient preference-learning technique for training RL-based systems, which is now very widely cited (scholar). Jan Leike, now at OpenAI, leads a team working on ‘scalable alignment' using preference-learning techniques (arxiv, 2018) (blog). Interpretability: Chris Olah (scholar) leads an interpretability research group at Anthropic. Anthropic (website) is culturally very attuned to large-scale risks from AI, including existential risks. Buck Shlegeris and others at Redwood Research (website) have built an interpretability tool for analyzing transformer networks trained on natural language (demo). Prof. Cynthia Rudin at Duke (homepage) approaches interpretability by trying to replace black-box models with more interpretable ones (arxiv, 2018), and we know from conversations with her that she is open to applications of her work to existential safety. Robustness & risk management: Prof. Jaime Fisac at Princeton (homepage) researches AI safety for robotics, high-dimensional control systems and multi-agent systems (scholar), including provable robustness guarantees. He was previously a PhD student at the UC Berkeley Center for Human-Compatible AI (CHAI), provided extensive feedback on AI Research Considerations for Human Existential Safety (ARCHES) (arxiv, 2020), and is very attuned to existential safety as a cause area. Prof. David Krueger at Cambridge (scholar) studies out-of-distribution generalization (pdf, 2021), and is currently taking on students. Adam Gleave (homepage) is a final-year PhD student at CHAI / UC Berkeley, and studies out-of-distribution robustness for deep RL. Sam Toyer (scholar), also a PhD student at CHAI, has developed a benchmark for robust imitation learning (pdf, 2020). Appendix 2: “Emerging” AI x-safety research areas In this post, we classified cooperative AI and multi-stakeholder control of AI systems as “emerging” topics in AI x-safety. Here's more about what we mean, and why: Cooperative AI This area is “emerging” in x-safety because there's plenty of attention to the issue of cooperation from both policy-makers and AI researchers, but not yet much among folks focused on x-risk. Existential safety attention on cooperative AI: Many authors — too many to name! — have remarked on the importance of international coordination on AI safety efforts, including existential safety. For instance, there is a Wikipedia article on AI arms races (wikipedia). This covers the human–human side of the cooperative AI problem. AI research on cooperative AI: Multi-agent systems research has a long history in AI (scholar search), as does multi-agent reinforcement learning (scholar search). DeepMind's Multi-agent Learning team has recently written number papers examining competition and cooperation between artificial agents (website). OpenAI has done some work on multi-agent interaction, e.g. emergent tool use in multi-agent interaction (arxiv). Prof. Jakob Foerster at Oxford (scholar search), and ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Jan Leike: On the windfall clause, published by Cullen OKeefe on August 4, 2022 on The Effective Altruism Forum. Jan wrote this thoughtful critique of the Windfall Clause back in 2020, and I thought it should be posted publicly. Posted with Jan's permission. Key excerpt: A core challenge when trying to design a windfall clause is that there is an incredibly strong incentive to find a loophole once the clause takes effect. If you run an organization who signed a windfall clause and in the future the unlikely comes to pass and you actually end up making $11 trillion in annual profits, it would be rational for you to spend up to $10 trillion on legal fees to try to get out of that clause just for that year; preferably in a way that doesn't cost you too much credibility. Companies are doing this already–this is why the big internet companies pay hardly any taxes. So how can we implement a windfall clause so that it withstands a multi-trillion dollar search for loopholes? This is incredibly hard, because we need to write and implement the clause today, before the windfall happens (which might never happen at all). This means it has to be done on a budget that is many orders of magnitude smaller than the actual windfall. Let's call this asymmetry the legal offense/defense ratio: We have to defend the "spirit" of the windfall clause against a future version of the organization (possibly with different people in charge) who is experiencing a windfall and wants to get out of the clause. So if you tried to figure out the implementation of a legally tight windfall clause on an insanely high budget like 10 million dollars to safeguard a $10 trillion windfall, you need to withstand a legal offense/defense ratio of at least 1:1,000,000.[1] These numbers are certainly made up, but let's roll with them and look at some historical precedent: What are legal battles where one party has spent that much more money than the other? I'm certainly not a lawyer, but I want to highlight two examples. The Tobacco Master Settlement Agreement was one of the highest stakes legal cases in history: a bunch of US states sued the big tobacco companies for the increased burden of long-time smokers on the healthcare system. The tobacco companies ultimately lost and were sentenced to pay $206 billion over 25 years.[2] Because of incentive fees, the states' legal fees exceeded $8 billion. I don't know how much the tobacco companies spent on legal fees, but in order to get above the 1:1,000,000 ratio, they would have needed to keep their expenses below $8,000. We can safely assume that they spent much more than that; $8,000 probably wouldn't even enable you to attend most of the court sessions even if you are defending yourself, live right next to the court, and pay yourself minimum wage. This may sound ridiculous, but this was almost the case in the famous "McLibel" case: Two people were sued by MacDonald's in 1990 for libel because they distributed some flyers accusing the corporation of various malpractices: destroying rainforests, mistreating animals–the usual stuff. One of the defendants was a part-time bar-worker earning a maximum of £65 a week, and the other one was an unemployed postal worker. The two defendants were denied free legal aid because apparently the UK libel system is not great at this. Therefore they represented themselves, plus spending about £30,000 over ten (!) years, not counting their personal time and various pro bono legal advice. In contrast, MacDonald's allegedly spent several million dollars on the case. Still, this is a legal offense/defense ratio of below 1:1,000. The two defendants ultimately lost the case and had to pay £40,000 to MacDonald's, but the Judge also upheld some of their claims. The European Court for Human Rights later ruled that they weren't given a fair tri...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Scott Aaronson is joining OpenAI to work on AI safety, published by peterbarnett on June 18, 2022 on LessWrong. Scott Aaronson is a computer scientist at the University of Texas in Austin, whose research mainly focuses on quantum computing and complexity theory. He's at least very adjacent to the Rationalist/LessWrong community. After some comments on his blog and then coversations with Jan Leike, he's decided work for one year on AI safety at OpenAI. To me this is a reasonable update that people who are sympathetic to AI safety can be convinced to actually do direct work. Aaronson might be one of the easier people to induce to do AI safety work, but I imagine there are also other people who are worth talking to about doing direct work on AI safety. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Scott Aaronson is joining OpenAI to work on AI safety, published by peterbarnett on June 18, 2022 on LessWrong. Scott Aaronson is a computer scientist at the University of Texas in Austin, whose research mainly focuses on quantum computing and complexity theory. He's at least very adjacent to the Rationalist/LessWrong community. After some comments on his blog and then coversations with Jan Leike, he's decided work for one year on AI safety at OpenAI. To me this is a reasonable update that people who are sympathetic to AI safety can be convinced to actually do direct work. Aaronson might be one of the easier people to induce to do AI safety work, but I imagine there are also other people who are worth talking to about doing direct work on AI safety. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Prize for Alignment Research Tasks, published by Andreas Stuhlmüller on April 29, 2022 on The AI Alignment Forum. Can AI systems substantially help with alignment research before transformative AI? People disagree. Ought is collecting a dataset of alignment research tasks so that we can: Make progress on the disagreement Guide AI research towards helping with alignment We're offering a prize of $200-$2000 for each contribution to this dataset. The debate: Can AI substantially help with alignment research? Wei Dai asked the question in 2019: [This] comparison table makes Research Assistant seem a particularly attractive scenario to aim for, as a stepping stone to a more definitive [AI Safety] success story. Is this conclusion actually justified? Jan Leike thinks so: My currently favored approach to solving the alignment problem: automating alignment research using sufficiently aligned AI systems. It doesn't require humans to solve all alignment problems themselves, and can ultimately help bootstrap better alignment solutions. Paul Christiano agrees: Building weak AI systems that help improve alignment seems extremely important to me and is a significant part of my optimism about AI alignment. [...] Overall I think that "make sure we are able to get good alignment research out of early AI systems" is comparably important to "do alignment ourselves." Realistically I think the best case for "do alignment ourselves" is that if "do alignment" is the most important task to automate, then just working a ton on alignment is a great way to automate it. But that still means you should be investing quite a significant fraction of your time in automating alignment. Eliezer doesn't: "AI systems that do better alignment research" are dangerous in virtue of the lethally powerful work they are doing, not because of some particular narrow way of doing that work. If you can do it by gradient descent then that means gradient descent got to the point of doing lethally dangerous work. Asking for safely weak systems that do world-savingly strong tasks is almost everywhere a case of asking for nonwet water, and asking for AI that does alignment research is an extreme case in point. Everyone would likely agree that AI can help a little, e.g. using next word prediction to write papers slightly faster. The debate is about whether AI can help enough with alignment specifically that it substantially changes the picture. If AI alignment is 70% easy stuff we can automate and 30% hard stuff that we can't hope to help with, the 30% is still a bottleneck in the end. Motivation for the dataset We're collecting a dataset of concrete research tasks so that we can: Make progress on the disagreement about whether AI can substantially help with alignment before TAI. Is there even a disagreement? Maybe people aren't talking about the same kinds of tasks and the collective term “alignment research” obscures important distinctions. If there is a disagreement, concrete tasks will let us make progress on figuring out the correct answer. Guide AI research towards helping with alignment. Figure out if current language models can already be helpful now. If they can, help Ought and others build tools that are differentially useful for alignment researchers. If they can't, guide future language model work towards supporting those tasks. As an important special case of step two, the dataset will guide the plan for Elicit. Limitations Ideally, we'd come up with tasks and automation together, iterating quickly on how to set up the tasks so that they are within reach of language models. If tasks are constructed in isolation, they are likely to be a worse fit for automation. In practice, we expect that language models won't be applied end-to-end to tasks like this, mapping inputs to outputs, but will be part of compositi...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Prize for Alignment Research Tasks, published by Andreas Stuhlmüller on April 29, 2022 on The AI Alignment Forum. Can AI systems substantially help with alignment research before transformative AI? People disagree. Ought is collecting a dataset of alignment research tasks so that we can: Make progress on the disagreement Guide AI research towards helping with alignment We're offering a prize of $200-$2000 for each contribution to this dataset. The debate: Can AI substantially help with alignment research? Wei Dai asked the question in 2019: [This] comparison table makes Research Assistant seem a particularly attractive scenario to aim for, as a stepping stone to a more definitive [AI Safety] success story. Is this conclusion actually justified? Jan Leike thinks so: My currently favored approach to solving the alignment problem: automating alignment research using sufficiently aligned AI systems. It doesn't require humans to solve all alignment problems themselves, and can ultimately help bootstrap better alignment solutions. Paul Christiano agrees: Building weak AI systems that help improve alignment seems extremely important to me and is a significant part of my optimism about AI alignment. [...] Overall I think that "make sure we are able to get good alignment research out of early AI systems" is comparably important to "do alignment ourselves." Realistically I think the best case for "do alignment ourselves" is that if "do alignment" is the most important task to automate, then just working a ton on alignment is a great way to automate it. But that still means you should be investing quite a significant fraction of your time in automating alignment. Eliezer doesn't: "AI systems that do better alignment research" are dangerous in virtue of the lethally powerful work they are doing, not because of some particular narrow way of doing that work. If you can do it by gradient descent then that means gradient descent got to the point of doing lethally dangerous work. Asking for safely weak systems that do world-savingly strong tasks is almost everywhere a case of asking for nonwet water, and asking for AI that does alignment research is an extreme case in point. Everyone would likely agree that AI can help a little, e.g. using next word prediction to write papers slightly faster. The debate is about whether AI can help enough with alignment specifically that it substantially changes the picture. If AI alignment is 70% easy stuff we can automate and 30% hard stuff that we can't hope to help with, the 30% is still a bottleneck in the end. Motivation for the dataset We're collecting a dataset of concrete research tasks so that we can: Make progress on the disagreement about whether AI can substantially help with alignment before TAI. Is there even a disagreement? Maybe people aren't talking about the same kinds of tasks and the collective term “alignment research” obscures important distinctions. If there is a disagreement, concrete tasks will let us make progress on figuring out the correct answer. Guide AI research towards helping with alignment. Figure out if current language models can already be helpful now. If they can, help Ought and others build tools that are differentially useful for alignment researchers. If they can't, guide future language model work towards supporting those tasks. As an important special case of step two, the dataset will guide the plan for Elicit. Limitations Ideally, we'd come up with tasks and automation together, iterating quickly on how to set up the tasks so that they are within reach of language models. If tasks are constructed in isolation, they are likely to be a worse fit for automation. In practice, we expect that language models won't be applied end-to-end to tasks like this, mapping inputs to outputs, but will be part of compositi...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Law-Following AI 1: Sequence Introduction and Structure, published by Cullen OKeefe on April 27, 2022 on The AI Alignment Forum. This post is written in my personal capacity, and does not necessarily represent the views of OpenAI or any other organization. Cross-posted to the Effective Altruism Forum. This sequence of posts will argue that working to ensure that AI systems follow laws is a worthwhile way to improve the long-term future of AI.[1] The structure of this sequence will be as follows: First, in this post, I will define some key terms and sketch what an ideal law-following AI ("LFAI") system might look like. In the next few posts, I will explain why law-following might not emerge by default given the existing constellation of alignment approaches, financial objectives, and legal constraints, and explain why this is troubling. Finally, I will propose some policy and technical routes to ameliorating these problems. If the vision here excites you, and you would like to get funding to work on it, get in touch. I may be excited to recommend grants for people working on this, as long as it does not distract them from working on more important alignment issues. Image by OpenAI's DALL·E. Key Definitions A law-following AI , or LFAI , is an AI system that is designed to rigorously comply with some defined set of human-originating rules ("laws"),[2] using legal interpretative techniques,[3] under the assumption that those laws apply to the AI in the same way that they would to a human. By "intrinsically motivated," I mean that the AI is motivated to obey those rules regardless of whether (a) its human principal wants it to obey the law,[4] or (b) disobeying the law would be instrumentally valuable.[5] (The Appendix to this post explores some possible conceptual issues with this definition of LFAI.) I will compare LFAI with intent-aligned AI. The standard definition of "intent alignment" generally concerns only the relationship between some property of a human principal H and the actions of the human's AI agent A: Jan Leike et al. define the "agent alignment problem" as "How can we create agents that behave in accordance with the user's intentions?" Amanda Askell et al. define "alignment" as "the degree of overlap between the way two agents rank different outcomes." Paul Christiano defines "AI alignment" as "A is trying to do what H wants it to do." Richard Ngo endorses Christiano's definition. Iason Gabriel does not directly define "intent alignment," but provides a taxonomy wherein an AI agent can be aligned with: "Instructions: the agent does what I instruct it to do." "Expressed intentions: the agent does what I intend it to do." "Revealed preferences: the agent does what my behaviour reveals I prefer." "Informed preferences or desires: the agent does what I would want it to do if I were rational and informed." "Interest or well-being: the agent does what is in my interest, or what is best for me, objectively speaking." "Values: the agent does what it morally ought to do, as defined by the individual or society." All but (6) concern the relationship between H and A. It would therefore seem appropriate to describe them as types of intent alignment. Alignment with some broader or more complete set of values—such as type (6) in Gabriel's taxonomy, Coherent Extrapolated Volition, or what Ngo calls "maximalist" or "ambitious" alignment—is perhaps desirable or even necessary, but seems harder than working on intent alignment.[6] Much current alignment work therefore focuses on intent alignment. We can see that, on its face, intent alignment does not entail law-following. A key crux of this sequence, to be defended in subsequent posts, is that this gap between intent alignment and law-following is: Bad in expectation for the long-term future. Easier to bridge than the ga...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Link] Why I'm excited about AI-assisted human feedback, published by Jan Leike on April 6, 2022 on The AI Alignment Forum. This is a link post for I'm writing a sequence of posts on the approach to alignment I'm currently most excited about. This first post argues for recursive reward modeling and the problem it's meant to address (scaling RLHF to tasks that are hard to evaluate). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: [Link] A minimal viable product for alignment, published by Jan Leike on April 6, 2022 on The AI Alignment Forum. This is a link post for I'm writing a sequence of posts on the approach to alignment I'm currently most excited about. This second post argues that instead of trying to solve the alignment problem once and for all, we can succeed with something less ambitious: building a system that allows us to bootstrap better alignment techniques. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Risks from Learned Optimization, Part 1: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is Risks from Learned Optimization, Part 1: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabr. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Risks from Learned Optimization: Introduction, published by evhub, Chris van Merwijk, vlad_m, Joar Skalse, Scott Garrabrant on the LessWrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible plans, picking thos...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Risks from Learned Optimization: Introduction , published by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, Scott Garrabrant on the AI Alignment Forum. This is the first of five posts in the Risks from Learned Optimization Sequence based on the paper “Risks from Learned Optimization in Advanced Machine Learning Systems” by Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, Joar Skalse, and Scott Garrabrant. Each post in the sequence corresponds to a different section of the paper. Evan Hubinger, Chris van Merwijk, Vladimir Mikulik, and Joar Skalse contributed equally to this sequence. With special thanks to Paul Christiano, Eric Drexler, Rob Bensinger, Jan Leike, Rohin Shah, William Saunders, Buck Shlegeris, David Dalrymple, Abram Demski, Stuart Armstrong, Linda Linsefors, Carl Shulman, Toby Ord, Kate Woolverton, and everyone else who provided feedback on earlier versions of this sequence. Motivation The goal of this sequence is to analyze the type of learned optimization that occurs when a learned model (such as a neural network) is itself an optimizer—a situation we refer to as mesa-optimization, a neologism we introduce in this sequence. We believe that the possibility of mesa-optimization raises two important questions for the safety and transparency of advanced machine learning systems. First, under what circumstances will learned models be optimizers, including when they should not be? Second, when a learned model is an optimizer, what will its objective be—how will it differ from the loss function it was trained under—and how can it be aligned? We believe that this sequence presents the most thorough analysis of these questions that has been conducted to date. In particular, we present not only an introduction to the basic concerns surrounding mesa-optimizers, but also an analysis of the particular aspects of an AI system that we believe are likely to make the problems related to mesa-optimization relatively easier or harder to solve. By providing a framework for understanding the degree to which different AI systems are likely to be robust to misaligned mesa-optimization, we hope to start a discussion about the best ways of structuring machine learning systems to solve these problems. Furthermore, in the fourth post we will provide what we think is the most detailed analysis yet of a problem we refer as deceptive alignment which we posit may present one of the largest—though not necessarily insurmountable—current obstacles to producing safe advanced machine learning systems using techniques similar to modern machine learning. Two questions In machine learning, we do not manually program each individual parameter of our models. Instead, we specify an objective function that captures what we want the system to do and a learning algorithm to optimize the system for that objective. In this post, we present a framework that distinguishes what a system is optimized to do (its “purpose”), from what it optimizes for (its “goal”), if it optimizes for anything at all. While all AI systems are optimized for something (have a purpose), whether they actually optimize for anything (pursue a goal) is non-trivial. We will say that a system is an optimizer if it is internally searching through a search space (consisting of possible outputs, policies, plans, strategies, or similar) looking for those elements that score high according to some objective function that is explicitly represented within the system. Learning algorithms in machine learning are optimizers because they search through a space of possible parameters—e.g. neural network weights—and improve the parameters with respect to some objective. Planning algorithms are also optimizers, since they search through possible plans, picking those that do well according to some objective. Whether a syste...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My current framework for thinking about AGI timelines, published by Alex Zhu on the AI Alignment Forum. At the beginning of 2017, someone I deeply trusted said they thought AGI would come in 10 years, with 50% probability. I didn't take their opinion at face value, especially since so many experts seemed confident that AGI was decades away. But the possibility of imminent apocalypse seemed plausible enough and important enough that I decided to prioritize investigating AGI timelines over trying to strike gold. I left the VC-backed startup I'd cofounded, and went around talking to every smart and sensible person I could find who seemed to have opinions about when humanity would develop AGI. My biggest takeaways after 3 years might be disappointing -- I don't think the considerations currently available to us point to any decisive conclusion one way or another, and I don't think anybody really knows when AGI is coming. At the very least, the fields of knowledge that I think bear on AGI forecasting (including deep learning, predictive coding, and comparative neuroanatomy) are disparate, and I don't know of any careful and measured thinkers with all the relevant expertise. That being said, I did manage to identify a handful of background variables that consistently play significant roles in informing people's intuitive estimates of when we'll get to AGI. In other words, people would often tell me that their estimates of AGI timelines would significantly change if their views on one of these background variables changed. I've put together a framework for understanding AGI timelines based on these background variables. Among all the frameworks for AGI timelines I've encountered, it's the framework that most comprehensively enumerates crucial considerations for AGI timelines, and it's the framework that best explains how smart and sensible people might arrive at vastly different views on AGI timelines. Over the course of the next few weeks, I'll publish a series of posts about these background variables and some considerations that shed light on what their values are. I'll conclude by describing my framework for how they come together to explain various overall viewpoints on AGI timelines, depending on different prior assumptions on the values of these variables. By trade, I'm a math competition junkie, an entrepreneur, and a hippie. I am not an expert on any of the topics I'll be writing about -- my analyses will not be comprehensive, and they might contain mistakes. I'm sharing them with you anyway in the hopes that you might contribute your own expertise, correct for my epistemic shortcomings, and perhaps find them interesting. I'd like to thank Paul Christiano, Jessica Taylor, Carl Shulman, Anna Salamon, Katja Grace, Tegan McCaslin, Eric Drexler, Vlad Firiou, Janos Kramar, Victoria Krakovna, Jan Leike, Richard Ngo, Rohin Shah, Jacob Steinhardt, David Dalrymple, Catherine Olsson, Jelena Luketina, Alex Ray, Jack Gallagher, Ben Hoffman, Tsvi BT, Sam Eisenstat, Matthew Graves, Ryan Carey, Gary Basin, Eliana Lorch, Anand Srinivasan, Michael Webb, Ashwin Sah, Yi Sun, Mark Sellke, Alex Gunning, Paul Kreiner, David Girardo, Danit Gal, Oliver Habryka, Sarah Constantin, Alex Flint, Stag Lynn, Andis Draguns, Tristan Hume, Holden Lee, David Dohan, and Daniel Kang for enlightening conversations about AGI timelines, and I'd like to apologize to anyone whose name I ought to have included, but forgot to include. Table of contents As I post over the coming weeks, I'll update this table of contents with links to the posts, and I might update some of the titles and descriptions. How special are human brains among animal brains? Humans can perform intellectual feats that appear qualitatively different from those of other animals, but are our brains really doing anything so different? How u...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Classifying specification problems as variants of Goodhart's Law, published by Vika on the AI Alignment Forum. (Cross-posted to personal blog. Summarized in Alignment Newsletter #76. Thanks to Jan Leike and Tom Everitt for their helpful feedback on this post.) There are a few different classifications of safety problems, including the Specification, Robustness and Assurance (SRA) taxonomy and the Goodhart's Law taxonomy. In SRA, the specification category is about defining the purpose of the system, i.e. specifying its incentives. Since incentive problems can be seen as manifestations of Goodhart's Law, we explore how the specification category of the SRA taxonomy maps to the Goodhart taxonomy. The mapping is an attempt to integrate different breakdowns of the safety problem space into a coherent whole. We hope that a consistent classification of current safety problems will help develop solutions that are effective for entire classes of problems, including future problems that have not yet been identified. The SRA taxonomy defines three different types of specifications of the agent's objective: ideal (a perfect description of the wishes of the human designer), design (the stated objective of the agent) and revealed (the objective recovered from the agent's behavior). It then divides specification problems into design problems (e.g. side effects) that correspond to a difference between the ideal and design specifications, and emergent problems (e.g. tampering) that correspond to a difference between the design and revealed specifications. In the Goodhart taxonomy, there is a variable U representing the true objective, and a variable U representing the proxy for the objective (e.g. a reward function). The taxonomy identifies four types of Goodhart effects: regressional (maximizing U also selects for the difference between U and U), extremal (maximizing U takes the agent outside the region where U and U are correlated), causal (the agent intervenes to maximize U in a way that does not affect U), and adversarial (the agent has a different goal W and exploits the proxy U to maximize W). We think there is a correspondence between these taxonomies: design problems are regressional and extremal Goodhart effects, while emergent problems are causal Goodhart effects. The rest of this post will explain and refine this correspondence. The SRA taxonomy needs to be refined in order to capture the distinction between regressional and extremal Goodhart effects, and to pinpoint the source of causal Goodhart effects. To this end, we add a model specification as an intermediate point between the ideal and design specifications, and an implementation specification between the design and revealed specifications. The model specification is the best proxy within a chosen formalism (e.g. model class or specification language), i.e. the proxy that most closely approximates the ideal specification. In a reinforcement learning setting, the model specification is the reward function (defined in the given MDP/R over the given state space) that best captures the human designer's preferences. The ideal-model gap corresponds to the model design problem (regressional Goodhart): choosing a model that is tractable but also expressive enough to approximate the ideal specification well. The model-design gap corresponds to proxy design problems (extremal Goodhart), such as specification gaming and side effects. While the design specification is a high-level description of what should be executed by the system, the implementation specification is a specification that can be executed, which includes agent and environment code (e.g. an executable Linux binary). (We note that it is also possible to define other specification levels at intermediate levels of abstraction between design and implementation, e.g...
The more powerful our AIs become, the more we'll have to ensure that they're doing exactly what we want. If we don't, we risk building AIs that use dangerously creative solutions that have side-effects that could be undesirable, or downright dangerous. Even a slight misalignment between the motives of a sufficiently advanced AI and human values could be hazardous. That's why leading AI labs like OpenAI are already investing significant resources into AI alignment research. Understanding that research is important if you want to understand where advanced AI systems might be headed, and what challenges we might encounter as AI capabilities continue to grow — and that's what this episode of the podcast is all about. My guest today is Jan Leike, head of AI alignment at OpenAI, and an alumnus of DeepMind and the Future of Humanity Institute. As someone who works directly with some of the world's largest AI systems (including OpenAI's GPT-3) Jan has a unique and interesting perspective to offer both on the current challenges facing alignment researchers, and the most promising future directions the field might take. --- Intro music: ➞ Artist: Ron Gelinas ➞ Track Title: Daybreak Chill Blend (original mix) ➞ Link to Track: https://youtu.be/d8Y2sKIgFWc --- Chapters: 0:00 Intro 1:35 Jan's background 7:10 Timing of scalable solutions 16:30 Recursive reward modeling 24:30 Amplification of misalignment 31:00 Community focus 32:55 Wireheading 41:30 Arguments against the democratization of AIs 49:30 Differences between capabilities and alignment 51:15 Research to focus on 1:01:45 Formalizing an understanding of personal experience 1:04:04 OpenAI hiring 1:05:02 Wrap-up
As artificial intelligence gets more and more powerful, the need becomes greater to ensure that machines do the right thing. But what does that even mean? Brian Christian joins Vasant Dhar in episode 13 of Brave New World to discuss, as the title of his new book goes, the alignment problem. Useful resources: 1. Brian Christian's homepage. 2. The Alignment Problem: Machine Learning and Human Values -- Brian Christian. 3. Algorithms to Live By: The Computer Science of Human Decisions -- Brian Christian and Tom Griffiths. 4. The Most Human Human -- Brian Christian. 5. How Social Media Threatens Society -- Episode 8 of Brave New World (w Jonathan Haidt). 6. Are We Becoming a New Species? -- Episode 12 of Brave New World (w Molly Crockett). 7. The Nature of Intelligence -- Episode 7 of Brave New World (w Yann le Cunn) 8. Some Moral and Technical Consequences of Automation -- Norbert Wiener. 9.Superintelligence: Paths, Dangers, Strategies -- Nick Bostrom. 10. Human Compatible: AI and the Problem of Control -- Stuart Russell. 11. OpenAI. 12. Center for Human-Compatible AI. 13. Concrete Problems in AI Safety -- Dario Amodei, Chris Olah, Jacob Steinhardt, Paul Christiano, John Schulman, Dan Mané. 14. Machine Bias -- Julia Angwin, Jeff Larson, Surya Mattu and Lauren Kirchner. 15. Inherent Trade-Offs in the Fair Determination of Risk Scores -- Jon Kleinberg, Sendhil Mullainathan, Manish Raghavan. 16. Algorithmic Decision Making and the Cost of Fairness -- Sam Corbett-Davies, Emma Pierson, Avi Feller, Sharad Goel, Aziz Huq.. 17. Predictions Put Into Practice -- Jessica Saunders, Priscillia Hunt, John S. Hollywood 18. An Engine, Not a Camera: How Financial Models Shape Markets -- Donald MacKenzie. 19. An Anthropologist on Mars -- Oliver Sacks. 20. Deep Reinforcement Learning from Human Preferences -- Paul F Christiano, Jan Leike, Tom B Brown, Miljan Martic, Shane Legg, Dario Amadei for OpenAI & Deep Mind.