American computer scientist
POPULARITY
How can we ensure technology evolves ethically in a rapidly advancing world? Neil deGrasse Tyson, Chuck Nice & Gary O'Reilly explore the challenges of designing a future where human values and AI coexist with The Future of Life Award's 2024 recipients, Batya Friedman & Steve Omohundro.Thanks to our friends at Future of Life Institute for supporting today's episode. To learn more about FOL and this year's winners, make sure to visit FutureofLife.org.NOTE: StarTalk+ Patrons can listen to this entire episode commercial-free here: https://startalkmedia.com/show/the-ethics-of-ai-with-batya-friedman-steve-omohundro/Thanks to our Patrons Meech, Sara Hubbard, Jesse Thilo, Myles Stanton, Francisco Meza, Edvardas Žemaitis, Ronny Waingort, Cyrelle Sumner, Ticheal Murner, Chase, Karen Morlatt, Brian Kelley, Kevin Vail, Rob, Razey9, Mark Th, Kyle M, Zygmunt Wasik, Hulk, Jon McPeak, smiggy93, Tolulope Oladimeji Oyeniyi, Bernie Cat, David Conradt, Ian Mercado, Daniel Bielawski, Indika, and Aris for supporting us this week. Subscribe to SiriusXM Podcasts+ on Apple Podcasts to listen to new episodes ad-free and a whole week early.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limitations on Formal Verification for AI Safety, published by Andrew Dickson on August 20, 2024 on LessWrong. In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets. Additionally, even if we had such rules for the natural sciences, it would be very difficult for any software system to obtain sufficiently accurate models and data about initial conditions for a prover to succeed in deriving strong guarantees for AI systems operating in the real world. Practical limitations like these on formal verification have been well-understood for decades to engineers and applied mathematicians building real-world software systems, which makes it puzzling that they have mostly been dismissed by leading researchers advocating for the use of formal verification in AI safety so far. This paper will focus-in on several such limitations and use them to argue that we should be extremely skeptical of claims that formal verification-based approaches will provide strong guarantees against major AI threats in the near-term. What do we Mean by Formal Verification for AI Safety? Some examples of the kinds of threats researchers hope formal verification will help with come from the paper "Provably Safe Systems: The Only Path to Controllable AGI" [1] by Max Tegmark and Steve Omohundro (emphasis mine): Several groups are working to identify the greatest human existential risks from AGI. For example, the Center for AI Safety recently published 'An Overview of Catastrophic AI Risks' which discusses a wide range of risks including bioterrorism, automated warfare, rogue power seeking AI, etc. Provably safe systems could counteract each of the risks they describe. These authors describe a concrete bioterrorism scenario in section 2.4: a terrorist group wants to use AGI to release a deadly virus over a highly populated area. They use an AGI to design the DNA and shell of a pathogenic virus and the steps to manufacture it. They hire a chemistry lab to synthesize the DNA and integrate it into the protein shell. They use AGI controlled drones to disperse the virus and social media AGIs to spread their message after the attack. Today, groups are working on mechanisms to prevent the synthesis of dangerous DNA. But provably safe infrastructure could stop this kind of attack at every stage: biochemical design AI would not synthesize designs unless they were provably safe for humans, data center GPUs would not execute AI programs unless they were certified safe, chip manufacturing plants would not sell GPUs without provable safety checks, DNA synthesis machines would not operate without a proof of safety, drone control systems would not allow drones to fly without proofs of safety, and armies of persuasive bots would not be able to manipulate media without proof of humanness. [1] The above quote contains a number of very strong claims about the possibility of formally or mathematically provable guarantees around software systems deployed in the physical world - for example, the claim that we could have safety proofs about the real-world good behavior of DNA synthesis machines, or drones. From a practical standpoint, our default stance towards such claims should be skepticism, since we do not have proofs of this sort for any of the technologies we interact with in the real-world today. For example, DNA synthesis machines exist today and do not come with f...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limitations on Formal Verification for AI Safety, published by Andrew Dickson on August 20, 2024 on LessWrong. In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets. Additionally, even if we had such rules for the natural sciences, it would be very difficult for any software system to obtain sufficiently accurate models and data about initial conditions for a prover to succeed in deriving strong guarantees for AI systems operating in the real world. Practical limitations like these on formal verification have been well-understood for decades to engineers and applied mathematicians building real-world software systems, which makes it puzzling that they have mostly been dismissed by leading researchers advocating for the use of formal verification in AI safety so far. This paper will focus-in on several such limitations and use them to argue that we should be extremely skeptical of claims that formal verification-based approaches will provide strong guarantees against major AI threats in the near-term. What do we Mean by Formal Verification for AI Safety? Some examples of the kinds of threats researchers hope formal verification will help with come from the paper "Provably Safe Systems: The Only Path to Controllable AGI" [1] by Max Tegmark and Steve Omohundro (emphasis mine): Several groups are working to identify the greatest human existential risks from AGI. For example, the Center for AI Safety recently published 'An Overview of Catastrophic AI Risks' which discusses a wide range of risks including bioterrorism, automated warfare, rogue power seeking AI, etc. Provably safe systems could counteract each of the risks they describe. These authors describe a concrete bioterrorism scenario in section 2.4: a terrorist group wants to use AGI to release a deadly virus over a highly populated area. They use an AGI to design the DNA and shell of a pathogenic virus and the steps to manufacture it. They hire a chemistry lab to synthesize the DNA and integrate it into the protein shell. They use AGI controlled drones to disperse the virus and social media AGIs to spread their message after the attack. Today, groups are working on mechanisms to prevent the synthesis of dangerous DNA. But provably safe infrastructure could stop this kind of attack at every stage: biochemical design AI would not synthesize designs unless they were provably safe for humans, data center GPUs would not execute AI programs unless they were certified safe, chip manufacturing plants would not sell GPUs without provable safety checks, DNA synthesis machines would not operate without a proof of safety, drone control systems would not allow drones to fly without proofs of safety, and armies of persuasive bots would not be able to manipulate media without proof of humanness. [1] The above quote contains a number of very strong claims about the possibility of formally or mathematically provable guarantees around software systems deployed in the physical world - for example, the claim that we could have safety proofs about the real-world good behavior of DNA synthesis machines, or drones. From a practical standpoint, our default stance towards such claims should be skepticism, since we do not have proofs of this sort for any of the technologies we interact with in the real-world today. For example, DNA synthesis machines exist today and do not come with f...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Limitations on Formal Verification for AI Safety, published by Andrew Dickson on August 19, 2024 on The AI Alignment Forum. In the past two years there has been increased interest in formal verification-based approaches to AI safety. Formal verification is a sub-field of computer science that studies how guarantees may be derived by deduction on fully-specified rule-sets and symbol systems. By contrast, the real world is a messy place that can rarely be straightforwardly represented in a reductionist way. In particular, physics, chemistry and biology are all complex sciences which do not have anything like complete symbolic rule sets. Additionally, even if we had such rules for the natural sciences, it would be very difficult for any software system to obtain sufficiently accurate models and data about initial conditions for a prover to succeed in deriving strong guarantees for AI systems operating in the real world. Practical limitations like these on formal verification have been well-understood for decades to engineers and applied mathematicians building real-world software systems, which makes it puzzling that they have mostly been dismissed by leading researchers advocating for the use of formal verification in AI safety so far. This paper will focus-in on several such limitations and use them to argue that we should be extremely skeptical of claims that formal verification-based approaches will provide strong guarantees against major AI threats in the near-term. What do we Mean by Formal Verification for AI Safety? Some examples of the kinds of threats researchers hope formal verification will help with come from the paper "Provably Safe Systems: The Only Path to Controllable AGI" [1] by Max Tegmark and Steve Omohundro (emphasis mine): Several groups are working to identify the greatest human existential risks from AGI. For example, the Center for AI Safety recently published 'An Overview of Catastrophic AI Risks' which discusses a wide range of risks including bioterrorism, automated warfare, rogue power seeking AI, etc. Provably safe systems could counteract each of the risks they describe. These authors describe a concrete bioterrorism scenario in section 2.4: a terrorist group wants to use AGI to release a deadly virus over a highly populated area. They use an AGI to design the DNA and shell of a pathogenic virus and the steps to manufacture it. They hire a chemistry lab to synthesize the DNA and integrate it into the protein shell. They use AGI controlled drones to disperse the virus and social media AGIs to spread their message after the attack. Today, groups are working on mechanisms to prevent the synthesis of dangerous DNA. But provably safe infrastructure could stop this kind of attack at every stage: biochemical design AI would not synthesize designs unless they were provably safe for humans, data center GPUs would not execute AI programs unless they were certified safe, chip manufacturing plants would not sell GPUs without provable safety checks, DNA synthesis machines would not operate without a proof of safety, drone control systems would not allow drones to fly without proofs of safety, and armies of persuasive bots would not be able to manipulate media without proof of humanness. [1] The above quote contains a number of very strong claims about the possibility of formally or mathematically provable guarantees around software systems deployed in the physical world - for example, the claim that we could have safety proofs about the real-world good behavior of DNA synthesis machines, or drones. From a practical standpoint, our default stance towards such claims should be skepticism, since we do not have proofs of this sort for any of the technologies we interact with in the real-world today. For example, DNA synthesis machines exist today and do no...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Provably Safe AI: Worldview and Projects, published by bgold on August 10, 2024 on LessWrong. In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document. Background In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029. "Transformative AI" was coined by Open Philanthropy to describe AI which can "precipitate a transition comparable to the agricultural or industrial revolution". There appears to be a significant probability that Transformative AI may be created by 2030. If this probability is, say, greater than 10%, then humanity must immediately begin to prepare for it. The social changes and upheaval caused by Transformative AI are likely to be enormous. There will likely be many benefits but also many risks and dangers, perhaps even existential risks for humanity. Today's technological infrastructure is riddled with flaws and security holes. Power grids, cell service, and internet services have all been very vulnerable to accidents and attacks. Terrorists have attacked critical infrastructure as a political statement. Today's cybersecurity and physical security barely keeps human attackers at bay. When these groups obtain access to powerful cyberattack AI's, they will likely be able to cause enormous social damage and upheaval. Humanity has known how to write provably correct and secure software since Alan Turing's 1949 paper. Unfortunately, proving program correctness requires mathematical sophistication and it is rare in current software development practice. Fortunately, modern deep learning systems are becoming proficient at proving mathematical theorems and generating provably correct code. When combined with techniques like "autoformalization," this should enable powerful AI to rapidly replace today's flawed and insecure codebase with optimized, secure, and provably correct replacements. Many researchers working in these areas believe that AI theorem-proving at the level of human PhD's is likely about two years away. Similar issues plague hardware correctness and security, and it will be a much larger project to replace today's flawed and insecure hardware. Max and Steve propose formal methods grounded in mathematical physics to produce provably safe physical designs. The same AI techniques which are revolutionizing theorem proving and provable software synthesis are also applicable to provable hardware design. Finally, today's social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs. What's next? Given the huge potential risk of uncontrolled powerful AI, many have argued for a pause in Frontier AI development. Unfortunately, that does not appear to be a stable solution. Even if the US paused its AI development, China or other countries could gain an advantage by accelerating their own work. There have been similar calls to limit the power of open source AI models. But, again, any group anywhere in the world can release their powerful AI model weig...
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Provably Safe AI: Worldview and Projects, published by bgold on August 10, 2024 on LessWrong. In September 2023, Max Tegmark and Steve Omohundro proposed "Provably Safe AI" as a strategy for AI Safety. In May 2024, a larger group delineated the broader concept of "Guaranteed Safe AI" which includes Provably Safe AI and other related strategies. In July, 2024, Ben Goldhaber and Steve discussed Provably Safe AI and its future possibilities, as summarized in this document. Background In June 2024, ex-OpenAI AI Safety Researcher Leopold Aschenbrenner wrote a 165-page document entitled "Situational Awareness, The Decade Ahead" summarizing AI timeline evidence and beliefs which are shared by many frontier AI researchers. He argued that human-level AI is likely by 2027 and will likely lead to superhuman AI in 2028 or 2029. "Transformative AI" was coined by Open Philanthropy to describe AI which can "precipitate a transition comparable to the agricultural or industrial revolution". There appears to be a significant probability that Transformative AI may be created by 2030. If this probability is, say, greater than 10%, then humanity must immediately begin to prepare for it. The social changes and upheaval caused by Transformative AI are likely to be enormous. There will likely be many benefits but also many risks and dangers, perhaps even existential risks for humanity. Today's technological infrastructure is riddled with flaws and security holes. Power grids, cell service, and internet services have all been very vulnerable to accidents and attacks. Terrorists have attacked critical infrastructure as a political statement. Today's cybersecurity and physical security barely keeps human attackers at bay. When these groups obtain access to powerful cyberattack AI's, they will likely be able to cause enormous social damage and upheaval. Humanity has known how to write provably correct and secure software since Alan Turing's 1949 paper. Unfortunately, proving program correctness requires mathematical sophistication and it is rare in current software development practice. Fortunately, modern deep learning systems are becoming proficient at proving mathematical theorems and generating provably correct code. When combined with techniques like "autoformalization," this should enable powerful AI to rapidly replace today's flawed and insecure codebase with optimized, secure, and provably correct replacements. Many researchers working in these areas believe that AI theorem-proving at the level of human PhD's is likely about two years away. Similar issues plague hardware correctness and security, and it will be a much larger project to replace today's flawed and insecure hardware. Max and Steve propose formal methods grounded in mathematical physics to produce provably safe physical designs. The same AI techniques which are revolutionizing theorem proving and provable software synthesis are also applicable to provable hardware design. Finally, today's social mechanisms like money, contracts, voting, and the structures of governance, will also need to be updated for the new realities of an AI-driven society. Here too, the underlying rules of social interaction can be formalized, provably effective social protocols can be designed, and secure hardware implementing the new rules synthesized using powerful theorem proving AIs. What's next? Given the huge potential risk of uncontrolled powerful AI, many have argued for a pause in Frontier AI development. Unfortunately, that does not appear to be a stable solution. Even if the US paused its AI development, China or other countries could gain an advantage by accelerating their own work. There have been similar calls to limit the power of open source AI models. But, again, any group anywhere in the world can release their powerful AI model weig...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems, published by Joar Skalse on May 17, 2024 on The AI Alignment Forum. I want to draw attention to a new paper, written by myself, David "davidad" Dalrymple, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, and Joshua Tenenbaum. In this paper we introduce the concept of "guaranteed safe (GS) AI", which is a broad research strategy for obtaining safe AI systems with provable quantitative safety guarantees. Moreover, with a sufficient push, this strategy could plausibly be implemented on a moderately short time scale. The key components of GS AI are: 1. A formal safety specification that mathematically describes what effects or behaviors are considered safe or acceptable. 2. A world model that provides a mathematical description of the environment of the AI system. 3. A verifier that provides a formal proof (or some other comparable auditable assurance) that the AI system satisfies the safety specification with respect to the world model. The first thing to note is that a safety specification in general is not the same thing as a reward function, utility function, or loss function (though they include these objects as special cases). For example, it may specify that the AI system should not communicate outside of certain channels, copy itself to external computers, modify its own source code, or obtain information about certain classes of things in the external world, etc. The safety specifications may be specified manually, generated by a learning algorithm, written by an AI system, or obtained through other means. Further detail is provided in the main paper. The next thing to note is that most useful safety specifications must be given relative to a world model. Without a world model, we can only use specifications defined directly over input-output relations. However, we want to define specifications over input-outcome relations instead. This is why a world model is a core component of GS AI. Also note that: 1. The world model need not be a "complete" model of the world. Rather, the required amount of detail and the appropriate level of abstraction depends on both the safety specification(s) and the AI system's context of use. 2. The world model should of course account for uncertainty, which may include both stochasticity and nondeterminism. 3. The AI system whose safety is being verified may or may not use a world model, and if it does, we may or may not be able to extract it. However, the world model that is used for the verification of the safety properties need not be the same as the world model of the AI system whose safety is being verified (if it has one). The world model would likely have to be AI-generated, and should ideally be interpretable. In the main paper, we outline a few potential strategies for producing such a world model. Finally, the verifier produces a quantitative assurance that the base-level AI controller satisfies the safety specification(s) relative to the world model(s). In the most straightforward form, this could simply take the shape of a formal proof. However, if a direct formal proof cannot be obtained, then there are weaker alternatives that would still produce a quantitative guarantee. For example, the assurance may take the form of a proof that bounds the probability of failing to satisfy the safety specification, or a proof that the AI system will converge towards satisfying the safety specification (with increasing amounts of data or computational resources, for example). Such proofs are of course often very hard to obtain. However, further progress in automated theorem proving (and relat...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems, published by Gunnar Zarncke on May 16, 2024 on LessWrong. Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems, published by Gunnar Zarncke on May 16, 2024 on LessWrong. Authors: David "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua Tenenbaum Abstract: Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches. Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org
In Episode #16, AI Risk Denier Down, things get weird. This show did not have to be like this. Our guest in Episode #16 is Timothy Lee, a computer scientist and journalist who founded and runs understandingai.org. Tim has written about AI risk many times, including these two recent essays: https://www.understandingai.org/p/why... https://www.understandingai.org/p/why... Tim was not prepared to discuss this work, which is when things started to get off the rails. For Humanity: An AI Safety Podcast, is the accessible AI Safety Podcast for all humans, no tech background required. Our show focuses solely on the threat of human extinction from AI. MY QUESTIONS FOR TIM (We didn't even get halfway through lol, Youtube wont let me put all of them so I'm just putting the second essay questions) OK lets get into your second essay "Why I'm not afraid of superintelligent AI taking over the world" from 11/15/23 -You find chess as a striking example of how AI will not take over the world-But I'd like to talk about AI safety researcher Steve Omohundro's take on chess-He says if you had an unaligned AGI you asked to get better at chess, it would first break into other servers to steal computing power so it would be better at Chess. Then when you discover this and try to stop it by turning it off, it sees your turning it off as a threat to it's improving at chess, so it murders you. -Where is he wrong? -You wrote: “Think about a hypothetical graduate student. Let's say that she was able to reach the frontiers of physics knowledge after reading 20 textbooks. Could she have achieved a superhuman understanding of physics by reading 200 textbooks? Obviously not. Those extra 180 textbooks contain a lot of words, they don't contain very much knowledge she doesn't already have. So too with AI systems. I suspect that on many tasks, their performance will start to plateau around human-level performance. Not because they “run out of data,” but because they reached the frontiers of human knowledge.” -In this you seem to assume that any one human is capable of mastering all of knowledge in a subject area better than any AI, because you seem to believe that one human is capable of holding ALL of the knowledge available on a given subject. -This is ludicrous to me. You think humans are far too special. -AN AGI WILL HAVE READ EVERY BOOK EVER WRITTEN. MILLIONS OF BOOKS. ACTIVELY CROSS-REFERENCING ACROSS EVERY DISCIPLINE. -How could any humans possibly compete with an AGI system than never sleeps and can read every word ever written in any language? No human could ever do this. -Are you saying humans are the most perfect vessels of knowledge consumption possible in the universe? -A human who has read 1000 books on one area can compete with an AGI who has read millions of books in thousands of areas for knowledge? Really? -You wrote: “AI safetyists assume that all problems can be solved with the application of enough brainpower. But for many problems, having the right knowledge matters more. And a lot of economically significant knowledge is not contained in any public data set. It's locked up in the brains and private databases of millions of individuals and organizations spread across the economy and around the world.” -Why do you assume an unaligned AGI would not raid every private database on earth in a very short time and take in all this knowledge you find so special? -Does this claim rest on the security protocols of the big AI companies? -Security protocols, even at OpenAI, are seen to be highly vulnerable to large-scale nation-state hacking. If China could hack into OpenAI, and AGI could surely hack into either or anything. An AGI's ability to spot and exploit vulnerabilities in human written code is widely predicted. -Lets see if we can leave this conversation with a note of agreement. Is there anything you think we can agree on?
AI systems have become more powerful in the last few years, and are expected to become even more powerful in the years ahead. The question naturally arises: what, if anything, should humanity be doing to increase the likelihood that these forthcoming powerful systems will be safe, rather than destructive?Our guest in this episode has a long and distinguished history of analysing that question, and he has some new proposals to share with us. He is Steve Omohundro, the CEO of Beneficial AI Research, an organisation which is working to ensure that artificial intelligence is safe and beneficial for humanity.Steve has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. He went on to be an award-winning computer science professor at the University of Illinois. At that time, he developed the notion of basic AI drives, which we talk about shortly, as well as a number of potential key AI safety mechanisms.Among many other roles which are too numerous to mention here, Steve served as a Research Scientist at Meta, the parent company of Facebook, where he worked on generative models and AI-based simulation, and he is an advisor to MIRI, the Machine Intelligence Research Institute.Selected follow-ups:Steve Omohundro: Innovative ideas for a better worldMetaculus forecast for the date of weak AGI"The Basic AI Drives" (PDF, 2008)TED Talk by Max Tegmark: How to Keep AI Under ControlApple Secure EnclaveMeta Research: Teaching AI advanced mathematical reasoningDeepMind AlphaGeometryMicrosoft Lean theorem proverTerence Tao (Wikipedia)NeurIPS Tutorial on Machine Learning for Theorem Proving (2023)The team at MIRIMusic: Spike Protein, by Koi Discovery, available under CC0 1.0 Public Domain Declaration
Gaming the Future: Technologies for Intelligent Voluntary CooperationTrent McConaghy: I'm working on Ocean Protocol. Ocean is a public utility network to safely & securely share AI data while preserving privacy. It has applications to data marketplaces and digital sovereignty. Steve Omohundro: Steve has been a scientist, professor, author, software architect, and entrepreneur. He is a Research Scientist at Facebook working on AI-based simulation. He has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. He was an award-winning computer science professor at the University of Illinois at Champaign-Urbana and cofounded the Center for Complex Systems Research. He was the Chief Scientist of AIBrain and served on its Board of Directors. He was the Founder and CEO of Possibility Research which worked to develop new foundations for Artificial Intelligence based on precise mathematical semantics and Self-Aware Systems which worked to make intelligent technologies beneficial for humanity.This episode is a discussion on chapter 9 of the book: WELCOME NEW PLAYERS | Artificial IntelligencesSession summary: T. McConaghy & S. Omohundro: WELCOME NEW PLAYERS | Gaming the Future Book Club Chapter 9 - Foresight InstituteThe Foresight Institute is a research organization and non-profit that supports the beneficial development of high-impact technologies. Since our founding in 1987 on a vision of guiding powerful technologies, we have continued to evolve into a many-armed organization that focuses on several fields of science and technology that are too ambitious for legacy institutions to support.Allison Duettmann is the president and CEO of Foresight Institute. She directs the Intelligent Cooperation, Molecular Machines, Biotech & Health Extension, Neurotech, and Space Programs, Fellowships, Prizes, and Tech Trees, and shares this work with the public. She founded Existentialhope.com, co-edited Superintelligence: Coordination & Strategy, co-authored Gaming the Future, and co-initiated The Longevity Prize. Apply to Foresight's virtual salons and in person workshops here!We are entirely funded by your donations. If you enjoy what we do please consider donating through our donation page.Visit our website for more content, or join us here:TwitterFacebookLinkedInEvery word ever spoken on this podcast is now AI-searchable using Fathom.fm, a search engine for podcasts. Hosted on Acast. See acast.com/privacy for more information.
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Money: The Unit of Caring, published by Eliezer Yudkowsky on the LessWrong. Steve Omohundro has suggested a folk theorem to the effect that, within the interior of any approximately rational, self-modifying agent, the marginal benefit of investing additional resources in anything ought to be about equal. Or, to put it a bit more exactly, shifting a unit of resource between any two tasks should produce no increase in expected utility, relative to the agent's utility function and its probabilistic expectations about its own algorithms. This resource balance principle implies that—over a very wide range of approximately rational systems, including even the interior of a self-modifying mind—there will exist some common currency of expected utilons, by which everything worth doing can be measured. In our society, this common currency of expected utilons is called "money". It is the measure of how much society cares about something. This is a brutal yet obvious point, which many are motivated to deny. With this audience, I hope, I can simply state it and move on. It's not as if you thought "society" was intelligent, benevolent, and sane up until this point, right? I say this to make a certain point held in common across many good causes. Any charitable institution you've ever had a kind word for, certainly wishes you would appreciate this point, whether or not they've ever said anything out loud. For I have listened to others in the nonprofit world, and I know that I am not speaking only for myself here... Many people, when they see something that they think is worth doing, would like to volunteer a few hours of spare time, or maybe mail in a five-year-old laptop and some canned goods, or walk in a march somewhere, but at any rate, not spend money. Believe me, I understand the feeling. Every time I spend money I feel like I'm losing hit points. That's the problem with having a unified quantity describing your net worth: Seeing that number go down is not a pleasant feeling, even though it has to fluctuate in the ordinary course of your existence. There ought to be a fun-theoretic principle against it. But, well... There is this very, very old puzzle/observation in economics about the lawyer who spends an hour volunteering at the soup kitchen, instead of working an extra hour and donating the money to hire someone to work for five hours at the soup kitchen. There's this thing called "Ricardo's Law of Comparative Advantage". There's this idea called "professional specialization". There's this notion of "economies of scale". There's this concept of "gains from trade". The whole reason why we have money is to realize the tremendous gains possible from each of us doing what we do best. This is what grownups do. This is what you do when you want something to actually get done. You use money to employ full-time specialists. Yes, people are sometimes limited in their ability to trade time for money (underemployed), so that it is better for them if they can directly donate that which they would usually trade for money. If the soup kitchen needed a lawyer, and the lawyer donated a large contiguous high-priority block of lawyering, then that sort of volunteering makes sense—that's the same specialized capability the lawyer ordinarily trades for money. But "volunteering" just one hour of legal work, constantly delayed, spread across three weeks in casual minutes between other jobs? This is not the way something gets done when anyone actually cares about it, or to state it near-equivalently, when money is involved. To the extent that individuals fail to grasp this principle on a gut level, they may think that the use of money is somehow optional in the pursuit of things that merely seem morally desirable—as opposed to tasks like feeding ourselves, whose desirability seems to be treated ...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Seeking Power is Often Convergently Instrumental in MDPs , published by TurnTrout, elriggs on the LessWrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is a linkpost for In 2008, Steve Omohundro's foundational paper The Basic AI Drives conjectured that superintelligent goal-directed AIs might be incentivized to gain significant amounts of power in order to better achieve their goals. Omohundro's conjecture bears out in toy models, and the supporting philosophical arguments are intuitive. In 2019, the conjecture was even debated by well-known AI researchers. Power-seeking behavior has been heuristically understood as an anticipated risk, but not as a formal phenomenon with a well-understood cause. The goal of this post (and the accompanying paper, Optimal Policies Tend to Seek Power) is to change that. Motivation It's 2008, the ancient wild west of AI alignment. A few people have started thinking about questions like “if we gave an AI a utility function over world states, and it actually maximized that utility... what would it do?" In particular, you might notice that wildly different utility functions seem to encourage similar strategies. Resist shutdown? Gain computational resources? Prevent modification of utility function? Paperclip utility ✔️ ✔️ ✔️ Blue webcam pixel utility ✔️ ✔️ ✔️ People-look-happy utility ✔️ ✔️ ✔️ These strategies are unrelated to terminal preferences: the above utility functions do not award utility to e.g. resource gain in and of itself. Instead, these strategies are instrumental: they help the agent optimize its terminal utility. In particular, a wide range of utility functions incentivize these instrumental strategies. These strategies seem to be convergently instrumental. But why? I'm going to informally explain a formal theory which makes significant progress in answering this question. I don't want this post to be Optimal Policies Tend to Seek Power with cuter illustrations, so please refer to the paper for the math. You can read the two concurrently. We can formalize questions like “do ‘most' utility maximizers resist shutdown?” as “Given some prior beliefs about the agent's utility function, knowledge of the environment, and the fact that the agent acts optimally, with what probability do we expect it to be optimal to avoid shutdown?” The table's convergently instrumental strategies are about maintaining, gaining, and exercising power over the future, in some sense. Therefore, this post will help answer: What does it mean for an agent to “seek power”? In what situations should we expect seeking power to be more probable under optimality, than not seeking power? This post won't tell you when you should seek power for your own goals; this post illustrates a regularity in optimal action across different goals one might pursue. Formalizing Convergent Instrumental Goals suggests that the vast majority of utility functions incentivize the agent to exert a lot of control over the future, assuming that these utility functions depend on “resources.” This is a big assumption: what are “resources”, and why must the AI's utility function depend on them? We drop this assumption, assuming only unstructured reward functions over a finite Markov decision process (MDP), and show from first principles how power-seeking can often be optimal. Formalizing the Environment My theorems apply to finite MDPs; for the unfamiliar, I'll illustrate with Pac-Man. Full observability: You can see everything that's going on; this information is packaged in the state s. In Pac-Man, the state is the game screen. Markov transition function: the next state depends only on the choice of action a and the current state s. It doesn't matter how we got into a situation. Discounted reward: future rewards get geometrically discoun...
Artificial Intelligence is transforming every aspect of business and society. The usual narrative focuses on monolithic AIs owned by large corporations and governments that promote the interests of the powerful. But imagine a world in which each person has their own "personal AI," which deeply models their beliefs, desires, and values and promotes those interests. Such agents enable much richer and more frequent "semantic voting," improving feedback for governance. They dramatically change the incentives for advertisers and news sources. When personal agents filter manipulative and malicious content, it incentivizes the creation of content aligned with a person's values. Personal AI agents will dramatically transform economic transactions, social interactions, personal transformation, and the ability to contribute to the greater good. But there are also many challenges, and new ideas are needed. Join this fireside chat to discuss the possibilities and perils of personal AIs and how they relate to the RadicalXChange movement. SPEAKERSSteve Omohundro has been a scientist, professor, author, software architect, and entrepreneur and is developing the next generation of artificial intelligence. He has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. He was an award-winning computer science professor at the University of Illinois at Champaign-Urbana and cofounded the Center for Complex Systems Research. He is the Chief Scientist of AIBrain and serves on its Board of Directors. AIBrain is creating new AI technologies for learning, conversation, robotics, simulation, and music and has offices in Menlo Park, Seoul, Berlin, and Shenzhen. It is creating Turingworld, a powerful AI learning social media platform based on AI-optimized learning, AI-powered gamification, and AI-enhanced social interaction. He is also Founder and CEO of Possibility Research which is working to develop new foundations for Artificial Intelligence based on precise mathematical semantics and Self-Aware Systems which is working to ensure that intelligent technologies have a positive impact. Steve published the book “Geometric Perturbation Theory in Physics”, designed the first data parallel language StarLisp, wrote the 3D graphics for Mathematica, developed fast neural data structures like balltrees, designed the fastest and safest object-oriented language Sather, invented manifold learning, co-created the first neural focus of attention systems, co-designed the best lip reading system, invented model merging for fast one-shot learning, co-designed the best stochastic grammar learning system, co-created the first Bayesian image search engine PicHunter, invented self-improving AI, discovered the Basic AI Drives, and proposed many of the basic AI safety mechanisms including AI smart contracts. Steve is an award-winning teacher and has given hundreds of talks around the world. Some of his talks and scientific papers are available here. He holds the vision that new technologies can help humanity create a more compassionate, peaceful, and life-serving world. Puja Ohlhaver is a technologist and lawyer who explores the intersection of technology, democracy, and markets. She is an advocate of digital social innovation, as a path to rebooting democracy and testing regulatory innovations. She is an inventor and founder of ClearPath Surgical, a company that seeks to improve health outcomes in minimally invasive surgery. She holds a law degree from Stanford Law School and was previously an investment management attorney.
Today's guest is the great and brilliant Dr. Steve Omohundro, Chief Scientist at AIBrain. AIBrain is creating Turingworld, a powerful AI learning social media platform based on AI-optimized learning, AI-powered gamification, and AI-enhanced social interaction. Dr. Steve Omohundro received his Ph.D. in Physics from U.C. Berkeley. He also founded an organization to support AI safety and another organization to advance new intelligence architectures based on the programming language Omda, the specification language Omex, and the semantics language Omai. Episode topics include: how humans can build safe AI, what facets of AI development might/might-not require global governance, how the international community might best collaborate to prioritize AGI development efforts, and how AI may influence our lives as consumers.
Steve Omohundro is an AI researcher, and currently Chief Scientist for AIBrain, a company creating new technologies for learning, conversation, robotics, simulation, and music. Previous to this, he was an award-winning computer science professor at the University of Illinois at Champaign-Urbanaand cofounded the Center for Complex Systems Research there. He has degrees in Physics and Mathematics from Stanford University and a Ph.D. in Physics from University of California, Berkeley.This podcast focuses on Steve's research paper: 'Autonomous technology and the greater human good'. In the paper he discusses the ethical implications of artificial intelligence, and how we can ensure that we design machines that are safe and positively benefit humankind.Find out more about this episode, and our 12-week learning program for researchers at: www.howresearchers.comShare your thoughts on the episode on Facebook, Twitter, and LinkedIn @howresearchers or use #howresearchers
Prof. (Dr.) Steve Omohundro, President at Possibility Research based in the United States participates in Risk Roundup to discuss the Rise of Algorithms in Decision-Making. The rise of Algorithms in Decision- Making In pursuit of automation-driven efficiencies, the rapidly evolving Artificial Intelligence (AI) techniques, such as neural networks, machine-learning systems, predictive analytics, speech recognition, natural language […] The post The Rise of Algorithms in Decision-Making appeared first on Risk Group.
In this episode, Artificial Intelligence scientist and buddhist teacher Nikki Mirghafori and computer scientist Steve Omohundro discuss how the concept of karma can guide us as we push forward towards creating non-human intelligence.
Ep. 47 - Dr. Steve Omohundro on AI, Robots, and Empathy Steve’s Homepage Reinforcement and Inverse Reinforcement Learning Kate Darling on robots and empathy Video NPR Interview Video: Boston Dynamics robots getting knocked down, but getting up again! Eliza, first chatbot Replika Julie Twitter conversation about Replika
Steve Omohundro shares plans for creating provably correct protections against AI superintelligence, and thoughts on how human values can be embued into AI. Resource allocation, decentralized cooperation, and discussions on how Blockchain Proofs of Work/Stake can possibly be compatible with basic needs.
Steve Omohundro RoboPsych Podcast Show Notes This episode include spoilers for AMC's Humans. AMC Humans website, including free episodes Steve Omohundro website Mickey Mouse in “The Sorcerer’s Apprentice” Jibo Pepper Eliza computerized psychotherapist Julie Carpenter, Ph.D. interview on RoboPsych Podcast School of Visual Arts, Master’s in Branding program The Hidden Persuaders, Vance Packard, 1957 Kevin Kelly podcast on Tim Ferriss Show Population dynamics in 21st Century Future shock Psychodynamics of approach-avoidance conflicts South Korean autonomous turret weapon Faraday cage Isaac Asimov’s three laws of robotics
Dr. Steve Omohundro earned degrees in physics and mathematics from Stanford University and a Ph.D. in physics from the University of California, Berkeley. Today, most of his best-known work In this interview, Dr. Omohundro shares his thoughts about how we might “scaffold” our security measures to match the intelligence of our machines in order to prevent them from getting out of hand.
Google, IBM, Microsoft, Apple, Facebook, Baidu, Foxconn, and others have recently made multi-billion dollar investments in artificial intelligence and robotics. Some of these investments are aimed at increasing productivity and enhancing coordination and cooperation. Others are aimed at creating strategic gains in competitive interactions. This is creating “arms races” in high-frequency trading, cyber warfare, drone warfare, stealth technology, surveillance systems, and missile warfare. Cynthia Brian finds out what ASI is and how it will affect our world with scientist, author, entrepreneur, professor, and expert on artificial super intelligence, Dr. Steve Omohundro. Fat shaming has been a #hashtag topic of conversation with people posting negative comments on social media about people's body. Now the new trend is skinny shaming. In Health Matters, Heather Brittany tackles way to stop the cyber-bullying and harmful judging.
Steve Omohundro is a scientist, professor, author, software architect, and entrepreneur doing research on natural and artificial intelligence. He has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. On this episode of Bulletproof Radio, tune in to discover the secret to Steve’s weight loss breakthrough, why Dave fired his doctor, and how you should define artificial intelligence. You’ll also hear about mycotoxins, transhumanism, and non-violent communication. There are a ton of resources below, be sure to check them out!
Steve Omohundro is a scientist, professor, author, software architect, and entrepreneur doing research on natural and artificial intelligence. He has degrees in Physics and Mathematics from Stanford and a Ph.D. in Physics from U.C. Berkeley. On this episode of Bulletproof Radio, tune in to discover the secret to Steve’s weight loss breakthrough, why Dave fired his doctor, and how you should define artificial intelligence. You’ll also hear about mycotoxins, transhumanism, and non-violent communication. There are a ton of resources below, be sure to check them out!
Steve Omohundro is a scientist, professor, author, and entrepreneur who has a Ph.D. in physics but has spent decades studying intelligent systems and artificial intelligence. His research into the basic “AI Drives” was featured in James Barrat’s recent book Our Final Invention and has been generating international interest. And so I was very happy to […]