POPULARITY
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Contra "Strong Coherence", published by DragonGod on March 4, 2023 on LessWrong. Polished from my shortform See also: Is "Strong Coherence" Anti-Natural? Introduction Many AI risk failure modes imagine strong coherence/goal directedness (e.g. [expected] utility maximisers).Such strong coherence is not represented in humans (or any other animal), seems unlikely to emerge from deep learning and may be "anti-natural" to general intelligence in our universe. I suspect the focus on strongly coherent systems was a mistake that set the field back a bit, and it's not yet fully recovered from that error.I think most of the AI safety work for strongly coherent agents (e.g. decision theory) will end up inapplicable/useless for aligning powerful systems, because powerful systems in the real world are "of an importantly different type". Ontological Error? I don't think it nails everything, but on a purely ontological level, @Quintin Pope and @TurnTrout's shard theory feels a lot more right to me than e.g. HRAD. HRAD is based on an ontology that seems to me to be mistaken/flawed in important respects. The shard theory account of value formation (while lacking) seems much more plausible as an account of how intelligent systems develop values (where values are "contextual influences on decision making") than the immutable terminal goals in strong coherence ontologies. I currently believe that (immutable) terminal goals is just a wrong frame for reasoning about generally intelligent systems in our world (e.g. humans, animals and future powerful AI systems). Theoretical Justification and Empirical Investigation Needed I'd be interested in more investigation into what environments/objective functions select for coherence and to what degree said selection occurs.And empirical demonstrations of systems that actually become more coherent as they are trained for longer/"scaled up" or otherwise amplified. I want advocates of strong coherence to explain why agents operating in rich environments (e.g. animals, humans) or sophisticated ML systems (e.g. foundation models) aren't strongly coherent.And mechanistic interpretability analysis of sophisticated RL agents (e.g. AlphaStar, OpenAI Five [or replications thereof]) to investigate their degree of coherence. Conclusions Currently, I think strong coherence is unlikely (plausibly "anti-natural") and am unenthusiastic about research agendas and threat models predicated on strong coherence. Disclaimer The above is all low confidence speculation, and I may well be speaking out of my ass. By "strong coherence/goal directedness" I mean something like: Informally: a system has immutable terminal goals. Semi-formally: a system's decision making is well described as (an approximation) of argmax over actions (or higher level mappings thereof) to maximise the expected value of a single fixed utility function over states. You cannot well predict the behaviour/revealed preferences of humans or other animals by the assumption that they have immutable terminal goals or are expected utility maximisers. The ontology that intelligent systems in the real world instead have "values" (contextual influences on decision making) seems to explain their observed behaviour (and purported "incoherencies") better. Many observed values in humans and other mammals (see) (e.g. fear, play/boredom, friendship/altruism, love, etc.) seem to be values that were instrumental for increasing inclusive genetic fitness (promoting survival, exploration, cooperation and sexual reproduction/survival of progeny respectively). Yet, humans and mammals seem to value these terminally and not because of their instrumental value on inclusive genetic fitness. That the instrumentally convergent goals of evolution's fitness criterion manifested as "terminal" values in mammals is IMO strong empiric...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Contra "Strong Coherence", published by DragonGod on March 4, 2023 on LessWrong. Polished from my shortform See also: Is "Strong Coherence" Anti-Natural? Introduction Many AI risk failure modes imagine strong coherence/goal directedness (e.g. [expected] utility maximisers).Such strong coherence is not represented in humans (or any other animal), seems unlikely to emerge from deep learning and may be "anti-natural" to general intelligence in our universe. I suspect the focus on strongly coherent systems was a mistake that set the field back a bit, and it's not yet fully recovered from that error.I think most of the AI safety work for strongly coherent agents (e.g. decision theory) will end up inapplicable/useless for aligning powerful systems, because powerful systems in the real world are "of an importantly different type". Ontological Error? I don't think it nails everything, but on a purely ontological level, @Quintin Pope and @TurnTrout's shard theory feels a lot more right to me than e.g. HRAD. HRAD is based on an ontology that seems to me to be mistaken/flawed in important respects. The shard theory account of value formation (while lacking) seems much more plausible as an account of how intelligent systems develop values (where values are "contextual influences on decision making") than the immutable terminal goals in strong coherence ontologies. I currently believe that (immutable) terminal goals is just a wrong frame for reasoning about generally intelligent systems in our world (e.g. humans, animals and future powerful AI systems). Theoretical Justification and Empirical Investigation Needed I'd be interested in more investigation into what environments/objective functions select for coherence and to what degree said selection occurs.And empirical demonstrations of systems that actually become more coherent as they are trained for longer/"scaled up" or otherwise amplified. I want advocates of strong coherence to explain why agents operating in rich environments (e.g. animals, humans) or sophisticated ML systems (e.g. foundation models) aren't strongly coherent.And mechanistic interpretability analysis of sophisticated RL agents (e.g. AlphaStar, OpenAI Five [or replications thereof]) to investigate their degree of coherence. Conclusions Currently, I think strong coherence is unlikely (plausibly "anti-natural") and am unenthusiastic about research agendas and threat models predicated on strong coherence. Disclaimer The above is all low confidence speculation, and I may well be speaking out of my ass. By "strong coherence/goal directedness" I mean something like: Informally: a system has immutable terminal goals. Semi-formally: a system's decision making is well described as (an approximation) of argmax over actions (or higher level mappings thereof) to maximise the expected value of a single fixed utility function over states. You cannot well predict the behaviour/revealed preferences of humans or other animals by the assumption that they have immutable terminal goals or are expected utility maximisers. The ontology that intelligent systems in the real world instead have "values" (contextual influences on decision making) seems to explain their observed behaviour (and purported "incoherencies") better. Many observed values in humans and other mammals (see) (e.g. fear, play/boredom, friendship/altruism, love, etc.) seem to be values that were instrumental for increasing inclusive genetic fitness (promoting survival, exploration, cooperation and sexual reproduction/survival of progeny respectively). Yet, humans and mammals seem to value these terminally and not because of their instrumental value on inclusive genetic fitness. That the instrumentally convergent goals of evolution's fitness criterion manifested as "terminal" values in mammals is IMO strong empiric...
A brief Segway into Artificial Intelligence and possible applications in conjunction with IAP. Our guest Jousef Murad discusses with us his thoughts about AI and its applications in the general engineering industry. Topics Covered:(01:14) - Introducing Jousef Murad(04:01) - Intro to Artificial Intelligence, Machine Learning vs. Human Learning(08:35) - AI in the movies.(12:09) - ChatGPT and how AI can benefit and destruct society (14:33) - AI in 2001 Space Odyssey, Apollo 13, and ethical decision making(17:47) - The Book Life 3.0, machines tricking and manipulating humans (19:17) - Issac Asimov's 3 Laws of Robotics, and how to design machines to "learn" more (25:18) - Stable Diffusion, text to image AIx(27:28) - AI in games and other situations, and how AI can "bluff"(31:28) - Cybernetics and transhumanism (34:05) - Envisioning practical use of AI with IAPResources Mentioned:Jousef's website: https://www.jousefmurad.com/AlphaGo Documentary: https://www.youtube.com/watch?v=WXuK6gekU1YStable Diffusion: https://stablediffusionweb.com/OpenAI Five: https://openai.com/blog/openai-five/ChatGPT: https://openai.com/blog/chatgpt/Podcast Production Services by EveryWord Media
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fun with +12 OOMs of Compute, published by Daniel Kokotajlo on the LessWrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. Or: Big Timelines Crux Operationalized What fun things could one build with +12 orders of magnitude of compute? By ‘fun' I mean ‘powerful.' This hypothetical is highly relevant to AI timelines, for reasons I'll explain later. Summary (Spoilers): I describe a hypothetical scenario that concretizes the question “what could be built with 2020's algorithms/ideas/etc. but a trillion times more compute?” Then I give some answers to that question. Then I ask: How likely is it that some sort of TAI would happen in this scenario? This second question is a useful operationalization of the (IMO) most important, most-commonly-discussed timelines crux: “Can we get TAI just by throwing more compute at the problem?” I consider this operationalization to be the main contribution of this post; it directly plugs into Ajeya's timelines model and is quantitatively more cruxy than anything else I know of. The secondary contribution of this post is my set of answers to the first question: They serve as intuition pumps for my answer to the second, which strongly supports my views on timelines. The hypothetical In 2016 the Compute Fairy visits Earth and bestows a blessing: Computers are magically 12 orders of magnitude faster! Over the next five years, what happens? The Deep Learning AI Boom still happens, only much crazier: Instead of making AlphaStar for 10^23 floating point operations, DeepMind makes something for 10^35. Instead of making GPT-3 for 10^23 FLOPs, OpenAI makes something for 10^35. Instead of industry and academia making a cornucopia of things for 10^20 FLOPs or so, they make a cornucopia of things for 10^32 FLOPs or so. When random grad students and hackers spin up neural nets on their laptops, they have a trillion times more compute to work with. [EDIT: Also assume magic +12 OOMs of memory, bandwidth, etc. All the ingredients of compute.] For context on how big a deal +12 OOMs is, consider the graph below, from ARK. It's measuring petaflop-days, which are about 10^20 FLOP each. So 10^35 FLOP is 1e+15 on this graph. GPT-3 and AlphaStar are not on this graph, but if they were they would be in the very top-right corner. Question One: In this hypothetical, what sorts of things could AI projects build? I encourage you to stop reading, set a five-minute timer, and think about fun things that could be built in this scenario. I'd love it if you wrote up your answers in the comments! My tentative answers: Below are my answers, listed in rough order of how ‘fun' they seem to me. I'm not an AI scientist so I expect my answers to overestimate what could be done in some ways, and underestimate in other ways. Imagine that each entry is the best version of itself, since it is built by experts (who have experience with smaller-scale versions) rather than by me. OmegaStar: In our timeline, it cost about 10^23 FLOP to train AlphaStar. (OpenAI Five, which is in some ways more impressive, took less!) Let's make OmegaStar like AlphaStar only +7 OOMs bigger: the size of a human brain.[1] [EDIT: You may be surprised to learn, as I was, that AlphaStar has about 10% as many parameters as a honeybee has synapses! Playing against it is like playing against a tiny game-playing insect.] Larger models seem to take less data to reach the same level of performance, so it would probably take at most 10^30 FLOP to reach the same level of Starcraft performance as AlphaStar, and indeed we should expect it to be qualitatively better.[2] So let's do that, but also train it on lots of other games too.[3] There are 30,000 games in the Steam Library. We train OmegaStar long enough that it has as much time on each game as AlphaStar had on Starcraft. Wi...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Fun with +12 OOMs of Compute, published by Daniel Kokotajlo on the AI Alignment Forum. Or: Big Timelines Crux Operationalized What fun things could one build with +12 orders of magnitude of compute? By ‘fun' I mean ‘powerful.' This hypothetical is highly relevant to AI timelines, for reasons I'll explain later. Summary (Spoilers): I describe a hypothetical scenario that concretizes the question “what could be built with 2020's algorithms/ideas/etc. but a trillion times more compute?” Then I give some answers to that question. Then I ask: How likely is it that some sort of TAI would happen in this scenario? This second question is a useful operationalization of the (IMO) most important, most-commonly-discussed timelines crux: “Can we get TAI just by throwing more compute at the problem?” I consider this operationalization to be the main contribution of this post; it directly plugs into Ajeya's timelines model and is quantitatively more cruxy than anything else I know of. The secondary contribution of this post is my set of answers to the first question: They serve as intuition pumps for my answer to the second, which strongly supports my views on timelines. The hypothetical In 2016 the Compute Fairy visits Earth and bestows a blessing: Computers are magically 12 orders of magnitude faster! Over the next five years, what happens? The Deep Learning AI Boom still happens, only much crazier: Instead of making AlphaStar for 10^23 floating point operations, DeepMind makes something for 10^35. Instead of making GPT-3 for 10^23 FLOPs, OpenAI makes something for 10^35. Instead of industry and academia making a cornucopia of things for 10^20 FLOPs or so, they make a cornucopia of things for 10^32 FLOPs or so. When random grad students and hackers spin up neural nets on their laptops, they have a trillion times more compute to work with. [EDIT: Also assume magic +12 OOMs of memory, bandwidth, etc. All the ingredients of compute.] For context on how big a deal +12 OOMs is, consider the graph below, from ARK. It's measuring petaflop-days, which are about 10^20 FLOP each. So 10^35 FLOP is 1e+15 on this graph. GPT-3 and AlphaStar are not on this graph, but if they were they would be in the very top-right corner. Question One: In this hypothetical, what sorts of things could AI projects build? I encourage you to stop reading, set a five-minute timer, and think about fun things that could be built in this scenario. I'd love it if you wrote up your answers in the comments! My tentative answers: Below are my answers, listed in rough order of how ‘fun' they seem to me. I'm not an AI scientist so I expect my answers to overestimate what could be done in some ways, and underestimate in other ways. Imagine that each entry is the best version of itself, since it is built by experts (who have experience with smaller-scale versions) rather than by me. OmegaStar: In our timeline, it cost about 10^23 FLOP to train AlphaStar. (OpenAI Five, which is in some ways more impressive, took less!) Let's make OmegaStar like AlphaStar only +7 OOMs bigger: the size of a human brain.[1] [EDIT: You may be surprised to learn, as I was, that AlphaStar has about 10% as many parameters as a honeybee has synapses! Playing against it is like playing against a tiny game-playing insect.] Larger models seem to take less data to reach the same level of performance, so it would probably take at most 10^30 FLOP to reach the same level of Starcraft performance as AlphaStar, and indeed we should expect it to be qualitatively better.[2] So let's do that, but also train it on lots of other games too.[3] There are 30,000 games in the Steam Library. We train OmegaStar long enough that it has as much time on each game as AlphaStar had on Starcraft. With a brain so big, maybe it'll start to do some transfer learning, acquiring g...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: , published by on the AI Alignment Forum. Highlights OpenAI Five (Many people at OpenAI): OpenAI has trained a team of five neural networks to play a particular set of Dota heroes in a mirror match (playing against the same set of heroes) with a few restrictions, and have started to beat amateur human players. They are aiming to beat a team of top professionals at The International in August, with the same set of five heroes, but without any other restrictions. Salient points: The method is remarkably simple -- it's a scaled up version of PPO with training data coming from self-play, with reward shaping and some heuristics for exploration, where each agent is implemented by an LSTM. There's no human data apart from the reward shaping and exploration heuristics. Contrary to most expectations, they didn't need anything fundamentally new in order to get long-term strategic planning. I was particularly surprised by this. Some interesting thoughts from OpenAI researchers in this thread -- in particular, assuming good exploration, the variance of the gradient should scale linearly with the duration, and so you might expect you only need linearly more samples to counteract this. They used 256 dedicated GPUs and 128,000 preemptible CPUs. A Hacker News comment estimates the cost at $2500 per hour, which would put the likely total cost in the millions of dollars. They simulate 900 years of Dota every day, which is a ratio of ~330,000:1, suggesting that each CPU is running Dota ~2.6x faster than real time. In reality, it's probably running many times faster than that, but preemptions, communication costs, synchronization etc. all lead to inefficiency. There was no explicit communication mechanism between agents, but they all get to observe the full Dota 2 state (not pixels) that any of the agents could observe, so communication is not really necessary. A version of the code with a serious bug was still able to train to beat humans. Not encouraging for safety. Alex Irpan covers some of these points in more depth in Quick Opinions on OpenAI Five. Gwern comments as well. My opinion: I might be more excited by an approach that was able to learn from human games (which are plentiful), and perhaps finetune with RL, in order to develop an approach that could generalize to more tasks in the future, where human data is available but a simulator is not. (Given the ridiculous sample complexity, pure RL with PPO can only be used in tasks with a simulator.) On the other hand, an approach that leveraged human data would necessarily be at least somewhat specific to Dota. A dependence on human data is unlikely to get us to general intelligence, whereas this result suggests that we can solve tasks that have a simulator, exploration strategy, and a dense reward function, which really is pushing the boundary on generality. This seems to be gdb's take: "We are very encouraged by the algorithmic implication of this result — in fact, it mirrors closely the story of deep learning (existing algorithms at large scale solve otherwise unsolvable problems). If you have a very hard problem for which you have a simulator, our results imply there is a real, practical path towards solving it. This still needs to be proven out in real-world domains, but it will be very interesting to see the full ramifications of this finding." Paul's research agenda FAQ (zhukeepa): Exactly what it sounds like. I'm not going to summarize it because it's long and covers a lot of stuff, but I do recommend it. Technical AI alignment Technical agendas and prioritization Conceptual issues in AI safety: the paradigmatic gap (Jon Gauthier): Lots of current work on AI safety focuses on what we can call "mid-term safety" -- the safety of AI systems that are more powerful and more broadly deployed than the ones we have t...
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Eight claims about multi-agent AGI safety, published by Richard Ngo on the AI Alignment Forum. There are quite a few arguments about how interactions between multiple AGIs affect risks from AGI development. I've identified at least eight distinct but closely-related claims which it seems worthwhile to disambiguate. I've split them up into four claims about the process of training AGIs, and four claims about the process of deploying AGIs; after listing them, I go on to explain each in more detail. Note that while I believe that all of these ideas are interesting enough to warrant further investigation, I don't currently believe that all of them are true as stated. In particular, I think that so far there's been little compelling explanation of why interactions between many aligned AIs might have castastrophic effects on the world (as is discussed in point 7). Claims about training 1. Multi-agent training is one of the most likely ways we might build AGI. 2. Multi-agent training is one of the most dangerous ways we might build AGI. 3. Multi-agent training is a regime in which standard safety techniques won't work. 4. Multi-agent training allows us to implement important new safety techniques. Claims about deployment 5. We should expect the first AGIs to be deployed in a world which already contains many nearly-as-good AIs. 6. We should expect AGIs to be deployed as multi-agent collectives. 7. Lack of coordination between multiple deployed AGIs is a major source of existential risk. 8. Conflict between multiple deployed AGIs risks causing large-scale suffering. Details and arguments 1. Multi-agent training is one of the most likely ways we might build AGI. The core argument for this thesis is that multi-agent interaction was a key feature of the evolution of human intelligence, by promoting both competition and cooperation. Competition between humans provides a series of challenges which are always at roughly the right level of difficulty; Liebo et al. (2019) call this an autocurriculum. Autocurricula were crucial for training sophisticated reinforcement learning agents like AlphaGo and OpenAI Five; it seems plausible that they will also play an important role in training AGIs. Meanwhile, the usefulness of cooperation led to the development of language, which plays a core role in human cognition; and the benefits of cooperatively sharing ideas allowed the accumulation of human cultural skills and knowledge more generally. 2. Multi-agent training is one of the most dangerous ways we might build AGI. Humans have skills and motivations (such as deception, manipulation and power-hungriness) which would be dangerous in AGIs. It seems plausible that the development of many of these traits was driven by competition with other humans, and that AGIs trained to answer questions or do other limited-scope tasks would be safer and less goal-directed. I briefly make this argument here. 3. Multi-agent training is a regime in which standard safety techniques won't work. Most approaches to safety rely on constructing safe reward functions. But Ecoffet et al. (2020) argue that “open-ended” environments give rise to incentives which depend on reward functions in complex and hard-to-predict ways. Open-endedness is closely related to self-play (which was used to train AlphaGo) and multi-agent environments more generally. When a task involves multiple agents, those agents might learn many skills that are not directly related to the task itself, but instead related to competing or cooperating with each other. E.g. compare a language model like GPT-3, which was directly trained to output language, to the evolution of language in humans - where evolution only selected us for increased genetic fitness, but we developed language skills because they were (indirectly) helpful for that. Furthermore,...
Возможно после этого выпуска вам уже будет не так интересно играть в игры, потому что вы узнаете как под капотом устроена логика противников, а самое главное, что игра ПОДДАЕТСЯ! Про игровой AI поговорили с разработчиком Rainbow 6 Siege и преподавателем в школе геймдева XYZ Александром Балакшиным. Закопались и в технические подробности разработки игрового процесса, потому что он сильно отличается от промышленной разработки из-за больших объемов данных и ресурсов. Ну и в этот раз мы действительно постарались над ссылками, обязательно ознакомьтесь! Поддержи лучший подкаст про IT: www.patreon.com/podlodka Также ждем вас, ваши лайки, репосты и комменты в мессенджерах и соцсетях! Telegram-чат: https://t.me/podlodka Telegram-канал: https://t.me/podlodkanews Страница в Facebook: www.facebook.com/podlodkacast/ Twitter-аккаунт: https://twitter.com/PodlodkaPodcast Полезные ссылки: - Курс программирования игровой логики https://www.school-xyz.com/gamecode - Introduction to AI with blueprints на Unreal learning portal https://learn.unrealengine.coma/course/3318392 - Overwatch gameplay architecture and code https://www.youtube.com/watch?v=W3aieHjyNvw - AI в Fear https://www.rockpapershotgun.com/2017/04/03/why-fears-ai-is-still-the-best-in-firstperson-shooters/ - Марк Церни https://www.youtube.com/watch?v=QOAW9ioWAvE - Использование AI бота для тестирования https://www.polygon.com/2020/10/16/21519815/baldurs-gate-3-testing-ai-super-gamerlarian-studios-development-pc-steam - Open AI играет в прятки https://www.youtube.com/watch?v=Lu56xVlZ40M - Open AI команда для Dota 2 https://en.m.wikipedia.org/wiki/OpenAI_Five - Использование машинного обучения для статического анализа кода https://techcrunch.com/2019/02/12/ubisoft-and-mozilla-team-up-to-develop-clevercommit-an-ai-coding-assistant/ - История про «багфикс» alien colonial marines https://www.polygon.com/2018/7/15/17574248/aliens-colonial-marines-fixing-code-typoai-xenomorphs
Recorded by Robert Miles Original text here More information about the newsletter here
OpenAI’s Dactyl is an AI system that can manipulate objects with a human-like robot hand. OpenAI Five is an AI system that can defeat humans at the video game Dota 2. The strange thing is they were both developed using the same general-purpose reinforcement learning algorithm. How is this possible and what does it show? In today's interview Jack Clark, Policy Director at OpenAI, explains that from a computational perspective using a hand and playing Dota 2 are remarkably similar problems. A robot hand needs to hold an object, move its fingers, and rotate it to the desired position. In Dota 2 you control a team of several different people, moving them around a map to attack an enemy. Your hand has 20 or 30 different joints to move. The number of main actions in Dota 2 is 10 to 20, as you move your characters around a map. When you’re rotating an objecting in your hand, you sense its friction, but you don’t directly perceive the entire shape of the object. In Dota 2, you're unable to see the entire map and perceive what's there by moving around – metaphorically 'touching' the space. Read our new in-depth article on becoming an AI policy specialist: The case for building expertise to work on US AI policy, and how to do it Links to learn more, summary and full transcript This is true of many apparently distinct problems in life. Compressing different sensory inputs down to a fundamental computational problem which we know how to solve only requires the right general-purpose software. The creation of such increasingly 'broad-spectrum' learning algorithms like has been a key story of the last few years, and this development like have unpredictable consequences, heightening the huge challenges that already exist in AI policy. Today’s interview is a mega-AI-policy-quad episode; Jack is joined by his colleagues Amanda Askell and Miles Brundage, on the day they released their fascinating and controversial large general language model GPT-2. We discuss: • What are the most significant changes in the AI policy world over the last year or two? • What capabilities are likely to develop over the next five, 10, 15, 20 years? • How much should we focus on the next couple of years, versus the next couple of decades? • How should we approach possible malicious uses of AI? • What are some of the potential ways OpenAI could make things worse, and how can they be avoided? • Publication norms for AI research • Where do we stand in terms of arms races between countries or different AI labs? • The case for creating newsletters • Should the AI community have a closer relationship to the military? • Working at OpenAI vs. working in the US government • How valuable is Twitter in the AI policy world? Rob is then joined by two of his colleagues – Niel Bowerman & Michelle Hutchinson – to quickly discuss: • The reaction to OpenAI's release of GPT-2 • Jack’s critique of our US AI policy article • How valuable are roles in government? • Where do you start if you want to write content for a specific audience? Get this episode by subscribing to our podcast on the world’s most pressing problems and how to solve them: type 80,000 Hours into your podcasting app. Or read the transcript below. The 80,000 Hours Podcast is produced by Keiran Harris.
Where I continue my discussion on Artificial Intelligence and talk about OpenAI Five. OpenAI Five is the DOTA2 dominating force that will change video games as we know it. The blog post I reference on the openai website is https://blog.openai.com/openai-five/ --- Support this podcast: https://anchor.fm/tebbstalks/support
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today we’re joined by Christy Dennison, Machine Learning Engineer at OpenAI. Since joining OpenAI earlier this year, Christy has been working on OpenAI’s efforts to build an AI-powered agent to play the DOTA 2 video game. Our conversation begins with an overview of DOTA 2 gameplay and the recent OpenAI Five benchmark which put the OpenAI agent up against a team of professional human players. We then dig into the underlying technology used to create OpenAI Five, including their use of deep reinforcement learning and LSTM recurrent neural networks, and their liberal use of entity embeddings, plus some of the tricks and techniques they use to train the model on 256 GPUs and 128,000 CPU cores. The complete show notes for this episode can be found at twimlai.com/talk/176.
Google DeepMind and OpenAI recently released some new results that give a tantalizing glimpse at what AI can become in the future. We discuss DeepMind's results on capture the flag and the human-bot cooperation as well as OpenAI's recent achievements (and upcoming challenge) in DOTA2 with the OpenAI Five. Links: DeepMind Capture the Flag OpenAI Five Episodes Mentioned Episode 23: AI Competes in DOTA2 Tournament Episode 51: Adapting to AI in Industry with Jeff Kavanaugh Follow us and leave us a rating! iTunes Homepage Twitter @artlyintelly Facebook artificiallyintelligent1@gmail.com
In breaking news, Andy and Dave discuss a potentially groundbreaking paper on the scalable training of artificial neural nets with adaptive sparse connectivity; MIT researchers unveil the Navion chip, only 20 square millimeters in size and consumes 24 milliwatts of power, it can process real-time camera images up to 171 frames per second, and can be integrated into drones the size of a fingernail; the Chair of the Armed Services Subcommitttee on Emerging Threats and Capabilities convened a roundtable on AI with subject matter experts and industry leaders; the IEEE Standards Association and MIT Media Lab launched the Council on Extended Intelligence (CXI) to build a “new narrative” on autonomous technologies, including three pilot programs, one of which seeks to help individuals “reclaim their digital identity;” and the Foundation for Responsible Robotics, which wants to shape the responsible design and use of robotics, releases a report on Drones in the Service of Society. Then, Andy and Dave discuss IBM’s Project Debater, the follow-on to Watson that engaged in a live, public debate with humans on 18 June. IBM spent 6 years developing PD’s capabilities, with over 30 technical papers and benchmark datasets, Debater can debate nearly 100 topics. PD uses three pioneering capabilities: data-driven speech writing and delivery, listening comprehension, and the ability to model human dilemmas. Next up, OpenAI announces OpenAI Five, a team of 5 AI algorithms trained to take on a human team in the tower defense game, Dota 2; Andy and Dave discuss the reasons for the impressive achievement, including that the 5 AI networks do not communicate with each other, and that coordination and collaboration naturally emerge from their incentive structures. The system uses 256 Nvidia graphics cards and 128,000 processor cores; it has taken on (and won) a variety of human teams, but OpenAI plans to stream a match against a top Dota 2 team in late July.