Podcasts about muzero

18PODCASTS
28EPISODES
30mAVG DURATION
1MONTHLY NEW EPISODE
Apr 8, 2025LATEST

POPULARITY

20172018201920202021202220232024

Best podcasts about muzero

The Nonlinear Library

5 episodes with muzero

Yannic Kilcher Videos (Audio Only)

3 episodes with muzero

Towards Data Science

2 episodes with muzero

Game over?

2 episodes with muzero

Intelligenza Artificiale per l'Impresa

2 episodes with muzero

The Nonlinear Library: Alignment Forum Top Posts

2 episodes with muzero

Latest podcast episodes about muzero

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

Machine Learning Street Talk

Play Episode Listen Later Apr 8, 2025 76:55

Prof. Kevin Ellis and Dr. Zenna Tavares talk about making AI smarter, like humans. They want AI to learn from just a little bit of information by actively trying things out, not just by looking at tons of data.They discuss two main ways AI can "think": one way is like following specific rules or steps (like a computer program), and the other is more intuitive, like guessing based on patterns (like modern AI often does). They found combining both methods works well for solving complex puzzles like ARC.A key idea is "compositionality" - building big ideas from small ones, like LEGOs. This is powerful but can also be overwhelming. Another important idea is "abstraction" - understanding things simply, without getting lost in details, and knowing there are different levels of understanding.Ultimately, they believe the best AI will need to explore, experiment, and build models of the world, much like humans do when learning something new.SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***TRANSCRIPT:https://www.dropbox.com/scl/fi/3ngggvhb3tnemw879er5y/BASIS.pdf?rlkey=lr2zbj3317mex1q5l0c2rsk0h&dl=0 Zenna Tavares:http://www.zenna.org/Kevin Ellis:https://www.cs.cornell.edu/~ellisk/TOC:1. Compositionality and Learning Foundations [00:00:00] 1.1 Compositional Search and Learning Challenges [00:03:55] 1.2 Bayesian Learning and World Models [00:12:05] 1.3 Programming Languages and Compositionality Trade-offs [00:15:35] 1.4 Inductive vs Transductive Approaches in AI Systems2. Neural-Symbolic Program Synthesis [00:27:20] 2.1 Integration of LLMs with Traditional Programming and Meta-Programming [00:30:43] 2.2 Wake-Sleep Learning and DreamCoder Architecture [00:38:26] 2.3 Program Synthesis from Interactions and Hidden State Inference [00:41:36] 2.4 Abstraction Mechanisms and Resource Rationality [00:48:38] 2.5 Inductive Biases and Causal Abstraction in AI Systems3. Abstract Reasoning Systems [00:52:10] 3.1 Abstract Concepts and Grid-Based Transformations in ARC [00:56:08] 3.2 Induction vs Transduction Approaches in Abstract Reasoning [00:59:12] 3.3 ARC Limitations and Interactive Learning Extensions [01:06:30] 3.4 Wake-Sleep Program Learning and Hybrid Approaches [01:11:37] 3.5 Project MARA and Future Research DirectionsREFS:[00:00:25] DreamCoder, Kevin Ellis et al.https://arxiv.org/abs/2006.08381[00:01:10] Mind Your Step, Ryan Liu et al.https://arxiv.org/abs/2410.21333[00:06:05] Bayesian inference, Griffiths, T. L., Kemp, C., & Tenenbaum, J. B.https://psycnet.apa.org/record/2008-06911-003[00:13:00] Induction and Transduction, Wen-Ding Li, Zenna Tavares, Yewen Pu, Kevin Ellishttps://arxiv.org/abs/2411.02272[00:23:15] Neurosymbolic AI, Garcez, Artur d'Avila et al.https://arxiv.org/abs/2012.05876[00:33:50] Induction and Transduction (II), Wen-Ding Li, Kevin Ellis et al.https://arxiv.org/abs/2411.02272[00:38:35] ARC, François Chollethttps://arxiv.org/abs/1911.01547[00:39:20] Causal Reactive Programs, Ria Das, Joshua B. Tenenbaum, Armando Solar-Lezama, Zenna Tavareshttp://www.zenna.org/publications/autumn2022.pdf[00:42:50] MuZero, Julian Schrittwieser et al.http://arxiv.org/pdf/1911.08265[00:43:20] VisualPredicator, Yichao Lianghttps://arxiv.org/abs/2410.23156[00:48:55] Bayesian models of cognition, Joshua B. Tenenbaumhttps://mitpress.mit.edu/9780262049412/bayesian-models-of-cognition/[00:49:30] The Bitter Lesson, Rich Suttonhttp://www.incompleteideas.net/IncIdeas/BitterLesson.html[01:06:35] Program induction, Kevin Ellis, Wen-Ding Lihttps://arxiv.org/pdf/2411.02272[01:06:50] DreamCoder (II), Kevin Ellis et al.https://arxiv.org/abs/2006.08381[01:11:55] Project MARA, Zenna Tavares, Kevin Ellishttps://www.basis.ai/blog/mara/

ReflectionAI Founder Ioannis Antonoglou: From AlphaGo to AGI

Training Data

Play Episode Listen Later Jan 28, 2025 52:29

Ioannis Antonoglou, founding engineer at DeepMind and co-founder of ReflectionAI, has seen the triumphs of reinforcement learning firsthand. From AlphaGo to AlphaZero and MuZero, Ioannis has built the most powerful agents in the world. Ioannis breaks down key moments in AlphaGo's game against Lee Sodol (Moves 37 and 78), the importance of self-play and the impact of scale, reliability, planning and in-context learning as core factors that will unlock the next level of progress in AI. Hosted by: Stephanie Zhan and Sonya Huang, Sequoia Capital Mentioned in this episode: PPO: Proximal Policy Optimization algorithm developed by DeepMind in game environments. Also used by OpenAI for RLHF in ChatGPT. MuJoCo: Open source physics engine used to develop PPO Monte Carlo Tree Search: Heuristic search algorithm used in AlphaGo as well as video compression for YouTube and the self-driving system at Tesla AlphaZero: The DeepMind model that taught itself from scratch how to master the games of chess, shogi and Go MuZero: The DeepMind follow up to AlphaZero that mastered games without knowing the rules and able to plan winning strategies in unknown environments AlphaChem: Chemical Synthesis Planning with Tree Search and Deep Neural Network Policies DQN: Deep Q-Network, Introduced in 2013 paper, Playing Atari with Deep Reinforcement Learning AlphaFold: DeepMind model for predicting protein structures for which Demis Hassabis, John Jumper and David Baker won the 2024 Nobel Prize in Chemistry

founders ai chatgpt chemistry openai nobel prize deepmind david baker alphago ioannis demis hassabis alphazero rlhf muzero

AF - Experiment Idea: RL Agents Evading Learned Shutdownability by Leon Lang

The Nonlinear Library

Play Episode Listen Later Jan 16, 2023 28:24

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Experiment Idea: RL Agents Evading Learned Shutdownability, published by Leon Lang on January 16, 2023 on The AI Alignment Forum. Preface Produced as part of the SERI ML Alignment Theory Scholars Program - Winter 2022 Cohort. Thanks to Erik Jenner who explained to me the basic intuition for why an advanced RL agent may evade the discussed corrigibility measure. I also thank Alex Turner, Magdalena Wache, and Walter Laurito for detailed feedback on the proposal and Quintin Pope and Lisa Thiergart for helpful feedback in the last December SERI-MATS shard theory group meeting. This text was part of my deliverable for the shard theory stream of SERI-MATS. In it, I present an idea for an experiment that tests the convergent drive of modern model-based RL agents to evade shutdownability. If successful, I expect the project could serve as a means to communicate the problem of corrigibility to the machine learning community. As such, I also consider this project idea a submission of the Shutdown Problem Contest. I do not personally want to work on the project since doing experiments does not seem like my comparative advantage. Thus, with posting this project, I mainly seek collaborators and feedback. More on that in the conclusion. Introduction At some point in the future, we will train very advanced AI, possibly by some version of model-based reinforcement learning. Since it is hard to correctly specify the reward function or steer the inner learning dynamics of the agent, it may be desirable to make sure that the agent is corrigible, including shutdownability: “(1) A corrigible reasoner must at least tolerate and preferably assist the programmers in their attempts to alter or turn off the system.” Next to a literal shutdown button, one could also try to use the inner functions of the agent to effectively get the ability to shut it down. The idea would be to design a certain type of input such that, as a reaction to that input, a computation in the agent itself will cause it to perform a null action that doesn't do anything — which effectively equates to a shutdown. Scott Aaronson proposed this idea in a recent talk (timestamp: 57:10) in the form of a planted cryptographic backdoor, an idea he currently works on during his sabbatical at OpenAI. For this proposal, I consider a variation of that shutdown approach: instead of directly “planting” the shutdown behavior, I consider the possibility to train the behavior into the agent. Concretely, there is a specific input — here, an alert sound — that the agent receives at random times during training. Whenever that happens, the agent will receive a negative reward unless it performs the null action. Over time, the agent will learn to stop acting in these situations. During deployment, the input can then be shown to the agent whenever it performs harmful actions, which will ideally cause it to stop immediately. This shutdown approach has the following philosophical problem: if the agent performs world modeling, it will know that the alert sound will make it stop performing, and this may be counter to the more consequentialist motivations of the agent. Thus, we should expect that the agent will perform an action that makes it never hear the sound again — for example, by “putting in earplugs” — once it is advanced enough and has the ability to do so. This experimental proposal is meant to show this convergent behavior using contemporary model-based reinforcement learning. The remaining sections will discuss the following: A three-part training story for an advanced AI that evades corrigibility, including details for how to model the situation with contemporary reinforcement learning; More experimental details based on using MuZero; Caveats, what might go wrong; A short comparison to the original shutdown problem and how the proposal...

ai idea speech experiments ea lang openai cohorts rl evading alex turner rationalist scott aaronson concretely muzero

AF - An Open Agency Architecture for Safe Transformative AI by davidad (David A. Dalrymple)

The Nonlinear Library

Play Episode Listen Later Dec 20, 2022 9:33

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: An Open Agency Architecture for Safe Transformative AI, published by davidad (David A. Dalrymple) on December 20, 2022 on The AI Alignment Forum. Note: my quality bar here is "this is probably worth the reader's time" rather than "this is as clear, compelling, and comprehensive as I can make it." In the Neorealist Success Model, I asked: What would be the best strategy for building an AI system that helps us ethically end the acute risk period without creating its own catastrophic risks that would be worse than the status quo? This post is a first pass at communicating my current answer. Bird's-eye view At the top level, it centres on a separation between learning a world-model from data and eliciting desirabilities within that ontology planning against a world-model and associated desirabilities acting in real-time We see such a separation in, for example, MuZero, which can probably still beat GPT-4 at Go—the most effective capabilities do not always emerge from a fully black-box, end-to-end, generic pre-trained policy. Hypotheses Scientific Sufficiency Hypothesis: It's feasible to train a purely descriptive/predictive infra-Bayesian world-model that specifies enough critical dynamics accurately enough to end the acute risk period, such that this world-model is also fully understood by a collection of humans (in the sense of "understood" that existing human science is). MuZero does not train its world-model for any form of interpretability, so this hypothesis is more speculative. However, I find Scientific Sufficiency much more plausible than the tractability of eliciting latent knowledge from an end-to-end policy. It's worth noting there is quite a bit of overlap in relevant research directions, e.g. pinpointing gaps between the current human-intelligible ontology and the machine-learned ontology, and investigating natural abstractions theoretically and empirically. Deontic Sufficiency Hypothesis: There exists a human-understandable set of features of finite trajectories in such a world-model, taking values in (−∞,0], such that we can be reasonably confident that all these features being near 0 implies high probability of existential safety, and such that saturating them at 0 is feasible with high probability, using scientifically-accessible technologies. I am optimistic about this largely because of recent progress toward formalizing a natural abstraction of boundaries by Critch and Garrabrant. I find it quite plausible that there is some natural abstraction property Q of world-model trajectories that lies somewhere strictly within the vast moral gulf of Model-Checking Feasibility Hypothesis: It could become feasible to train RL policies such that a formally verified, infra-Bayesian, symbolic model-checking algorithm can establish high-confidence bounds on its performance relative to the world-model and safety desiderata, by using highly capable AI heuristics that can only affect the checker's computational cost and not its correctness—soon enough that switching to this strategy would be a strong Pareto improvement for an implementation-adequate coalition. Time-Bounded Optimization Thesis: RL settings can be time-bounded such that high-performing agents avoid lock-in. I'm pretty confident of this. The founding coalition might set the time bound for the top-level policy to some number of decades, balancing the potential harms of certain kinds of lock-in for that period against the timelines for solving a more ambitious form of AI alignment. If those hypotheses are true, I think this is a plan that would work. I also think they are all quite plausible (especially relative to the assumptions that underly other long-term AI safety hopes)—and that if any one of them fails, they would fail in a way that is detectable before deployment, making an attempt to execute the p...

ai safe speech agency architecture ea transformative gpt pareto bayesian rl dalrymple rationalist muzero

Julekalender luke 14: AI for å komprimere video

Game over?

Play Episode Listen Later Dec 13, 2022 15:08

80% av all internett er video streaming, og tilsammen går det mye energi bort til overføring av YouTube-videoer og NetFlix-streaming. Nå har Google modifisert sin sjakkspillende MuZero til også komprimere videoer, og den kjører allerede for alle videoer på YouTubeArtikkel: MuZero with Self-competition for Rate Control in VP9 Video Compression, https://arxiv.org/abs/2202.06626Prøv selv: https://www.deepmind.com/blog/muzeros-first-step-from-research-into-the-real-world

netflix google luke 14 julekalender muzero

Collapsing AGI timelines, with Ross Nordby

London Futurists

Play Episode Listen Later Oct 26, 2022 35:37

How likely is it that, by 2030, someone will build artificial general intelligence (AGI)?Ross Nordby is an AI researcher who has shortened his AGI timelines: he has changed his mind about when AGI might be expected to exist. He recently published an article on the LessWrong community discussion site, giving his argument in favour of shortening these timelines. He now identifies 2030 as the date by which it is 50% likely that AGI will exist. In this episode, we ask Ross questions about his argument, and consider some of the implications that arise.Article by Ross: https://www.lesswrong.com/posts/K4urTDkBbtNuLivJx/why-i-think-strong-general-ai-is-coming-soonEffective Altruism Long-Term Future Fund: https://funds.effectivealtruism.org/funds/far-futureMIRI (Machine Intelligence Research Institution): https://intelligence.org/00.57 Ross' background: real-time graphics, mostly in video games02.10 Increased familiarity with AI made him reconsider his AGI timeline02.37 He submitted a grant request to the Effective Altruism Long-Term Future Fund to move into AI safety work03.50 What Ross was researching: can we make an AI intrinsically interpretable?04.25 The AGI Ross is interested in is defined by capability, regardless of consciousness or sentience04.55 An AI that is itself "goalless" might be put to uses with destructive side-effects06.10 The leading AI research groups are still DeepMind and OpenAI06.43 Other groups, like Anthropic, are more interested in alignment07.22 If you can align an AI to any goal at all, that is progress: it indicates you have some control08.00 Is this not all abstract and theoretical - a distraction from more pressing problems?08.30 There are other serious problems, like pandemics and global warming, but we have to solve them all08.45 Globally, only around 300 people are focused on AI alignment: not enough10.05 AGI might well be less than three decades away10.50 AlphaGo surprised the community, which was expecting Go to be winnable 10-15 years later11.10 Then AlphaGo was surpassed by systems like AlphaZero and MuZero, which were actually simpler, and more flexible11.20 AlphaTensor frames matrix multiplication as a game, and becomes superhuman at it11.40 In 2018, the Transformer paper was published, but no-one forecast GPT-3's capabilities12.00 This year, Minerva (similar to GPT-3) got 50% correct on the math dataset: high school competition math problems13.16 Illustrators now feel threatened by systems like Dall-E, Stable Diffusion, etc13.30 The conclusion is that intelligence is easier to simulate than we thought13.40 But these systems also do stupid things. They are brittle18.00 But we could use transformers more intelligently19.20 They turn out to be able to write code, and to explain jokes, and do maths reasoning21:10 Google's Gopher AI22.05 Machines don't yet have internal models of the world, which we call common sense24.00 But an early version of GPT-3 demonstrated the ability to model a human thought process alongside a machine's27.15 Ross' current timeline is 50% probability of AGI by 2030, and 90+% by 205027:35 Counterarguments?29.35 So what is to be done?30.55 If convinced that AGI is coming soon, most lay people would probably demand that all AI research stops immediately. Which isn't possible31.40 Maybe publicity would be good in order to generate resources for AI alignment. And to avoid a backlash against secrecy33.55 It would be great if more billionaires opened their wallets, but actually there are funds available for people who want to work on the problem34.20 People who can help would not have to take a pay cut to work on AI alignmentAudio engineering by Alexander ChaceMusic: Spike Protein, by Koi Discovery, available under CC0 1.0 Public Domain Declaration

google ai machines increased globally gpt transformer timelines agi collapsing deepmind anthropic stable diffusion illustrators alphago cc0 alphazero nordby lesswrong muzero

Transformers are Sample Efficient World Models

Papers Read on AI

Play Episode Listen Later Sep 17, 2022 24:14

Deep reinforcement learning agents are notoriously sample inefﬁcient, which considerably limits their application to real-world problems. Recently, many model-based methods have been designed to address this issue, with learning in the imagination of a world model being one of the most prominent approaches. However, while virtually unlimited interaction with a simulated environment sounds appealing, the world model has to be accurate over extended periods of time. Motivated by the success of Transformers in sequence modeling tasks, we introduce IRIS , a data-efﬁcient agent that learns in a world model composed of a discrete autoencoder and an autoregressive Transformer. With the equivalent of only two hours of gameplay in the Atari 100k benchmark, IRIS achieves a mean human normalized score of 1.046, and outperforms humans on 10 out of 26 games. Our approach sets a new state of the art for methods without lookahead search, and even surpasses MuZero. 2022: Vincent Micheli, Eloi Alonso, Franccois Fleuret https://arxiv.org/pdf/2209.00588v1.pdf

deep models transformers motivated efficient atari transformer muzero

AF - Remaking EfficientZero (as best I can) by Hoagy

The Nonlinear Library

Play Episode Listen Later Jul 4, 2022 32:02

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Remaking EfficientZero (as best I can), published by Hoagy on July 4, 2022 on The AI Alignment Forum. Introduction When I first heard about EfficientZero, I was amazed that it could learn at a sample efficiency comparable to humans. What's more, it was doing it without the gigantic amount of pre-training the humans have, which I'd always felt made comparing sample efficiencies with humans rather unfair. I also wanted to practice my ML programming, so I thought I'd make my own version. This article uses what I've learned to give you an idea, not just of how the EfficientZero algorithm works, but also of what it looks like to implement in practice. The algorithm itself has already been well covered in a LessWrong post here. That article inspired me to write this and if it's completely new to you it might be a good place to start - the focus here will be more on what the algorithm looks like as a piece of code. The code below is all written by me and comes from a cleaned and extra-commented version of EfficientZero which draws from the papers (MuZero, Efficient Zero), the open implementation pf MuZero by Werner Duvaud, the pseudocode provided by the MuZero paper, and the original implementation of EfficientZero. You can have a look at the full code and run it at on github. It's currently functional and works on trivial games like cartpole but struggles to learn much on Atari games within a reasonable timeframe, not certain if this reflects an error or just insufficient time. Testing on my laptop or Colab for Atari games is slow - if anyone could give access to some compute to do proper testing that would be amazing! Grateful to Misha Wagner for feedback on both code and post. Algorithm Overview AlphaZero EfficientZero is based on MuZero, which itself is based on AlphaZero, a refinement of the architecture which was the first beat the Go world champion. With AlphaZero, you play a deterministic game, like chess, by developing a neural network that evaluates game states, associating each possible state of the board with a value, the discounted expected return (in zero-sum games like chess, discount rate is 0 and this is just win%). Since the algorithm can have access to a game 'simulator', it can test out different moves, and responses to those moves before actually playing them. More specifically, from an initial game state it can traverse the tree of potential games, making different moves, playing against itself, and evaluating these derived game states. After traversing this tree, and seeing the quality of the states reached, we can average the values of the derived states to get a better estimate of how good that initial game state actually was, and make our final move based on these estimates. When playing out these hypothetical games, we are playing roughly according to our policy, but if we start finding that a move that looked promising leads to bad situations we can start avoiding that, thereby improving on our original policy. In the limit, this constrains our position evaluation function to be consistent with itself, meaning that if position A is rated highly, and our response in that situation would be to move to position B, then B should also be rated highly, etc. This allows us the maximize the value of our training data, because if we learn that state C is bad, we will also learn to avoid states which would lead to C and vice versa. Note that this constraint is what similar to that enforced by the Minimax algorithm, but AZ and descendants propagate the average value of the found states, rather than the minimum, up the tree to avoid compounding NN error. MuZero While AlphaZero was very impressive, from a research direction, it seemed (to me) fundamentally limited by the fact that it requires a fully deterministic space in which to play - to search the tree o...

testing speech grateful ea ml atari remaking colab nn alphazero rationalist minimax lesswrong hoagy muzero

MuZero tar turen ut i den virkelige verden: Videokompresjon

Game over?

Play Episode Listen Later May 20, 2022 16:19

De store sjakk- og go -spillende algoritmene blir ofte kritisert for ikke å bli brukt til noe i den «virkelige verden» utover brettspill og dataspill. DeepMind har tatt steget videre og brukt den sjakkspillende MuZero til videokomprimering. Algoritmen spiller lek med seg selv, men i stedet for å vinne på spillebrettet vinner den om den klarer å komprimere video uten at kvaliteten forsvinner – og helt uten at noen forteller hvordan. Opptil 80 % av alt innhold på dagens internett er video. Å lykkes med dette kan bety mye for lagring av videoinnhold, og energiforbruk av internett.

verden deepmind turen algoritmen opptil muzero

LW - Supervise Process, not Outcomes by stuhlmueller

The Nonlinear Library

Play Episode Listen Later Apr 5, 2022 17:44

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Supervise Process, not Outcomes, published by stuhlmueller on April 5, 2022 on LessWrong. We can think about machine learning systems on a spectrum from process-based to outcome-based: Process-based systems are built on human-understandable task decompositions, with direct supervision of reasoning steps. Outcome-based systems are built on end-to-end optimization, with supervision of final results. This post explains why Ought is devoted to process-based systems. The argument is: In the short term, process-based ML systems have better differential capabilities: They help us apply ML to tasks where we don't have access to outcomes. These tasks include long-range forecasting, policy decisions, and theoretical research. In the long term, process-based ML systems help avoid catastrophic outcomes from systems gaming outcome measures and are thus more aligned. Both process- and outcome-based evaluation are attractors to varying degrees: Once an architecture is entrenched, it's hard to move away from it. This lock-in applies much more to outcome-based systems. Whether the most powerful ML systems will primarily be process-based or outcome-based is up in the air. So it's crucial to push toward process-based training now. There are almost no new ideas here. We're reframing the well-known outer alignment difficulties for traditional deep learning architectures and contrasting them with compositional approaches. To the extent that there are new ideas, credit primarily goes to Paul Christiano and Jon Uesato. We only describe our background worldview here. In a follow-up post, we'll explain why we're building Elicit, the AI research assistant. The spectrum Supervising outcomes Supervision of outcomes is what most people think about when they think about machine learning. Local components are optimized based on an overall feedback signal: SGD optimizes weights in a neural net to reduce its training loss Neural architecture search optimizes architectures and hyperparameters to have low validation loss Policy gradient optimizes policy neural nets to choose actions that lead to high expected rewards In each case, the system is optimized based on how well it's doing empirically. MuZero is an example of a non-trivial outcome-based architecture. MuZero is a reinforcement learning algorithm that reaches expert-level performance at Go, Chess, and Shogi without human data, domain knowledge, or hard-coded rules. The architecture has three parts: A representation network, mapping observations to states A dynamics network, mapping state and action to future state, and A prediction network, mapping state to value and distribution over next actions. Superficially, this looks like an architecture with independently meaningful components, including a “world model” (dynamics network). However, because the networks are optimized end-to-end to jointly maximize expected rewards and to be internally consistent, they need not capture interpretable dynamics or state. It's just a few functions that, if chained together, are useful for predicting reward-maximizing actions. Neural nets are always in the outcomes-based regime to some extent: In each layer and at each node, they use the matrices that make the neural net as a whole work well. Supervising process If you're not optimizing based on how well something works empirically (outcomes), then the main way you can judge it is by looking at whether it's structurally the right thing to do (process). For many tasks, we understand what pieces of work we need to do and how to combine them. We trust the result because of this reasoning, not because we've observed final results for very similar tasks: Engineers and astronomers expect the James Webb Space Telescope to work because its deployment follows a well-understood plan, and it is built out of well-understo...

ai local speech engineers ea outcomes outcome chess ml james webb space telescope neural elicit rationalist sgd supervise superficially shogi lesswrong muzero

LW - Supervise Process, not Outcomes by stuhlmueller

The Nonlinear Library: LessWrong

Play Episode Listen Later Apr 5, 2022 17:44

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Supervise Process, not Outcomes, published by stuhlmueller on April 5, 2022 on LessWrong. We can think about machine learning systems on a spectrum from process-based to outcome-based: Process-based systems are built on human-understandable task decompositions, with direct supervision of reasoning steps. Outcome-based systems are built on end-to-end optimization, with supervision of final results. This post explains why Ought is devoted to process-based systems. The argument is: In the short term, process-based ML systems have better differential capabilities: They help us apply ML to tasks where we don't have access to outcomes. These tasks include long-range forecasting, policy decisions, and theoretical research. In the long term, process-based ML systems help avoid catastrophic outcomes from systems gaming outcome measures and are thus more aligned. Both process- and outcome-based evaluation are attractors to varying degrees: Once an architecture is entrenched, it's hard to move away from it. This lock-in applies much more to outcome-based systems. Whether the most powerful ML systems will primarily be process-based or outcome-based is up in the air. So it's crucial to push toward process-based training now. There are almost no new ideas here. We're reframing the well-known outer alignment difficulties for traditional deep learning architectures and contrasting them with compositional approaches. To the extent that there are new ideas, credit primarily goes to Paul Christiano and Jon Uesato. We only describe our background worldview here. In a follow-up post, we'll explain why we're building Elicit, the AI research assistant. The spectrum Supervising outcomes Supervision of outcomes is what most people think about when they think about machine learning. Local components are optimized based on an overall feedback signal: SGD optimizes weights in a neural net to reduce its training loss Neural architecture search optimizes architectures and hyperparameters to have low validation loss Policy gradient optimizes policy neural nets to choose actions that lead to high expected rewards In each case, the system is optimized based on how well it's doing empirically. MuZero is an example of a non-trivial outcome-based architecture. MuZero is a reinforcement learning algorithm that reaches expert-level performance at Go, Chess, and Shogi without human data, domain knowledge, or hard-coded rules. The architecture has three parts: A representation network, mapping observations to states A dynamics network, mapping state and action to future state, and A prediction network, mapping state to value and distribution over next actions. Superficially, this looks like an architecture with independently meaningful components, including a “world model” (dynamics network). However, because the networks are optimized end-to-end to jointly maximize expected rewards and to be internally consistent, they need not capture interpretable dynamics or state. It's just a few functions that, if chained together, are useful for predicting reward-maximizing actions. Neural nets are always in the outcomes-based regime to some extent: In each layer and at each node, they use the matrices that make the neural net as a whole work well. Supervising process If you're not optimizing based on how well something works empirically (outcomes), then the main way you can judge it is by looking at whether it's structurally the right thing to do (process). For many tasks, we understand what pieces of work we need to do and how to combine them. We trust the result because of this reasoning, not because we've observed final results for very similar tasks: Engineers and astronomers expect the James Webb Space Telescope to work because its deployment follows a well-understood plan, and it is built out of well-understo...

ai local speech engineers ea outcomes outcome chess ml james webb space telescope neural elicit rationalist sgd supervise superficially shogi lesswrong muzero

News 09/22: Tech-Sanktionen gegen Russland // GitHub Advisory Database // Google Privacy Sandbox // MuZero // Google for Games Developer Summit

programmier.bar – der Podcast für App- und Webentwicklung

Play Episode Listen Later Mar 2, 2022 46:43

Wir erleben aktuell gemeinsam schwierige Zeiten und auch wenn wir kein Politik-Talk sind, macht uns der Angriff auf die Ukraine bis in diesen Podcast zu schaffen. Wir werfen einen Blick auf die Sanktionen, die von großen Tech-Unternehmengegen Russland eingeführt wurden.Außerdem geht es um weitere Themen:Die GitHub Advisory Database öffnet sich und lädt nun zur Kooperation ein.Google hat einen Games Developer Summit für den 15. März angekündigt. Ihr könnt euch kostenlos registrieren und die Inhalte per Stream konsumieren.DeepMind hat ein neues Modell MuZero, das nun das erste Mal eingesetzt wird im Rahmen von Videokomprimierung.Wie letzte Woche versprochen, gibt es heute eine Zusammenfassung der Privacy Sandbox für Android, die Google vorgestellt hat, und damit langfristig auch auf Android die Werbe-ID abschafft.Schreibt uns! Schickt uns eure Themenwünsche und euer Feedback: podcast@programmier.barFolgt uns! Bleibt auf dem Laufenden über zukünftige Folgen und virtuelle Meetups und beteiligt euch an Community-Diskussionen. TwitterInstagramFacebookMeetupYouTube

[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Feb 24, 2022 26:06

#mlnews #muzero #nerf Your regularly irregular updates on everything new in the ML world! Merch: store.ykilcher.com OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 2:15 - Uber switches from XGBoost to Deep Learning for ETA prediction 5:45 - MuZero advances video compression 10:10 - Learned Soft Prompts can steer large language models 12:45 - Block-NeRF captures entire city blocks 14:15 - Neural Architecture Search considers underlying hardware 16:50 - Mega-Blog on Self-Organizing Agents 18:40 - Know Your Data (for Tensorflow Datasets) 20:30 - Helpful Things Sponsor: Weights & Biases https://wandb.me/yannic References: https://docs.wandb.ai/guides/integrat... https://colab.research.google.com/git... https://wandb.ai/borisd13/GPT-3/repor... Uber switches from XGBoost to Deep Learning for ETA prediction https://eng.uber.com/deepeta-how-uber... MuZero advances video compression https://deepmind.com/blog/article/MuZ... https://storage.googleapis.com/deepmi... Learned Soft Prompts can steer large language models https://ai.googleblog.com/2022/02/gui... https://aclanthology.org/2021.emnlp-m... Block-NeRF captures entire city blocks https://arxiv.org/abs/2202.05263 https://arxiv.org/pdf/2202.05263.pdf https://waymo.com/intl/zh-cn/research... Neural Architecture Search considers underlying hardware https://ai.googleblog.com/2022/02/unl... https://openaccess.thecvf.com/content... Mega-Blog on Self-Organizing Agents https://developmentalsystems.org/sens... https://flowers.inria.fr/ Know Your Data (for Tensorflow Datasets) https://knowyourdata-tfds.withgoogle.... https://knowyourdata.withgoogle.com/ Helpful Things https://twitter.com/casualganpapers/s... https://www.reddit.com/r/MachineLearn... https://arxiv.org/abs/2202.02435 https://github.com/vicariousinc/PGMax https://www.vicarious.com/posts/pgmax... https://diambra.ai/tournaments https://github.com/diambra/diambraArena https://www.youtube.com/watch?v=dw72P... https://gitlab.com/deepcypher/python-... https://python-fhez.readthedocs.io/en... https://joss.theoj.org/papers/10.2110... https://github.com/PyTorchLightning/m... https://torchmetrics.readthedocs.io/e... https://twitter.com/alanyttian/status... https://github.com/google/evojax https://arxiv.org/abs/2202.05008 https://www.reddit.com/r/MachineLearn... https://www.gymlibrary.ml/pages/api/#... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/2017636191 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

uber merch gpt ml deep learning nerf eta compression muz xgboost muzero

AF - Christiano and Yudkowsky on AI predictions and human intelligence by Eliezer Yudkowsky

The Nonlinear Library

Play Episode Listen Later Feb 23, 2022 68:33

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Christiano and Yudkowsky on AI predictions and human intelligence, published by Eliezer Yudkowsky on February 23, 2022 on The AI Alignment Forum. This is a transcript of a conversation between Paul Christiano and Eliezer Yudkowsky, with comments by Rohin Shah, Beth Barnes, Richard Ngo, and Holden Karnofsky, continuing the Late 2021 MIRI Conversations. Color key: Chat by Paul and Eliezer Other chat 15. October 19 comment [Yudkowsky][11:01] thing that struck me as an iota of evidence for Paul over Eliezer: 16. November 3 conversation 16.1. EfficientZero [Yudkowsky][9:30] Thing that (if true) strikes me as... straight-up falsifying Paul's view as applied to modern-day AI, at the frontier of the most AGI-ish part of it and where Deepmind put in substantial effort on their project? EfficientZero (allegedly) learns Atari in 100,000 frames. Caveat: I'm not having an easy time figuring out how many frames MuZero would've required to achieve the same performance level. MuZero was trained on 200,000,000 frames but reached what looks like an allegedly higher high; the EfficientZero paper compares their performance to MuZero on 100,000 frames, and claims theirs is much better than MuZero given only that many frames. CC: @paulfchristiano. (I would further argue that this case is important because it's about the central contemporary model for approaching AGI, at least according to Eliezer, rather than any number of random peripheral AI tasks.) [Shah][14:46] I only looked at the front page, so might be misunderstanding, but the front figure says "Our proposed method EfficientZero is 170% and 180% better than the previous SoTA performance in mean and median human normalized score [...] on the Atari 100k benchmark", which does not seem like a huge leap? Oh, I incorrectly thought that was 1.7x and 1.8x, but it is actually 2.7x and 2.8x, which is a bigger deal (though still feels not crazy to me) [Yudkowsky][15:28] the question imo is how many frames the previous SoTA would require to catch up to EfficientZero (I've tried emailing an author to ask about this, no response yet) like, perplexity on GPT-3 vs GPT-2 and "losses decreased by blah%" would give you a pretty meaningless concept of how far ahead GPT-3 was from GPT-2, and I think the "2.8x performance" figure in terms of scoring is equally meaningless as a metric of how much EfficientZero improves if any what you want is a notion like "previous SoTA would have required 10x the samples" or "previous SoTA would have required 5x the computation" to achieve that performance level [Shah][15:38] I see. Atari curves are not nearly as nice and stable as GPT curves and often have the problem that they plateau rather than making steady progress with more training time, so that will make these metrics noisier, but it does seem like a reasonable metric to track (Not that I have recommendations about how to track it; I doubt the authors can easily get these metrics) [Christiano][18:01] If you think our views are making such starkly different predictions then I'd be happy to actually state any of them in advance, including e.g. about future ML benchmark results. I don't think this falsifies my view, and we could continue trying to hash out what my view is but it seems like slow going and I'm inclined to give up. Relevant questions on my view are things like: is MuZero optimized at all for performance in the tiny-sample regime? (I think not, I don't even think it set SoTA on that task and I haven't seen any evidence.) What's the actual rate of improvements since people started studying this benchmark ~2 years ago, and how much work has gone into it? And I totally agree with your comments that "# of frames" is the natural unit for measuring and that would be the starting point for any discussion. [Barnes][18:22] In previous MCTS RL algorithms, th...

ai predictions color speech chat barnes relevant ea shah cc gpt ml atari agi deepmind eliezer sota human intelligence christiano eliezer yudkowsky rationalist yudkowsky holden karnofsky muzero richard ngo rohin shah

AI for Ancient Languages, Eye Tracking for Ads, Pulpy AI Sci Fi Covers

Let's Talk AI

Play Episode Listen Later Feb 17, 2022 29:07

Our 86th episode with a summary and discussion of last week's big AI news! Subscribe: RSS | Apple Podcasts | Spotify | YouTube Outline: (00:00) Intro (01:22) This AI is set to help stop illegal fishing (04:21) An ancient language has defied translation for 100 years. Can AI crack the code? (09:40) The New CGI: Creating Neural Neighborhoods With Block-NeRF (12:45) MuZero's first step from research into the real world (16:40) MoviePass 2.0 Wants to Track Your Eyeballs to Make Sure You Watch Ads (20:02) AI fails its job interview (23:58) Mad Scientist Forces AI to Make Horrific Pictures Out of BuzzFeed Headlines (26:18) Artist uses AI to perfectly fake 70s science fiction pulp covers – artwork and titles

ai artist sci fi ancient covers languages moviepass eye tracking muzero

The road to AGI

DeepMind: The Podcast

Play Episode Listen Later Feb 15, 2022 32:58

Hannah meets DeepMind co-founder and chief scientist Shane Legg, the man who coined the phrase ‘artificial general intelligence', and explores how it might be built. Why does Shane think AGI is possible? When will it be realised? And what could it look like? Hannah also explores a simple theory of using trial and error to reach AGI and takes a deep dive into MuZero, an AI system which mastered complex board games from chess to Go, and is now generalising to solve a range of important tasks in the real world. For questions or feedback on the series, message us on Twitter @DeepMind or email podcast@deepmind.com. Interviewees: DeepMind's Shane Legg, Doina Precup, Dave Silver & Jackson Broshear CreditsPresenter: Hannah FrySeries Producer: Dan HardoonProduction support: Jill AchinekuSounds design: Emma BarnabyMusic composition: Eleni ShawSound Engineer: Nigel AppletonEditor: David PrestCommissioned by DeepMind Thank you to everyone who made this season possible! Further reading: Real-world challenges for AGI, DeepMind: https://deepmind.com/blog/article/real-world-challenges-for-agiAn executive primer on artificial general intelligence, McKinsey: https://www.mckinsey.com/business-functions/operations/our-insights/an-executive-primer-on-artificial-general-intelligenceMastering Go, chess, shogi and Atari without rules, DeepMind: https://deepmind.com/blog/article/muzero-mastering-go-chess-shogi-and-atari-without-rulesWhat is AGI?, Medium: https://medium.com/intuitionmachine/what-is-agi-99cdb671c88eA Definition of Machine Intelligence by Shane Legg, arXiv: https://arxiv.org/abs/0712.3329Reward is enough by David Silver, ScienceDirect: https://www.sciencedirect.com/science/article/pii/S0004370221000862

ai science technology future artificial intelligence medium computers mckinsey atari agi deepmind arxiv machine intelligence david silver muzero shane legg

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern

The Nonlinear Library: LessWrong Top Posts

Play Episode Listen Later Dec 11, 2021 2:26

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised, published by gwern on the AI Alignment Forum. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is a linkpost for "Mastering Atari Games with Limited Data", Ye et al 2021: Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at this https URL. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community. This work is supported by the Ministry of Science and Technology of the People's Republic of China, the 2030 Innovation Megaprojects “Program on New Generation Artificial Intelligence” (Grant No. 2021AAA0150000). Some have said that poor sample-efficiency on ALE has been a reason to downplay DRL progress or implications. The primary boost in EfficientZero (table 3), pushing it past the human benchmark, is some simple self-supervised learning (SimSiam on predicted vs actual observations). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

china science technology ministry speech republic efficiency ea ye ale atari sac rl reinforcement supervised mcts rationalist drl gwern muzero

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 10, 2021 2:21

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised , published by gwern on the AI Alignment Forum. This is a linkpost for "Mastering Atari Games with Limited Data", Ye et al 2021: Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at this https URL. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community. This work is supported by the Ministry of Science and Technology of the People's Republic of China, the 2030 Innovation Megaprojects “Program on New Generation Artificial Intelligence” (Grant No. 2021AAA0150000). Some have said that poor sample-efficiency on ALE has been a reason to downplay DRL progress or implications. The primary boost in EfficientZero (table 3), pushing it past the human benchmark, is some simple self-supervised learning (SimSiam on predicted vs actual observations). Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

china science technology ministry speech republic efficiency ea ye ale atari sac rl reinforcement supervised mcts rationalist drl gwern muzero

106. Yang Gao - Sample-efficient AI

Towards Data Science

Play Episode Listen Later Dec 8, 2021 49:53

Historically, AI systems have been slow learners. For example, a computer vision model often needs to see tens of thousands of hand-written digits before it can tell a 1 apart from a 3. Even game-playing AIs like DeepMind's AlphaGo, or its more recent descendant MuZero, need far more experience than humans do to master a given game. So when someone develops an algorithm that can reach human-level performance at anything as fast as a human can, it's a big deal. And that's exactly why I asked Yang Gao to join me on this episode of the podcast. Yang is an AI researcher with affiliations at Berkeley and Tsinghua University, who recently co-authored a paper introducing EfficientZero: a reinforcement learning system that learned to play Atari games at the human-level after just two hours of in-game experience. It's a tremendous breakthrough in sample-efficiency, and a major milestone in the development of more general and flexible AI systems. --- Intro music: ➞ Artist: Ron Gelinas ➞ Track Title: Daybreak Chill Blend (original mix) ➞ Link to Track: https://youtu.be/d8Y2sKIgFWc --- Chapters: - 0:00 Intro - 1:50 Yang's background - 6:00 MuZero's activity - 13:25 MuZero to EfficiantZero - 19:00 Sample efficiency comparison - 23:40 Leveraging algorithmic tweaks - 27:10 Importance of evolution to human brains and AI systems - 35:10 Human-level sample efficiency - 38:28 Existential risk from AI in China - 47:30 Evolution and language - 49:40 Wrap-up

ai china evolution wrap track human leveraging berkeley historically yang efficient atari existential deepmind alphago tsinghua university muzero

Inner Alignment in Salt-Starved Rats by Steve Byrnes

The Nonlinear Library: Alignment Forum Top Posts

Play Episode Listen Later Dec 6, 2021 18:55

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Inner Alignment in Salt-Starved Rats, published by Steve Byrnes on the AI Alignment Forum. Introduction: The Dead Sea Salt Experiment In this 2014 paper by Mike Robinson and Kent Berridge at University of Michigan (see also this more theoretical follow-up discussion by Berridge and Peter Dayan), rats were raised in an environment where they were well-nourished, and in particular, where they were never salt-deprived—not once in their life. The rats were sometimes put into a test cage with a lever which, when it appeared, was immediately followed by a device spraying ridiculously salty water directly into their mouth. The rats were disgusted and repulsed by the extreme salt taste, and quickly learned to hate the lever—which from their perspective would seem to be somehow causing the saltwater spray. One of the rats went so far as to stay tight against the opposite wall—as far from the lever as possible! Then the experimenters made the rats feel severely salt-deprived, by depriving them of salt. Haha, just kidding! They made the rats feel severely salt-deprived by injecting the rats with a pair of chemicals that are known to induce the sensation of severe salt-deprivation. Ah, the wonders of modern science! ...And wouldn't you know it, almost instantly upon injection, the rats changed their behavior! When shown the lever (this time without the salt-water spray), they now went right over to that lever and jumped on it and gnawed at it, obviously desperate for that super-salty water. The end. Aren't you impressed? Aren't you floored? You should be!!! I don't think any standard ML algorithm would be able to do what these rats just did! Think about it: Is this Reinforcement Learning? No. RL would look like the rats randomly stumbling upon the behavior of “nibbling the lever when salt-deprived”, find it rewarding, and then adopt that as a goal via “credit assignment”. That's not what happened. While the rats were nibbling at the lever, they had never in their life had an experience where the lever had brought forth anything other than an utterly repulsive experience. And they had never in their life had an experience where they were salt-deprived, tasted something extremely salty, and found it gratifying. I mean, they were clearly trying to interact with the lever—this is a foresighted plan we're talking about—but that plan does not seem to have been reinforced by any experience in their life. Update for clarification: Specifically, it's not any version of RL where you learn about the reward function only by observing past rewards. This category includes all model-free RL and some model-based RL (e.g. MuZero). If, by contrast, you have a version of model-based RL where the agent can submit arbitrary hypothetical queries to the true reward function, then OK, sure, now you can get the rats' behavior. I don't think that's what's going on here for reasons I'll mention at the bottom. Is this Imitation Learning? Obviously not; the rats had never seen any other rat around any lever for any reason. Is this an innate, hardwired, stimulus-response behavior? No, the connection between a lever and saltwater was an arbitrary, learned connection. (I didn't mention it, but the researchers also played a distinctive sound each time the lever appeared. Not sure how important that is. But anyway, that connection is arbitrary and learned, too.) So what's the algorithm here? How did their brains know that this was a good plan? That's the subject of this post. What does this have to do with inner alignment? What is inner alignment anyway? Why should we care about any of this? With apologies to the regulars on this forum who already know all this, the so-called “inner alignment problem” occurs when you, a programmer, build an intelligent, foresighted, goal-seeking agent. You want it to be trying t...

university michigan speech salt ea rats ml rl byrnes starved reinforcement learning mike robinson rationalist inner alignment muzero

EfficientZero: Mastering Atari Games with Limited Data (Machine Learning Research Paper Explained)

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Nov 5, 2021 29:25

#efficientzero #muzero #atari Reinforcement Learning methods are notoriously data-hungry. Notably, MuZero learns a latent world model just from scalar feedback of reward- and policy-predictions, and therefore relies on scale to perform well. However, most RL algorithms fail when presented with very little data. EfficientZero makes several improvements over MuZero that allows it to learn from astonishingly small amounts of data and outperform other methods by a large margin in the low-sample setting. This could be a staple algorithm for future RL research. OUTLINE: 0:00 - Intro & Outline 2:30 - MuZero Recap 10:50 - EfficientZero improvements 14:15 - Self-Supervised consistency loss 17:50 - End-to-end prediction of the value prefix 20:40 - Model-based off-policy correction 25:45 - Experimental Results & Conclusion Paper: https://arxiv.org/abs/2111.00210 Code: https://github.com/YeWR/EfficientZero Note: code not there yet as of release of this video Abstract: Reinforcement learning has achieved great success in many applications. However, sample efficiency remains a key challenge, with prominent methods requiring millions (or even billions) of environment steps to train. Recently, there has been significant progress in sample efficient image-based RL algorithms; however, consistent human-level performance on the Atari game benchmark remains an elusive goal. We propose a sample efficient model-based visual RL algorithm built on MuZero, which we name EfficientZero. Our method achieves 190.4% mean human performance and 116.0% median performance on the Atari 100k benchmark with only two hours of real-time game experience and outperforms the state SAC in some tasks on the DMControl 100k benchmark. This is the first time an algorithm achieves super-human performance on Atari games with such little data. EfficientZero's performance is also close to DQN's performance at 200 million frames while we consume 500 times less data. EfficientZero's low sample complexity and high performance can bring RL closer to real-world applicability. We implement our algorithm in an easy-to-understand manner and it is available at this https URL. We hope it will accelerate the research of MCTS-based RL algorithms in the wider community. Authors: Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel, Yang Gao Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/ykilcher BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

data model mastering minds limited machine learning atari notably outline sac rl research papers mcts learning research atari games pieter abbeel muzero

101. Ayanna Howard - AI and the trust problem

Towards Data Science

Play Episode Listen Later Nov 3, 2021 53:15

Over the last two years, the capabilities of AI systems have exploded. AlphaFold2, MuZero, CLIP, DALLE, GPT-3 and many other models have extended the reach of AI to new problem classes. There's a lot to be excited about. But as we've seen in other episodes of the podcast, there's a lot more to getting value from an AI system than jacking up its capabilities. And increasingly, one of these additional missing factors is becoming trust. You can make all the powerful AIs you want, but if no one trusts their output — or if people trust it when they shouldn't — you can end up doing more harm than good. That's why we invited Ayanna Howard on the podcast. Ayanna is a roboticist, entrepreneur and Dean of the College of Engineering at Ohio State University, where she focuses her research on human-machine interactions and the factors that go into building human trust in AI systems. She joined me to talk about her research, its applications in medicine and education, and the future of human-machine trust. --- Intro music: - Artist: Ron Gelinas - Track Title: Daybreak Chill Blend (original mix) - Link to Track: https://youtu.be/d8Y2sKIgFWc --- Chapters: - 0:00 Intro - 1:30 Ayanna's background - 6:10 The interpretability of neural networks - 12:40 Domain of machine-human interaction - 17:00 The issue of preference - 20:50 Gelman/newspaper amnesia - 26:35 Assessing a person's persuadability - 31:40 Doctors and new technology - 36:00 Responsibility and accountability - 43:15 The social pressure aspect - 47:15 Is Ayanna optimistic? - 53:00 Wrap-up

trust ai college doctors wrap track responsibility engineering assessing ohio state university gpt domain clip dalle ayanna gelman alphafold2 muzero

#80 - DeepMind: un ottobre 'caldo'

Intelligenza Artificiale per l'Impresa

Play Episode Listen Later Nov 1, 2021 22:34

Questo mese di ottobre è stato caratterizzato da particolari avvenimenti, come Megatron-Turing, per fare un esempio...Ma c'è stato un protagonista che silenziosamente, nel mondo dell'Intelligenza Artificiale ha cominciato a portare delle (ulteriori) piccole novità...Stiamo parlando di una delle celebrità vere e proprie di questo mondo, DeepMind, una costola di Google dedicata a creare modelli innovativi di deep learning.Noi l'abbiamo citato più volte nel blog, parlando di AlphaGo o di MuZero.Questa volta torniamo parlando di DeepMind e di questo suo ottobre molto caldo.Per ora buon ascolto e ricordati che ci puoi ascoltare anche su:►GOOGLE PODCAST: https://bit.ly/32TDZei ►SPOTIFY: https://spoti.fi/2D3ttGF ►APPLE PODCAST: https://apple.co/2CBsSfzOppure puoi vederci su:► YOUTUBE: https://www.youtube.com/channel/UC7t_EfzzusBj3y_GauT_JCg

google google podcasts questo questa noi ottobre deepmind stiamo alphago muzero

[ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Sep 13, 2021 27:33

#mlnews #schmidhuber #muzero Your regular updates on what's happening in the ML world! OUTLINE: 0:00 - Intro 0:15 - Sponsor: Weights & Biases 1:45 - Google shuts down health streams 4:25 - AI predicts race from blurry X-Rays 7:35 - Facebook labels black men as primates 11:05 - Distill papers on Graph Neural Networks 11:50 - Jürgen Schmidhuber to lead KAUST AI Initiative 12:35 - GitHub brief on DMCA notices for source code 14:55 - Helpful Reddit Threads 19:40 - Simple Tricks to improve Transformers 20:40 - Apple's Unconstrained Scene Generation 21:40 - Common Objects in 3D dataset 22:20 - WarpDrive Multi-Agent RL framework 23:10 - My new paper: Boosting Search Agents & MuZero 25:15 - Can AI detect depression from speech? References: Google shuts down Health Streams https://techcrunch.com/2021/08/26/goo... AI predicts race from X-Rays https://www.iflscience.com/technology... https://arxiv.org/ftp/arxiv/papers/21... Facebook labels black men as primates https://www.nytimes.com/2021/09/03/te... https://en.wikipedia.org/wiki/Human Distill articles on GNNs https://distill.pub/2021/gnn-intro/ https://distill.pub/2021/understandin... Jürgen Schmidhuber leads KAUST AI initiative https://people.idsia.ch/~juergen/kaus... GitHub issues court brief on code DMCAs https://github.blog/2021-08-31-vague-... Useful Reddit Threads https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... https://www.reddit.com/r/MachineLearn... Tricks to improve Transformers https://arxiv.org/pdf/2108.12284.pdf Unconstrained Scene Generation https://apple.github.io/ml-gsn/ Common Objects in 3D dataset https://ai.facebook.com/blog/common-o... WarpDrive Multi-Agent RL framework https://blog.einstein.ai/warpdrive-fa... Boosting Search Engines / MuZero Code https://arxiv.org/abs/2109.00527 https://github.com/google-research/go... https://github.com/google-research/la... Can AI detect depression? https://venturebeat.com/2021/08/31/ai... Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yann... Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-ki... BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannick... Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n

google ai apple race search 3d tricks minds kills boosting ml github outline predicts x ray dmca distill bilibili simple tricks schmidhuber muzero

Chatting with Google's DeepMind about the future of AI

The Stack Overflow Podcast

Play Episode Listen Later Mar 2, 2021 26:46

You can find the paper on MuZero here.He blogs at Furidamu and can be found on Twitter here.The story on drug discovery powered by AI can be found here.

google ai chatting future of ai deepmind muzero

#42 - Altre 3 novità dell'Intelligenza Artificiale per i business del 2021

Intelligenza Artificiale per l'Impresa

Play Episode Listen Later Feb 9, 2021 34:53

Facciamo un ricapitolo di tutto quello che è successo nel 2020 nel campo dell'Intelligenza Artificiale.Anche se il lockdown globale ha - in buona sostanza - fermato il mondo, non è riuscito a fermare lo sviluppo dell'Intelligenza Artificiale, anzi…Come diciamo da alcuni mesi a questa parte, la pandemia ha dato una spinta molto forte al processo di trasformazione digitale.Diverse innovazioni sono venute fuori dal mondo dell'Intelligenza Artificiale, e di cui abbiamo voluto parlare nel corso del 2020...Perché secondo noi sono novità che hanno avuto un effettivo impatto sul business...Ne diciamo solo alcune, ne abbiamo parlato in una puntata apposta...►►Less Than One Shot Learning: un training di immagini che non utilizza immagini.►►AI Hacker Proof, un training fatto con una rete neurale che deve si irrobustisce proteggendosi dagli attacchi di un'altra rete neurale.►►MuZero, l'algoritmo inventato da DeepMind che riesce a vincere a Go, Shogi, Scacchi e Atari senza conoscere fondamentalmente le regole.►►Vokenizzazione, un sistema di apprendimento che unisce sia le parole scritte alle immagini per migliorare la comprensione della realtà.►►L'app dell'MIT che riconosce l'infezione da Covid, ascoltando il colpo di tosse di un utente.►►L'algoritmo al 100% affidabile che riesce a dire quanto è effettivamente corretta la previsione.E c'è stata anche una grossa, grossissima innovazione dell'anno scorso... ne abbiamo parlato per tutto il 2020...IL GPT-3, appunto.L'algoritmo di OpenAI ha segnato il nuovo gold standard degli algoritmi di linguaggio...E la sua uscita sul mercato ha dato uno slancio agli investimenti in AI, sia nel mondo del business che in quello della ricerca e dello sviluppo.Infatti, il GPT-3 ha spinto diversi gruppi di ricerca e sviluppo a sfruttarne il funzionamento del per creare nuovi algoritmi o migliorare quelli già presenti.E le novità di cui abbiamo voluto parlare nella puntata #42 del podcast sono molto legate al GPT-3, e avranno un innegabile impatto sul business.Per ora buona visione e ricordati che ci puoi ascoltare anche su:►GOOGLE PODCAST: https://bit.ly/32TDZei ►SPOTIFY: https://spoti.fi/2D3ttGF ►APPLE PODCAST: https://apple.co/2CBsSfz►SPREAKER: https://bit.ly/3f6D5PA #artificialintelligence #industry40 #machinelearning #deeplearning***P.S. Vuoi scoprire come implementare un sistema di Intelligenza Artificiale alla tua azienda?►► Telefona al numero verde 800-270-021►► Scrivi un'email a info@bluetensor.ai►► Visita il sito www.bluetensor.ai

EP65【科技趨勢話題】完全自學規則的AI誕生！人工智慧征服圍棋後，也將征服人類？/專訪「紅面棋王」周俊勳

IC之音｜創意領航家

Play Episode Listen Later Feb 2, 2021 21:26

《科技領航家》每週向科技人聊科技事，近期您如果有關心產業發展，您會發現在2021年，AI依然是產業非常火熱的話題。CES 2021前陣子落幕，擁有許多AI新應用的發表，但您有想過自己該如何跟AI共處嗎？而現在AI又發展到什麼程度了呢？還記得之前DeepMind公司創造的AlphaGo與圍棋世界冠軍柯潔對弈，AlphaGo壓倒性勝出，人工智慧發展震驚全球。去年底DeepMind 發表新一代的 AI 系統 MuZero，更聲稱能在不知道遊戲規則的狀態下自學，精通西洋棋、圍棋、將棋等等。今年節目非常榮幸邀請到全台灣唯一和AlphaGo下過棋的人，他就是曾勇奪世界職業圍棋冠軍的「紅面棋王」周俊勳。從2008年起，周俊勳就加入圍棋人工智慧的對弈訓練，一路見證圍棋AI發展。歡迎棋王分享AI與人類的差異為何？AI如何改變圍棋世界？我們該怎麼放大身為人類的優勢？人類棋士的覆盤與AI的自我調整差別在哪？

ai deepmind ai ai alphago deepmind ai muzero

Can autonomous weapons be safe? #RB6

Robustly Beneficial Podcast

Play Episode Listen Later Feb 21, 2020 37:13

Intelligent Autonomous Things on the Battlefield. AI for the Internet of Everything. A Kott and E Stump 19. https://arxiv.org/ftp/arxiv/papers/1902/1902.10086.pdf Slaughterbots. Future of life Institute 17. https://www.youtube.com/watch?v=HipTO_7mUOw The Future of War, and How It Affects YOU (Multi-Domain Operations). Smarter Every Day 211. https://www.youtube.com/watch?v=qOTYgcdNrXE Find out more on the Robustly Beneficial Wiki: https://robustlybeneficial.org/wiki/index.php?title=Robustly_beneficial https://robustlybeneficial.org/wiki/index.php?title=Robust_statistics Next week's paper is about MuZero. Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model. SAHSS+20. https://arxiv.org/abs/1911.08265

ai internet future war planning safe institute battlefield chess autonomous weapons kott shogi smarter every day muzero

Podcasts about muzero

Best podcasts about muzero

The Nonlinear Library

Yannic Kilcher Videos (Audio Only)

Towards Data Science

Game over?

Intelligenza Artificiale per l'Impresa

The Nonlinear Library: Alignment Forum Top Posts

Latest news about muzero

Latest podcast episodes about muzero

How Machines Learn to Ignore the Noise (Kevin Ellis + Zenna Tavares)

ReflectionAI Founder Ioannis Antonoglou: From AlphaGo to AGI

AF - Experiment Idea: RL Agents Evading Learned Shutdownability by Leon Lang

AF - An Open Agency Architecture for Safe Transformative AI by davidad (David A. Dalrymple)

Julekalender luke 14: AI for å komprimere video

Collapsing AGI timelines, with Ross Nordby

Transformers are Sample Efficient World Models

AF - Remaking EfficientZero (as best I can) by Hoagy

MuZero tar turen ut i den virkelige verden: Videokompresjon

LW - Supervise Process, not Outcomes by stuhlmueller

LW - Supervise Process, not Outcomes by stuhlmueller

News 09/22: Tech-Sanktionen gegen Russland // GitHub Advisory Database // Google Privacy Sandbox // MuZero // Google for Games Developer Summit

[ML News] Uber: Deep Learning for ETA | MuZero Video Compression | Block-NeRF | EfficientNet-X

AF - Christiano and Yudkowsky on AI predictions and human intelligence by Eliezer Yudkowsky

AI for Ancient Languages, Eye Tracking for Ads, Pulpy AI Sci Fi Covers

The road to AGI

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern

EfficientZero: human ALE sample-efficiency w/MuZero+self-supervised by gwern

106. Yang Gao - Sample-efficient AI

Inner Alignment in Salt-Starved Rats by Steve Byrnes

EfficientZero: Mastering Atari Games with Limited Data (Machine Learning Research Paper Explained)

101. Ayanna Howard - AI and the trust problem

#80 - DeepMind: un ottobre 'caldo'

[ML News] AI predicts race from X-Ray | Google kills HealthStreams | Boosting Search with MuZero

Chatting with Google's DeepMind about the future of AI

#42 - Altre 3 novità dell'Intelligenza Artificiale per i business del 2021

EP65【科技趨勢話題】完全自學規則的AI誕生！人工智慧征服圍棋後，也將征服人類？/專訪「紅面棋王」周俊勳

Can autonomous weapons be safe? #RB6