Podcasts about deep reinforcement learning

  • 56PODCASTS
  • 87EPISODES
  • 46mAVG DURATION
  • ?INFREQUENT EPISODES
  • Oct 16, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about deep reinforcement learning

Latest podcast episodes about deep reinforcement learning

The MAD Podcast with Matt Turck
How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

The MAD Podcast with Matt Turck

Play Episode Listen Later Oct 16, 2025 76:04


What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI's VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn't). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don't), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI? This is the MAD Podcast —AI for the 99%. If you're curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.OpenAIWebsite - https://openai.comX/Twitter - https://x.com/OpenAIJerry TworekLinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56X/Twitter - https://x.com/millionintFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro(01:01) What Reasoning Actually Means in AI(02:32) Chain of Thought: Models Thinking in Words(05:25) How Models Decide Thinking Time(07:24) Evolution from O1 to O3 to GPT-5(11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading(20:32) Working on Robotics and Rubik's Cube Solving(23:02) A Day in the Life: Talking to Researchers(24:06) How Research Priorities Are Determined(26:53) Collaboration vs IP Protection at OpenAI(29:32) Shipping Fast While Doing Deep Research(31:52) Using OpenAI's Own Tools Daily(32:43) Pre-Training Plus RL: The Modern AI Stack(35:10) Reinforcement Learning 101: Training Dogs(40:17) The Evolution of Deep Reinforcement Learning(42:09) When GPT-4 Seemed Underwhelming at First(45:39) How RLHF Made GPT-4 Actually Useful(48:02) Unsupervised vs Supervised Learning(49:59) GRPO and How DeepSeek Accelerated US Research(53:05) What It Takes to Scale Reinforcement Learning(55:36) Agentic AI and Long-Horizon Thinking(59:19) Alignment as an RL Problem(1:01:11) Winning ICPC World Finals Without Specific Training(1:05:53) Applying RL Beyond Math and Coding(1:09:15) The Path from Here to AGI(1:12:23) Pure RL vs Language Models

Mediterranean Sustainability Partners
Sustainable AI : Closing the gap in the circular economy

Mediterranean Sustainability Partners

Play Episode Listen Later Jan 24, 2025 40:05


Here is the interview with Dr. Andreas Windisch , as we discover in the first segment, who he is, in the second segment, his work, and finally in the 3rd segment, closing gap in the circular economy . Here is a brief bio : Andreas is a highly skilled theoretical physicist, manager, lecturer and researcher in AI. Profile Expertise in AI, Machine Learning and Deep (Reinforcement) Learning, also experience with IBM Quantum Computing package Qiskit. Profound IT knowledge, programming experience in many languages. Repeated participation in work groups for Austrian Ministries on various subjects regarding Artificial Intelligence. Experienced public speaker, currently also lecturer. Worked 5 years in USA, supervised students, won research grants. US Private Pilot.

ITSPmagazine | Technology. Cybersecurity. Society
Generative AI and Large Language Model (LLM) Prompt Hacking: Exposing Systemic Vulnerabilities of LLMs to Enhance AI Security Through Innovative Red Teaming Competitions | A Conversation with Sander Schulhoff | Redefining CyberSecurity with Sean Martin

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Sep 11, 2024 35:14


Guest: Sander Schulhoff, CEO and Co-Founder, Learn Prompting [@learnprompting]On LinkedIn | https://www.linkedin.com/in/sander-schulhoff/____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinView This Show's Sponsors___________________________Episode NotesIn this episode of Redefining CyberSecurity, host Sean Martin engages with Sander Schulhoff, CEO and Co-Founder of Learn Prompting and a researcher at the University of Maryland. The discussion focuses on the critical intersection of artificial intelligence (AI) and cybersecurity, particularly the role of prompt engineering in the evolving AI landscape. Schulhoff's extensive work in natural language processing (NLP) and deep reinforcement learning provides a robust foundation for this insightful conversation.Prompt engineering, a vital part of AI research and development, involves creating effective input prompts that guide AI models to produce desired outputs. Schulhoff explains that the diversity of prompt techniques is vast and includes methods like the chain of thought, which helps AI articulate its reasoning steps to solve complex problems. However, the conversation highlights that there are significant security concerns that accompany these techniques.One such concern is the vulnerability of systems when they integrate user-generated prompts with AI models, especially those prompts that can execute code or interact with external databases. Security flaws can arise when these systems are not adequately sandboxed or otherwise protected, as demonstrated by Schulhoff through real-world examples like MathGPT, a tool that was exploited to run arbitrary code by injecting malicious prompts into the AI's input.Schulhoff's insights into the AI Village at DEF CON underline the community's nascent but growing focus on AI security. He notes an intriguing pattern: many participants in AI-specific red teaming events were beginners, which suggests a gap in traditional red teamer familiarity with AI systems. This gap necessitates targeted education and training, something Schulhoff is actively pursuing through initiatives at Learn Prompting.The discussion also covers the importance of studying and understanding the potential risks posed by AI models in business applications. With AI increasingly integrated into various sectors, including security, the stakes for anticipating and mitigating risks are high. Schulhoff mentions that his team is working on Hack A Prompt, a global prompt injection competition aimed at crowdsourcing diverse attack strategies. This initiative not only helps model developers understand potential vulnerabilities but also furthers the collective knowledge base necessary for building more secure AI systems.As AI continues to intersect with various business processes and applications, the role of security becomes paramount. This episode underscores the need for collaboration between prompt engineers, security professionals, and organizations at large to ensure that AI advancements are accompanied by robust, proactive security measures. By fostering awareness and education, and through collaborative competitions like Hack A Prompt, the community can better prepare for the multifaceted challenges that AI security presents.Top Questions AddressedWhat are the key security concerns associated with prompt engineering?How can organizations ensure the security of AI systems that integrate user-generated prompts?What steps can be taken to bridge the knowledge gap in AI security among traditional security professionals?___________________________SponsorsImperva: https://itspm.ag/imperva277117988LevelBlue: https://itspm.ag/attcybersecurity-3jdk3___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:

Redefining CyberSecurity
Generative AI and Large Language Model (LLM) Prompt Hacking: Exposing Systemic Vulnerabilities of LLMs to Enhance AI Security Through Innovative Red Teaming Competitions | A Conversation with Sander Schulhoff | Redefining CyberSecurity with Sean Martin

Redefining CyberSecurity

Play Episode Listen Later Sep 11, 2024 35:14


Guest: Sander Schulhoff, CEO and Co-Founder, Learn Prompting [@learnprompting]On LinkedIn | https://www.linkedin.com/in/sander-schulhoff/____________________________Host: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinView This Show's Sponsors___________________________Episode NotesIn this episode of Redefining CyberSecurity, host Sean Martin engages with Sander Schulhoff, CEO and Co-Founder of Learn Prompting and a researcher at the University of Maryland. The discussion focuses on the critical intersection of artificial intelligence (AI) and cybersecurity, particularly the role of prompt engineering in the evolving AI landscape. Schulhoff's extensive work in natural language processing (NLP) and deep reinforcement learning provides a robust foundation for this insightful conversation.Prompt engineering, a vital part of AI research and development, involves creating effective input prompts that guide AI models to produce desired outputs. Schulhoff explains that the diversity of prompt techniques is vast and includes methods like the chain of thought, which helps AI articulate its reasoning steps to solve complex problems. However, the conversation highlights that there are significant security concerns that accompany these techniques.One such concern is the vulnerability of systems when they integrate user-generated prompts with AI models, especially those prompts that can execute code or interact with external databases. Security flaws can arise when these systems are not adequately sandboxed or otherwise protected, as demonstrated by Schulhoff through real-world examples like MathGPT, a tool that was exploited to run arbitrary code by injecting malicious prompts into the AI's input.Schulhoff's insights into the AI Village at DEF CON underline the community's nascent but growing focus on AI security. He notes an intriguing pattern: many participants in AI-specific red teaming events were beginners, which suggests a gap in traditional red teamer familiarity with AI systems. This gap necessitates targeted education and training, something Schulhoff is actively pursuing through initiatives at Learn Prompting.The discussion also covers the importance of studying and understanding the potential risks posed by AI models in business applications. With AI increasingly integrated into various sectors, including security, the stakes for anticipating and mitigating risks are high. Schulhoff mentions that his team is working on Hack A Prompt, a global prompt injection competition aimed at crowdsourcing diverse attack strategies. This initiative not only helps model developers understand potential vulnerabilities but also furthers the collective knowledge base necessary for building more secure AI systems.As AI continues to intersect with various business processes and applications, the role of security becomes paramount. This episode underscores the need for collaboration between prompt engineers, security professionals, and organizations at large to ensure that AI advancements are accompanied by robust, proactive security measures. By fostering awareness and education, and through collaborative competitions like Hack A Prompt, the community can better prepare for the multifaceted challenges that AI security presents.Top Questions AddressedWhat are the key security concerns associated with prompt engineering?How can organizations ensure the security of AI systems that integrate user-generated prompts?What steps can be taken to bridge the knowledge gap in AI security among traditional security professionals?___________________________SponsorsImperva: https://itspm.ag/imperva277117988LevelBlue: https://itspm.ag/attcybersecurity-3jdk3___________________________Watch this and other videos on ITSPmagazine's YouTube ChannelRedefining CyberSecurity Podcast with Sean Martin, CISSP playlist:

ITSPmagazine | Technology. Cybersecurity. Society
Deep Backdoors in Deep Reinforcement Learning Agents | A Black Hat USA 2024 Conversation with Vas Mavroudis and Jamie Gawith | On Location Coverage with Sean Martin and Marco Ciappelli

ITSPmagazine | Technology. Cybersecurity. Society

Play Episode Listen Later Aug 1, 2024 24:11


Guests: Vas Mavroudis, Principal Research Scientist, The Alan Turing InstituteWebsite | https://mavroud.is/At BlackHat | https://www.blackhat.com/us-24/briefings/schedule/speakers.html#vasilios-mavroudis-34757Jamie Gawith, Assistant Professor of Electrical Engineering, University of BathOn LinkedIn | https://www.linkedin.com/in/jamie-gawith-63560b60/At BlackHat | https://www.blackhat.com/us-24/briefings/schedule/speakers.html#jamie-gawith-48261____________________________Hosts: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinMarco Ciappelli, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining Society PodcastOn ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/marco-ciappelli____________________________Episode NotesAs Black Hat Conference 2024 approaches, Sean Martin and Marco Ciappelli are gearing up for a conversation about the complexities of deep reinforcement learning and the potential cybersecurity threats posed by backdoors in these systems. They will be joined by Vas Mavroudis from the Alan Turing Institute and Jamie Gawith from the University of Bath, who will be presenting their cutting-edge research at the event.Setting the Stage: The discussion begins with Sean and Marco sharing their excitement about the upcoming conference. They set a professional and engaging tone, seamlessly leading into the introduction of their guests, Jamie and Vas.The Core Discussion: Sean introduces the main focus of their upcoming session, titled "Backdoors in Deep Reinforcement Learning Agents." Expressing curiosity and anticipation, he invites Jamie and Vas to share more about their backgrounds and the significance of their work in this area.Expert Introductions: Jamie Gawith explains his journey from working in power electronics and nuclear fusion to focusing on cybersecurity. His collaboration with Vas arose from a shared interest in using reinforcement learning agents for controlling nuclear fusion reactors. He describes the crucial role these agents play and the potential risks associated with their deployment in critical environments.Vas Mavroudis introduces himself as a principal research scientist at the Alan Turing Institute, leading a team focused on autonomous cyber defense. His work involves developing and securing autonomous agents tasked with defending networks and systems from cyber threats. The conversation highlights the vulnerabilities of these agents to backdoors and the need for robust security measures.Deep Dive into Reinforcement Learning: Vas offers an overview of reinforcement learning, highlighting its differences from supervised and unsupervised learning. He emphasizes the importance of real-world experiences in training these agents to make optimal decisions through trial and error. The conversation also touches on the use of deep neural networks, which enhance the capabilities of reinforcement learning models but also introduce complexities that can be exploited.Security Concerns: The discussion then shifts to the security challenges associated with reinforcement learning models. Vas explains the concept of backdoors in machine learning and the unique challenges they present. Unlike traditional software backdoors, these are hidden within the neural network layers, making detection difficult.Real-World Implications: Jamie discusses the practical implications of these security issues, particularly in high-stakes scenarios like nuclear fusion reactors. He outlines the potential catastrophic consequences of a backdoor-triggered failure, underscoring the importance of securing these models to prevent malicious exploitation.Looking Ahead: Sean and Marco express their anticipation for the upcoming session, highlighting the collaborative efforts of Vas, Jamie, and their teams in tackling these critical issues. They emphasize the significance of this research and its implications for the future of autonomous systems.Conclusion: This pre-event conversation sets the stage for a compelling session at Black Hat Conference 2024. It offers attendees a preview of the insights and discussions they can expect about the intersection of deep reinforcement learning and cybersecurity. The session promises to provide valuable knowledge on protecting advanced technologies from emerging threats.Be sure to follow our Coverage Journey and subscribe to our podcasts!____________________________This Episode's SponsorsLevelBlue: https://itspm.ag/levelblue266f6cCoro: https://itspm.ag/coronet-30deSquareX: https://itspm.ag/sqrx-l91____________________________Follow our Black Hat USA  2024 coverage: https://www.itspmagazine.com/black-hat-usa-2024-hacker-summer-camp-2024-event-coverage-in-las-vegasOn YouTube:

Redefining CyberSecurity
Deep Backdoors in Deep Reinforcement Learning Agents | A Black Hat USA 2024 Conversation with Vas Mavroudis and Jamie Gawith | On Location Coverage with Sean Martin and Marco Ciappelli

Redefining CyberSecurity

Play Episode Listen Later Aug 1, 2024 24:11


Guests: Vas Mavroudis, Principal Research Scientist, The Alan Turing InstituteWebsite | https://mavroud.is/At BlackHat | https://www.blackhat.com/us-24/briefings/schedule/speakers.html#vasilios-mavroudis-34757Jamie Gawith, Assistant Professor of Electrical Engineering, University of BathOn LinkedIn | https://www.linkedin.com/in/jamie-gawith-63560b60/At BlackHat | https://www.blackhat.com/us-24/briefings/schedule/speakers.html#jamie-gawith-48261____________________________Hosts: Sean Martin, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining CyberSecurity Podcast [@RedefiningCyber]On ITSPmagazine | https://www.itspmagazine.com/sean-martinMarco Ciappelli, Co-Founder at ITSPmagazine [@ITSPmagazine] and Host of Redefining Society PodcastOn ITSPmagazine | https://www.itspmagazine.com/itspmagazine-podcast-radio-hosts/marco-ciappelli____________________________Episode NotesAs Black Hat Conference 2024 approaches, Sean Martin and Marco Ciappelli are gearing up for a conversation about the complexities of deep reinforcement learning and the potential cybersecurity threats posed by backdoors in these systems. They will be joined by Vas Mavroudis from the Alan Turing Institute and Jamie Gawith from the University of Bath, who will be presenting their cutting-edge research at the event.Setting the Stage: The discussion begins with Sean and Marco sharing their excitement about the upcoming conference. They set a professional and engaging tone, seamlessly leading into the introduction of their guests, Jamie and Vas.The Core Discussion: Sean introduces the main focus of their upcoming session, titled "Backdoors in Deep Reinforcement Learning Agents." Expressing curiosity and anticipation, he invites Jamie and Vas to share more about their backgrounds and the significance of their work in this area.Expert Introductions: Jamie Gawith explains his journey from working in power electronics and nuclear fusion to focusing on cybersecurity. His collaboration with Vas arose from a shared interest in using reinforcement learning agents for controlling nuclear fusion reactors. He describes the crucial role these agents play and the potential risks associated with their deployment in critical environments.Vas Mavroudis introduces himself as a principal research scientist at the Alan Turing Institute, leading a team focused on autonomous cyber defense. His work involves developing and securing autonomous agents tasked with defending networks and systems from cyber threats. The conversation highlights the vulnerabilities of these agents to backdoors and the need for robust security measures.Deep Dive into Reinforcement Learning: Vas offers an overview of reinforcement learning, highlighting its differences from supervised and unsupervised learning. He emphasizes the importance of real-world experiences in training these agents to make optimal decisions through trial and error. The conversation also touches on the use of deep neural networks, which enhance the capabilities of reinforcement learning models but also introduce complexities that can be exploited.Security Concerns: The discussion then shifts to the security challenges associated with reinforcement learning models. Vas explains the concept of backdoors in machine learning and the unique challenges they present. Unlike traditional software backdoors, these are hidden within the neural network layers, making detection difficult.Real-World Implications: Jamie discusses the practical implications of these security issues, particularly in high-stakes scenarios like nuclear fusion reactors. He outlines the potential catastrophic consequences of a backdoor-triggered failure, underscoring the importance of securing these models to prevent malicious exploitation.Looking Ahead: Sean and Marco express their anticipation for the upcoming session, highlighting the collaborative efforts of Vas, Jamie, and their teams in tackling these critical issues. They emphasize the significance of this research and its implications for the future of autonomous systems.Conclusion: This pre-event conversation sets the stage for a compelling session at Black Hat Conference 2024. It offers attendees a preview of the insights and discussions they can expect about the intersection of deep reinforcement learning and cybersecurity. The session promises to provide valuable knowledge on protecting advanced technologies from emerging threats.Be sure to follow our Coverage Journey and subscribe to our podcasts!____________________________This Episode's SponsorsLevelBlue: https://itspm.ag/levelblue266f6cCoro: https://itspm.ag/coronet-30deSquareX: https://itspm.ag/sqrx-l91____________________________Follow our Black Hat USA  2024 coverage: https://www.itspmagazine.com/black-hat-usa-2024-hacker-summer-camp-2024-event-coverage-in-las-vegasOn YouTube:

Many Minds
From the archive: What does ChatGPT really know?

Many Minds

Play Episode Listen Later Jul 24, 2024 55:10


Hi friends, we're on a brief summer break at the moment. We'll have a new episode for you in August. In the meanwhile, enjoy this pick from our archives! ---- [originally aired January 25, 2023] By now you've probably heard about the new chatbot called ChatGPT. There's no question it's something of a marvel. It distills complex information into clear prose; it offers instructions and suggestions; it reasons its way through problems. With the right prompting, it can even mimic famous writers. And it does all this with an air of cool competence, of intelligence. But, if you're like me, you've probably also been wondering: What's really going on here? What are ChatGPT—and other large language models like it—actually doing? How much of their apparent competence is just smoke and mirrors? In what sense, if any, do they have human-like capacities? My guest today is Dr. Murray Shanahan. Murray is Professor of Cognitive Robotics at Imperial College London and Senior Research Scientist at DeepMind. He's the author of numerous articles and several books at the lively intersections of artificial intelligence, neuroscience, and philosophy. Very recently, Murray put out a paper titled 'Talking about Large Language Models', and it's the focus of our conversation today. In the paper, Murray argues that—tempting as may be—it's not appropriate to talk about large language models in anthropomorphic terms. Not yet, anyway. Here, we chat about the rapid rise of large language models and the basics of how they work. We discuss how a model that—at its base—simply does “next-word prediction" can be engineered into a savvy chatbot like ChatGPT. We talk about why ChatGPT lacks genuine “knowledge” and “understanding”—at least as we currently use those terms. And we discuss what it might take for these models to eventually possess richer, more human-like capacities. Along the way, we touch on: emergence, prompt engineering, embodiment and grounding, image generation models, Wittgenstein, the intentional stance, soft robots, and "exotic mind-like entities." Before we get to it, just a friendly reminder: applications are now open for the Diverse Intelligences Summer Institute (or DISI). DISI will be held this June/July in St Andrews Scotland—the program consists of three weeks of intense interdisciplinary engagement with exactly the kinds of ideas and questions we like to wrestle with here on this show. If you're intrigued—and I hope you are!—check out disi.org for more info. Alright friends, on to my decidedly human chat, with Dr. Murray Shanahan. Enjoy!   The paper we discuss is here. A transcript of this episode is here.   Notes and links 6:30 – The 2017 “breakthrough” article by Vaswani and colleagues. 8:00 – A popular article about GPT-3. 10:00 – A popular article about some of the impressive—and not so impressive—behaviors of ChatGPT. For more discussion of ChatGPT and other large language models, see another interview with Dr. Shanahan, as well as interviews with Emily Bender and Margaret Mitchell, with Gary Marcus, and with Sam Altman (CEO of OpenAI, which created ChatGPT). 14:00 – A widely discussed paper by Emily Bender and colleagues on the “dangers of stochastic parrots.” 19:00 – A blog post about “prompt engineering”. Another blog post about the concept of Reinforcement Learning through Human Feedback, in the context of ChatGPT. 30:00 – One of Dr. Shanahan's books is titled, Embodiment and the Inner Life. 39:00 – An example of a robotic agent, SayCan, which is connected to a language model. 40:30 – On the notion of embodiment in the cognitive sciences, see the classic book by Francisco Varela and colleagues, The Embodied Mind. 44:00 – For a detailed primer on the philosophy of Ludwig Wittgenstein, see here. 45:00 – See Dr. Shanahan's general audience essay on “conscious exotica" and the space of possible minds. 49:00 – See Dennett's book, The Intentional Stance.   Dr. Shanahan recommends: Artificial Intelligence: A Guide for Thinking Humans, by Melanie Mitchell (see also our earlier episode with Dr. Mitchell) ‘Abstraction for Deep Reinforcement Learning', by M. Shanahan and M. Mitchell   You can read more about Murray's work on his website and follow him on Twitter.   Many Minds is a project of the Diverse Intelligences Summer Institute (DISI) (https://disi.org), which is made possible by a generous grant from the Templeton World Charity Foundation to UCLA. It is hosted and produced by Kensy Cooperrider, with help from Assistant Producer Urte Laukaityte and with creative support from DISI Directors Erica Cartmill and Jacob Foster. Our artwork is by Ben Oldroyd (https://www.mayhilldesigns.co.uk/). Our transcripts are created by Sarah Dopierala (https://sarahdopierala.wordpress.com/). You can subscribe to Many Minds on Apple, Stitcher, Spotify, Pocket Casts, Google Play, or wherever you like to listen to podcasts. **You can now subscribe to the Many Minds newsletter here!** We welcome your comments, questions, and suggestions. Feel free to email us at: manymindspodcast@gmail.com. For updates about the show, visit our website (https://disi.org/manyminds/), or follow us on Twitter: @ManyMindsPod.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Controlling Fusion Reactor Instability with Deep Reinforcement Learning with Aza Jalalvand - #682

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 29, 2024 42:09


Today we're joined by Azarakhsh (Aza) Jalalvand, a research scholar at Princeton University, to discuss his work using deep reinforcement learning to control plasma instabilities in nuclear fusion reactors. Aza explains his team developed a model to detect and avoid a fatal plasma instability called ‘tearing mode'. Aza walks us through the process of collecting and pre-processing the complex diagnostic data from fusion experiments, training the models, and deploying the controller algorithm on the DIII-D fusion research reactor. He shares insights from developing the controller and discusses the future challenges and opportunities for AI in enabling stable and efficient fusion energy production. The complete show notes for this episode can be found at twimlai.com/go/682.

Digital Marketing Legend Leaks
EP916: Deep Reinforcement Learning Teaching AI to Learn from Experience

Digital Marketing Legend Leaks

Play Episode Listen Later Apr 20, 2024 4:01


SPREAKER, PODCAST, PODCASTING, AI, ARTIFICIALINTELLIGENCE, DIGITALMARKETING, marketing, FutureMarketing, AIFuture, FutureofAISpreaker Top Podcast of the Year in Artificial Intelligence Digital Marketing - AI DigitalMarketingis Digital Marketing Legend Leaks, Srinidhi Ranganathan - the human AI. THE BEST IN CREATIVE FICTION AND NON-FICTION!Become a supporter of this podcast: https://www.spreaker.com/podcast/digital-marketing-legend-leaks--4375666/support.

SuperDataScience
773: Deep Reinforcement Learning for Maximizing Profits, with Prof. Barrett Thomas

SuperDataScience

Play Episode Listen Later Apr 9, 2024 67:40


Dr. Barrett Thomas, an award-winning Research Professor at the University of Iowa, explores the intricacies of Markov decision processes and their connection to Deep Reinforcement Learning. Discover how these concepts are applied in operations research to enhance business efficiency and drive innovations in same-day delivery and autonomous transportation systems. This episode is brought to you by Ready Tensor, where innovation meets reproducibility (https://www.readytensor.ai/). Interested in sponsoring a SuperDataScience Podcast episode? Visit passionfroot.me/superdatascience for sponsorship information. In this episode you will learn: • Barrett's start in operations logistics [02:27] • Concorde Solver and the traveling salesperson problem [09:59] • Cross-function approximation explained [19:13] • How Markov decision processes relate to deep reinforcement learning [26:08] • Understanding policy in decision-making contexts [33:40] • Revolutionizing supply chains and transportation with aerial drones [46:47] • Barrett's career evolution: past changes and future prospects [52:19] Additional materials: www.superdatascience.com/773

TalkRL: The Reinforcement Learning Podcast

Max Schwarzer is a PhD student at Mila, with Aaron Courville and Marc Bellemare, interested in RL scaling, representation learning for RL, and RL for science.  Max spent the last 1.5 years at Google Brain/DeepMind, and is now at Apple Machine Learning Research.   Featured References Bigger, Better, Faster: Human-level Atari with human-level efficiency  Max Schwarzer, Johan Obando-Ceron, Aaron Courville, Marc Bellemare, Rishabh Agarwal, Pablo Samuel Castro  Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville  The Primacy Bias in Deep Reinforcement Learning Evgenii Nikishin, Max Schwarzer, Pierluca D'Oro, Pierre-Luc Bacon, Aaron Courville  Additional References    Rainbow: Combining Improvements in Deep Reinforcement Learning, Hessel et al 2017   When to use parametric models in reinforcement learning? Hasselt et al 2019  Data-Efficient Reinforcement Learning with Self-Predictive Representations, Schwarzer et al 2020   Pretraining Representations for Data-Efficient Reinforcement Learning, Schwarzer et al 2021  

TalkRL: The Reinforcement Learning Podcast

Julian Togelius is an Associate Professor of Computer Science and Engineering at NYU, and Cofounder and research director at modl.ai  Featured References  Choose Your Weapon: Survival Strategies for Depressed AI AcademicsJulian Togelius, Georgios N. YannakakisLearning Controllable 3D Level GeneratorsZehua Jiang, Sam Earle, Michael Cerny Green, Julian TogeliusPCGRL: Procedural Content Generation via Reinforcement LearningAhmed Khalifa, Philip Bontrager, Sam Earle, Julian TogeliusIlluminating Generalization in Deep Reinforcement Learning through Procedural Level GenerationNiels Justesen, Ruben Rodriguez Torrado, Philip Bontrager, Ahmed Khalifa, Julian Togelius, Sebastian Risi

Game over?
Mess-AI: Mesterlig Målscoring i Robot-Fotballens verden

Game over?

Play Episode Listen Later Jul 7, 2023 27:36


I denne episoden av Game Over dykker vi dypt ned i robotfotballens verden. Vi ser nærmere på artikkelen "Learning Agile Soccer Skills for a Bipedal Robot with Deep Reinforcement Learning" og utforsker hvordan simulering, dyp forsterkningslæring og zero-shot overføring kan lære roboter avanserte fotballferdigheter. Fra virtuell trening til realverdenen utforsker vi suksesser, utfordringer og mulige applikasjoner av disse AI-drevne robotene.Artikkel og blog: https://sites.google.com/view/op3-soccer

Machine Learning Street Talk
#114 - Secrets of Deep Reinforcement Learning (Minqi Jiang)

Machine Learning Street Talk

Play Episode Listen Later Apr 16, 2023 167:15


Patreon: https://www.patreon.com/mlst Discord: https://discord.gg/ESrGqhf5CB Twitter: https://twitter.com/MLStreetTalk In this exclusive interview, Dr. Tim Scarfe sits down with Minqi Jiang, a leading PhD student at University College London and Meta AI, as they delve into the fascinating world of deep reinforcement learning (RL) and its impact on technology, startups, and research. Discover how Minqi made the crucial decision to pursue a PhD in this exciting field, and learn from his valuable startup experiences and lessons. Minqi shares his insights into balancing serendipity and planning in life and research, and explains the role of objectives and Goodhart's Law in decision-making. Get ready to explore the depths of robustness in RL, two-player zero-sum games, and the differences between RL and supervised learning. As they discuss the role of environment in intelligence, emergence, and abstraction, prepare to be blown away by the possibilities of open-endedness and the intelligence explosion. Learn how language models generate their own training data, the limitations of RL, and the future of software 2.0 with interpretability concerns. From robotics and open-ended learning applications to learning potential metrics and MDPs, this interview is a goldmine of information for anyone interested in AI, RL, and the cutting edge of technology. Don't miss out on this incredible opportunity to learn from a rising star in the AI world! TOC Tech & Startup Background [00:00:00] Pursuing PhD in Deep RL [00:03:59] Startup Lessons [00:11:33] Serendipity vs Planning [00:12:30] Objectives & Decision Making [00:19:19] Minimax Regret & Uncertainty [00:22:57] Robustness in RL & Zero-Sum Games [00:26:14] RL vs Supervised Learning [00:34:04] Exploration & Intelligence [00:41:27] Environment, Emergence, Abstraction [00:46:31] Open-endedness & Intelligence Explosion [00:54:28] Language Models & Training Data [01:04:59] RLHF & Language Models [01:16:37] Creativity in Language Models [01:27:25] Limitations of RL [01:40:58] Software 2.0 & Interpretability [01:45:11] Language Models & Code Reliability [01:48:23] Robust Prioritized Level Replay [01:51:42] Open-ended Learning [01:55:57] Auto-curriculum & Deep RL [02:08:48] Robotics & Open-ended Learning [02:31:05] Learning Potential & MDPs [02:36:20] Universal Function Space [02:42:02] Goal-Directed Learning & Auto-Curricula [02:42:48] Advice & Closing Thoughts [02:44:47] References: - Why Greatness Cannot Be Planned: The Myth of the Objective by Kenneth O. Stanley and Joel Lehman https://www.springer.com/gp/book/9783319155234 - Rethinking Exploration: General Intelligence Requires Rethinking Exploration https://arxiv.org/abs/2106.06860 - The Case for Strong Emergence (Sabine Hossenfelder) https://arxiv.org/abs/2102.07740 - The Game of Life (Conway) https://www.conwaylife.com/ - Toolformer: Teaching Language Models to Generate APIs (Meta AI) https://arxiv.org/abs/2302.04761 - OpenAI's POET: Paired Open-Ended Trailblazer https://arxiv.org/abs/1901.01753 - Schmidhuber's Artificial Curiosity https://people.idsia.ch/~juergen/interest.html - Gödel Machines https://people.idsia.ch/~juergen/goedelmachine.html - PowerPlay https://arxiv.org/abs/1112.5309 - Robust Prioritized Level Replay: https://openreview.net/forum?id=NfZ6g2OmXEk - Unsupervised Environment Design: https://arxiv.org/abs/2012.02096 - Excel: Evolving Curriculum Learning for Deep Reinforcement Learning https://arxiv.org/abs/1901.05431 - Go-Explore: A New Approach for Hard-Exploration Problems https://arxiv.org/abs/1901.10995 - Learning with AMIGo: Adversarially Motivated Intrinsic Goals https://www.researchgate.net/publication/342377312_Learning_with_AMIGo_Adversarially_Motivated_Intrinsic_Goals PRML https://www.microsoft.com/en-us/research/uploads/prod/2006/01/Bishop-Pattern-Recognition-and-Machine-Learning-2006.pdf Sutton and Barto https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf

Many Minds
What does ChatGPT really know?

Many Minds

Play Episode Listen Later Jan 25, 2023 55:10


By now you've probably heard about the new chatbot called ChatGPT. There's no question it's something of a marvel. It distills complex information into clear prose; it offers instructions and suggestions; it reasons its way through problems. With the right prompting, it can even mimic famous writers. And it does all this with an air of cool competence, of intelligence. But, if you're like me, you've probably also been wondering: What's really going on here? What are ChatGPT—and other large language models like it—actually doing? How much of their apparent competence is just smoke and mirrors? In what sense, if any, do they have human-like capacities? My guest today is Dr. Murray Shanahan. Murray is Professor of Cognitive Robotics at Imperial College London and Senior Research Scientist at DeepMind. He's the author of numerous articles and several books at the lively intersections of artificial intelligence, neuroscience, and philosophy. Very recently, Murray put out a paper titled 'Talking about Large Language Models', and it's the focus of our conversation today. In the paper, Murray argues that—tempting as may be—it's not appropriate to talk about large language models in anthropomorphic terms. Not yet, anyway. Here, we chat about the rapid rise of large language models and the basics of how they work. We discuss how a model that—at its base—simply does “next-word prediction" can be engineered into a savvy chatbot like ChatGPT. We talk about why ChatGPT lacks genuine “knowledge” and “understanding”—at least as we currently use those terms. And we discuss what it might take for these models to eventually possess richer, more human-like capacities. Along the way, we touch on: emergence, prompt engineering, embodiment and grounding, image generation models, Wittgenstein, the intentional stance, soft robots, and "exotic mind-like entities." Before we get to it, just a friendly reminder: applications are now open for the Diverse Intelligences Summer Institute (or DISI). DISI will be held this June/July in St Andrews Scotland—the program consists of three weeks of intense interdisciplinary engagement with exactly the kinds of ideas and questions we like to wrestle with here on this show. If you're intrigued—and I hope you are!—check out disi.org for more info. Alright friends, on to my decidedly human chat, with Dr. Murray Shanahan. Enjoy!   The paper we discuss is here. A transcript of this episode will be available soon.   Notes and links 6:30 – The 2017 “breakthrough” article by Vaswani and colleagues. 8:00 – A popular article about GPT-3. 10:00 – A popular article about some of the impressive—and not so impressive—behaviors of ChatGPT. For more discussion of ChatGPT and other large language models, see another interview with Dr. Shanahan, as well as interviews with Emily Bender and Margaret Mitchell, with Gary Marcus, and with Sam Altman (CEO of OpenAI, which created ChatGPT). 14:00 – A widely discussed paper by Emily Bender and colleagues on the “dangers of stochastic parrots.” 19:00 – A blog post about “prompt engineering”. Another blog post about the concept of Reinforcement Learning through Human Feedback, in the context of ChatGPT. 30:00 – One of Dr. Shanahan's books is titled, Embodiment and the Inner Life. 39:00 – An example of a robotic agent, SayCan, which is connected to a language model. 40:30 – On the notion of embodiment in the cognitive sciences, see the classic book by Francisco Varela and colleagues, The Embodied Mind. 44:00 – For a detailed primer on the philosophy of Ludwig Wittgenstein, see here. 45:00 – See Dr. Shanahan's general audience essay on “conscious exotica" and the space of possible minds. 49:00 – See Dennett's book, The Intentional Stance.   Dr. Shanahan recommends: Artificial Intelligence: A Guide for Thinking Humans, by Melanie Mitchell (see also our earlier episode with Dr. Mitchell) ‘Abstraction for Deep Reinforcement Learning', by M. Shanahan and M. Mitchell   You can read more about Murray's work on his website and follow him on Twitter.   Many Minds is a project of the Diverse Intelligences Summer Institute (DISI) (https://disi.org), which is made possible by a generous grant from the Templeton World Charity Foundation to UCLA. It is hosted and produced by Kensy Cooperrider, with help from Assistant Producer Urte Laukaityte and with creative support from DISI Directors Erica Cartmill and Jacob Foster. Our artwork is by Ben Oldroyd (https://www.mayhilldesigns.co.uk/). Our transcripts are created by Sarah Dopierala (https://sarahdopierala.wordpress.com/). You can subscribe to Many Minds on Apple, Stitcher, Spotify, Pocket Casts, Google Play, or wherever you like to listen to podcasts. **You can now subscribe to the Many Minds newsletter here!** We welcome your comments, questions, and suggestions. Feel free to email us at: manymindspodcast@gmail.com. For updates about the show, visit our website (https://disi.org/manyminds/), or follow us on Twitter: @ManyMindsPod.

ITmedia NEWS
「2足歩行の人間」が「4足歩行のロボット」を全身運動で直感操作するとこうなる

ITmedia NEWS

Play Episode Listen Later Dec 4, 2022 0:35


「2足歩行の人間」が「4足歩行のロボット」を全身運動で直感操作するとこうなる。 韓国のソウル大学校と米ジョージア工科大学に所属する研究者らが発表した論文「Human Motion Control of Quadrupedal Robots using Deep Reinforcement Learning」は、人が直感的に4足歩行ロボットを制御できる手法を提案した研究報告だ。前足を上げる、後ろ足で立つなど、モーションキャプチャーシステムを用いてユーザーが全身運動で4足歩行ロボットを制御する。

Infinite Loops
Julia Bonafede — A Deep Dive into Deep Reinforcement Learning (EP.135)

Infinite Loops

Play Episode Listen Later Dec 1, 2022 65:15


Julia is the co-founder of Rosetta Analytics Inc, “an alternative asset manager that is pioneering the use of advanced artificial intelligence to build and actively manage liquid investment strategies.” Prior to co-founding Rosetta, Julia served as President of Wilshire Consulting and was a member of Wilshire's Board of Directors and Consulting Investment Committee. Julia joins the show to take a deep dive into deep reinforcement learning and Rosetta's pioneering work using AI as the basis of its investment strategies. Important Links: Rosetta Analytics Julia's LinkedIn Show Notes: Julia's journey from Wilshire to Rosetta Defining deep reinforcement learning AI and non-linear thinking Using adaptive models Overcoming the human need for ‘why' Pitching deep reinforcement learning models to new investors Telling positive stories about AI; improving our discourse “Wake up and look for the joy”; “overcoming fear is the biggest barrier to success” Books Mentioned: What Works on Wall Street: A Guide to the Best-Performing Investment Strategies of All Time; by Jim O'Shaughnessy The Beginning of Infinity: Explanations That Transform the World; by David Deutsch The Fourth Turning: An American Prophecy - What the Cycles of History Tell Us About America's Next Rendezvous with Destiny; by William Strauss and Neil Howe

Yannic Kilcher Videos (Audio Only)
This is a game changer! (AlphaTensor by DeepMind explained)

Yannic Kilcher Videos (Audio Only)

Play Episode Listen Later Oct 23, 2022 55:06


#alphatensor #deepmind #ai Matrix multiplication is the most used mathematical operation in all of science and engineering. Speeding this up has massive consequences. Thus, over the years, this operation has become more and more optimized. A fascinating discovery was made when it was shown that one actually needs less than N^3 multiplication operations to multiply to NxN matrices. DeepMind goes a step further and creates AlphaTensor, a Deep Reinforcement Learning algorithm that plays a single-player game, TensorGame, in order to find even more optimized algorithms for matrix multiplication. And it turns out, there exists a plethora of undiscovered matrix multiplication algorithms, which not only will make everything from computers to smart toasters faster, but also bring new insights into fundamental math and complexity theory. Sponsor: Assembly AI Link: https://www.assemblyai.com/?utm_source=youtube&utm_medium=social&utm_campaign=yannic_sentiment OUTLINE: 0:00 - Intro 1:50 - Sponsor: Assembly AI (link in description) 3:25 - What even is Matrix Multiplication? 6:10 - A very astounding fact 8:45 - Trading multiplications for additions 12:35 - Matrix Multiplication as a Tensor 17:30 - Tensor Decompositions 20:30 - A formal way of finding multiplication algorithms 31:00 - How to formulate this as a game? 39:30 - A brief primer on AlphaZero / MCTS 45:40 - The Results 48:15 - Optimizing for different hardware 52:40 - Expanding fundamental math 53:45 - Summary & Final Comments Paper: https://www.nature.com/articles/s41586-022-05172-4 Title: Discovering faster matrix multiplication algorithms with reinforcement learning Abstract: Improving the efficiency of algorithms for fundamental computations can have a widespread impact, as it can affect the overall speed of a large amount of computations. Matrix multiplication is one such primitive task, occurring in many systems—from neural networks to scientific computing routines. The automatic discovery of algorithms using machine learning offers the prospect of reaching beyond human intuition and outperforming the current best human-designed algorithms. However, automating the algorithm discovery procedure is intricate, as the space of possible algorithms is enormous. Here we report a deep reinforcement learning approach based on AlphaZero1 for discovering efficient and provably correct algorithms for the multiplication of arbitrary matrices. Our agent, AlphaTensor, is trained to play a single-player game where the objective is finding tensor decompositions within a finite factor space. AlphaTensor discovered algorithms that outperform the state-of-the-art complexity for many matrix sizes. Particularly relevant is the case of 4 × 4 matrices in a finite field, where AlphaTensor's algorithm improves on Strassen's two-level algorithm for the first time, to our knowledge, since its discovery 50 years ago2. We further showcase the flexibility of AlphaTensor through different use-cases: algorithms with state-of-the-art complexity for structured matrix multiplication and improved practical efficiency by optimizing matrix multiplication for runtime on specific hardware. Our results highlight AlphaTensor's ability to accelerate the process of algorithmic discovery on a range of problems, and to optimize for different criteria. Authors: Alhussein Fawzi, Matej Balog, Aja Huang, Thomas Hubert, Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Francisco J. R. Ruiz, Julian Schrittwieser, Grzegorz Swirszcz, David Silver, Demis Hassabis & Pushmeet Kohli

PaperPlayer biorxiv neuroscience
Integrating artificial and biological neural networks to improve animal task performance using deep reinforcement learning

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Sep 19, 2022


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2022.09.19.508590v1?rss=1 Authors: Li, C., Kreiman, G., Ramanathan, S. Abstract: Artificial neural networks have performed remarkable feats in a wide variety of domains. However, artificial intelligence algorithms lack the flexibility, robustness, and generalization power of biological neural networks. Given the different capabilities of artificial and biological neural networks, it would be advantageous to build systems where the two types of networks are directly connected and can synergistically interact. As proof of principle, here we show how to create such a hybrid system and how it can be harnessed to improve animal performance on biologically relevant tasks. Using optogenetics, we interfaced the nervous system of the nematode Caenorhabditis elegans with a deep reinforcement learning agent, enabling the animal to navigate to targets and enhancing its natural ability to search for food. Agents adapted to strikingly different sites of neural integration and learned site-specific activation patterns to improve performance on a target-finding task. The combined animal and agent displayed cooperative computation between artificial and biological neural networks by generalizing target-finding to novel environments. This work constitutes an initial demonstration of how to robustly improve task performance in animals using artificial intelligence interfaced with a living nervous system. Copy rights belong to original authors. Visit the link for more info Podcast created by PaperPlayer

London Futurists
AI overview: 2. The Big Bang and the years that followed

London Futurists

Play Episode Listen Later Sep 7, 2022 31:50


In this episode, co-hosts Calum Chace and David Wood continue their review of progress in AI, taking up the story at the 2012 "Big Bang".00.05: Introduction: exponential impact, big bangs, jolts, and jerks00.45: What enabled the Big Bang01.25: Moore's Law02.05: Moore's Law has always evolved since its inception in 196503.08: Intel's tick tock becomes tic tac toe03.49: GPUs - Graphic Processing Units04.29: TPUs - Tensor Processing Units04.46: Moore's Law is not dead or dying05.10: 3D chips05.32: Memristors05.54: Neuromorphic chips06.48: Quantum computing08.18: The astonishing effect of exponential growth09.08: We have seen this effect in computing already. The cost of an iPhone in the 1950s.09.42: Exponential growth can't continue forever, but Moore's Law hasn't reached any theoretical limits10.33: Reasons why Moore's Law might end: too small, too expensive, not worthwhile11.20: Counter-arguments12.01: "Plenty more room at the bottom"12.56: Software and algorithms can help keep Moore's Law going14.15: Using AI to improve chip design14.40: Data is critical15.00: ImageNet, Fei Fei Lee, Amazon Turk16.10: AIs labelling data16.35: The Big Bang17.00: Jürgen Schmidhuber challenges the narrative17.41: The Big Bang enabled AI to make money18.24: 2015 and the Great Robot Freak-Out18.43: Progress in many domains, especially natural language processing19.44: Machine Learning and Deep Learning20.25: Boiling the ocean vs the scientific method's hypothesis-driven approach21.15: Deep Learning: levels21.57: How Deep Learning systems recognise faces22.48: Supervised, Unsupervised, and Reinforcement Learning24.00: Variants, including Deep Reinforcement Learning and Self-Supervised Learning24.30: Yann LeCun's camera metaphor for Deep Learning26.05: Lack of transparency is a concern27.45: Explainable AI. Is it achievable?29.00: Other AI problems29.17: Has another Big Bang taken place? Large Language Models like GPT-330.08: Few-shot learning and transfer learning30.40: Escaping Uncanny Valley31.50: Gato and partially general AIMusic: Spike Protein, by Koi Discovery, available under CC0 1.0 Public Domain DeclarationFor more about the podcast hosts, see https://calumchace.com/ and https://dw2blog.com/

AXRP - the AI X-risk Research Podcast
18 - Concept Extrapolation with Stuart Armstrong

AXRP - the AI X-risk Research Podcast

Play Episode Listen Later Sep 3, 2022 106:19


Concept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are. Topics we discuss, and timestamps: 00:00:44 - What is concept extrapolation 00:15:25 - When is concept extrapolation possible 00:30:44 - A toy formalism 00:37:25 - Uniqueness of extrapolations 00:48:34 - Unity of concept extrapolation methods 00:53:25 - Concept extrapolation and corrigibility 00:59:51 - Is concept extrapolation possible? 01:37:05 - Misunderstandings of Stuart's approach 01:44:13 - Following Stuart's work The transcript Stuart's startup, Aligned AI Research we discuss: The Concept Extrapolation sequence The HappyFaces benchmark Goal Misgeneralization in Deep Reinforcement Learning

Argmax
10: Outracing champion Gran Turismo drivers with deep reinforcement learning

Argmax

Play Episode Listen Later Aug 23, 2022 54:50


We discuss Sony AI's accomplishment of creating a novel AI agent that can beat professional racers in Gran Turismo. Some topics include:- The crafting of rewards to make the agent behave nicely- What is QR-SAC?- How to deal with "rare" experiences in the replay bufferLink to paper: https://www.nature.com/articles/s41586-021-04357-7

The Embodied AI Podcast
#6 Alex Lascarides: Linguistics from Frege to Settlers of Catan

The Embodied AI Podcast

Play Episode Listen Later Jul 14, 2022 96:56


Alex is a professor and the director of the Institution for Language, Cognition and Computation at Edinburgh. She is interested in discourse coherence, gestures, complex games and interactive task learning. After we find out about Alex's background and geek out over Ludwig Wittgenstein, she tells us about Dynamic Semantics and Segmented Discourse Representation Theory (SDRT). SDRT considers discourse as actions that change the state space of the world and requires agents to infer coherence in the discourse. Then, I initiate a discussion between Felix Hill and Alex by asking her about her opinion on compositionality and playing a clip where Felix gives his "spicy take" on theoretical linguistics. Next, we talk about gestures and how they could be analysed using logic or a deep learning classifier. Then, we talk about non-linguistic events and the conceptualization problem. Later, we discuss Alex's work on Settlers of Catan, and how this links to deep reinforcement learning, Monte Carlo tree search, and neurosymbolic AI. Next, we briefly bring up game theory and then talk about interactive task learning, which is about agents learning and adapting in unknown domains. Finally, there are some career questions on whether to do a PhD and what makes a good supervisee & supervisor. Timestamps: (00:00) - Intro (02:00) - Alex's background & Wittgenstein geekiness (05:15) - Discourse Coherence & Semantic Discourse Representation Theory (SDRT) (12:56) - Compositionality, Responding to Felix Hill's "spicy take" (23:50) - Analysing gestures with logic and deep learning (38:54) - Pointing and evolution (42:28) - Non-linguistics events in Settlers of Catan, conceptualization problem (54:15) - 3D simulations and supermarket stocktaking (59:19) - Settlers of Catan, Monte Carlo tree search, neurosymbolic AI (01:11:08) - Persuasion & Game Theory (01:17:23) - Interactive Task Learning, symbol grounding, unknown domain (01:25:28) - Career advice Alex Webpage (All articles are open access) https://homepages.inf.ed.ac.uk/alex/index.html Talk on Discourse Coherence and Segmented Discourse Representation Theory https://www.youtube.com/watch?v=3HfKq9E3syM A Formal Semantic Analysis of Gesture paper with Matthew Stone https://homepages.inf.ed.ac.uk/alex/papers/gesture_jos.pdf A formal semantics for situated conversation paper with Julie Hunter & Nicholas Asher https://semprag.org/index.php/sp/article/view/sp.11.10 Game strategies for The Settlers of Catan paper with Markus Guhe https://homepages.inf.ed.ac.uk/alex/papers/cig2014_gs.pdf Evaluating Persuasion Strategies and Deep Reinforcement Learning methods for Negotiation Dialogue agents paper with Simon Keizer , Markus Guhe, & Oliver Lemon https://homepages.inf.ed.ac.uk/alex/papers/eacl_2017.pdf Learning Language Games through Interaction paper with Sida Wang, Percy Liang, Christopher Manning https://arxiv.org/abs/1606.02447 Interactive Task Learning Paper with Mattias Appelgren https://homepages.inf.ed.ac.uk/alex/papers/aamas_grounding.pdf My Twitter https://twitter.com/Embodied_AI

Argmax
6: Deep Reinforcement Learning at the Edge of the Statistical Precipice

Argmax

Play Episode Listen Later Jun 6, 2022 61:08


We discuss NeurIPS outstanding paper award winning paper, talking about important topics surrounding metrics and reproducibility.

QuantSpeak
Deep Reinforcement Learning for Asset Allocation in US Equities

QuantSpeak

Play Episode Listen Later May 19, 2022 26:17


QuantSpeak host, Dan Tudball, is joined by Sonam Srivastava, Founder of Wright Research, to discuss the application of reinforcement learning within asset allocation, the results of her recent research, and her career journey as a quant.

The Theory of Anything
Episode 43: Deep Reinforcement Learning

The Theory of Anything

Play Episode Listen Later Apr 18, 2022 81:40


In this video upload available on Spotify (we'll try this once and see how it's received), we revisit Reinforcement Learning (from way back in episode 28) and this time discuss how to turn it into Deep Reinforcement Learning by swapping out the Q-Table and putting a neural network in its place. The end result is a sort of 'bootstrapping intelligence' where you let the neural net train itself. We also discuss: How this, if at all, relates to animal intelligence. Is RL a general purposes learner? Is it a path to AGI? Links: Github Code Base Presentation Slide Pack Youtube version --- Support this podcast: https://anchor.fm/four-strands/support

Papers Read on AI
FinRL: A Deep Reinforcement Learning Library for Automated Stock Trading in Quantitative Finance

Papers Read on AI

Play Episode Listen Later Apr 3, 2022 24:14


As deep reinforcement learning (DRL) has been recognized as an effective approach in quantitative finance, getting hands-on experiences is attractive to beginners. However, to train a practical DRL trading agent that decides where to trade, at what price, and what quantity involves error-prone and arduous development and debugging. In this paper, we introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. Along with easily-reproducible tutorials, FinRL library allows users to streamline their own developments and to compare with existing schemes easily. Within FinRL, virtual environments are configured with stock market datasets, trading agents are trained with neural networks, and extensive backtesting is analyzed via trading performance. Moreover, it incorporates important trading constraints such as transaction cost, market liquidity and the investor's degree of risk-aversion. FinRL is featured with completeness, hands-on tutorial and reproducibility that favors beginners: (i) at multiple levels of time granularity, FinRL simulates trading environments across various stock markets, including NASDAQ-100, DJIA, SP (ii) organized in a layered architecture with modular structure, FinRL provides fine-tuned state-of-the-art DRL algorithms (DQN, DDPG, PPO, SAC, A2C, TD3, etc.), commonly-used reward functions and standard evaluation baselines to alleviate the debugging workloads and promote the reproducibility, and (iii) being highly extendable, FinRL reserves a complete set of user-import interfaces. Furthermore, we incorporated three application demonstrations, namely single stock trading, multiple stock trading, and portfolio allocation. The FinRL library will be available on Github at link this https URL. 2020: Xiao-Yang Liu, Hongyang Yang, Qian Chen, Runjia Zhang, Liuqing Yang, Bowen Xiao, Chris Wang https://arxiv.org/pdf/2011.09607v2.pdf

DEEP MINDS - KI-Podcast
Kann KI Lernen lernen mit Robert Lange | DEEP MINDS #6

DEEP MINDS - KI-Podcast

Play Episode Listen Later Mar 17, 2022 103:23


Künstliche Intelligenz kann trainiert werden, Aufgaben zu erledigen. Aber kann sie auch trainiert werden, zu lernen, Aufgaben zu erledigen? Für die KI-Forschung könnte das große Fortschritte bringen und gar einen Paradigmenwechsel bedeuten. Der KI-Forscher Robert Lange erzählt uns, was der Stand der Dinge beim Meta-Learning für KI ist. :// Über DEEP MINDS - KI-Podcast DEEP MINDS ist ein Video-Podcast mit Menschen, die sich mit Künstlicher Intelligenz und Wissenschaft befassen. Max und Matthias stellen ihnen einfache und schwere Fragen über Technologie, Forschung, Entwicklung und unsere Zukunft. Mehr: https://mixed.de/deep-minds/ 00:00:00 Tach zusammen! 00:01:52 Das ist Robert Lange 00:05:49 Sponsor - Danke BWI und Borlabs! 00:06:37 Wie wird ein Ökonom KI-Forscher? 00:10:23 Künstliche Intelligenz oder doch lieber Machine Learning? 00:13:18 Ist Intelligenz Statistik und welche Rolle spielen unsere Gene? 00:19:46 Lernen Menschen anders als Maschinen? 00:22:22 Nature vs. Nurture. Wo liegt die Grenze? 00:26:26 "Nature" und Künstliche Intelligenz 00:29:55 Kann Meta-Learning Deep Learning verbessern? 00:32:35 Macht die Meta-Learning-Forschung Fortschritte? 00:35:20 Der Unterschied: Learning, Meta-Learning und Transfer-Learning 00:38:04 Wer legt die Schwerpunkte im Meta-Learning? 00:40:48 Wie entscheidet der Meta-Lern-Algorithmus, was er lernt? 00:41:28 Unterschiede zwischen Meta-Learning und anderen maschinellen Lernmethoden 00:44:12 Welche Rolle spielt Lebenszeit im Meta-Learning? 00:51:20 Max fragt so rum und lenkt das Gespräch wieder auf KI 00:53:30 Nethack und Meta-Learning = Epic Win? 00:57:00 Braucht es Introspektion für Meta-Learning? 01:02:00 Komplexität und das Problem der Skalierung 01:05:15 Der Zusammenhang zwischen Self-Supervised-Learning und Meta-Learning 01:12:00 Introspektion Teil 2 01:16:43 Theory of Mind 01:26:00 Max weiß nicht, was er will (wie immer) 01:33:40 Verbessert Technologie das menschliche Leben oder spalten es die Menschheit? 01:36:56 Was braucht es für echte KI? 01:40:05 Verabschiedung :// Über Robert Lange Robert Lange promoviert an der Technischen Universität Berlin und ist Mitglied des Sprekeler-Labs. Dort untersucht er Mechanismen, die intelligenten kollektiven Systemen zugrunde liegen und forscht an Deep Reinforcement Learning mit einem Fokus auf Meta-Learning. :// Den DEEP MINDS KI-Podcast gibt es hier: Spotify: https://open.spotify.com/show/6rmXt98jRHNziyG1ev3sAT Apple: https://podcasts.apple.com/us/podcast/deep-minds/id1598920439 Soundcloud: https://soundcloud.com/deep-minds-podcast Amazon Music: https://music.amazon.de/podcasts/ca667db4-4dfb-4cc0-b1b5-9f3292fff112/deep-minds Google Podcasts: https://bit.ly/3q7CQda :// Unsere Sponsoren Borlabs Cookie Wordpress-Plugin made in Hamburg. Kauft Borlabs Cookie jetzt auf https://borlabs.io/mixed mit dem Rabattcode MIXED und erhaltet fünf Prozent Rabatt. Die BWI ist das IT-Systemhaus der Bundeswehr. Als ihr zuverlässiger Partner unterstützt sie mit Innovationen und ihrer IT-Expertise die Digitalisierung der Streitkräfte und treibt diese voran. Auch die Zukunftstechnologie KI spielt dabei eine wichtige Rolle, etwa bei der Generierung von Lagebildern oder für das Server-Management. Aktuelles aus der Arbeit der BWI: https://www.bwi.de/news-blog/blog KI bei der BWI Vom Software-Lebenszyklus bis zur Server-Anomalie: https://www.bwi.de/news-blog/blog/artikel/vom-software-lebenszyklus-bis-zur-server-anomalie-ki-und-ihr-praktischer-nutzen-fuer-die-bwi Wie KI Deutschland vor Angriffen schützen kann: https://www.bwi.de/news-blog/blog/artikel/hybride-bedrohungen-wie-kuenstliche-intelligenz-deutschland-vor-angriffen-schuetzen-kann Die BWI sucht engagierte IT-Profis: https://www.bwi.de/karriere ---------- MIXED.de UNTERSTÜTZEN Abonnieren: https://mixed.de/mixed-plus

SuperDataScience
SDS 551: Deep Reinforcement Learning — with Wah Loon Keng

SuperDataScience

Play Episode Listen Later Feb 22, 2022 81:04


In this episode, gifted author and software engineer Wah Loon Keng joins the podcast to dive deep into reinforcement learning. From its history to limitations, modern industrial applications, and future developments– there's no better expert to learn from if you want to know more about this complex topic. In this episode you will learn: • What is reinforcement learning? [4:50] • Deep reinforcement learning vs reinforcement learning [13:17] • A timeline of reinforcement learning breakthroughs [16:17] • The limitations of deep RL today [39:53] • Deep RL applications [53:10] • Keng's open-source SLM-Lab framework [57:51] • Keng's responsibilities as an AI engineer [1:02:17] • What is the future of RL? [1:08:05] Additional materials: www.superdatascience.com/551

The Nonlinear Library
EA - We're Aligned AI, we're aiming to align AI by Stuart Armstrong

The Nonlinear Library

Play Episode Listen Later Feb 21, 2022 6:46


Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: We're Aligned AI, we're aiming to align AI, published by Stuart Armstrong on February 21, 2022 on The Effective Altruism Forum. Aligned AI is an Oxford based startup focused on applied alignment research. Our goal is to implement scalable solutions to the alignment problem, and distribute these solutions to actors developing powerful transformative artificial intelligence (related Alignment Forum post here). We are lead by Stuart Armstrong and Rebecca Gorman, and advised by Dylan Hadfield-Menell, Adam Gleave, Justin Shovelain, Charles Pattison, and Anders Sandberg. Our Premises We think AI poses an existential risk to humanity, and that reducing the chance of this risk is one of the most impactful things we can do with our lives. Here we focus not on the premises behind that claim, but rather on why we're particularly excited about Aligned AI's approach to reducing AI existential risk. We believe AI Safety research is bottle-necked by a core problem: how to extrapolate values from one context to another. We believe solving value extrapolation is necessary and almost sufficient for alignment. Value extrapolation research is neglected, both in the mainstream AI community and the AI safety community. Note that there is a lot of overlap between value extrapolation and many fields of research (e.g. out of distribution detection, robustness, transfer learning, multi-objective reinforcement learning, active reward learning, reward modelling...) which provide useful research resources. However, we've found that we've had to generate our most of the key concepts ourselves. We believe value extrapolation research is tractable (and we've had success generating the key concepts). We believe distributing (not just creating) alignment solutions is critical for aligning powerful AIs. How we'll do this Solving value extrapolation will involve solving multiple subproblems. Therefore the research groups will iterate through sub-projects, like the ones presented here. The aim is to generate sub-projects that are close to current published research in machine learning, but whose solutions are designed to generalise. Our groups will take these projects, implement them in code, and build solutions for the relevant problem. At that point, we will either extend the project to investigate it in more depth, or write up the results and move on - passing the results to the library development team as needed. Research methodology At a high level, our research is structured around a linear pipeline, starting from theory and becoming progressively more applied. Each stage of the pipeline has tight feedback loops, and also inform the other stages of the pipeline (e.g. theory leads to experiments leads to revised theory). The following sections describe how such a process might go. Sub-project generation Once a sub component is deemed sufficiently promising, we will want to test it in code. To do so, we will generative "sub-project" ideas designed to be simple to implement but scalable to larger environments and models. Minimum viable (sub-)project We will start a sub-project with a "MVP", implementing the simplest project that captures the core of our approach. Test sub-projects in higher dimensional settings After implementing a successful MVP, we will iteratively experiment in increasingly high dimensional settings (think Deep Reinforcement Learning from Human Feedback to Learning to Summarize from Human Feedback). Red-teaming We will employ a "red-teaming" methodology similar to that of the Alignment Research Center, considering worst case scenarios and how a given approach handles them. What we plan to produce Software library If we believe we can commercialize a successful sub-project responsibly (without differential enhancing AI capabilities), it will be incorporated into our product and mar...

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Trends in Deep Reinforcement Learning with Kamyar Azizzadenesheli - #560

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Feb 21, 2022 77:57


Today we're joined by Kamyar Azizzadenesheli, an assistant professor at Purdue University, to close out our AI Rewind 2021 series! In this conversation, we focused on all things deep reinforcement learning, starting with a general overview of the direction of the field, and though it might seem to be slowing, thats just a product of the light being shined constantly on the CV and NLP spaces. We dig into themes like the convergence of RL methodology with both robotics and control theory, as well as a few trends that Kamyar sees over the horizon, such as self-supervised learning approaches in RL. We also talk through Kamyar's predictions for RL in 2022 and beyond. This was a fun conversation, and I encourage you to look through all the great resources that Kamyar shared on the show notes page at twimlai.com/go/560!

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Reinforcement Learning at the Edge of the Statistical Precipice with Rishabh Agarwal - #559

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Feb 14, 2022 51:51


Today we're joined by Rishabh Agarwal, a research scientist at Google Brain in Montreal. In our conversation with Rishabh, we discuss his recent paper Deep Reinforcement Learning at the Edge of the Statistical Precipice, which won an outstanding paper award at the most recent NeurIPS conference. In this paper, Rishabh and his coauthors call for a change in how deep RL performance is reported on benchmarks when using only a few runs, acknowledging that typically, DeepRL algorithms are evaluated by the performance on a large suite of tasks. Using the Atari 100k benchmark, they found substantial disparities in the conclusions from point estimates alone versus statistical analysis. We explore the reception of this paper from the research community, some of the more surprising results, what incentives researchers have to implement these types of changes in self-reporting when publishing, and much more. The complete show notes for this episode can be found at twimlai.com/go/559

Changelog Master Feed
Exploring deep reinforcement learning (Practical AI #166)

Changelog Master Feed

Play Episode Listen Later Feb 1, 2022 41:21 Transcription Available


In addition to being a Developer Advocate at Hugging Face, Thomas Simonini is building next-gen AI in games that can talk and have smart interactions with the player using Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP). He also created a Deep Reinforcement Learning course that takes a DRL beginner to from zero to hero. Natalie and Chris explore what's involved, and what the implications are, with a focus on the development path of the new AI data scientist.

Practical AI
Exploring deep reinforcement learning

Practical AI

Play Episode Listen Later Feb 1, 2022 41:21 Transcription Available


In addition to being a Developer Advocate at Hugging Face, Thomas Simonini is building next-gen AI in games that can talk and have smart interactions with the player using Deep Reinforcement Learning (DRL) and Natural Language Processing (NLP). He also created a Deep Reinforcement Learning course that takes a DRL beginner to from zero to hero. Natalie and Chris explore what's involved, and what the implications are, with a focus on the development path of the new AI data scientist.

The Gradient Podcast
Peter Henderson on RL Benchmarking, Climate Impacts of AI, and AI for Law

The Gradient Podcast

Play Episode Listen Later Oct 28, 2021 88:42


In episode 14 of The Gradient Podcast, we interview Stanford PhD Candidate Peter HendersonSubscribe to The Gradient Podcast: Apple Podcasts | Spotify | Pocket Casts | RSSPeter is a joint JD-PhD student at Stanford University advised by Dan Jurafsky. He is also an OpenPhilanthropy AI Fellow and a Graduate Student Fellow at the Regulation, Evaluation, and Governance Lab. His research focuses on creating robust decision-making systems, with three main goals: (1) use AI to make governments more efficient and fair; (2) ensure that AI isn't deployed in ways that can harm people; (3) create new ML methods for applications that are beneficial to society.Links:Reproducibility and Reusability in Deep Reinforcement Learning. Benchmark Environments for Multitask Learning in Continuous DomainsReproducibility of Bench-marked Deep Reinforcement Learning Tasks for Continuous Control.Deep Reinforcement Learning that MattersReproducibility and Replicability in Deep Reinforcement Learning (and Other Deep Learning Methods)Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine LearningHow blockers can turn into a paper: A retrospective on 'Towards The Systematic Reporting of the Energy and Carbon Footprints of Machine LearningWhen Does Pretraining Help? Assessing Self-Supervised Learning for Law and the CaseHOLD Dataset”How US law will evaluate artificial intelligence for Covid-19Podcast Theme: “MusicVAE: Trio 16-bar Sample #2” from "MusicVAE: A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music" Get full access to The Gradient at thegradientpub.substack.com/subscribe

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Advancing Deep Reinforcement Learning with NetHack, w/ Tim Rocktäschel - #527

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Oct 14, 2021 42:57


Take our survey at twimlai.com/survey21! Today we're joined by Tim Rocktäschel, a research scientist at Facebook AI Research and an associate professor at University College London (UCL).  Tim's work focuses on training RL agents in simulated environments, with the goal of these agents being able to generalize to novel situations. Typically, this is done in environments like OpenAI Gym, MuJuCo, or even using Atari games, but these all come with constraints. In Tim's approach, he utilizes a game called NetHack, which is much more rich and complex than the aforementioned environments.   In our conversation with Tim, we explore the ins and outs of using NetHack as a training environment, including how much control a user has when generating each individual game and the challenges he's faced when deploying the agents. We also discuss his work on MiniHack, an environment creation framework and suite of tasks that are based on NetHack, and future directions for this research. The complete show notes for this episode can be found at twimlai.com/go/527.

SuperDataScience
SDS 510: Deep Reinforcement Learning

SuperDataScience

Play Episode Listen Later Oct 1, 2021 7:13


In this episode, I dive into the world of reinforcement learning and deep reinforcement learning and the benefits of both. Additional materials: www.superdatascience.com/510

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Reinforcement Learning for Game Testing at EA with Konrad Tollmar - #517

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Sep 9, 2021 40:21


Today we're joined by Konrad Tollmar, research director at Electronic Arts and an associate professor at KTH.  In our conversation, we explore his role as the lead of EA's applied research team SEED and the ways that they're applying ML/AI across popular franchises like Apex Legends, Madden, and FIFA. We break down a few papers focused on the application of ML to game testing, discussing why deep reinforcement learning is at the top of their research agenda, the differences between training atari games and modern 3D games, using CNNs to detect glitches in games, and of course, Konrad gives us his outlook on the future of ML for games training. The complete show notes for this episode can be found at twimlai.com/go/517.

SuperDataScience
SDS 503: Deep Reinforcement Learning for Robotics

SuperDataScience

Play Episode Listen Later Sep 7, 2021 78:06


Pieter Abbeel joins us to discuss his work as an academic and entrepreneur in the field of AI robotics and what the future of the industry holds. In this episode you will learn: • How does Pieter do it all? [5:45] • Pieter's exciting areas of research [12:30] • Research application at Covariant [32:27] • Getting into AI robotics [42:18] • Traits of good AI robotics apprentices [49:38] • Valuable skills [56:40] • What Pieter hopes to look back on [1:04:30] • LinkedIn Q&A [1:06:51] Additional materials: www.superdatascience.com/503

Macro Musings with David Beckworth
Arthur Turrell on Economic Data, Modeling, and the Future of Nuclear Energy

Macro Musings with David Beckworth

Play Episode Listen Later Aug 16, 2021 52:13


Arthur Turrell is the deputy director at the data science campus for the UK Office of National Statistics (ONS). Arthur is also a former researcher at the Bank of England and a nuclear fusion scientist. He joins Macro Musings to talk about his work at the Bank of England, the future of economic data, and his new book on nuclear fusion titled, *The Star Builders: Nuclear Fusion and the Race to Power the Planet*.   Transcript for the episode can be found here: https://www.mercatus.org/bridge/tags/macro-musings   Arthur's Twitter: @arthurturrell Arthur's website: http://aeturrell.com/ Arthur's Bank of England profile: https://www.bankofengland.co.uk/research/researchers/arthur-turrell   Related Links:   *The Star Builders: Nuclear Fusion and the Race to Power the Planet* by Arthur Turrell https://www.simonandschuster.com/books/The-Star-Builders/Arthur-Turrell/9781982130664   *Coding for Economists* by Arthur Turrell https://aeturrell.github.io/coding-for-economists/intro.html   *Why Software Is Eating The World* by Marc Andreessen https://www.wsj.com/articles/SB10001424053111903480904576512250915629460   *Solving Heterogeneous General Equilibrium Economic Models with Deep Reinforcement Learning* by Edward Hill, Marco Bardoscia, and Arthur Turrell https://arxiv.org/pdf/2103.16977.pdf   Princeton's *Net-Zero America* Project: https://netzeroamerica.princeton.edu/?explorer=year&state=national&table=2020&limit=200   David's blog: macromarketmusings.blogspot.com David's Twitter: @DavidBeckworth

Machine Learning Podcast - Jay Shah
Intuition for research in Social Reinforcement Learning | Natasha Jacques

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Jul 19, 2021 6:52


How can we build intuition for interdisciplinary fields in order to tackle challenges in social reinforcement learning?Natasha Jaques is currently a Research Scientist at Google Brain and a post-doc fellow at UC Berkeley, where her research interests are in designing multi-agent RL algorithms while focusing on social reinforcement learning. She received her Ph.D. from MIT and has also received multiple awards for her research works submitted to venues like ICML and NeurIPS She has interned at DeepMind, Google Brain, and is an OpenAI  Scholars mentor.About the Host:Jay is a Ph.D. student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

The Digital Supply Chain podcast
Deep Reinforcement Learning - What Is It, And What Are Its Uses In Supply Chain? A Chat with Pathmind CEO Chris Nicholson

The Digital Supply Chain podcast

Play Episode Play 37 sec Highlight Listen Later Jul 2, 2021 23:31 Transcription Available


A relatively new field of AI called Deep Reinforcement Learning is starting to open up and shows a lot of promise.One company making use of Deep RL in the supply chain area is Pathmind. I invited the founder and CEO of Pathmind, Chris Nicholson to come on the podcast to explain what Deep RL is, why it is better than other forms of AI/ML, and how it can be used in the supply chain context.We had an excellent conversation and, as is often the case, I learned loads, I hope you do too...If you have any comments/suggestions or questions for the podcast - feel free to leave me a voice message over on my SpeakPipe page or just send it to me as a direct message on Twitter/LinkedIn. Audio messages will get played (unless you specifically ask me not to).To learn more about how Industry 4.0 technologies can help your organisation read the 2020 global research study 'The Power of change from Industry 4.0 in manufacturing' (https://www.sap.com/cmp/dg/industry4-manufacturing/index.html)And if you want to know more about any of SAP's Digital Supply Chain solutions, head on over to www.sap.com/digitalsupplychain and if you liked this show, please don't forget to rate and/or review it. It makes a big difference to help new people discover it. Thanks.And remember, stay healthy, stay safe, stay sane!

TalkRL: The Reinforcement Learning Podcast

Thomas Krendl Gilbert is a PhD student at UC Berkeley’s Center for Human-Compatible AI, specializing in Machine Ethics and Epistemology.Featured ReferencesHard Choices in Artificial Intelligence: Addressing Normative Uncertainty through Sociotechnical CommitmentsRoel Dobbe, Thomas Krendl Gilbert, Yonatan MintzMapping the Political Economy of Reinforcement Learning Systems: The Case of Autonomous VehiclesThomas Krendl GilbertAI Development for the Public Interest: From Abstraction Traps to Sociotechnical RisksMcKane Andrus, Sarah Dean, Thomas Krendl Gilbert, Nathan Lambert and Tom ZickAdditional References Political Economy of Reinforcement Learning Systems (PERLS) The Law and Political Economy (LPE) Project The Societal Implications of Deep Reinforcement Learning, Jess Whittlestone, Kai Arulkumaran, Matthew Crosby Robot Brains Podcast: Yann LeCun explains why Facebook would crumble without AI

The Artificial Intelligence Podcast
What is deep reinforcement learning: The next step in AI and deep learning

The Artificial Intelligence Podcast

Play Episode Listen Later May 10, 2021 7:23


Reinforcement learning is well-suited for autonomous decision-making where supervised learning or unsupervised learning techniques alone can't do the job --- Send in a voice message: https://anchor.fm/tonyphoang/message

UpTech Report
Rewarding the Robots | Chris Nicholson from Pathmind

UpTech Report

Play Episode Listen Later Apr 26, 2021 26:56


Anyone who's seen the film Moneyball understands how computer simulations and statistical analysis has totally transformed the world of sports. Well, it's not just sports. The same technology that's used to assess how a batter gets on base is used to analyze how a factory worker retrieves a box. The vast complexity of supply chain and manufacturing systems makes these industries perfectly positioned for assistance from AI and machine learning. But these technologies are not simple, and even once simulations are designed, there's opportunities to take them further. Our guest on this edition of UpTech Report is doing that with his company, Pathmind, which offers AI and Deep Reinforcement Learning technologies to bring simulations in supply chain and manufacturing sectors to their fullest potential. Chris Nicholson is Pathmind's founder and CEO, and he joins us to explain how Deep Reinforcement Learning uses a reward-based approach to train AI, and he discusses the numerous ways it can help companies increase worker efficiency, save energy, and make smart recommendations for better decision making.

Adventures in Machine Learning
Episode 39: ML 027: Staying Current in Machine Learning

Adventures in Machine Learning

Play Episode Listen Later Apr 22, 2021 41:47


Miguel and Chuck discuss how to stay current in the rapidly changing world of Machine Learning and Artificial Intelligence. They go over how to pick books, newsletters, podcasts, and other resources to up your Machine Learning knowledge and skills. Panel Charles Max Wood Miguel Morales Sponsors Dev Influences Accelerator Picks Charles- The Umbrella Academy Charles- The Sales Development Playbook Miguel- Deep Reinforcement Learning

Machine Learning Podcast - Jay Shah
Working at DeepMind vs Google-Brain

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Mar 21, 2021 4:02


Watch the full podcast with Natasha here: https://youtu.be/8XpCnmvq49sAlso check-out these talks on all available podcast platforms: https://jayshah.buzzsprout.comAbout the Host:Jay is a PhD student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Machine Learning Podcast - Jay Shah
Doing a PhD and Deciding thesis topic | Natasha Jacques, Research Scientist @Google ​

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Mar 12, 2021 8:52


Watch the full podcast here: https://youtu.be/8XpCnmvq49sNatasha Jaques is currently a Research Scientist at @Google Brain and a post-doc fellow at @UC Berkeley, where her research interests are in designing multi-agent RL algorithms while focusing on social reinforcement learning, that can improve generalization, coordination between agents, and collaboration between human and AI agents. She received her PhD from the @Massachusetts Institute of Technology (MIT) where she focused on Affective Computing and other techniques for deep/reinforcement learning. She has also received multiple awards for her research works submitted to venues like ICML and NeurIPS She has interned at @DeepMind, Google Brain, and is an @OpenAI Scholars mentor.Also check-out these talks on all available podcast platforms: https://jayshah.buzzsprout.comAbout the Host:Jay is a PhD student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Machine Learning Podcast - Jay Shah
Deep Reinforcement Learning for Social Learning & Fun Chat | Natasha Jacques,@Google ​

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Mar 6, 2021 53:00


Natasha Jaques is currently a Research Scientist at Google Brain and a post-doc fellow at UC-Berkeley, where her research interests are in designing multi-agent RL algorithms while focusing on social reinforcement learning, that can improve generalization, coordination between agents, and collaboration between human and AI agents. She received her Ph.D. from MIT where she focused on Affective Computing and other techniques for deep/reinforcement learning. She has also received multiple awards for her research works submitted to venues like ICML and NeurIPS She has interned at DeepMind, Google Brain, and is an OpenAI Scholars mentor.00:00 Introductions01:25 Can you tell us a bit about what projects you are working on at Google currently? And what does the work routine look like as a Research Scientist?06:25 You have worked as a researcher at many diverse backgrounds who are leading in the domain of machine learning: MIT, Google Brain, DeepMind - what are the key differences you have noticed while doing research in academia vs industry vs research lab?10:00 About your paper, social influence as intrinsic motivation for multi-agents deep reinforcement learning, can you tell us more about how you are trying to leverage intrinsic rewards for better coordination?12:00 Game Theory and Reinforcement Learning: discussion16:00 What was the intuition behind that approach - did you resort to cognitive psychology to get this idea and later on the model it using standard DRL principles or something else?20:00 Crackpot-y motivation behind the intuition of modeling social influence in MARL24:00 What applications did you have in mind while working on that approach? What could be the potential domains you see people can use that approach?25:35 Do you think generalization in RL is close enough to have an ImageNet moment?28:35 Inspiration from social animals for better architectures - Yay/Nay?30:20 How far are we in terms of using systems with DeepRL in day-to-day use? Or are there any such applications already in use?34:40 Do you think these DRL can be made interpretable to some extent? 39:00 What really intrigued you to pursue a Ph.D. after your master's and not a job?40:30 How did you go about deciding the topic for your Ph.D. thesis?47:40 How do you typically go about segmenting a research topic into smaller segments, from the initial stage when it's more of an abstract and no connections to theory too much more implementable?50:00 What are currently exploring and optimistic about?Also check-out these talks on all available podcast platforms: https://jayshah.buzzsprout.comAbout the Host:Jay is a Ph.D. student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

Adventures in Machine Learning
Episode 25: ML 021: Grokking Deep Reinforcement Learning with Miguel Morales

Adventures in Machine Learning

Play Episode Listen Later Feb 16, 2021 65:21


Miguel Morales is a Machine Learning engineer at Lockheed Martin and teaches at Georgia Institute of Technology. This episode starts with a basic explanation of Reinforcement Learning. Miguel then talks through the various methods of implementing and training systems through Reinforcement Learning. We talk algorithms and models and much more… Panel Charles Max Wood Guest Miguel Morales Sponsors Dev Heroes Accelerator Links GitHub | mimoralea/gdrl Manning | Grokking Deep Reinforcement Learning Twitter: mimoralea Email: mimoralea@gmail.com Picks Charles- The Expanse by James S. A. Corey Charles- Dev Heroes Accelerator Miguel- Deep Reinforcement Learning Miguel- Youtube: Dimitri Bertsekas Miguel- RL Course by David Silver - Lecture 1: Introduction to Reinforcement Learning Miguel- Thor ( 2011 )

Applied AI Pod
AI in game development & UX for gamers, w/ Unity ML-Agents' PM, Jeffrey Shih, E23

Applied AI Pod

Play Episode Listen Later Feb 2, 2021 41:32


02:10 - Brief history of game development in relation to AI advancements10:15 - Games driving advances in AI research: PR or reality?15:50 - Latest AI technique popular in game development20:55 - The role of Unity Game Simulation to reduce time & cost with games pre-launch testing26:45 - What's fancy in the games world31:35 - Streaming a game vs. traditional edge processing, gamer's lens37:55 - What's next for games & AIReferences:Unity ML-Agents Toolkit GitHubJeff's Twitter handle @shihzyJeff's LinkedInHost's notes:2021 Update for AI advancements through game examplesHistory of games at DeepMindTop AI Labs worldwide and AI's potentialFacebook, Carnegie Mellon build first AI that beats pros in 6-player poker

Machine Learning Podcast - Jay Shah
Using Deep Reinforcement Learning for System Optimization & more | Dr. Azalia Mirhoseini, Google ​

Machine Learning Podcast - Jay Shah

Play Episode Listen Later Feb 1, 2021 46:16


Azalia is a Research scientist at the Google Brain team, where she leads machine learning for systems moonshot projects. Her research interests include and not limited to exploring deep reinforcement learning for optimizing computer systems. She has a Ph.D. in Electrical and Computer Engineering from Rice University and has received many awards for her contributions including the MIT Technology Review 35 under 35.About the Host:Jay is a Ph.D. student at Arizona State University, doing research on building Interpretable AI models for Medical Diagnosis.Jay Shah: https://www.linkedin.com/in/shahjay22/You can reach out to https://www.public.asu.edu/~jgshah1/ for any queries.Stay tuned for upcoming webinars!***Disclaimer: The information contained in this video represents the views and opinions of the speaker and does not necessarily represent the views or opinions of any institution. It does not constitute an endorsement by any Institution or its affiliates of such video content.***

The Real Python Podcast
Deep Reinforcement Learning in a Notebook With Jupylet + Gaming and Synthesis

The Real Python Podcast

Play Episode Listen Later Jan 15, 2021 62:02


What is it like to design a Python library for three different audiences? This week on the show, we have Nir Aides, creator of Jupylet. His new library is designed for deep reinforcement learning researchers, musicians interested in live music coding, and kids interested in learning to program. Everything is designed to run inside of a Jupyter notebook.

PaperPlayer biorxiv neuroscience
Track-To-Learn: A general framework for tractography with deep reinforcement learning

PaperPlayer biorxiv neuroscience

Play Episode Listen Later Nov 17, 2020


Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.16.385229v1?rss=1 Authors: Theberge, A., Desrosiers, C., Descoteaux, M., Jodoin, P.-M. Abstract: Diffusion MRI tractography is currently the only non-invasive tool able to assess the white-matter structural connectivity of a brain. Since its inception, it has been widely documented that tractography is prone to producing erroneous tracks while missing true positive connections. Anatomical priors have been conceived and implemented in classical algorithms to try and tackle these issues, yet problems still remain and the conception and validation of these priors is very challenging. Recently, supervised learning algorithms have been proposed to learn the tracking procedure implicitly from data, without relying on anatomical priors. However, these methods rely on labelled data that is very hard to obtain. To remove the need for such data but still leverage the expressiveness of neural networks, we introduce Track-To-Learn: A general framework to pose tractography as a deep reinforcement learning problem. Deep reinforcement learning is a type of machine learning that does not depend on ground-truth data but rather on the concept of ``reward''. We implement and train algorithms to maximize returns from a reward function based on the alignment of streamlines with principal directions extracted from diffusion data. We show that competitive results can be obtained on known data and that the algorithms are able to generalize far better to new, unseen data, than prior machine learning-based tractography algorithms. To the best of our knowledge, this is the first successful use of deep reinforcement learning for tractography. Copy rights belong to original authors. Visit the link for more info

Applied AI Pod
Reinforcement Learning, Intelligent vehicles & Acquiring Data, with Praveen Palanisamy - AI Engineer Microsoft AI + Research, E15

Applied AI Pod

Play Episode Listen Later Oct 26, 2020 41:40


Notes:Deep Reinforcement Learning (DRL or DeepRL) applied to the automotive industrySimulation platforms and the role of simulators in training agentsObtaining data to prepare the autonomous vehicleMethods to evaluate robustness of the solutionDeploying in real worldStartups to use DL or be at the forefront of DLTechcrunch Disrupt Hackathon win & engineers at hackathons as a practice

Towards Data Science
54. Tim Rocktäschel - Deep reinforcement learning, symbolic learning and the road to AGI

Towards Data Science

Play Episode Listen Later Oct 15, 2020 53:52


Reinforcement learning can do some pretty impressive things. It can optimize ad targeting, help run self-driving cars, and even win StarCraft games. But current RL systems are still highly task-specific. Tesla’s self-driving car algorithm can’t win at StarCraft, and DeepMind’s AlphaZero algorithm can with Go matches against grandmasters, but can’t optimize your company’s ad spend. So how do we make the leap from narrow AI systems that leverage reinforcement learning to solve specific problems, to more general systems that can orient themselves in the world? Enter Tim Rocktäschel, a Research Scientist at Facebook AI Research London and a Lecturer in the Department of Computer Science at University College London. Much of Tim’s work has been focused on ways to make RL agents learn with relatively little data, using strategies known as sample efficient learning, in the hopes of improving their ability to solve more general problems. Tim joined me for this episode of the podcast.

Machine Learning Papers
An Intro to Deep Reinforcement Learning

Machine Learning Papers

Play Episode Listen Later Jul 15, 2020 10:08


Explains the basic concepts behind Deep Reinforcement Learning.

Journal Club
Chip Design, Teaching Google, and Fooling LIME and SHAP

Journal Club

Play Episode Listen Later Jun 16, 2020 32:42


This weeks episode we have the regular panel back together! George brought us the blog post from Google AI, "Chip Design with Deep Reinforcement Learning." Kyle brings us a news item from CNET, "How People with Down Syndrome are Improving Google Assistant." Lan brings us the paper this week! She discusses the paper "Fooling LIME and SHAP: Adversarial Attacks on Post hoc Explanation Methods." All works mentioned will be linked in the show notes. 

Data Science et al.
Deep Reinforcement Learning

Data Science et al.

Play Episode Listen Later Jun 10, 2020 0:15


Challenges of deep reinforcement learningSupport the show (http://paypal.me/SachinPanicker )

Engineered-Mind Podcast | Engineering, AI & Neuroscience
Miguel Morales - Deep Reinforcement Learning | Podcast #4

Engineered-Mind Podcast | Engineering, AI & Neuroscience

Play Episode Listen Later Apr 23, 2020 49:07


Miguel is a full-time Deep RL researcher at Lockheed Martin, a part-time Instructional Associate at Georgia Institute of Technology for the course on Reinforcement Learning and Decision Making, a Content Developer at Udacity for the Deep Reinforcement Learning Nanodegree, and the author of the great book “Grokking Deep Reinforcement Learning”. ————————————————————————————— Connect with me here: ✉️ My weekly email newsletter: jousef.substack.com

Bekk Open Podcast
34: Den om AlphaZero

Bekk Open Podcast

Play Episode Listen Later Apr 14, 2020 22:16


I denne episoden får Lars Magnus Øksnes og Ole-Martin Mørk besøk av Asbjørn Steinskog. Asbjørn er veldig glad i sjakk og kunstig intelligens, og har fulgt AlphaZero tett de siste årene. AlphaZero kombinerer sjakk og kunstig intelligens ved å bruke teknikken Deep Reinforcement Learning. Utviklerne ga den reglene, men så klarte den på egenhånd å bli verdens beste sjakkcomputer. Den gjorde det ved å spille mange millioner sjakkpartier mot seg selv, og hele tiden lære av sine feil. Vi går gjennom historien til AlphaZero, hva den har tilført sjakkverdenen, og ikke minst hva annet denne teknikken kan brukes til.

Lex Fridman Podcast
#86 – David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning

Lex Fridman Podcast

Play Episode Listen Later Apr 3, 2020 108:28


David Silver leads the reinforcement learning research group at DeepMind and was lead researcher on AlphaGo, AlphaZero and co-lead on AlphaStar, and MuZero and lot of important work in reinforcement learning. Support this podcast by signing up with these sponsors: – MasterClass: https://masterclass.com/lex – Cash App – use code “LexPodcast” and download: – Cash App (App Store): https://apple.co/2sPrUHe – Cash App (Google Play): https://bit.ly/2MlvP5w EPISODE LINKS: Reinforcement learning (book): https://amzn.to/2Jwp5zG This conversation is part of the Artificial Intelligence podcast. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook,

The Data Exchange with Ben Lorica
Next-generation simulation software will incorporate deep reinforcement learning

The Data Exchange with Ben Lorica

Play Episode Listen Later Apr 2, 2020 39:55


In this episode of the Data Exchange I speak with Chris Nicholson, founder and CEO of Pathmind, a startup applying deep reinforcement learning (DRL) to simulation problems.  In a recent post I highlighted two areas where companies can begin to add DRL to their suite of tools: personalization and recommendation engines, and simulation software. My interest in the interplay between DRL and simulation software began when I came across the work of Pathmind in this area.Our conversation focused on deep reinforcement learning and its applications:We began with the basics: what is reinforcement learning and why should businesses pay attention to it?We discussed enterprise applications of DRL, with particular emphasis in areas where Chris and Pathmind have been focused of late: Business Process Simulation and Optimization.Pathmind have been early adopters of Ray and of RLlib, a popular open-source library for reinforcement learning built on top of Ray. I asked Chris why they chose to build on top of RLlib.Detailed show notes can be found on The Data Exchange web site.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Music & AI Plus a Geometric Perspective on Reinforcement Learning with Pablo Samuel Castro - #339

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jan 16, 2020 43:49


Today we’re joined by Pablo Samuel Castro, Staff Research Software Developer at Google. Pablo, whose research is mainly focused on reinforcement learning, and I caught up at NeurIPS last month. We cover a lot of ground in our conversation, including his love for music, and how that has guided his work on the Lyric AI project, and a few of his other NeurIPS submissions, including “A Geometric Perspective on Optimal Representations for Reinforcement Learning” and “Estimating Policy Functions in Payments Systems using Deep Reinforcement Learning.”  Check out the complete show notes at twimlai.com/talk/339.

Data Science at Home
What is wrong with reinforcement learning? (Ep. 82)

Data Science at Home

Play Episode Listen Later Oct 15, 2019 21:48


Join the discussion on our Discord server   After reinforcement learning agents doing great at playing Atari video games, Alpha Go, doing financial trading, dealing with language modeling, let me tell you the real story here. In this episode I want to shine some light on reinforcement learning (RL) and the limitations that every practitioner should consider before taking certain directions. RL seems to work so well! What is wrong with it?   Are you a listener of Data Science at Home podcast? A reader of the Amethix Blog?  Or did you subscribe to the Artificial Intelligence at your fingertips newsletter? In any case let's stay in touch!  https://amethix.com/survey/     References Emergence of Locomotion Behaviours in Rich Environments  https://arxiv.org/abs/1707.02286 Rainbow: Combining Improvements in Deep Reinforcement Learning  https://arxiv.org/abs/1710.02298 AlphaGo Zero: Starting from scratch  https://deepmind.com/blog/article/alphago-zero-starting-scratch

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Reinforcement Learning for Logistics at Instadeep with Karim Beguir - #302

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Sep 25, 2019 43:45


Today we are joined by Karim Beguir, Co-Founder and CEO of InstaDeep, a company in Tunisia, Africa focusing on building advanced decision-making systems for the enterprise. In this episode, we discuss where his and InstaDeep’s journey began in Tunisia, Africa (00:27), the challenges that enterprise companies are seeing in logistics that can be solved by deep learning and machine learning (05:43), how InstaDeep is applying DL and RL to real world problems (09:45), and what are the data sets used to train these models and the application of transfer learning between similar data sets (13:00). Additionally, we go over ‘Rank Rewards’, a paper Karim published last year, in which adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms (22:40), the overall efficiency of RL for logistical problems (23:05), and details on the InstaDeep process (35:37). The complete show notes for this episode can be found at twimlai.com/talk/302. 

OpenLayer
OpenLayer - #4 | Mathieu Godbout

OpenLayer

Play Episode Listen Later Aug 17, 2019 68:16


Dans cet épisode, je discute avec Mathieu Godbout de son parcours vers l'apprentissage machine et l'apprentissage par renforcement (RL). Les livres dont Mahtieu parle dans le podcast.- Reinforcement Learning : An Introduction http://incompleteideas.net/book/the-book-2nd.html- An Introduction to Deep Reinforcement Learning https://arxiv.org/pdf/1811.12560.pdf Merci à mes commanditaires l'AÉLIES (https://www.aelies.ulaval.ca), le SPLA (https://www.spla.ulaval.ca), l'AGIL (http://www2.ift.ulaval.ca/~agil/) et .Layer (https://www.dotlayer.org/). Abonne-toi au podcast!

OpenLayer
OpenLayer - #4 | Mathieu Godbout

OpenLayer

Play Episode Listen Later Aug 16, 2019 68:17


Dans cet épisode, je discute avec Mathieu Godbout de son parcours vers l’apprentissage machine et l’apprentissage par renforcement (RL).nbsp; Les livres dont Mahtieu parle dans le podcast.- Reinforcement Learning : An Introduction http://incompleteideas.net/book/the-book-2nd.html- An Introduction to Deep Reinforcement Learning https://arxiv.org/pdf/1811.12560.pdf Merci à mes commanditaires l'AÉLIES (https://www.aelies.ulaval.ca), le SPLA (https://www.spla.ulaval.ca), l’AGIL (http://www2.ift.ulaval.ca/~agil/) et .Layer (https://www.dotlayer.org/) . Abonne-toi au podcast !--------[Facebook] - https://www.facebook.com/OpenLayerPodcast/[Spotify] - https://open.spotify.com/show/6LWUHrtNrRioE7Ggxkpcno?si=-3tZo88XSnW0Mwcp7sZkLA[Balado Québec] - https://baladoquebec.ca/#!/openlayer[iTunes Podcast] - https://podcasts.apple.com/ca/podcast/openlayer/id1477641065[Google Play Music] - https://playmusic.app.goo.gl/?ibi=com.google.PlayMusicamp;isi=691797987amp;ius=googleplaymusicamp;apn=com.google.android.musicamp;link=https://play.google.com/music/m/Iytw3gkywmyegzfe45u2y5ciqfm?t%3DOpenLayer%26pcampaignid%3DMKT-na-all-co-pr-mu-pod-16[Google Podcast] – OpenLayer Podcast------

Changelog Master Feed
Deep Reinforcement Learning (Practical AI #40)

Changelog Master Feed

Play Episode Listen Later Apr 23, 2019 45:35 Transcription Available


While attending the NVIDIA GPU Technology Conference in Silicon Valley, Chris met up with Adam Stooke, a speaker and PhD student at UC Berkeley who is doing groundbreaking work in large-scale deep reinforcement learning and robotics. Adam took Chris on a tour of deep reinforcement learning - explaining what it is, how it works, and why it’s one of the hottest technologies in artificial intelligence!

Practical AI
Deep Reinforcement Learning

Practical AI

Play Episode Listen Later Apr 23, 2019 45:35 Transcription Available


While attending the NVIDIA GPU Technology Conference in Silicon Valley, Chris met up with Adam Stooke, a speaker and PhD student at UC Berkeley who is doing groundbreaking work in large-scale deep reinforcement learning and robotics. Adam took Chris on a tour of deep reinforcement learning - explaining what it is, how it works, and why it’s one of the hottest technologies in artificial intelligence!

AI Mentors
E13 Dr. Tristan Behrens, Senior Data Scientist and AI Guru

AI Mentors

Play Episode Listen Later Apr 3, 2019


Today’s guest on the show is the AI Guru Dr Tristan Behrens. Tristan is a Senior Data Scientist with a strong focus on AI, Machine Learning, Deep Learning and Deep Reinforcement Learning. After five years of solving many digitalization challenges and creating many products, he decided to shift towards transformation and enablement. Today, Tristan helps individuals, … Continue reading "E13 Dr. Tristan Behrens, Senior Data Scientist and AI Guru"

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Safer Exploration in Deep Reinforcement Learning using Action Priors with Sicelukwanda Zwane - TWiML Talk #235

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Mar 1, 2019 54:01


Today we conclude our Black in AI series with Sicelukwanda Zwane, a masters student at the University of Witwatersrand and graduate research assistant at the CSIR. At the workshop, he presented on “Safer Exploration in Deep Reinforcement Learning using Action Priors,” which explores transferring action priors between robotic tasks to reduce the exploration space in reinforcement learning, which in turn reduces sample complexity. In our conversation, we discuss what “safer exploration” means in this sense, the difference between this work and other techniques like imitation learning, and how this fits in with the goal of “lifelong learning.” The complete show notes for this episode can be found at https://twimlai.com/talk/235. To follow along with the Black in AI series, visit https://twimlai.com/blackinai19.

AI with AI
Undecidable: They Called Me Mr. GAN

AI with AI

Play Episode Listen Later Jan 25, 2019 44:41


Andy and Dave discuss Microsoft’s $1.76B five-year service deal with the Department of Defense, US Coast Guard, and the intelligence communities; the US Defense Innovation Board announces its first “public listening session” on AI principles; Finland announces an AI experiment to teach 1% of its population the basics of AI; a report from the Center for the Governance of AI and the Future of Humanity Institute reports on American attitudes and trends toward AI; and the Reuters Institute for the Study of Journalism examines UK media coverage of AI. In research news, MIT and IBM Watson AI Lab dissect a GAN to visualize and understand its inner workings, and they identify clusters of neurons that represent concepts; they also created GAN Paint, which lets a user add or subtract elements from a photo. Research from NYU and Columbia trained a single network model to perform 20 cognitive tasks, and discover this learning gives rise to compositionality of task representations, where one task can be performed by recombining representations from other tasks. Researchers at the University of Waterloo, Princeton University, and Tel Aviv University demonstrate that a type of machine learning can be undecidable, that is, unsolvable. Jeff Huang at Brown University has compiled a list of the best papers at computer science conferences since 1996; McGill and Google Brain offer a condensed Introduction to Deep Reinforcement Learning; Nature launches the inaugural issue of Nature Machine Intelligence; and a paper explores designing neural networks through neuroevolution. Major General Mick Ryan debuts a sci-fi story “AugoStrat Awakenings;” NeurIPS 2018 makes all videos and slides available, and USNI’s Proceedings publishes an essay from CAPT Sharif Calfee on The Navy Needs an Autonomy Project Office.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Trends in Reinforcement Learning with Simon Osindero - TWiML Talk #217

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Jan 3, 2019 52:46


In this episode of our AI Rewind series, we introduce a new friend of the show, Simon Osindero, Staff Research Scientist at DeepMind. We discuss trends in Deep Reinforcement Learning in 2018 and beyond. We’ve packed a bunch into this show, as Simon walks us through many of the important papers and developments seen last year in areas like Imitation Learning, Unsupervised RL, Meta-learning, and more. The complete show notes for this episode can be found at https://twimlai.com/talk/217. For more information on our 2018 AI Rewind series, visit https://twimlai.com/rewind2018.    

Reversim Podcast
Summit 2018: Winning 2048 Game Using Deep Reinforcement Learning / Eyal Altshuler

Reversim Podcast

Play Episode Listen Later Dec 22, 2018


Lex Fridman Podcast
Pieter Abbeel: Deep Reinforcement Learning

Lex Fridman Podcast

Play Episode Listen Later Dec 16, 2018 42:56


Pieter Abbeel is a professor at UC Berkeley, director of the Berkeley Robot Learning Lab, and is one of the top researchers in the world working on how to make robots understand and interact with the world around them, especially through imitation and deep reinforcement learning. Video version is available on YouTube. If you would like to get more information about this podcast go to https://lexfridman.com/ai or connect with @lexfridman on Twitter, LinkedIn, Facebook, or YouTube where you can watch the video versions of these conversations.

NLP Highlights
74 - Deep Reinforcement Learning Doesn't Work Yet, with Alex Irpan

NLP Highlights

Play Episode Listen Later Nov 16, 2018 40:43


Blog post by Alex Irpan titled "Deep Reinforcement Learning Doesn't Work Yet" https://www.alexirpan.com/2018/02/14/rl-hard.html In this episode, Alex Irpan talks about limitations of current deep reinforcement learning methods and why we have a long way to go before they go mainstream. We discuss sample inefficiency, instability, the difficulty to design reward functions and overfitting to the environment. Alex concludes with a list of recommendations he found useful when training models with deep reinforcement learning.

Google Cloud Platform Podcast
AI Corporations and Communities in Africa with Karim Beguir & Muthoni Wanyoike

Google Cloud Platform Podcast

Play Episode Listen Later Oct 23, 2018 37:26


On the podcast today, we have two more fascinating interviews from Melanie’s time at Deep Learning Indaba! Mark helps host this episode as we speak with Karim Beguir and Muthoni Wanyoike about their company, Instadeep, the wonderful Indaba conference, and the growing AI community in Africa. Instadeep helps large enterprises understand how AI can benefit them. Karim stresses that it is possible to build advanced AI and machine learning programs in Africa because of the growing community of passionate developers and mentors for the new generation. Muthoni tells us about Nairobi Women in Machine Learning and Data Science, a community she is heavily involved with in Nairobi. The group runs workshops and classes for AI developers and encourages volunteers to participate by sharing their knowledge and skills. Karim Beguir Karim Beguir helps companies get a grip on the latest AI advancements and how to implement them. A graduate of France’s Ecole Polytechnique and former Program Fellow at NYU’s Courant Institute, Karim has a passion for teaching and using applied mathematics. This led him to co-found InstaDeep, an AI startup that was nominated at the MWC17 for the Top 20 global startup list made by PCMAG. Karim uses TensorFlow to develop Deep Learning and Reinforcement Learning products. Karim is also the founder of the TensorFlow Tunis Meetup. He regularly organises educational events and workshops to share his experience with the community. Karim is on a mission to democratize AI and make it accessible to a wide audience. Muthoni Wanyoike Muthoni Wanyoike is the team lead at Instadeep in Kenya. She is Passionate about bridging the skills gap in AI in Africa and does this by co-organizing the Nairobi Women in Machine Learning community. The community enables learning, mentorship, networking, and job opportunities for people interested in working in AI. She is experienced in research, data analytics, community and project management, and community growth hacking. Cool things of the week Is there life on other planets? Google Cloud is working with NASA’s Frontier Development Lab to find out blog In this Codelab, you will learn about StarCraft II Learning Environment project and to train your first Deep Reinforcement Learning agent. You will also get familiar some of the concepts and frameworks to get to train a machine learning agent. site A new course to teach people about fairness in ML blog Serverless from the ground up: Building a simple microservice with Cloud Functions (Part 1) blog Superposition Podcast from Deep Learning Indaba with Omoju Miller and Nando de Freitas tweet and video Interview Instadeep site Nairobi Women in Machine Learning and Data Science site Neural Information Processing Systems site Google Launchpad Accelerator site TensorFlow site Google Assistant site Cloud AutoML site Hackathon Lagos site Deep Learning Book book Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization research paper Lessons learned on building a tech community blog Kenya Open Data Initiative site R for Data Science GitHub site and book TWIML Presents Deep Learning Indaba site Question of the week If I want to create a GKE cluster with a specific major kubernetes version (or even just the latest) using the command line tools, how do I do that? GCloud container clusters create site Specifying cluster version site Where can you find us next? Our guests will be at Indaba 2019 in Kenya. Mark will be at KubeCon in December. Melanie will be at SOCML in November.

Women in AI
Episode 38: Deep Reinforcement Learning in Complex Environments

Women in AI

Play Episode Listen Later Sep 6, 2018 13:30


Women in AI is a biweekly podcast from RE•WORK, meeting with leading female minds in AI, Deep Learning and Machine Learning. We will speak to CEOs, CTOs, Data Scientists, Engineers, Researchers and Industry Professionals to learn about their cutting edge work and technological advancements, as well as their impact on AI for social good and diversity in the workplace.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Deep Reinforcement Learning Primer and Research Frontiers with Kamyar Azizzadenesheli - TWiML Talk #177

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Aug 30, 2018 95:25


Today we’re joined by Kamyar Azizzadenesheli, PhD student at the University of California, Irvine, and visiting researcher at Caltech where he works with Anima Anandkumar, who you might remember from TWiML Talk 142. We begin with a reinforcement learning primer of sorts, in which we review the core elements of RL, along with quite a few examples to help get you up to speed. We then discuss a pair of Kamyar’s RL-related papers: “Efficient Exploration through Bayesian Deep Q-Networks” and “Sample-Efficient Deep RL with Generative Adversarial Tree Search.” In addition to discussing Kamyar’s work, we also chat a bit of the general landscape of RL research today. So whether you’re new to the field or want to dive into cutting-edge reinforcement learning research with us, this podcast is here for you! If you'd like to skip the Deep Reinforcement Learning primer portion of this and jump to the research discussion, skip ahead to the 34:30 mark of the episode.

Scaling Ambition
#12 David Hunter of Optimal Labs on Revolutionising Human Nutrition

Scaling Ambition

Play Episode Listen Later May 29, 2018 28:43


David Hunter is the CEO and Co-Founder of Optimal Labs a company that applies cutting-edge deep reinforcement learning to create intelligent autopilots for farms, improving the efficiency, reliability and quality of food production.  High tech greenhouses can produce 10-40x yield of traditional farming and they’re universally scalable and deployable anywhere. And the implications of AI controlled farms are potentially huge – with higher quality food, produced faster and more reliably, people will be able to live better for longer. David is one of the famous cases of someone who did EF twice – he was on EF4 and after met his Co-Founder Joao on EF7 the idea for Optimal Labs became a reality and before they knew it, they were following farmers around greenhouses in Holland and building solutions to their problems. He studied AeroNautical Engineering at Imperial and trained as a pilot in the Royal Airforce before going into the banking industry, where he eventually ended up running Deutsche Bank’s quantitative strategies team where his algorithms traded more than 5% of the European stock market volume. After going through EF4 he took some time off to earn a PhD in Deep Reinforcement Learning at Oxford before returning to join EF7. In this episode David and I discuss: - Why he left his well-paid job at Deutsche Bank to start a company - What he learned from doing EF and how he applied this knowledge second time around  - The challenges he’s overcome along the way and his tips on hiring and seeking advice This was a fascinating conversation that will not only give you an insight into the future of nutrition but will also show you the resilience and continuous learning that entrepreneurs have to apply in their lives.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Learning Active Learning with Ksenia Konyushkova - TWiML Talk #116

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Mar 5, 2018 33:01


In this episode, I speak with Ksenia Konyushkova, Ph.D. student in the CVLab at Ecole Polytechnique Federale de Lausanne in Switzerland. Ksenia and I connected at NIPS in December to discuss her interesting research into ways we might apply machine learning to ease the challenge of creating labeled datasets for machine learning. The first paper we discuss is “Learning Active Learning from Data,” which suggests a data-driven approach to active learning that trains a secondary model to identify the unlabeled data points which, when labeled, would likely have the greatest impact on our primary model’s performance. We also discuss her paper “Learning Intelligent Dialogs for Bounding Box Annotation,” in which she trains an agent to guide the actions of a human annotator to more quickly produce bounding boxes. TWiML Online Meetup Update Join us Tuesday, March 13th for the March edition of the Online Meetup! Sean Devlin will be doing an in-depth review of reinforcement learning and presenting the Google DeepMind paper, "Playing Atari with Deep Reinforcement Learning." Head over to twimlai.com/meetup to learn more or register. Conference Update Be sure to check out some of the great names that will be at the AI Conference in New York, Apr 29–May 2, where you'll join the leading minds in AI, Peter Norvig, George Church, Olga Russakovsky, Manuela Veloso, and Zoubin Ghahramani. Explore AI's latest developments, separate what's hype and what's really game-changing, and learn how to apply AI in your organization right now. Save 20% on most passes with discount code PCTWIML. Early price ends February 2! The notes for this show can be found at https://twimlai.com/talk/116.

T3chfest
Introduccion to Deep reinforcement learning - Jorge Del Val

T3chfest

Play Episode Listen Later Feb 27, 2018 43:21


Si quieres ver el video con slides: https://www.youtube.com/watch?v=FDm500K20dU Jorge del Val Santos (BEEVA) El "Reinforcement Learning" es un área del machine learning y la inteligencia artificial que trata con agentes que aprenden y se adaptan dinámicamente a un entorno incierto en base a su experiencia. Has oído hablar de los recientes éxitos de Google Deep Mind? Programas que aprenden automáticamente a jugar a Atari usando sólo los pixeles, o ganar al campeón mundial de "Go" varias veces. El aprendizaje por refuerzo está en la frontera de las matemáticas aplicadas y la inteligencia artificial, siendo un campo de investigación extremadamente activo y profundo. En esta charla revisaremos brevemente, de una forma accesible, los fundamentos matemáticos y algorítmicos para entender cómo o por qué funcionan estas técnicas. También veremos algunas implementaciones y ejemplos en Python y discutiremos brevemente la aproximación funcional por medio de redes neuronales profundas.

QUT Institute for Future Environments
Deep Reinforcement Learning In The Real World - Raia Hadsell (Google DeepMind)

QUT Institute for Future Environments

Play Episode Listen Later Jan 14, 2018 50:21


QUT Institute for Future Environments
Deep Reinforcement Learning In The Real World - Raia Hadsell (Google DeepMind)

QUT Institute for Future Environments

Play Episode Listen Later Jan 14, 2018 50:21


IFE Distinguished Visitor Lecture, recorded 10 August 2017 at QUT

NLP Highlights
37 - On Statistical Significance, Training Variance, and Why Reporting Score Distributions Matters

NLP Highlights

Play Episode Listen Later Oct 24, 2017 12:47


In this episode we talk about a couple of recent papers that get at the issue of training variance, and why we should not just take the max from a training distribution when reporting results. Sadly, our current focus on performance in leaderboards only exacerbates these issues, and (in my opinion) encourages bad science. Papers: https://www.semanticscholar.org/paper/Reporting-Score-Distributions-Makes-a-Difference-P-Reimers-Gurevych/0eae432f7edacb262f3434ecdb2af707b5b06481 https://www.semanticscholar.org/paper/Deep-Reinforcement-Learning-that-Matters-Henderson-Islam/90dad036ab47d683080c6be63b00415492b48506

Learning Machines 101
LM101-044: What happened at the Deep Reinforcement Learning Tutorial at the 2015 Neural Information Processing Systems Conference?

Learning Machines 101

Play Episode Listen Later Jan 25, 2016 31:38


This is the third of a short subsequence of podcasts providing a summary of events associated with Dr. Golden’s recent visit to the 2015 Neural Information Processing Systems Conference. This is one of the top conferences in the field of Machine Learning. This episode reviews and discusses topics associated with the Introduction to Reinforcement Learning with Function Approximation Tutorial presented by Professor Richard Sutton on the first day of the conference. Check out: www.learningmachines101.com to learn more!! Also follow us at: "lm101talk" on twitter!