Podcasts about Reinforcement learning

  • 325PODCASTS
  • 726EPISODES
  • 46mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Nov 20, 2025LATEST

POPULARITY

20172018201920202021202220232024


Best podcasts about Reinforcement learning

Latest podcast episodes about Reinforcement learning

Female TechTalk
Deine Mikro-Öffentlichkeit und die Macht dahinter

Female TechTalk

Play Episode Listen Later Nov 20, 2025 41:35


Was passiert eigentlich in unseren Feeds – und warum sehen wir alle was anderes?In der neuen Folge von Female TechTalk sprechen wir darüber, wie sich unsere Öffentlichkeit immer stärker ins Digitale verschiebt und warum Social Media inzwischen zu einem der wichtigsten öffentlichen Räume unserer Demokratie geworden ist.Wir erklären, wie Reinforcement Learning im Hintergrund arbeitet, warum Algorithmen wie kleine Agentinnen jede Sekunde testen, was uns fesselt, und welche Rolle Zustandsübergänge und Übergangswahrscheinlichkeiten dabei spielen. Außerdem schauen wir darauf, wie wir mit der Informationsflut umgehen können – und weshalb wir am Ende doch alle in unserer eigenen Mikroöffentlichkeit leben.Und ja: Bei uns gibt es in dieser Folge sogar neue Feiertage höchstpersönlich von FTT eingeführt.Also hört rein. Wenn ihr danach euren Feed mit anderen Augen seht, übernehmen wir keine Verantwortung – das war dann vermutlich die Agentin.

Crazy Wisdom
Episode #506: How AI Turns Podcasts into Knowledge Engines

Crazy Wisdom

Play Episode Listen Later Nov 14, 2025 49:38


In this episode of Crazy Wisdom, host Stewart Alsop talks with Kevin Smith, co-founder of Snipd, about how AI is reshaping the way we listen, learn, and interact with podcasts. They explore Snipd's vision of transforming podcasts into living knowledge systems, the evolution of machine learning from finance to large language models, and the broader connection between AI, robotics, and energy as the foundation for the next technological era. Kevin also touches on ideas like the bitter lesson, reinforcement learning, and the growing energy demands of AI. Listeners can try Snipd's premium version free for a month using this promo link.Check out this GPT we trained on the conversationTimestamps00:00 – Stewart Alsop welcomes Kevin Smith, co-founder of Snipd, to discuss AI, podcasting, and curiosity-driven learning.05:00 – Kevin explains Snipd's snipping feature, chatting with episodes, and future plans for voice interaction with podcasts.10:00 – They discuss vector search, embeddings, and context windows, comparing full-episode context to chunked transcripts.15:00 – Kevin shares his background in mathematics and economics, his shift from finance to machine learning, and early startup work in AI.20:00 – They explore early quant models versus modern machine learning, statistical modeling, and data limitations in finance.25:00 – Conversation turns to transformer models, pretraining, and the bitter lesson—how compute-based methods outperform human-crafted systems. 30:00 – Stewart connects this to RLHF, Scale AI, and data scarcity; Kevin reflects on reinforcement learning's future. 35:00 – They pivot to Snipd's podcast ecosystem, hidden gems like Founders Podcast, and how stories shape entrepreneurial insight. 40:00 – ETH Zurich, robotics, and startup culture come up, linking academia to real-world innovation. 45:00 – They close on AI, robotics, and energy as the pillars of the future, debating nuclear and solar power's role in sustaining progress.Key InsightsPodcasts as dynamic knowledge systems: Kevin Smith presents Snipd as an AI-powered tool that transforms podcasts into interactive learning environments. By allowing listeners to “snip” and summarize meaningful moments, Snipd turns passive listening into active knowledge management—bridging curiosity, memory, and technology in a way that reframes podcasts as living knowledge capsules rather than static media.AI transforming how we engage with information: The discussion highlights how AI enables entirely new modes of interaction—chatting directly with podcast episodes, asking follow-up questions, and contextualizing information across an author's full body of work. This evolution points toward a future where knowledge consumption becomes conversational and personalized rather than linear and one-size-fits-all.Vectorization and context windows matter: Kevin explains that Snipd currently avoids heavy use of vector databases, opting instead to feed entire episodes into large models. This choice enhances coherence and comprehension, reflecting how advances in context windows have reshaped how AI understands complex audio content.Machine learning's roots in finance shaped early AI thinking: Kevin's journey from quantitative finance to AI reveals how statistical modeling laid the groundwork for modern learning systems. While finance once relied on rigid, theory-based models, the machine learning paradigm replaced those priors with flexible, data-driven discovery—an essential philosophical shift in how intelligence is approached.The Bitter Lesson and the rise of compute: Together they unpack Richard Sutton's “bitter lesson”—the idea that methods leveraging computation and data inevitably surpass those built from human intuition. This insight serves as a compass for understanding why transformers, pretraining, and scaling have driven recent AI breakthroughs.Reinforcement learning and data scarcity define AI's next phase: Stewart links RLHF and the work of companies like Scale AI and Surge AI to the broader question of data limits. Kevin agrees that the next wave of AI will depend on reinforcement learning and simulated environments that generate new, high-quality data beyond what humans can label.The future hinges on AI, robotics, and energy: Kevin closes with a framework for the next decade: AI provides intelligence, robotics applies it to the physical world, and energy sustains it all. He warns that society must shift from fearing energy use to innovating in production—especially through nuclear and solar power—to meet the demands of an increasingly intelligent, interconnected world.

Vida com IA
#136- Reinforcement Learning.

Vida com IA

Play Episode Listen Later Nov 13, 2025 15:59


Fala galera, nesse episódio eu falo sobre a base de reinforcement learning, que é uma das áreas mais importantes hoje em dia em IA, pois é a base do pós treinamento de LLMs!O cupom de 50% de desconto do curso pelos 2 primeiros meses válido até 30/11 é: BLACK50Aqui está o link para a página de vendas para saber mais sobre mim e sobre o curso: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.cursovidacomia.com.br/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Aqui está o link para se inscrever: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pay.hotmart.com/W98240617U⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Link do grupo do wpp:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://chat.whatsapp.com/GNLhf8aCurbHQc9ayX5oCPInstagram do podcast: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.instagram.com/podcast.lifewithai⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Meu Linkedin: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/filipe-lauar/⁠⁠⁠

Outgrow's Marketer of the Month
Snippet: Beyond Language Models: Why Reinforcement Learning Still Matters: Martin Riedmiller, Research Scientist & Controls Team Lead, Google DeepMind

Outgrow's Marketer of the Month

Play Episode Listen Later Nov 12, 2025 1:31


ExplAInable
רובוט אמיתי בכמה שורות פייתון עם MAIK-Education

ExplAInable

Play Episode Listen Later Nov 11, 2025 42:21


בפרק הזה מייק ריאיין את תמיר שסיפר על האפליקציה שהחברה maik-education.com שלו מפתחת. מדובר באפליקצייה וובית ייחודית שהינה סביבת Reinforcement Learning הניתן להפעלה באופן פיסי עם רובוטים אמיתיים שכל אחד יכול ליצור בבית או במשרד. בסביבה ניתן ליצור סוכנים, להגדיר להם התנהגויות בקוד או במודל דיפ אותו ניתן לאמן למיקסום פונקצית תגמול כלשהיא. לאחר שהפרויקט רץ ועובד וירטואלית ניתן לחבר כל סוכן לרובוט בבלוטוס (יש גם ערכות לזה) ויש לייצב מצלמה שתתפוס את זירת הרובוטים ואז כל מה שתיכנתנו או אימנו בסימולציה יקרה בעולם הפיסי. בפרק תמיר הראה פרויקטים כמו רובוטים שיודעים להסתדר בצורה של משולש, רובוט (פוטבול) המנסה להגיע לקו בעוד רובוט אחר המנסה לחסום אותו (AI vs AI), רובוט המגיע לנקודת יעד מבלי להתנגש במכשול או לחילופין כך שיעבור דרך נקודה שתזכה אותו בתגמול חלקי, ועוד. הסביבה מאפשרת לכל אחד ליצור פרויקט רובוטים יצירתי כלשהוא למטרות למידה וכף.  

TalkRL: The Reinforcement Learning Podcast
Danijar Hafner on Dreamer v4

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Nov 10, 2025 100:52 Transcription Available


Danijar Hafner was a Research Scientist at Google DeepMind until recently.Featured References   Training Agents Inside of Scalable World Models [ blog ]  Danijar Hafner, Wilson Yan, Timothy LillicrapOne Step Diffusion via Shortcut ModelsKevin Frans, Danijar Hafner, Sergey Levine, Pieter AbbeelAction and Perception as Divergence Minimization [ blog ] Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess Additional References   Mastering Diverse Domains through World Models [ blog ] DreaverV3l Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap   Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba   Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ], Baker et al

This Week in Google (MP3)
IM 844: Poob Has It For You - Spiky Superintelligence vs. Generality

This Week in Google (MP3)

Play Episode Listen Later Nov 6, 2025 163:50


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

All TWiT.tv Shows (MP3)
Intelligent Machines 844: Poob Has It For You

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 6, 2025 163:20 Transcription Available


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

Radio Leo (Audio)
Intelligent Machines 844: Poob Has It For You

Radio Leo (Audio)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

This Week in Google (Video HI)
IM 844: Poob Has It For You - Spiky Superintelligence vs. Generality

This Week in Google (Video HI)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

All TWiT.tv Shows (Video LO)
Intelligent Machines 844: Poob Has It For You

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 6, 2025 163:20 Transcription Available


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

Radio Leo (Video HD)
Intelligent Machines 844: Poob Has It For You

Radio Leo (Video HD)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Cohere's Chief Scientist on Why Scaling Laws Will Continue | Whether You Can Buy Success in AI with Talent Acquisitions | The Future of Synthetic Data & What It Means for Models | Why AI Coding is Akin to Image Generation in 2015 with Joelle

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Nov 3, 2025 57:34


Joelle Pineau is the Chief Scientist at Cohere, where she leads research on advancing large language models and practical AI systems. Before joining Cohere, she was VP of AI Research at Meta, where she founded and led Meta AI's Montreal lab. A professor at McGill University, Joelle is renowned for her pioneering work in reinforcement learning, robotics, and responsible AI development. AGENDA:  00:00 Introduction to AI Scaling Laws 03:00 How Meta Shaped How I Think About AI Research 04:36 Challenges in Reinforcement Learning 10:00 Is It Possible to be Capital Efficient in AI 15:52 AI in Enterprise: Efficiency and Adoption 22:15 Security Concerns with AI Agents 28:34 Can Zuck Win By Buying the Galacticos of AI 32:15 The Rising Cost of Data 35:28 Synthetic Data and Model Degradation 37:22 Why AI Coding is Akin to Image Generation in 2015 48:46 If Joelle Was a VC Where Would She Invest? 52:17 Quickfire: Lessons from Zuck, Biggest Mindset Shift  

Cambrian Fintech with Rex Salisbury
Why AI will NEVER Replace Your Sales Job! - Stevie Case CRO @Vanta

Cambrian Fintech with Rex Salisbury

Play Episode Listen Later Oct 21, 2025 46:49


My Fintech Newsletter for more interviews and the latest insights:↪︎ https://rexsalisbury.substack.com/In this episode, I sit down with Stevie Case from Vanta, a former pro gamer turned chief revenue officer, to discuss how AI is transforming the entire go-to-market function in B2B SaaS. Stevie shares insights on building agile sales organizations, how AI supercharges human roles rather than replacing them, and the evolving expectations for sales, customer success, and RevOps teams. The conversation covers AI tool adoption, hiring for an AI-native workforce, and why go-to-market roles are among the most exciting in tech today.Stevie Case: https://www.linkedin.com/in/steviecase/00:00:00 - AI's Impact on Go-To-Market Functions00:02:06 - Building Scalable Sales Organizations00:04:47 - Specialization and Segmentation in Sales00:06:28 - AI Supercharging Customer Success00:08:23 - Hiring and Onboarding with AI Support00:10:07 - Building AI-Driven Products with Customers00:12:08 - Selling New Products to Existing Customers00:15:02 - Early Product Adoption and Iteration00:17:25 - Operating at All Levels in Organizations00:20:01 - Creating Intense, High-Velocity Teams00:22:15 - Hiring AI-Native, Curious Builders00:25:05 - Measuring Success by Team Pride and Feedback00:26:07 - Developing Agent Platforms00:28:02 - Monetization and Business Model Evolution00:30:49 - AI-Enabled Competitive Advantages in Fintech00:32:31 - Top-Down AI Automation Demand00:34:11 - Reinforcement Learning in Fraud Detection00:38:00 - International Go-To-Market Expansion00:41:33 - Designing Global Sales Footprints00:45:04 - Resourcing RevOps and Systems Teams___Rex Salisbury LinkedIn:↪︎ https://www.linkedin.com/in/rexsalisburyTwitter: https://twitter.com/rexsalisburyTikTok: https://www.tiktok.com/@rex.salisburyInstagram: https://www.instagram.com/rexsalisbury/

The top AI news from the past week, every ThursdAI

Hey folks, Alex here. Can you believe it's already the middle of October? This week's show was a special one, not just because of the mind-blowing news, but because we set a new ThursdAI record with four incredible interviews back-to-back!We had Jessica Gallegos from Google DeepMind walking us through the cinematic new features in VEO 3.1. Then we dove deep into the world of Reinforcement Learning with my new colleague Kyle Corbitt from OpenPipe. We got the scoop on Amp's wild new ad-supported free tier from CEO Quinn Slack. And just as we were wrapping up, Swyx ( from Latent.Space , now with Cognition!) jumped on to break the news about their blazingly fast SWE-grep models. But the biggest story? An AI model from Google and Yale made a novel scientific discovery about cancer cells that was then validated in a lab. This is it, folks. This is the “let's f*****g go” moment we've been waiting for. So buckle up, because this week was an absolute monster. Let's dive in!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open Source: An AI Model Just Made a Real-World Cancer DiscoveryWe always start with open source, but this week felt different. This week, open source AI stepped out of the benchmarks and into the biology lab.Our friends at Qwen kicked things off with new 3B and 8B parameter versions of their Qwen3-VL vision model. It's always great to see powerful models shrink down to sizes that can run on-device. What's wild is that these small models are outperforming last generation's giants, like the 72B Qwen2.5-VL, on a whole suite of benchmarks. The 8B model scores a 33.9 on OS World, which is incredible for an on-device agent that can actually see and click things on your screen. For comparison, that's getting close to what we saw from Sonnet 3.7 just a few months ago. The pace is just relentless.But then, Google dropped a bombshell. A 27-billion parameter Gemma-based model they developed with Yale, called C2S-Scale, generated a completely novel hypothesis about how cancer cells behave. This wasn't a summary of existing research; it was a new idea, something no human scientist had documented before. And here's the kicker: researchers then took that hypothesis into a wet lab, tested it on living cells, and proved it was true.This is a monumental deal. For years, AI skeptics like Gary Marcus have said that LLMs are just stochastic parrots, that they can't create genuinely new knowledge. This feels like the first, powerful counter-argument. Friend of the pod, Dr. Derya Unutmaz, has been on the show before saying AI is going to solve cancer, and this is the first real sign that he might be right. The researchers noted this was an “emergent capability of scale,” proving once again that as these models get bigger and are trained on more complex data—in this case, turning single-cell RNA sequences into “sentences” for the model to learn from—they unlock completely new abilities. This is AI as a true scientific collaborator. Absolutely incredible.Big Companies & APIsThe big companies weren't sleeping this week, either. The agentic AI race is heating up, and we're seeing huge updates across the board.Claude Haiku 4.5: Fast, Cheap Model Rivals Sonnet 4 Accuracy (X, Official blog, X)First up, Anthropic released Claude Haiku 4.5, and it is a beast. It's a fast, cheap model that's punching way above its weight. On the SWE-bench verified benchmark for coding, it hit 73.3%, putting it right up there with giants like GPT-5 Codex, but at a fraction of the cost and twice the speed of previous Claude models. Nisten has already been putting it through its paces and loves it for agentic workflows because it just follows instructions without getting opinionated. It seems like Anthropic has specifically tuned this one to be a workhorse for agents, and it absolutely delivers. The thing to note also is the very impressive jump in OSWorld (50.7%), which is a computer use benchmark, and at this price and speed ($1/$5 MTok input/output) is going to make computer agents much more streamlined and speedy! ChatGPT will loose restrictions; age-gating enables “adult mode” with new personality features coming (X) Sam Altman set X on fire with a thread announcing that ChatGPT will start loosening its restrictions. They're planning to roll out an “adult mode” in December for age-verified users, potentially allowing for things like erotica. More importantly, they're bringing back more customizable personalities, trying to recapture some of the magic of GPT-4.0 that so many people missed. It feels like they're finally ready to treat adults like adults, letting us opt-in to R-rated conversations while keeping strong guardrails for minors. This is a welcome change, and we've been advocating for this for a while, and it's a notable change from the XAI approach I covered last week. Opt in for adults with verification while taking precautions vs engagement bait in the form of a flirty animated waifu with engagement mechanics. Microsoft is making every windows 11 an AI PC with copilot voice input and agentic powers (Blog,X)And in breaking news from this morning, Microsoft announced that every Windows 11 machine is becoming an AI PC. They're building a new Copilot agent directly into the OS that can take over and complete tasks for you. The really clever part? It runs in a secure, sandboxed desktop environment that you can watch and interact with. This solves a huge problem with agents that take over your mouse and keyboard, locking you out of your own computer. Now, you can give the agent a task and let it run in the background while you keep working. This is going to put agentic AI in front of hundreds of millions of users, and it's a massive step towards making AI a true collaborator at the OS level.NVIDIA DGX - the tiny personal supercomputer at $4K (X, LMSYS Blog)NVIDIA finally delivered their promised AI Supercomputer, and while the excitement was in the air with Jensen hand delivering the DGX Spark to OpenAI and Elon (recreating that historical picture when Jensen hand delivered a signed DGX workstation while Elon was still affiliated with OpenAI). The workstation was sold out almost immediately. Folks from LMSys did a great deep dive into specs, all the while, folks on our feeds are saying that if you want to get the maximum possible open source LLMs inference speed, this machine is probably overpriced, compared to what you can get with an M3 Ultra Macbook with 128GB of RAM or the RTX 5090 GPU which can get you similar if not better speeds at significantly lower price points. Anthropic's “Claude Skills”: Your AI Agent Finally Gets a Playbook (Blog)Just when we thought the week couldn't get any more packed, Anthropic dropped “Claude Skills,” a huge upgrade that lets you give your agent custom instructions and workflows. Think of them as expertise folders you can create for specific tasks. For example, you can teach Claude your personal coding style, how to format reports for your company, or even give it a script to follow for complex data analysis.The best part is that Claude automatically detects which “Skill” is needed for a given task, so you don't have to manually load them. This is a massive step towards making agents more reliable and personalized, moving beyond just a single custom instruction and into a library of repeatable, expert processes. It's available now for all paid users, and it's a feature I've been waiting for. Our friend Simon Willison things skills may be a bigger deal than MCPs!

The MAD Podcast with Matt Turck
How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

The MAD Podcast with Matt Turck

Play Episode Listen Later Oct 16, 2025 76:04


What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI's VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn't). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don't), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI? This is the MAD Podcast —AI for the 99%. If you're curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.OpenAIWebsite - https://openai.comX/Twitter - https://x.com/OpenAIJerry TworekLinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56X/Twitter - https://x.com/millionintFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro(01:01) What Reasoning Actually Means in AI(02:32) Chain of Thought: Models Thinking in Words(05:25) How Models Decide Thinking Time(07:24) Evolution from O1 to O3 to GPT-5(11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading(20:32) Working on Robotics and Rubik's Cube Solving(23:02) A Day in the Life: Talking to Researchers(24:06) How Research Priorities Are Determined(26:53) Collaboration vs IP Protection at OpenAI(29:32) Shipping Fast While Doing Deep Research(31:52) Using OpenAI's Own Tools Daily(32:43) Pre-Training Plus RL: The Modern AI Stack(35:10) Reinforcement Learning 101: Training Dogs(40:17) The Evolution of Deep Reinforcement Learning(42:09) When GPT-4 Seemed Underwhelming at First(45:39) How RLHF Made GPT-4 Actually Useful(48:02) Unsupervised vs Supervised Learning(49:59) GRPO and How DeepSeek Accelerated US Research(53:05) What It Takes to Scale Reinforcement Learning(55:36) Agentic AI and Long-Horizon Thinking(59:19) Alignment as an RL Problem(1:01:11) Winning ICPC World Finals Without Specific Training(1:05:53) Applying RL Beyond Math and Coding(1:09:15) The Path from Here to AGI(1:12:23) Pure RL vs Language Models

Podcast Notes Playlist: Latest Episodes
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Podcast Notes Playlist: Latest Episodes

Play Episode Listen Later Oct 7, 2025


Invest Like the Best: Read the notes at at podcastnotes.org. Don't forget to subscribe for free to our newsletter, the top 10 ideas of the week, every Monday --------- My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Podcast Notes Playlist: Business
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Podcast Notes Playlist: Business

Play Episode Listen Later Oct 7, 2025 118:15


Invest Like the Best Key Takeaways  Today, the challenge is not to make the model bigger; the problem is knowing how best to generate and create data in useful domains so that the model gets better at them    AIs do not have to get to digital god mode for AI to have an enormous impact on productivity and society: Even if AI does not become smarter than humans in the short term, the economic value creation boom will still be enormous“If we didn't have the AI boom, the US probably would be behind China and no longer the world hegemon by the end of the decade, if not sooner.” – Dylan Patel The US is doing what China has done historically: dumping tons of capital into something, and then the market becomes If there is a sustained lag in model improvement, the US economy will go into a recession; this is the case for Korea and Taiwan, too  On the AI talent wars: If these companies are willing to spend billions on training runs, it makes sense to spend a lot on talent to optimize those runs and potentially mitigate errors  We actually are not dedicating that much power to AI yet; only 3-4% of total power is going to data centers He is more optimistic on Anthropic than OpenAI; their revenue is accelerating much faster because of their focus on the $2 trillion software market, whereas OpenAI's focus is split between many thingsWhile Meta “has the cards to potentially own it all”, Google is better positioned to dominate the consumer and professional markets Read the full notes @ podcastnotes.orgMy guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Podcast Notes Playlist: Startup
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Podcast Notes Playlist: Startup

Play Episode Listen Later Oct 7, 2025 118:15


Invest Like the Best Key Takeaways  Today, the challenge is not to make the model bigger; the problem is knowing how best to generate and create data in useful domains so that the model gets better at them    AIs do not have to get to digital god mode for AI to have an enormous impact on productivity and society: Even if AI does not become smarter than humans in the short term, the economic value creation boom will still be enormous“If we didn't have the AI boom, the US probably would be behind China and no longer the world hegemon by the end of the decade, if not sooner.” – Dylan Patel The US is doing what China has done historically: dumping tons of capital into something, and then the market becomes If there is a sustained lag in model improvement, the US economy will go into a recession; this is the case for Korea and Taiwan, too  On the AI talent wars: If these companies are willing to spend billions on training runs, it makes sense to spend a lot on talent to optimize those runs and potentially mitigate errors  We actually are not dedicating that much power to AI yet; only 3-4% of total power is going to data centers He is more optimistic on Anthropic than OpenAI; their revenue is accelerating much faster because of their focus on the $2 trillion software market, whereas OpenAI's focus is split between many thingsWhile Meta “has the cards to potentially own it all”, Google is better positioned to dominate the consumer and professional markets Read the full notes @ podcastnotes.orgMy guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Invest Like the Best with Patrick O'Shaughnessy
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Invest Like the Best with Patrick O'Shaughnessy

Play Episode Listen Later Sep 30, 2025 118:15


My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Eye On A.I.
#289 Eiso Kant: How Reinforcement Learning and Coding Could Unlock Human-Level AI

Eye On A.I.

Play Episode Listen Later Sep 24, 2025 54:06


How do we get from today's AI copilots to true human-level intelligence? In this episode of Eye on AI, Craig Smith sits down with Eiso Kant, Co-Founder of Poolside, to explore why reinforcement learning + software development might be the fastest path to human-level AI. Eiso shares Poolside's mission to build AI that doesn't just autocomplete code — but learns like a real developer. You'll hear how Poolside uses reinforcement learning from code execution (RLCF), why software development is the perfect training ground for intelligence, and how agentic AI systems are about to transform the way we build and ship software. If you want to understand the future of AI, software engineering, and AGI, this conversation is packed with insights you won't want to miss. Stay Updated: Craig Smith on X:https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI (00:00) The Missing Ingredient for Human-Level AI(01:02) Eiso Kant's Journey(05:30) Using Software Development to Reach AGI(07:48) Why Coding Is the Perfect Training Ground for Intelligence(10:11)  Reinforcement Learning from Code Execution (RLCF) Explained(13:14) How Poolside Builds and Trains Its Foundation Models(17:35) The Rise of Agentic AI(21:08)  Making Software Creation Accessible to Everyone(26:03) Overcoming Model Limitations(32:08) Training Models to Think(37:24) Building the Future of AI Agents(42:11) Poolside's Full-Stack Approach to AI Deployment(46:28) Enterprise Partnerships, Security & Customization Behind the Firewall(50:48) Giving Enterprises Transparency to Drive Adoption  

Hub Dialogues
How Alberta could lead the AI revolution

Hub Dialogues

Play Episode Listen Later Sep 18, 2025 47:27


Artificial intelligence could fuel Alberta's next big tech boom. Three leaders in the field—Cam Linke, CEO of Amii; Nicole Janssen, co-founder of AltaML; and Danielle Gifford, managing director of AI with PwC—dig into how AI is transforming everything from energy to healthcare and even space. They share why Edmonton is a world leader in reinforcement learning, and why Alberta's natural advantages could make it a global hub for data centres and AI commercialization.    This podcast is generously supported by Don Archibald. The Hub thanks him for his ongoing support.   The Hub is Canada's fastest-growing independent digital news outlet. Subscribe to our YouTube channel to get our latest videos: https://www.youtube.com/@TheHubCanada Subscribe to The Hub's podcast feed to get our best content when you are on the go: https://tinyurl.com/3a7zpd7e (Apple)  https://tinyurl.com/y8akmfn7 (Spotify)   Want more Hub? Get a FREE 3-month trial membership on us: https://thehub.ca/free-trial/ Follow The Hub on X: https://x.com/thehubcanada?lang=en   CREDITS: Falice Chin - Producer and Editor  Ryan Hastman - Host Amal Attar-Guzman - Sound and Video Assistant   To contact us, sign up for updates, and access transcripts email support@thehub.ca

Getting2Alpha
Hansohl Kim: What Is Reinforcement Learning?

Getting2Alpha

Play Episode Listen Later Sep 16, 2025 40:01 Transcription Available


Hansohl Kim is an engineer at Anthropic, where he focuses on reinforcement learning & AI safety for models like Claude. With experience spanning computer science, biotech, & machine learning, he brings a unique perspective to the fast-changing world of artificial intelligence.Listen as Hansohl unpacks the challenges of alignment, the importance of guardrails, & what it takes to design AI systems we can truly trust.RELATED LINKS:

TalkRL: The Reinforcement Learning Podcast
David Abel on the Science of Agency @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Sep 8, 2025 59:42 Transcription Available


David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency.  Featured References  Plasticity as the Mirror of Empowerment   David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  A Definition of Continual RL   David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh  Agency is Frame-Dependent   David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  On the Expressivity of Markov Reward   David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021  Additional References  Bidirectional Communication Theory — Marko 1973  Causality, Feedback and Directed Information — Massey 1990  The Big World Hypothesis — Javed et al. 2024  Loss of plasticity in deep continual learning — Dohare et al. 2024  Three Dogmas of Reinforcement Learning — Abel 2024  Explaining dopamine through prediction errors and beyond — Gershman et al. 2024  David Abel Google Scholar  David Abel personal website  

TalkRL: The Reinforcement Learning Podcast
Jake Beck, Alex Goldie, & Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 19, 2025 12:20 Transcription Available


Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.Featured ReferencesLecture on the Oak Architecture, Rich SuttonAlberta Plan, Rich Sutton with Mike Bowling and Patrick Pilarski Additional ReferencesJacob Beck on Google Scholar Alex Goldie on Google ScholarCornelius Braun on Google ScholarReinforcement Learning Conference

TalkRL: The Reinforcement Learning Podcast
Outstanding Paper Award Winners - 2/2 @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 18, 2025 14:18 Transcription Available


We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References Empirical Reinforcement Learning ResearchMitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functionsAyush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J LimApplications of Reinforcement LearningWOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management StrategiesWilliam Solow, Sandhya Saisubramanian, Alan FernEmerging Topics in Reinforcement LearningTowards Improving Reward Design in RL: A Reward Alignment Metric for RL PractitionersCalarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. TaylorScientific Understanding in Reinforcement LearningMulti-Task Reinforcement Learning Enables Parameter ScalingReginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro

TalkRL: The Reinforcement Learning Podcast
Outstanding Paper Award Winners - 1/2 @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 15, 2025 6:46 Transcription Available


We caught up with the RLC Outstanding Paper award winners for your listening pleasure.  Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References  Scientific Understanding in Reinforcement Learning  How Should We Meta-Learn Reinforcement Learning Algorithms?  Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson  Tooling, Environments, and Evaluation for Reinforcement Learning  Syllabus: Portable Curricula for Reinforcement Learning Agents  Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson  Resourcefulness in Reinforcement Learning  PufferLib 2.0: Reinforcement Learning at 1M steps/s  Joseph Suarez  Theory of Reinforcement Learning  Deep Reinforcement Learning with Gradient  Eligibility Traces  Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White  

TalkRL: The Reinforcement Learning Podcast
Thomas Akam on Model-based RL in the Brain

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 4, 2025 52:06 Transcription Available


Prof Thomas Akam is a Neuroscientist at the Oxford University Department of Experimental Psychology.  He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the Cognitive Circuits research group.Featured ReferencesBrain Architecture for Adaptive BehaviourThomas Akam, RLDM 2025 TutorialAdditional ReferencesThomas Akam on Google ScholarpyPhotometry : Open source, Python based, fiber photometry data acquisition pyControl : Open source, Python based, behavioural experiment control.Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M., Milner, B., Corkin, S., & Teuber, H. L., 1968Internally generated cell assembly sequences in the rat hippocampus, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008Multi-disciplinary Conference on Reinforcement Learning and Decision 2025

Analyse Asia with Bernard Leong
How Microsoft Research Balances Exploration and Impact Globally with Doug Burger

Analyse Asia with Bernard Leong

Play Episode Listen Later Aug 3, 2025 43:46


"If you're going to be running a very elite research institution, you have to have the best people. To have the best people, you have to trust them and empower them. You can't hire a world expert in some area and then tell them what to do. They know more than you do. They're smarter than you are in their area. So you've got to trust your people. One of our really foundational commitments to our people is: we trust you. We're going to work to empower you. Go do the thing that you need to do. If somebody in the labs wants to spend 5, 10, 15 years working on something they think is really important, they're empowered to do that." - Doug Burger Fresh out of the studio, Doug Burger, Technical Fellow and Corporate Vice President at Microsoft Research, joins us to explore Microsoft's bold expansion into Southeast Asia with the recent launch of the Microsoft Research Asia lab in Singapore. From there, Doug shares his accidental journey from academia to leading global research operations, reflecting on how Microsoft Research's open collaboration model empowers over thousands of researchers worldwide to tackle humanity's biggest challenges. Following on, he highlights the recent breakthroughs from Microsoft Research for example, the quantum computing breakthrough with topological qubits, the evolution from lines of code to natural language programming, and how AI is accelerating innovation across multiple scaling dimensions beyond traditional data limits. Addressing the intersection of three computing paradigms—logic, probability, and quantum—he emphasizes that geographic diversity in research labs enables Microsoft to build AI that works for everyone, not just one region. Closing the conversation, Doug shares his vision of what great looks like for Microsoft Research with researchers driven by purpose and passion to create breakthroughs that advance both science and society. Episode Highlights: [00:00] Quote of the Day by Doug Burger [01:08] Doug Burger's journey from academia to Microsoft Research [02:24] Career advice: Always seek challenges, move when feeling restless or comfortable [03:07] Launch of Microsoft Research Asia in Singapore: Tapping local talent and culture for inclusive AI development [04:13] Singapore lab focuses on foundational AI, embodied AI, and healthcare applications [06:19] AI detecting seizures in children and assessing Parkinson's motor function [08:24] Embedding Southeast Asian societal norms and values into Foundational AI research [10:26] Microsoft Research's open collaboration model [12:42] Generative AI's rapid pace accelerating technological innovation and research tools [14:36] AI revolutionizing computer architecture by creating completely new interfaces [16:24] Open versus closed source AI models debate and Microsoft's platform approach [18:08] Reasoning models enabling formal verification and correctness guarantees in AI [19:35] Multiple scaling dimensions in AI beyond traditional data scaling laws [21:01] Project Catapult and Brainwave: Building configurable hardware acceleration platforms [23:29] Microsoft's 17-year quantum computing journey with topological qubits breakthrough [26:26] Balancing blue-sky foundational research with application-driven initiatives at scale [29:16] Three computing paradigms: logic, probability (AI), and quantum superposition [32:26] Microsoft Research's exploration-to-exploitation playbook for breakthrough discoveries [35:26] Research leadership secret: Curiosity across fields enables unexpected connections [37:11] Hidden Mathematical Structures Transformers Architecture in LLMs [40:04] Microsoft Research's vision: Becoming Bell Labs for AI era [42:22] Steering AI models for mental health and critical thinking conversations Profile: Doug Burger, Technical Fellow and Corporate Vice President, Microsoft Research LinkedIn: https://www.linkedin.com/in/dcburger/ Microsoft Research Profile: https://www.microsoft.com/en-us/research/people/dburger/ Podcast Information: Bernard Leong hosts and produces the show. The proper credits for the intro and end music are "Energetic Sports Drive." G. Thomas Craig mixed and edited the episode in both video and audio format. Here are the links to watch or listen to our podcast. Analyse Asia Main Site: https://analyse.asia Analyse Asia Spotify: https://open.spotify.com/show/1kkRwzRZa4JCICr2vm0vGl Analyse Asia Apple Podcasts: https://podcasts.apple.com/us/podcast/analyse-asia-with-bernard-leong/id914868245 Analyse Asia YouTube: https://www.youtube.com/@AnalyseAsia Analyse Asia LinkedIn: https://www.linkedin.com/company/analyse-asia/ Analyse Asia X (formerly known as Twitter): https://twitter.com/analyseasia Analyse Asia Threads: https://www.threads.net/@analyseasia Sign Up for Our This Week in Asia Newsletter: https://www.analyse.asia/#/portal/signup Subscribe Newsletter on LinkedIn https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7149559878934540288

GOTO - Today, Tomorrow and the Future
Prompt Engineering for Generative AI • James Phoenix, Mike Taylor & Phil Winder

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Aug 1, 2025 53:33 Transcription Available


This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereJames Phoenix - Co-Author of "Prompt Engineering for Generative AI"Mike Taylor - Co-Author of "Prompt Engineering for Generative AI"Phil Winder - Author of "Reinforcement Learning" & CEO of Winder.AIRESOURCESJameshttps://x.com/jamesaphoenix12https://www.linkedin.com/in/jamesphoenixhttps://understandingdata.comMikehttp://saxifrage.xyzhttps://twitter.com/hammer_mthttps://www.linkedin.com/in/mjt145Philhttps://twitter.com/DrPhilWinderhttps://linkedin.com/in/drphilwinderhttps://winder.aiLinkshttps://brightpool.devhttps://karpathy.aihttps://help.openai.com/en/articles/6654000https://gemini.google.comhttps://dreambooth.github.iohttps://github.com/microsoft/LoRAhttps://claude.aihttps://www.langchain.com/langgraphDESCRIPTIONLarge language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. Because they have been trained on all the public text and images on the internet, they can make useful contributions to a wide variety of tasks. And with the barrier to entry greatly reduced today, practically any developer can harness LLMs and diffusion models to tackle problems previously unsuitable for automation.With this book, you'll gain a solid foundation in generative AI, including how to apply these models in practice. When first integrating LLMs and diffusion models into their workflows, most developers struggle to coax reliable enough results from them to use in automated systems.* Book description: © O'ReillyRECOMMENDED BOOKSJames Phoenix & Mike Taylor • Prompt Engineering for Generative AIPhil WiBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

Not Another Politics Podcast
MechaHitler and The Political Bias of AI Chatbots

Not Another Politics Podcast

Play Episode Listen Later Jul 24, 2025 57:28


When you ask ChatGPT or Gemini a question about politics, whose opinions are you really hearing?In this episode, we dive into a provocative new study from political scientist Justin Grimmer and his colleagues, which finds that nearly every major large language model—from ChatGPT to Grok—is perceived by Americans as having a left-leaning bias. But why is that? Is it the training data? The guardrails? The Silicon Valley engineers? Or something deeper about the culture of the internet itself?The hosts grapple with everything from “Mecha Hitler” incidents on Grok to the way terms like “unhoused” sneak into AI-generated text—and what that might mean for students, voters, and future regulation. Should the government step in to ensure “political neutrality”? Will AI reshape how people learn about history or policy? Or are we just projecting our own echo chambers onto machines?

TalkRL: The Reinforcement Learning Podcast
Stefano Albrecht on Multi-Agent RL @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Jul 22, 2025 31:34 Transcription Available


Stefano V. Albrecht was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup Deepflow. He is a Program Chair of RLDM 2025 and is co-author of the MIT Press textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches".Featured ReferencesMulti-Agent Reinforcement Learning: Foundations and Modern ApproachesStefano V. Albrecht,  Filippos Christianos,  Lukas SchäferMIT Press, 2024RLDM 2025: Reinforcement Learning and Decision Making ConferenceDublin, IrelandEPyMARL: Extended Python MARL frameworkhttps://github.com/uoe-agents/epymarlBenchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksGeorgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht

The MAD Podcast with Matt Turck
Ex‑DeepMind Researcher Misha Laskin on Enterprise Super‑Intelligence | Reflection AI

The MAD Podcast with Matt Turck

Play Episode Listen Later Jul 17, 2025 66:29


What if your company had a digital brain that never forgot, always knew the answer, and could instantly tap the knowledge of your best engineers, even after they left? Superintelligence can feel like a hand‑wavy pipe‑dream— yet, as Misha Laskin argues, it becomes a tractable engineering problem once you scope it to the enterprise level. Former DeepMind researcher Laskin is betting on an oracle‑like AI that grasps every repo, Jira ticket and hallway aside as deeply as your principal engineer—and he's building it at Reflection AI.In this wide‑ranging conversation, Misha explains why coding is the fastest on‑ramp to superintelligence, how “organizational” beats “general” when real work is on the line, and why today's retrieval‑augmented generation (RAG) feels like “exploring a jungle with a flashlight.” He walks us through Asimov, Reflection's newly unveiled code‑research agent that fuses long‑context search, team‑wide memory and multi‑agent planning so developers spend less time spelunking for context and more time shipping.We also rewind his unlikely journey—from physics prodigy in a Manhattan‑Project desert town, to Berkeley's AI crucible, to leading RLHF for Google Gemini—before he left big‑lab comfort to chase a sharper vision of enterprise super‑intelligence. Along the way: the four breakthroughs that unlocked modern AI, why capital efficiency still matters in the GPU arms‑race, and how small teams can lure top talent away from nine‑figure offers.If you're curious about the next phase of AI agents, the future of developer tooling, or the gritty realities of scaling a frontier‑level startup—this episode is your blueprint.Reflection AIWebsite - https://reflection.aiLinkedIn - https://www.linkedin.com/company/reflectionaiMisha LaskinLinkedIn - https://www.linkedin.com/in/mishalaskinX/Twitter - https://x.com/mishalaskinFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:42) Reflection AI: Company Origins and Mission (04:14) Making Superintelligence Concrete (06:04) Superintelligence vs. AGI: Why the Goalposts Moved (07:55) Organizational Superintelligence as an Oracle (12:05) Coding as the Shortcut: Hands, Legs & Brain for AI (16:00) Building the Context Engine (20:55) Capturing Tribal Knowledge in Organizations (26:31) Introducing Asimov: A Deep Code Research Agent (28:44) Team-Wide Memory: Preserving Institutional Knowledge (33:07) Multi-Agent Design for Deep Code Understanding (34:48) Data Retrieval and Integration in Asimov (38:13) Enterprise-Ready: VPC and On-Prem Deployments (39:41) Reinforcement Learning in Asimov's Development (41:04) Misha's Journey: From Physics to AI (42:06) Growing Up in a Science-Driven Desert Town (53:03) Building General Agents at DeepMind (56:57) Founding Reflection AI After DeepMind (58:54) Product-Driven Superintelligence: Why It Matters (01:02:22) The State of Autonomous Coding Agents (01:04:26) What's Next for Reflection AI

KI in der Industrie
How to schedule the shopfloor with AI

KI in der Industrie

Play Episode Listen Later Jul 9, 2025 38:17 Transcription Available


Rico Knapper is the CEO of Pailot and loves PCBs shopfloors. Why? Because his AI based solution outperforms other approaches. We met him at Siemens.

alphalist.CTO Podcast - For CTOs and Technical Leaders
#124 - The Path to AGI: Inside poolside's AI Model Factory for Code with Eiso Kant

alphalist.CTO Podcast - For CTOs and Technical Leaders

Play Episode Listen Later Jun 27, 2025 63:56 Transcription Available


How do you build a foundation model that can write code at a human level? Eiso Kant (CTO & co-founder, Poolside) reveals the technical architecture, distributed team strategies, and reinforcement learning breakthroughs powering one of Europe's most ambitious AI startups. Learn how Poolside operates 10,000+ H200s, runs the world's largest code execution RL environment, and why CTOs must rethink engineering orgs for an agent-driven future.

TalkRL: The Reinforcement Learning Podcast
Satinder Singh: The Origin Story of RLDM @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Jun 25, 2025 5:57 Transcription Available


Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.Featured ReferencesRLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)June 11-14, 2025 at Trinity College Dublin, IrelandSatinder Singh on Google Scholar

Father Fessio in Five (by Ignatius Press)
109: Reinforcement Learning—The Final Step of Making A.I.

Father Fessio in Five (by Ignatius Press)

Play Episode Listen Later Jun 20, 2025 10:01


The final step of making A.I. requires giving the system certain questions we know the answer to and some questions we do not know the answer to and then checking their answers against reality. Fr. Fessio explains how A.I. ultimately depends entirely on humans and thereby cannot self-replicate.

Eye On A.I.
#261 Jonathan Frankle: How Databricks is Disrupting AI Model Training

Eye On A.I.

Play Episode Listen Later Jun 12, 2025 52:47


This episode is sponsored by Oracle. OCI is the next-generation cloud designed for every workload – where you can run any application, including any AI projects, faster and more securely for less. On average, OCI costs 50% less for compute, 70% less for storage, and 80% less for networking. Join Modal, Skydance Animation, and today's innovative AI tech companies who upgraded to OCI…and saved.   Try OCI for free at http://oracle.com/eyeonai   What if you could fine-tune an AI model without any labeled data—and still outperform traditional training methods?   In this episode of Eye on AI, we sit down with Jonathan Frankle, Chief Scientist at Databricks and co-founder of MosaicML, to explore TAO (Test-time Adaptive Optimization)—Databricks' breakthrough tuning method that's transforming how enterprises build and scale large language models (LLMs).   Jonathan explains how TAO uses reinforcement learning and synthetic data to train models without the need for expensive, time-consuming annotation. We dive into how TAO compares to supervised fine-tuning, why Databricks built their own reward model (DBRM), and how this system allows for continual improvement, lower inference costs, and faster enterprise AI deployment.   Whether you're an AI researcher, enterprise leader, or someone curious about the future of model customization, this episode will change how you think about training and deploying AI.   Explore the latest breakthroughs in data and AI from Databricks: https://www.databricks.com/events/dataaisummit-2025-announcements Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI  

Crazy Wisdom
Episode #465: Proof of Aliveness: A Cryptographic Theater of the Real

Crazy Wisdom

Play Episode Listen Later Jun 2, 2025 61:11


I, Stewart Alsop, am thrilled to welcome Xathil of Poliebotics to this episode of Crazy Wisdom, for what is actually our second take, this time with a visual surprise involving a fascinating 3D-printed Bauta mask. Xathil is doing some truly groundbreaking work at the intersection of physical reality, cryptography, and AI, which we dive deep into, exploring everything from the philosophical implications of anonymity to the technical wizardry behind his "Truth Beam."Check out this GPT we trained on the conversationTimestamps01:35 Xathil explains the 3D-printed Bauta Mask, its Venetian origins, and its role in enabling truth through anonymity via his project, Poliepals.04:50 The crucial distinction between public identity and "real" identity, and how pseudonyms can foster truth-telling rather than just conceal.10:15 Addressing the serious risks faced by crypto influencers due to public displays of wealth and the broader implications for online identity.15:05 Xathil details the core Poliebotics technology: the "Truth Beam," a projector-camera system for cryptographically timestamping physical reality.18:50 Clarifying the concept of "proof of aliveness"—verifying a person is currently live in a video call—versus the more complex "proof of liveness."21:45 How the speed of light provides a fundamental advantage for Poliebotics in outmaneuvering AI-generated deepfakes.32:10 The concern of an "inversion," where machine learning systems could become dominant over physical reality by using humans as their actuators.45:00 Xathil's ambitious project to use Poliebotics for creating cryptographically verifiable records of biodiversity, beginning with an enhanced Meles trap.Key InsightsAnonymity as a Truth Catalyst: Drawing from Oscar Wilde, the Bauta mask symbolizes how anonymity or pseudonyms can empower individuals to reveal deeper, more authentic truths. This challenges the notion that masks only serve to hide, suggesting they can be tools for genuine self-expression.The Bifurcation of Identity: In our digital age, distinguishing between one's core "real" identity and various public-facing personas is increasingly vital. This separation isn't merely about concealment but offers a space for truthful expression while navigating public life.The Truth Beam: Anchoring Reality: Poliebotics' "Truth Beam" technology employs a projector-camera system to cast cryptographic hashes onto physical scenes, recording them and anchoring them to a blockchain. This aims to create immutable, verifiable records of reality to combat the rise of sophisticated deepfakes.Harnessing Light Speed Against Deepfakes: The fundamental defense Poliebotics offers against AI-generated fakes is the speed of light. Real-world light reflection for capturing projected hashes is virtually instantaneous, whereas an AI must simulate this complex process, a task too slow to keep up with real-time verification.The Specter of Humans as AI Actuators: A significant future concern is the "inversion," where AI systems might utilize humans as unwitting agents to achieve their objectives in the physical world. By manipulating incentives, AIs could effectively direct human actions, raising profound questions about agency.Towards AI Symbiosis: The ideal future isn't a human-AI war or complete technological asceticism, but a cooperative coexistence between nature, humanity, and artificial systems. This involves developing AI responsibly, instilling human values, and creating systems that are non-threatening and beneficial.Contact Information*   Polybotics' GitHub*   Poliepals*   Xathil: Xathil@ProtonMail.com

Crazy Wisdom
Episode #464: From Meme Coins to Mind Melds: Crypto Meets AI

Crazy Wisdom

Play Episode Listen Later May 26, 2025 48:22


I, Stewart Alsop, had a fascinating conversation on this episode of Crazy Wisdom with Mallory McGee, the founder of Chroma, who is doing some really interesting work at the intersection of AI and crypto. We dove deep into how these two powerful technologies might reshape the internet and our interactions with it, moving beyond the hype cycles to what's truly foundational.Check out this GPT we trained on the conversationTimestamps00:00 The Intersection of AI and Crypto01:28 Bitcoin's Origins and Austrian Economics04:35 AI's Centralization Problem and the New Gatekeepers09:58 Agent Interactions and Decentralized Databases for Trustless Transactions11:11 AI as a Prosthetic Mind and the Interpretability Challenge15:12 Deterministic Blockchains vs. Non-Deterministic AI Intents18:44 The Demise of Traditional Apps in an Agent-Driven World35:07 Property Rights, Agent Registries, and Blockchains as BackendsKey InsightsCrypto's Enduring Fundamentals: Mallory emphasized that while crypto prices are often noise, the underlying fundamentals point to a new, long-term cycle for the Internet itself. It's about decentralizing control, a core principle stemming from Bitcoin's original blend of economics and technology.AI's Centralization Dilemma: We discussed the concerning trend of AI development consolidating power within a few major players. This, as Mallory pointed out, ironically mirrors the very centralization crypto aims to dismantle, potentially shifting control from governments to a new set of tech monopolies.Agents are the Future of Interaction: Mallory envisions a future where most digital interactions aren't human-to-LLM, but agent-to-agent. These autonomous agents will require decentralized, trustless platforms like blockchains to transact, hold assets, and communicate confidentially.Bridging Non-Deterministic AI with Deterministic Blockchains: A fascinating challenge Mallory highlighted is translating the non-deterministic "intents" of AI (e.g., an agent's goal to "get me a good return on spare cash") into the deterministic transactions required by blockchains. This translation layer is crucial for agents to operate effectively on-chain.The Decline of Traditional Apps: Mallory made a bold claim that traditional apps and web interfaces are on their way out. As AI agents become capable of generating personalized interfaces on the fly, the need for standardized, pre-built apps will diminish, leading to a world where software is hyper-personalized and often ephemeral.Blockchains as Agent Backbones: We explored the intriguing idea that blockchains might be inherently better suited for AI agents than for direct human use. Their deterministic nature, ability to handle assets, and potential for trustless reputation systems make them ideal backends for an agent-centric internet.Trust and Reputation for Agents: In a world teeming with AI agents, establishing trust is paramount. Mallory suggested that on-chain mechanisms like reward and slashing systems can be used to build verifiable reputation scores for agents, helping us discern trustworthy actors from malicious ones without central oversight.The Battle for an Open AI Future: The age-old battle between open and closed source is playing out again in the AI sphere. While centralized players currently seem to dominate, Mallory sees hope in the open-source AI movement, which could provide a crucial alternative to a future controlled by a few large entities.Contact Information*   Twitter: @McGee_noodle*   Company: Chroma

Stanford Legal
AI, Liability, and Hallucinations in a Changing Tech and Law Environment

Stanford Legal

Play Episode Listen Later May 15, 2025 39:31


Since ChatGPT came on the scene, numerous incidents have surfaced involving attorneys submitting court filings riddled with AI-generated hallucinations—plausible-sounding case citations that purport to support key legal propositions but are, in fact, entirely fictitious. As sanctions against attorneys mount, it seems clear there are a few kinks in the tech. Even AI tools designed specifically for lawyers can be prone to hallucinations. In this episode, we look at the potential and risks of AI-assisted tech in law and policy with two Stanford Law researchers at the forefront of this issue: RegLab Director Professor Daniel Ho and JD/PhD student and computer science researcher Mirac Suzgun. Together with several co-authors, they examine the emerging risks in two recent papers, “Profiling Legal Hallucinations in Large Language Models” (Oxford Journal of Legal Analysis, 2024) and the forthcoming “Hallucination-Free?” in the Journal of Empirical Legal Studies. Ho and Suzgun offer new insights into how legal AI is working, where it's failing, and what's at stake.Links:Daniel Ho  >>> Stanford Law pageStanford Institute for Human-Centered Artificial Intelligence (HAI) >>> Stanford University pageRegulation, Evaluation, and Governance Lab (RegLab) >>> Stanford University pageConnect:Episode Transcripts >>> Stanford Legal Podcast WebsiteStanford Legal Podcast >>> LinkedIn PageRich Ford >>>  Twitter/XPam Karlan >>> Stanford Law School PageStanford Law School >>> Twitter/XStanford Lawyer Magazine >>> Twitter/X (00:00:00) Introduction to AI in Legal Education (00:05:01) AI Tools in Legal Research and Writing(00:12:01) Challenges of AI-Generated Content (00:20:0) Reinforcement Learning with Human Feedback(00:30:01) Audience Q&A

Let's Talk AI
#208 - Claude Integrations, ChatGPT Sycophancy, Leaderboard Cheats

Let's Talk AI

Play Episode Listen Later May 8, 2025 115:25 Transcription Available


Our 208th episode with a summary and discussion of last week's big AI news! Recorded on 05/02/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI showcases new integration capabilities in their API, enhancing the performance of LLMs and image generators with updated functionalities and improved user interfaces. Analysis of OpenAI's preparedness framework reveals updates focusing on biological and chemical risks, cybersecurity, and AI self-improvement, while tone down the emphasis on persuasion capabilities. Anthropic's research highlights potential security vulnerabilities in AI models, demonstrating various malicious use cases such as influence operations and hacking tool creation. A detailed examination of AI competition between the US and China reveals China's impending capability to match the US in AI advancement this year, emphasizing the impact of export controls and the importance of geopolitical strategy. Timestamps + Links: Tools & Apps (00:02:57) Anthropic lets users connect more apps to Claude (00:08:20) OpenAI undoes its glaze-heavy ChatGPT update (00:15:16) Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost (00:19:44) Adobe adds more image generators to its growing AI family (00:24:35) OpenAI makes its upgraded image generator available to developers (00:27:01) xAI's Grok chatbot can now ‘see' the world around it Applications & Business: (00:28:41) Thinking Machines Lab CEO Has Unusual Control in Andreessen-Led Deal (00:33:36) Chip war heats up: Huawei 910C emerges as China's answer to US export bans (00:34:21) Huawei to Test New AI Chip (00:40:17) ByteDance, Alibaba and Tencent stockpile billions worth of Nvidia chips (00:43:59) Speculation mounts that Musk will raise tens of billions for AI supercomputer with 1 million GPUs: Report Projects & Open Source: (00:47:14) Alibaba unveils Qwen 3, a family of ‘hybrid' AI reasoning models (00:54:14) Intellect-2 (01:02:07) BitNet b1.58 2B4T Technical Report (01:05:33) Meta AI Introduces Perception Encoder: A Large-Scale Vision Encoder that Excels Across Several Vision Tasks for Images and Video Research & Advancements: (01:06:42) The Leaderboard Illusion (01:12:08) Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (01:18:38) Reinforcement Learning for Reasoning in Large Language Models with One Training Example (01:24:40) Sleep-time Compute: Beyond Inference Scaling at Test-time Policy & Safety: (01:28:23) Every AI Datacenter Is Vulnerable to Chinese Espionage, Report Says (01:32:27) OpenAI preparedness framework update (01:38:31) Detecting and Countering Malicious Uses of Claude: March 2025 (01:46:33) Chinese AI Will Match America's

Machine Learning Guide
MLG 034 Large Language Models 1

Machine Learning Guide

Play Episode Listen Later May 7, 2025 50:48


Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at ocdevel.com/mlg/mlg34 Build the future of multi-agent software with AGNTCY Try a walking desk stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.

Chain Reaction
Sam Lehman: What the Reinforcement Learning Renaissance Means for Decentralized AI

Chain Reaction

Play Episode Listen Later Apr 30, 2025 68:02


Join Tommy Shaughnessy from Delphi Ventures as he hosts Sam Lehman, Principal at Symbolic Capital and AI researcher, for a deep dive into the Reinforcement Learning (RL) renaissance and its implications for decentralized AI. Sam recently authored a widely discussed post, "The World's RL Gym", exploring the evolution of AI scaling and the exciting potential of decentralized networks for training next-generation models.

The World's RL Gym: https://www.symbolic.capital/writing/the-worlds-rl-gym

BlockHash: Exploring the Blockchain
Ep. 507 Dr. Alexander Long | Decentralizing AI with Pluralis Research

BlockHash: Exploring the Blockchain

Play Episode Listen Later Apr 14, 2025 19:59


For episode 507, Brandon Zemp is joined by the Founder of Pluralis Research Dr. Alexander Long. He was previously an AI Researcher at Amazon in a team of 14 Deep Learning PhDs. At Amazon, Dr Long's research focus was in retrieval augmentation and sample-efficient adaptation of large multi-modal foundation models. At UNSW his PhD was on sample efficient Reinforcement Learning and non-parametric memory in Deep Learning, where he was the School Nominee for the Malcolm Chaikin Prize (UNSW Best Thesis).Pluralis Research is pioneering Protocol Learning, an alternative to today's closed AI models and economically unsustainable open-source initiatives. Protocol Learning enables collaborative model training by pooling computational resources across multiple participants, while ensuring no single entity can obtain the complete model.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 8, 2025 51:45


Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori's two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori's performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.

Sales vs. Marketing
Lessons - Breaking Free From Bad Habits | Dr. Jud Brewer - Neuroscience of Addiction Expert

Sales vs. Marketing

Play Episode Listen Later Mar 25, 2025 13:58


➡️ Like The Podcast? Leave A Rating: https://ratethispodcast.com/successstory  In this "Lessons" episode, Dr. Jud Brewer, Neuroscience of Addiction Expert, reveals the science behind habits and addictions, explaining how our brains form automatic behaviors to conserve energy and how reinforcement learning reinforces unhealthy patterns. By learning to recognize the true rewards of our actions, Dr. Brewer shows us how to transform negative routines into opportunities for healthier change. ➡️ Show Linkshttps://successstorypodcast.com  YouTube: https://youtu.be/PpI2aFjA9FUApple: https://podcasts.apple.com/us/podcast/dr-judson-brewer-neuroscientist-addiction-psychiatrist/id1484783544Spotify: https://open.spotify.com/episode/531cPamqo4H0Esq6Yp8RQ3 ➡️ Watch the Podcast On Youtubehttps://www.youtube.com/c/scottdclary

Tabaghe 16 طبقه
172 - Parinaz Sobhani | هوش مصنوعی، سرمایه‌گذاری و آینده‌ای که نزدیک‌تر از فکر ماست

Tabaghe 16 طبقه

Play Episode Listen Later Mar 24, 2025 84:02


پریناز سبحانی از اون آدم‌های خیلی خاص تو دنیای هوش مصنوعیه که وقتی داستان حرفه‌ای‌شو می‌شنوی، واقعاً تحت تأثیر قرار می‌گیری. الان مدیر ارشد هوش مصنوعی در شرکت سرمایه‌گذاری ساگارده؛ یه شرکت بزرگ بین‌المللی با بیش از ۲۵ میلیارد دلار دارایی که داره حسابی روی آینده سرمایه‌گذاری با کمک هوش مصنوعی تمرکز می‌کنه.پریناز دکترای هوش مصنوعی داره از دانشگاه اتاوا و تو این سال‌ها، هم تو دانشگاه و هم صنعت، حسابی تجربه جمع کرده، مخصوصاً تو حوزه‌هایی مثل پردازش زبان طبیعی و یادگیری عمیق. جالبه بدونی که قبل‌تر تو مایکروسافت ریسرچ و شورای ملی تحقیقات کانادا هم کار کرده و روی پروژه‌هایی مثل ترجمه ماشینی و مدل‌های یادگیری عمیق برای پردازش متن کار کرده.00:00 پیشگفتار08:01 علاقه به ماشین لرنینگ و فوق لیسانس هوش مصنوعی شریف11:27 دیپ لرنینگ و شبکه‌های عصبی مغز15:32 تفاوت رویکردهای محاسباتی سنتی و یادگیری عمیق در هوش مصنوعی23:03 تحول اصطلاحات: از ماشین لرنینگ تا هوش مصنوعی و دیتا ساینس25:44 دیگر شاخه‌های هوش مصنوعی که باید بشناسید31:43 یادگیری تقویتی (Reinforcement Learning): چالش‌ها و روش‌های شبیه‌سازی47:23 جعبه سیاه هوش مصنوعی؟ محدودیت درک انسان در برابر پیچیدگی‌های هوش مصنوعی57:32 دلیل ورود به سرمایه‌گذاری خطرپذیر1:09:05 فرصت‌های کارآفرینی در حوزه هوش مصنوعی و حل مسائل موجود1:16:44 توضیح پتانسیل هوش مصنوعی برای افراد غیر متخصصParinaz Sobhani is a distinguished figure in artificial intelligence, currently serving as the Head of AI at Sagard, a global alternative asset management firm with over $25 billion in assets under management. With a Ph.D. in AI from the University of Ottawa, specializing in natural language processing, she has amassed over 15 years of experience in both academic and industry settings.اسپانسر این قسمتسعادت رنتاجاره ماشین لوکس در دبی، بدون پیش‌پرداخت و با بیمه کامل، راحت و سریع. https://www.saadatrent.com?ref_id=Tabaghe16شرکت ارائه‌دهنده خدمات میزبانی وب - لیموهاستhttps://limoo.hostاطلاعات بیشتر درباره پادکست طبقه ۱۶ و لینک پادکست‌‌های صوتیhttps://linktr.ee/tabaghe16 Hosted on Acast. See acast.com/privacy for more information.

Programming Throwdown
180: Reinforcement Learning

Programming Throwdown

Play Episode Listen Later Mar 17, 2025 112:22


Intro topic: GrillsNews/Links:You can't call yourself a senior until you've worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I've ever used — here's whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF &  GRPO ★ Support this podcast on Patreon ★