Podcasts about Reinforcement learning

  • 327PODCASTS
  • 734EPISODES
  • 45mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Dec 29, 2025LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about Reinforcement learning

Latest podcast episodes about Reinforcement learning

Crazy Wisdom
Episode #518: Decentralization Without Romance: Incentives, Mesh Networks, and Practical Crypto

Crazy Wisdom

Play Episode Listen Later Dec 29, 2025 69:07


In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with Mike Bakon to explore the fascinating intersection of hardware hacking, blockchain technology, and decentralized systems. Their conversation spans from Mike's childhood fascination with taking apart electronics in 1980s Poland to his current work with ESP32 microcontrollers, LoRa mesh networks, and Cardano blockchain development. They discuss the technical differences between UTXO and account-based blockchains, the challenges of true decentralization versus hybrid systems, and how AI tools are changing the development landscape. Mike shares his vision for incentivizing mesh networks through blockchain technology and explains why he believes mass adoption of decentralized systems will come through abstraction rather than technical education. The discussion also touches on the potential for creating new internet infrastructure using ad hoc mesh networks and the importance of maintaining truly decentralized, permissionless systems in an increasingly surveilled world. You can find Mike in Twitter as @anothervariable.Check out this GPT we trained on the conversationTimestamps00:00 Introduction to Hardware and Early Experiences02:59 The Evolution of AI in Hardware Development05:56 Decentralization and Blockchain Technology09:02 Understanding UTXO vs Account-Based Blockchains11:59 Smart Contracts and Their Functionality14:58 The Importance of Decentralization in Blockchain17:59 The Process of Data Verification in Blockchain20:48 The Future of Blockchain and Its Applications34:38 Decentralization and Trustless Systems37:42 Mainstream Adoption of Blockchain39:58 The Role of Currency in Blockchain43:27 Interoperability vs Bridging in Blockchain47:27 Exploring Mesh Networks and LoRa Technology01:00:25 The Future of AI and DecentralizationKey Insights1. Hardware curiosity drives innovation from childhood - Mike's journey into hardware began as a child in 1980s Poland, where he would disassemble toys like battery-powered cars to understand how they worked. This natural curiosity about taking things apart and understanding their inner workings laid the foundation for his later expertise in microcontrollers like the ESP32 and his deep understanding of both hardware and software integration.2. AI as a research companion, not a replacement for coding - Mike uses AI and LLMs primarily as research tools and coding companions rather than letting them write entire applications. He finds them invaluable for getting quick answers to coding problems, analyzing Git repositories, and avoiding the need to search through Stack Overflow, but maintains anxiety when AI writes whole functions, preferring to understand and write his own code.3. Blockchain decentralization requires trustless consensus verification - The fundamental difference between blockchain databases and traditional databases lies in the consensus process that data must go through before being recorded. Unlike centralized systems where one entity controls data validation, blockchains require hundreds of nodes to verify each block through trustless consensus mechanisms, ensuring data integrity without relying on any single authority.4. UTXO vs account-based blockchains have fundamentally different architectures - Cardano uses an extended UTXO model (like Bitcoin but with smart contracts) where transactions consume existing UTXOs and create new ones, keeping the ledger lean. Ethereum uses account-based ledgers that store persistent state, leading to much larger data requirements over time and making it increasingly difficult for individuals to sync and maintain full nodes independently.5. True interoperability differs fundamentally from bridging - Real blockchain interoperability means being able to send assets directly between different blockchains (like sending ADA to a Bitcoin wallet) without intermediaries. This is possible between UTXO-based chains like Cardano and Bitcoin. Bridges, in contrast, require centralized entities to listen for transactions on one chain and trigger corresponding actions on another, introducing centralization risks.6. Mesh networks need economic incentives for sustainable infrastructure - While technologies like LoRa and Meshtastic enable impressive decentralized communication networks, the challenge lies in incentivizing people to maintain the hardware infrastructure. Mike sees potential in combining blockchain-based rewards (like earning ADA for running mesh network nodes) with existing decentralized communication protocols to create self-sustaining networks.7. Mass adoption comes through abstraction, not education - Rather than trying to educate everyone about blockchain technology, mass adoption will happen when developers can build applications on decentralized infrastructure that users interact with seamlessly, without needing to understand the underlying blockchain mechanics. Users should be able to benefit from decentralization through well-designed interfaces that abstract away the complexity of wallets, addresses, and consensus mechanisms.

The Dr. Jud Podcast
Habit Change Addiction - Reinforcement Learning and Addiction: A Digital Mindfulness Solution

The Dr. Jud Podcast

Play Episode Listen Later Dec 20, 2025 13:52


App-Based Mindfulness Training Predicts Reductions in Smoking Behavior by Engaging Reinforcement Learning Mechanisms: A Preliminary Naturalistic Single-Arm StudyIn this episode, Dr. Jud Brewer and colleagues explore how mindfulness-based smoking cessation tools can target the brain's reinforcement learning mechanisms to disrupt addictive behaviors. The study highlights the use of the "Craving to Quit" app, which combines mindful awareness practices with real-time feedback on cravings and their outcomes. By recalibrating the reward value of smoking through mindfulness, the app achieved significant reductions in smoking frequency among participants. Discover how this research advances our understanding of habit loops and offers scalable, innovative solutions for smoking cessation.Full Reference:Taylor, V. A., Smith, R., & Brewer, J. A. (2022). App-based mindfulness training predicts reductions in smoking behavior by engaging reinforcement learning mechanisms: A preliminary naturalistic single-arm study. Sensors, 22(14), 5131. https://doi.org/10.3390/s22145131Let's connect on Instagram

Unsupervised Learning
AI Vibe Check: The Actual Bottleneck In Research, SSI's Mystique, & Spicy 2026 Predictions

Unsupervised Learning

Play Episode Listen Later Dec 18, 2025 78:04


Ari Morcos and Rob Toews return for their spiciest conversation yet. Fresh from NeurIPS, they debate whether models are truly plateauing or if we're just myopically focused on LLMs while breakthroughs happen in other modalities.They reveal why infinite capital at labs may actually constrain innovation, explain the narrow "Goldilocks zone" where RL actually works, and argue why U.S. chip restrictions may have backfired catastrophically—accelerating China's path to self-sufficiency by a decade. The conversation covers OpenAI's code red moment and structural vulnerabilities, the mystique surrounding SSI and Ilya's "two words," and why the real bottleneck in AI research is compute, not ideas.The episode closes with bold 2026 predictions: Rob forecasts Sam Altman won't be OpenAI's CEO by year-end, while Ari gives 50%+ odds a Chinese open-source model will be the world's best at least once next year. (0:00) Intro(1:51) Reflections on NeurIPS Conference(5:14) Are AI Models Plateauing?(11:12) Reinforcement Learning and Enterprise Adoption(16:16) Future Research Vectors in AI(28:40) The Role of Neo Labs(39:35) The Myth of the Great Man Theory in Science(41:47) OpenAI's Code Red and Market Position(47:19) Disney and OpenAI's Strategic Partnership(51:28) Meta's Super Intelligence Team Challenges(54:33) US-China AI Chip Dynamics(1:00:54) Amazon's Nova Forge and Enterprise AI(1:03:38) End of Year Reflections and Predictions With your co-hosts:@jacobeffron  - Partner at Redpoint, Former PM Flatiron Health@patrickachase  - Partner at Redpoint, Former ML Engineer LinkedIn@ericabrescia  - Former COO Github, Founder Bitnami (acq'd by VMWare)@jordan_segall  - Partner at Redpoint

AI Unraveled: Latest AI News & Trends, Master GPT, Gemini, Generative AI, LLMs, Prompting, GPT Store

Welcome back to AI Unraveled, your strategic briefing on the business impact of artificial intelligence.Today, we are doing something different. We are skipping the daily news cycle to focus on a single, massive piece of research that just dropped from a powerhouse team at Stanford, Princeton, Harvard, and the University of Washington that proposes the first proper taxonomy for Agentic AI Adaptation.If you are building or scaling agent-based systems, this is your new mental model. The researchers argue that almost all advanced agentic systems—despite their complexity—boil down to just four basic feedback loops. We explore the "4-Bucket" framework (A1, A2, T1, T2) and explain the critical trade-offs between changing the agent versus changing the tools.Key Topics:Intro: Why "learning from feedback" is the definition of adaptation.The Definition: What actually counts as "Agentic AI"?Bucket A1 (Agent + Tool Outcome): Updating the agent based on whether code ran or queries succeeded.Bucket A2 (Agent + Output Eval): Updating the agent based on human feedback or automated scoring.Bucket T1 (Frozen Agent + Trained Tools): Keeping the LLM fixed while optimizing retrievers and external models.Bucket T2 (Frozen Agent + Agent-Supervised Tools): Using the agent's own signals to tune its toolkit.Trade-offs: Cost vs. Flexibility in modern system design.Links & Resources:Read the Paper: Adaptation Strategies for Agentic AI Systems (GitHub): https://github.com/pat-jj/Awesome-Adaptation-of-Agentic-AI/blob/main/paper.pdfKeywords: Agentic AI, AI Taxonomy, AI Research, Stanford AI, Princeton AI, Large Language Models, LLM Agents, Reinforcement Learning, Tool Use, RAG, A1 A2 T1 T2, AI Adaptation, Etienne Noumen, AI Unraveled.

Unsupervised Learning
Ep 79: OpenAI's Head of Product on How the Best Teams Build, Ship and Scale AI Products

Unsupervised Learning

Play Episode Listen Later Dec 10, 2025 56:16


This episode features Olivier Godement, Head of Product for Business Products at OpenAI, discussing the current state and future of AI adoption in enterprises, with a particular focus on the recent releases of GPT 5.1 and Codex. The conversation explores how these models are achieving meaningful automation in specific domains like coding, customer support, and life sciences: where companies like Amgen are using AI to accelerate drug development timelines from months to weeks through automated regulatory documentation. Olivier reveals that while complete job automation remains challenging and requires substantial scaffolding, harnesses, and evaluation frameworks, certain use cases like coding are reaching a tipping point where engineers would "riot" if AI tools were taken away. The discussion covers the importance of cost reduction in unlocking new use cases, the emerging significance of reinforcement fine-tuning (RFT) for frontier customers, and OpenAI's philosophy of providing not just models but reference architectures and harnesses to maximize developer success. (0:00) Intro(1:46) Discussing GPT-5.1(2:57) Adoption and Impact of Codex(4:09) Scientific Community's Use of GPT-5.1(6:37) Challenges in AI Automation(8:19) AI in Life Sciences and Pharma(11:48) Enterprise AI Adoption and Ecosystem(16:04) Future of AI Models and Continuous Learning(24:20) Cost and Efficiency in AI Deployment(27:10) Reinforcement Learning and Enterprise Use Cases(31:17) Key Factors Influencing Model Choice(34:21) Challenges in Model Deployment and Adaptation(38:29) Voice Technology: The Next Frontier(41:08) The Rise of AI in Software Engineering(52:09) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint

Crypto Hipster Podcast
Designing a Next-Gen Planetary-Scale Network to Create a Global and Scalable AI-Driven Community, with Ivan Nikitin @ Fortytwo (Video)

Crypto Hipster Podcast

Play Episode Listen Later Dec 5, 2025 31:50


Ivan co-founded Fortytwo, a planetary-scale intelligence network designed for next-generation AI applications and services. Fortytwo connects nodes running on consumer hardware with small AI models that collectively can outperform the reasoning capabilities of frontier AI (e.g.,centralized large models by OpenAI and Anthropic), creating a permissionless, scalable layer of community-driven intelligence free from intermediaries.Ivan graduated from DePaul University and, early in his career, co-founded Temporal Games, where he served as CEO, focusing on AI projects in games and entertainment. He led the creation of autonomous, self-learning AI agents using Genetic Algorithms and Reinforcement Learning, which outperformed humans in classic games. In 2018, he developed a Conversational AI service leveraging some of the earliest LLMs to enhance game character interactivity. In 2023, he collaborated with NEOM to develop a state-of-the-art generative 3D graphics model and the industry's first LLM capable of spatial reasoning. His more recent workincludes research on Animated Gaussian Splatting for capturing and reconstructing volumetric video content, followed by ongoing efforts in AI decentralization, which led to the research on swarm inference and the creation of the Fortytwo network.

Crypto Hipster Podcast
Designing a Next-Gen Planetary-Scale Network to Create a Global and Scalable AI-Driven Community, with Ivan Nikitin @ Fortytwo (Audio)

Crypto Hipster Podcast

Play Episode Listen Later Dec 5, 2025 31:50


Ivan co-founded Fortytwo, a planetary-scale intelligence network designed for next-generation AI applications and services. Fortytwo connects nodes running on consumer hardware with small AI models that collectively can outperform the reasoning capabilities of frontier AI (e.g.,centralized large models by OpenAI and Anthropic), creating a permissionless, scalable layer of community-driven intelligence free from intermediaries.Ivan graduated from DePaul University and, early in his career, co-founded Temporal Games, where he served as CEO, focusing on AI projects in games and entertainment. He led the creation of autonomous, self-learning AI agents using Genetic Algorithms and Reinforcement Learning, which outperformed humans in classic games. In 2018, he developed a Conversational AI service leveraging some of the earliest LLMs to enhance game character interactivity. In 2023, he collaborated with NEOM to develop a state-of-the-art generative 3D graphics model and the industry's first LLM capable of spatial reasoning. His more recent workincludes research on Animated Gaussian Splatting for capturing and reconstructing volumetric video content, followed by ongoing efforts in AI decentralization, which led to the research on swarm inference and the creation of the Fortytwo network.

MLOps.community
Hardening Agents for E-commerce Scale: From RL Alignment to Reliability // Panel 2

MLOps.community

Play Episode Listen Later Dec 2, 2025 29:16


Thanks to Prosus Group for collaborating on the Agents in Production Virtual Conference 2025.Abstract //The discussion centers on highly technical yet practical themes, such as the use of advanced post-training techniques like Direct Preference Optimization (DPO) and Parameter-Efficient Fine-Tuning (PEFT) to ensure LLMs maintain stability while specializing for e-commerce domains. We compare the implementation challenges of Computer-Using Agents in automating legacy enterprise systems versus the stability issues faced by conversational agents when inputs become unpredictable in production. We will analyze the role of cloud infrastructure in supporting the continuous, iterative training loops required by Reinforcement Learning-based agents for e-commerce!Bio // Paul van der Boor (Panel Host) //Paul van der Boor is a Senior Director of Data Science at Prosus and a member of its internal AI group.Arushi Jain (Panelist) // Arushi is a Senior Applied Scientist at Microsoft, working on LLM post-training for Computer-Using Agent (CUA) through Reinforcement Learning. She previously completed Microsoft's competitive 2-year AI Rotational Program (MAIDAP), building and shipping AI-powered features across four product teams.She holds a Master's in Machine Learning from the University of Michigan, Ann Arbor, and a Dual Degree in Economics from IIT Kanpur. At Michigan, she led the NLG efforts for the Alexa Prize Team, securing a $250K research grant to develop a personalized, active-listening socialbot. Her research spans collaborations with Rutgers School of Information, Virginia Tech's Economics Department, and UCLA's Center for Digital Behavior.Beyond her technical work, Arushi is a passionate advocate for gender equity in AI. She leads the Women in Data Science (WiDS) Cambridge community, scaling participation in her ML workshops from 25 women in 2020 to 100+ in 2025—empowering women and non-binary technologists through education and mentorship.Swati Bhatia //Passionate about building and investing in cutting-edge technology to drive positive impact.Currently shaping the future of AI/ML at Google Cloud.10+ years of global experience across the U.S, EMEA, and India in product, strategy & venture capital (Google, Uber, BCG, Morpheus Ventures).Audi Liu //I'm passionate about making AI more useful and safe.Why? Because AI will be ubiquitous in every workflow, powering our lives just like how electricity revolutionized our society - It's pivotal we get it right.At Inworld AI, we believe all future software will be powered by voice. As a Sr Product Manager at Inworld, I'm focused on building a real-time voice API that empowers developers to create engaging, human-like experiences. Inworld offers state-of-the-art voice AI at a radically accessible price - No. 1 on Hugging Face and Artificial Analysis, instant voice cloning, rich multilingual support, real-time streaming, and emotion plus non-verbal control, all for just $5 per million characters.Isabella Piratininga //Experienced Product Leader with over 10 years in the tech industry, shaping impactful solutions across micro-mobility, e-commerce, and leading organizations in the new economy, such as OLX, iFood, and now Nubank. I began my journey as a Product Owner during the early days of modern product management, contributing to pivotal moments like scaling startups, mergers of major tech companies, and driving innovation in digital banking.My passion lies in solving complex challenges through user-centered product strategies. I believe in creating products that serve as a bridge between user needs and business goals, fostering value and driving growth. At Nubank, I focus on redefining financial experiences and empowering users with accessible and innovative solutions.

Female TechTalk
Deine Mikro-Öffentlichkeit und die Macht dahinter

Female TechTalk

Play Episode Listen Later Nov 20, 2025 41:35


Was passiert eigentlich in unseren Feeds – und warum sehen wir alle was anderes?In der neuen Folge von Female TechTalk sprechen wir darüber, wie sich unsere Öffentlichkeit immer stärker ins Digitale verschiebt und warum Social Media inzwischen zu einem der wichtigsten öffentlichen Räume unserer Demokratie geworden ist.Wir erklären, wie Reinforcement Learning im Hintergrund arbeitet, warum Algorithmen wie kleine Agentinnen jede Sekunde testen, was uns fesselt, und welche Rolle Zustandsübergänge und Übergangswahrscheinlichkeiten dabei spielen. Außerdem schauen wir darauf, wie wir mit der Informationsflut umgehen können – und weshalb wir am Ende doch alle in unserer eigenen Mikroöffentlichkeit leben.Und ja: Bei uns gibt es in dieser Folge sogar neue Feiertage höchstpersönlich von FTT eingeführt.Also hört rein. Wenn ihr danach euren Feed mit anderen Augen seht, übernehmen wir keine Verantwortung – das war dann vermutlich die Agentin.

Crazy Wisdom
Episode #506: How AI Turns Podcasts into Knowledge Engines

Crazy Wisdom

Play Episode Listen Later Nov 14, 2025 49:38


In this episode of Crazy Wisdom, host Stewart Alsop talks with Kevin Smith, co-founder of Snipd, about how AI is reshaping the way we listen, learn, and interact with podcasts. They explore Snipd's vision of transforming podcasts into living knowledge systems, the evolution of machine learning from finance to large language models, and the broader connection between AI, robotics, and energy as the foundation for the next technological era. Kevin also touches on ideas like the bitter lesson, reinforcement learning, and the growing energy demands of AI. Listeners can try Snipd's premium version free for a month using this promo link.Check out this GPT we trained on the conversationTimestamps00:00 – Stewart Alsop welcomes Kevin Smith, co-founder of Snipd, to discuss AI, podcasting, and curiosity-driven learning.05:00 – Kevin explains Snipd's snipping feature, chatting with episodes, and future plans for voice interaction with podcasts.10:00 – They discuss vector search, embeddings, and context windows, comparing full-episode context to chunked transcripts.15:00 – Kevin shares his background in mathematics and economics, his shift from finance to machine learning, and early startup work in AI.20:00 – They explore early quant models versus modern machine learning, statistical modeling, and data limitations in finance.25:00 – Conversation turns to transformer models, pretraining, and the bitter lesson—how compute-based methods outperform human-crafted systems. 30:00 – Stewart connects this to RLHF, Scale AI, and data scarcity; Kevin reflects on reinforcement learning's future. 35:00 – They pivot to Snipd's podcast ecosystem, hidden gems like Founders Podcast, and how stories shape entrepreneurial insight. 40:00 – ETH Zurich, robotics, and startup culture come up, linking academia to real-world innovation. 45:00 – They close on AI, robotics, and energy as the pillars of the future, debating nuclear and solar power's role in sustaining progress.Key InsightsPodcasts as dynamic knowledge systems: Kevin Smith presents Snipd as an AI-powered tool that transforms podcasts into interactive learning environments. By allowing listeners to “snip” and summarize meaningful moments, Snipd turns passive listening into active knowledge management—bridging curiosity, memory, and technology in a way that reframes podcasts as living knowledge capsules rather than static media.AI transforming how we engage with information: The discussion highlights how AI enables entirely new modes of interaction—chatting directly with podcast episodes, asking follow-up questions, and contextualizing information across an author's full body of work. This evolution points toward a future where knowledge consumption becomes conversational and personalized rather than linear and one-size-fits-all.Vectorization and context windows matter: Kevin explains that Snipd currently avoids heavy use of vector databases, opting instead to feed entire episodes into large models. This choice enhances coherence and comprehension, reflecting how advances in context windows have reshaped how AI understands complex audio content.Machine learning's roots in finance shaped early AI thinking: Kevin's journey from quantitative finance to AI reveals how statistical modeling laid the groundwork for modern learning systems. While finance once relied on rigid, theory-based models, the machine learning paradigm replaced those priors with flexible, data-driven discovery—an essential philosophical shift in how intelligence is approached.The Bitter Lesson and the rise of compute: Together they unpack Richard Sutton's “bitter lesson”—the idea that methods leveraging computation and data inevitably surpass those built from human intuition. This insight serves as a compass for understanding why transformers, pretraining, and scaling have driven recent AI breakthroughs.Reinforcement learning and data scarcity define AI's next phase: Stewart links RLHF and the work of companies like Scale AI and Surge AI to the broader question of data limits. Kevin agrees that the next wave of AI will depend on reinforcement learning and simulated environments that generate new, high-quality data beyond what humans can label.The future hinges on AI, robotics, and energy: Kevin closes with a framework for the next decade: AI provides intelligence, robotics applies it to the physical world, and energy sustains it all. He warns that society must shift from fearing energy use to innovating in production—especially through nuclear and solar power—to meet the demands of an increasingly intelligent, interconnected world.

Vida com IA
#136- Reinforcement Learning.

Vida com IA

Play Episode Listen Later Nov 13, 2025 15:59


Fala galera, nesse episódio eu falo sobre a base de reinforcement learning, que é uma das áreas mais importantes hoje em dia em IA, pois é a base do pós treinamento de LLMs!O cupom de 50% de desconto do curso pelos 2 primeiros meses válido até 30/11 é: BLACK50Aqui está o link para a página de vendas para saber mais sobre mim e sobre o curso: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.cursovidacomia.com.br/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Aqui está o link para se inscrever: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://pay.hotmart.com/W98240617U⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Link do grupo do wpp:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://chat.whatsapp.com/GNLhf8aCurbHQc9ayX5oCPInstagram do podcast: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.instagram.com/podcast.lifewithai⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Meu Linkedin: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.linkedin.com/in/filipe-lauar/⁠⁠⁠

Outgrow's Marketer of the Month
Snippet: Beyond Language Models: Why Reinforcement Learning Still Matters: Martin Riedmiller, Research Scientist & Controls Team Lead, Google DeepMind

Outgrow's Marketer of the Month

Play Episode Listen Later Nov 12, 2025 1:31


ExplAInable
רובוט אמיתי בכמה שורות פייתון עם MAIK-Education

ExplAInable

Play Episode Listen Later Nov 11, 2025 42:21


בפרק הזה מייק ריאיין את תמיר שסיפר על האפליקציה שהחברה maik-education.com שלו מפתחת. מדובר באפליקצייה וובית ייחודית שהינה סביבת Reinforcement Learning הניתן להפעלה באופן פיסי עם רובוטים אמיתיים שכל אחד יכול ליצור בבית או במשרד. בסביבה ניתן ליצור סוכנים, להגדיר להם התנהגויות בקוד או במודל דיפ אותו ניתן לאמן למיקסום פונקצית תגמול כלשהיא. לאחר שהפרויקט רץ ועובד וירטואלית ניתן לחבר כל סוכן לרובוט בבלוטוס (יש גם ערכות לזה) ויש לייצב מצלמה שתתפוס את זירת הרובוטים ואז כל מה שתיכנתנו או אימנו בסימולציה יקרה בעולם הפיסי. בפרק תמיר הראה פרויקטים כמו רובוטים שיודעים להסתדר בצורה של משולש, רובוט (פוטבול) המנסה להגיע לקו בעוד רובוט אחר המנסה לחסום אותו (AI vs AI), רובוט המגיע לנקודת יעד מבלי להתנגש במכשול או לחילופין כך שיעבור דרך נקודה שתזכה אותו בתגמול חלקי, ועוד. הסביבה מאפשרת לכל אחד ליצור פרויקט רובוטים יצירתי כלשהוא למטרות למידה וכף.  

TalkRL: The Reinforcement Learning Podcast
Danijar Hafner on Dreamer v4

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Nov 10, 2025 100:52 Transcription Available


Danijar Hafner was a Research Scientist at Google DeepMind until recently.Featured References   Training Agents Inside of Scalable World Models [ blog ]  Danijar Hafner, Wilson Yan, Timothy LillicrapOne Step Diffusion via Shortcut ModelsKevin Frans, Danijar Hafner, Sergey Levine, Pieter AbbeelAction and Perception as Divergence Minimization [ blog ] Danijar Hafner, Pedro A. Ortega, Jimmy Ba, Thomas Parr, Karl Friston, Nicolas Heess Additional References   Mastering Diverse Domains through World Models [ blog ] DreaverV3l Danijar Hafner, Jurgis Pasukonis, Jimmy Ba, Timothy Lillicrap   Mastering Atari with Discrete World Models [ blog ] DreaverV2 ; Danijar Hafner, Timothy Lillicrap, Mohammad Norouzi, Jimmy Ba   Dream to Control: Learning Behaviors by Latent Imagination [ blog ] Dreamer ; Danijar Hafner, Timothy Lillicrap, Jimmy Ba, Mohammad Norouzi Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos [ Blog Post ], Baker et al

This Week in Google (MP3)
IM 844: Poob Has It For You - Spiky Superintelligence vs. Generality

This Week in Google (MP3)

Play Episode Listen Later Nov 6, 2025 163:50


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

All TWiT.tv Shows (MP3)
Intelligent Machines 844: Poob Has It For You

All TWiT.tv Shows (MP3)

Play Episode Listen Later Nov 6, 2025 163:20 Transcription Available


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

Radio Leo (Audio)
Intelligent Machines 844: Poob Has It For You

Radio Leo (Audio)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

This Week in Google (Video HI)
IM 844: Poob Has It For You - Spiky Superintelligence vs. Generality

This Week in Google (Video HI)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

All TWiT.tv Shows (Video LO)
Intelligent Machines 844: Poob Has It For You

All TWiT.tv Shows (Video LO)

Play Episode Listen Later Nov 6, 2025 163:20 Transcription Available


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

Radio Leo (Video HD)
Intelligent Machines 844: Poob Has It For You

Radio Leo (Video HD)

Play Episode Listen Later Nov 6, 2025 163:20


Is today's AI stuck as a "spiky superintelligence," brilliant at some things but clueless at others? This episode pulls back the curtain on a lunchroom full of AI researchers trading theories, strong opinions, and the next big risks on the path to real AGI. Why "Everyone Dies" Gets AGI All Wrong The Nonprofit Feeding the Entire Internet to AI Companies Google's First AI Ad Avoids the Uncanny Valley by Casting a Turkey Coca-Cola Is Trying Another AI Holiday Ad. Executives Say This Time Is Different Sam Altman shuts down question about how OpenAI can commit to spending $1.4 trillion while earning billions: 'Enough' How OpenAI Uses Complex and Circular Deals to Fuel Its Multibillion-Dollar Rise Perplexity's new AI tool aims to simplify patent research Kids Turn Podcast Comments Into Secret Chat Rooms, Because Of Course They Do Amazon and Perplexity have kicked off the great AI web browser fight Neural network finds an enzyme that can break down polyurethane Dictionary.com names 6-7 as 2025's word of the year Tech companies don't care that students use their AI agents to cheat The Morning After: Musk talks flying Teslas on Joe Rogan's show The Hatred of Podcasting | Brace Belden TikTok announces its first awards show in the US Google wants to build solar-powered data centers — in space Anthropic Projects $70 Billion in Revenue, $17 Billion in Cash Flow in 2028 American Museum of Tort Law Dog Chapel - Dog Mountain Nicvember masterlist Pornhub says UK visitors down 77% since age checks came in Hosts: Leo Laporte, Jeff Jarvis, and Paris Martineau Guest: Jeremy Berman Download or subscribe to Intelligent Machines at https://twit.tv/shows/intelligent-machines. Join Club TWiT for Ad-Free Podcasts! Support what you love and get ad-free shows, a members-only Discord, and behind-the-scenes access. Join today: https://twit.tv/clubtwit Sponsors: threatlocker.com/twit agntcy.org spaceship.com/twit monarch.com with code IM

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
20VC: Cohere's Chief Scientist on Why Scaling Laws Will Continue | Whether You Can Buy Success in AI with Talent Acquisitions | The Future of Synthetic Data & What It Means for Models | Why AI Coding is Akin to Image Generation in 2015 with Joelle

The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch

Play Episode Listen Later Nov 3, 2025 57:34


Joelle Pineau is the Chief Scientist at Cohere, where she leads research on advancing large language models and practical AI systems. Before joining Cohere, she was VP of AI Research at Meta, where she founded and led Meta AI's Montreal lab. A professor at McGill University, Joelle is renowned for her pioneering work in reinforcement learning, robotics, and responsible AI development. AGENDA:  00:00 Introduction to AI Scaling Laws 03:00 How Meta Shaped How I Think About AI Research 04:36 Challenges in Reinforcement Learning 10:00 Is It Possible to be Capital Efficient in AI 15:52 AI in Enterprise: Efficiency and Adoption 22:15 Security Concerns with AI Agents 28:34 Can Zuck Win By Buying the Galacticos of AI 32:15 The Rising Cost of Data 35:28 Synthetic Data and Model Degradation 37:22 Why AI Coding is Akin to Image Generation in 2015 48:46 If Joelle Was a VC Where Would She Invest? 52:17 Quickfire: Lessons from Zuck, Biggest Mindset Shift  

Cambrian Fintech with Rex Salisbury
Why AI will NEVER Replace Your Sales Job! - Stevie Case CRO @Vanta

Cambrian Fintech with Rex Salisbury

Play Episode Listen Later Oct 21, 2025 46:49


My Fintech Newsletter for more interviews and the latest insights:↪︎ https://rexsalisbury.substack.com/In this episode, I sit down with Stevie Case from Vanta, a former pro gamer turned chief revenue officer, to discuss how AI is transforming the entire go-to-market function in B2B SaaS. Stevie shares insights on building agile sales organizations, how AI supercharges human roles rather than replacing them, and the evolving expectations for sales, customer success, and RevOps teams. The conversation covers AI tool adoption, hiring for an AI-native workforce, and why go-to-market roles are among the most exciting in tech today.Stevie Case: https://www.linkedin.com/in/steviecase/00:00:00 - AI's Impact on Go-To-Market Functions00:02:06 - Building Scalable Sales Organizations00:04:47 - Specialization and Segmentation in Sales00:06:28 - AI Supercharging Customer Success00:08:23 - Hiring and Onboarding with AI Support00:10:07 - Building AI-Driven Products with Customers00:12:08 - Selling New Products to Existing Customers00:15:02 - Early Product Adoption and Iteration00:17:25 - Operating at All Levels in Organizations00:20:01 - Creating Intense, High-Velocity Teams00:22:15 - Hiring AI-Native, Curious Builders00:25:05 - Measuring Success by Team Pride and Feedback00:26:07 - Developing Agent Platforms00:28:02 - Monetization and Business Model Evolution00:30:49 - AI-Enabled Competitive Advantages in Fintech00:32:31 - Top-Down AI Automation Demand00:34:11 - Reinforcement Learning in Fraud Detection00:38:00 - International Go-To-Market Expansion00:41:33 - Designing Global Sales Footprints00:45:04 - Resourcing RevOps and Systems Teams___Rex Salisbury LinkedIn:↪︎ https://www.linkedin.com/in/rexsalisburyTwitter: https://twitter.com/rexsalisburyTikTok: https://www.tiktok.com/@rex.salisburyInstagram: https://www.instagram.com/rexsalisbury/

The MAD Podcast with Matt Turck
How GPT-5 Thinks — OpenAI VP of Research Jerry Tworek

The MAD Podcast with Matt Turck

Play Episode Listen Later Oct 16, 2025 76:04


What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI's VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn't). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don't), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI? This is the MAD Podcast —AI for the 99%. If you're curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.OpenAIWebsite - https://openai.comX/Twitter - https://x.com/OpenAIJerry TworekLinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56X/Twitter - https://x.com/millionintFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro(01:01) What Reasoning Actually Means in AI(02:32) Chain of Thought: Models Thinking in Words(05:25) How Models Decide Thinking Time(07:24) Evolution from O1 to O3 to GPT-5(11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading(20:32) Working on Robotics and Rubik's Cube Solving(23:02) A Day in the Life: Talking to Researchers(24:06) How Research Priorities Are Determined(26:53) Collaboration vs IP Protection at OpenAI(29:32) Shipping Fast While Doing Deep Research(31:52) Using OpenAI's Own Tools Daily(32:43) Pre-Training Plus RL: The Modern AI Stack(35:10) Reinforcement Learning 101: Training Dogs(40:17) The Evolution of Deep Reinforcement Learning(42:09) When GPT-4 Seemed Underwhelming at First(45:39) How RLHF Made GPT-4 Actually Useful(48:02) Unsupervised vs Supervised Learning(49:59) GRPO and How DeepSeek Accelerated US Research(53:05) What It Takes to Scale Reinforcement Learning(55:36) Agentic AI and Long-Horizon Thinking(59:19) Alignment as an RL Problem(1:01:11) Winning ICPC World Finals Without Specific Training(1:05:53) Applying RL Beyond Math and Coding(1:09:15) The Path from Here to AGI(1:12:23) Pure RL vs Language Models

Podcast Notes Playlist: Latest Episodes
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Podcast Notes Playlist: Latest Episodes

Play Episode Listen Later Oct 7, 2025


Invest Like the Best: Read the notes at at podcastnotes.org. Don't forget to subscribe for free to our newsletter, the top 10 ideas of the week, every Monday --------- My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Podcast Notes Playlist: Business
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Podcast Notes Playlist: Business

Play Episode Listen Later Oct 7, 2025 118:15


Invest Like the Best Key Takeaways  Today, the challenge is not to make the model bigger; the problem is knowing how best to generate and create data in useful domains so that the model gets better at them    AIs do not have to get to digital god mode for AI to have an enormous impact on productivity and society: Even if AI does not become smarter than humans in the short term, the economic value creation boom will still be enormous“If we didn't have the AI boom, the US probably would be behind China and no longer the world hegemon by the end of the decade, if not sooner.” – Dylan Patel The US is doing what China has done historically: dumping tons of capital into something, and then the market becomes If there is a sustained lag in model improvement, the US economy will go into a recession; this is the case for Korea and Taiwan, too  On the AI talent wars: If these companies are willing to spend billions on training runs, it makes sense to spend a lot on talent to optimize those runs and potentially mitigate errors  We actually are not dedicating that much power to AI yet; only 3-4% of total power is going to data centers He is more optimistic on Anthropic than OpenAI; their revenue is accelerating much faster because of their focus on the $2 trillion software market, whereas OpenAI's focus is split between many thingsWhile Meta “has the cards to potentially own it all”, Google is better positioned to dominate the consumer and professional markets Read the full notes @ podcastnotes.orgMy guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Invest Like the Best with Patrick O'Shaughnessy
Dylan Patel - Inside the Trillion-Dollar AI Buildout - [Invest Like the Best, EP.442]

Invest Like the Best with Patrick O'Shaughnessy

Play Episode Listen Later Sep 30, 2025 118:15


My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠here⁠⁠⁠⁠⁠⁠.⁠⁠⁠⁠⁠⁠⁠⁠ ----- This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ramp.com/invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to sign up for free and get a $250 welcome bonus. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Ridgeline⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ ridgelineapps.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ to learn more about the platform. – This episode is brought to you by⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ AlphaSense⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ Alpha-Sense.com/Invest⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://thepodcastconsultant.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing

Eye On A.I.
#289 Eiso Kant: How Reinforcement Learning and Coding Could Unlock Human-Level AI

Eye On A.I.

Play Episode Listen Later Sep 24, 2025 54:06


How do we get from today's AI copilots to true human-level intelligence? In this episode of Eye on AI, Craig Smith sits down with Eiso Kant, Co-Founder of Poolside, to explore why reinforcement learning + software development might be the fastest path to human-level AI. Eiso shares Poolside's mission to build AI that doesn't just autocomplete code — but learns like a real developer. You'll hear how Poolside uses reinforcement learning from code execution (RLCF), why software development is the perfect training ground for intelligence, and how agentic AI systems are about to transform the way we build and ship software. If you want to understand the future of AI, software engineering, and AGI, this conversation is packed with insights you won't want to miss. Stay Updated: Craig Smith on X:https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI (00:00) The Missing Ingredient for Human-Level AI(01:02) Eiso Kant's Journey(05:30) Using Software Development to Reach AGI(07:48) Why Coding Is the Perfect Training Ground for Intelligence(10:11)  Reinforcement Learning from Code Execution (RLCF) Explained(13:14) How Poolside Builds and Trains Its Foundation Models(17:35) The Rise of Agentic AI(21:08)  Making Software Creation Accessible to Everyone(26:03) Overcoming Model Limitations(32:08) Training Models to Think(37:24) Building the Future of AI Agents(42:11) Poolside's Full-Stack Approach to AI Deployment(46:28) Enterprise Partnerships, Security & Customization Behind the Firewall(50:48) Giving Enterprises Transparency to Drive Adoption  

Hub Dialogues
How Alberta could lead the AI revolution

Hub Dialogues

Play Episode Listen Later Sep 18, 2025 47:27


Artificial intelligence could fuel Alberta's next big tech boom. Three leaders in the field—Cam Linke, CEO of Amii; Nicole Janssen, co-founder of AltaML; and Danielle Gifford, managing director of AI with PwC—dig into how AI is transforming everything from energy to healthcare and even space. They share why Edmonton is a world leader in reinforcement learning, and why Alberta's natural advantages could make it a global hub for data centres and AI commercialization.    This podcast is generously supported by Don Archibald. The Hub thanks him for his ongoing support.   The Hub is Canada's fastest-growing independent digital news outlet. Subscribe to our YouTube channel to get our latest videos: https://www.youtube.com/@TheHubCanada Subscribe to The Hub's podcast feed to get our best content when you are on the go: https://tinyurl.com/3a7zpd7e (Apple)  https://tinyurl.com/y8akmfn7 (Spotify)   Want more Hub? Get a FREE 3-month trial membership on us: https://thehub.ca/free-trial/ Follow The Hub on X: https://x.com/thehubcanada?lang=en   CREDITS: Falice Chin - Producer and Editor  Ryan Hastman - Host Amal Attar-Guzman - Sound and Video Assistant   To contact us, sign up for updates, and access transcripts email support@thehub.ca

Getting2Alpha
Hansohl Kim: What Is Reinforcement Learning?

Getting2Alpha

Play Episode Listen Later Sep 16, 2025 40:01 Transcription Available


Hansohl Kim is an engineer at Anthropic, where he focuses on reinforcement learning & AI safety for models like Claude. With experience spanning computer science, biotech, & machine learning, he brings a unique perspective to the fast-changing world of artificial intelligence.Listen as Hansohl unpacks the challenges of alignment, the importance of guardrails, & what it takes to design AI systems we can truly trust.RELATED LINKS:

TalkRL: The Reinforcement Learning Podcast
David Abel on the Science of Agency @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Sep 8, 2025 59:42 Transcription Available


David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency.  Featured References  Plasticity as the Mirror of Empowerment   David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  A Definition of Continual RL   David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh  Agency is Frame-Dependent   David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh  On the Expressivity of Markov Reward   David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021  Additional References  Bidirectional Communication Theory — Marko 1973  Causality, Feedback and Directed Information — Massey 1990  The Big World Hypothesis — Javed et al. 2024  Loss of plasticity in deep continual learning — Dohare et al. 2024  Three Dogmas of Reinforcement Learning — Abel 2024  Explaining dopamine through prediction errors and beyond — Gershman et al. 2024  David Abel Google Scholar  David Abel personal website  

TalkRL: The Reinforcement Learning Podcast
Jake Beck, Alex Goldie, & Cornelius Braun on Sutton's OaK, Metalearning, LLMs, Squirrels @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 19, 2025 12:20 Transcription Available


Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.Featured ReferencesLecture on the Oak Architecture, Rich SuttonAlberta Plan, Rich Sutton with Mike Bowling and Patrick Pilarski Additional ReferencesJacob Beck on Google Scholar Alex Goldie on Google ScholarCornelius Braun on Google ScholarReinforcement Learning Conference

TalkRL: The Reinforcement Learning Podcast
Outstanding Paper Award Winners - 2/2 @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 18, 2025 14:18 Transcription Available


We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References Empirical Reinforcement Learning ResearchMitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functionsAyush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J LimApplications of Reinforcement LearningWOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management StrategiesWilliam Solow, Sandhya Saisubramanian, Alan FernEmerging Topics in Reinforcement LearningTowards Improving Reward Design in RL: A Reward Alignment Metric for RL PractitionersCalarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. TaylorScientific Understanding in Reinforcement LearningMulti-Task Reinforcement Learning Enables Parameter ScalingReginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro

TalkRL: The Reinforcement Learning Podcast
Outstanding Paper Award Winners - 1/2 @ RLC 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 15, 2025 6:46 Transcription Available


We caught up with the RLC Outstanding Paper award winners for your listening pleasure.  Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References  Scientific Understanding in Reinforcement Learning  How Should We Meta-Learn Reinforcement Learning Algorithms?  Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson  Tooling, Environments, and Evaluation for Reinforcement Learning  Syllabus: Portable Curricula for Reinforcement Learning Agents  Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson  Resourcefulness in Reinforcement Learning  PufferLib 2.0: Reinforcement Learning at 1M steps/s  Joseph Suarez  Theory of Reinforcement Learning  Deep Reinforcement Learning with Gradient  Eligibility Traces  Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White  

TalkRL: The Reinforcement Learning Podcast
Thomas Akam on Model-based RL in the Brain

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Aug 4, 2025 52:06 Transcription Available


Prof Thomas Akam is a Neuroscientist at the Oxford University Department of Experimental Psychology.  He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the Cognitive Circuits research group.Featured ReferencesBrain Architecture for Adaptive BehaviourThomas Akam, RLDM 2025 TutorialAdditional ReferencesThomas Akam on Google ScholarpyPhotometry : Open source, Python based, fiber photometry data acquisition pyControl : Open source, Python based, behavioural experiment control.Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M., Milner, B., Corkin, S., & Teuber, H. L., 1968Internally generated cell assembly sequences in the rat hippocampus, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008Multi-disciplinary Conference on Reinforcement Learning and Decision 2025

Analyse Asia with Bernard Leong
How Microsoft Research Balances Exploration and Impact Globally with Doug Burger

Analyse Asia with Bernard Leong

Play Episode Listen Later Aug 3, 2025 43:46


"If you're going to be running a very elite research institution, you have to have the best people. To have the best people, you have to trust them and empower them. You can't hire a world expert in some area and then tell them what to do. They know more than you do. They're smarter than you are in their area. So you've got to trust your people. One of our really foundational commitments to our people is: we trust you. We're going to work to empower you. Go do the thing that you need to do. If somebody in the labs wants to spend 5, 10, 15 years working on something they think is really important, they're empowered to do that." - Doug Burger Fresh out of the studio, Doug Burger, Technical Fellow and Corporate Vice President at Microsoft Research, joins us to explore Microsoft's bold expansion into Southeast Asia with the recent launch of the Microsoft Research Asia lab in Singapore. From there, Doug shares his accidental journey from academia to leading global research operations, reflecting on how Microsoft Research's open collaboration model empowers over thousands of researchers worldwide to tackle humanity's biggest challenges. Following on, he highlights the recent breakthroughs from Microsoft Research for example, the quantum computing breakthrough with topological qubits, the evolution from lines of code to natural language programming, and how AI is accelerating innovation across multiple scaling dimensions beyond traditional data limits. Addressing the intersection of three computing paradigms—logic, probability, and quantum—he emphasizes that geographic diversity in research labs enables Microsoft to build AI that works for everyone, not just one region. Closing the conversation, Doug shares his vision of what great looks like for Microsoft Research with researchers driven by purpose and passion to create breakthroughs that advance both science and society. Episode Highlights: [00:00] Quote of the Day by Doug Burger [01:08] Doug Burger's journey from academia to Microsoft Research [02:24] Career advice: Always seek challenges, move when feeling restless or comfortable [03:07] Launch of Microsoft Research Asia in Singapore: Tapping local talent and culture for inclusive AI development [04:13] Singapore lab focuses on foundational AI, embodied AI, and healthcare applications [06:19] AI detecting seizures in children and assessing Parkinson's motor function [08:24] Embedding Southeast Asian societal norms and values into Foundational AI research [10:26] Microsoft Research's open collaboration model [12:42] Generative AI's rapid pace accelerating technological innovation and research tools [14:36] AI revolutionizing computer architecture by creating completely new interfaces [16:24] Open versus closed source AI models debate and Microsoft's platform approach [18:08] Reasoning models enabling formal verification and correctness guarantees in AI [19:35] Multiple scaling dimensions in AI beyond traditional data scaling laws [21:01] Project Catapult and Brainwave: Building configurable hardware acceleration platforms [23:29] Microsoft's 17-year quantum computing journey with topological qubits breakthrough [26:26] Balancing blue-sky foundational research with application-driven initiatives at scale [29:16] Three computing paradigms: logic, probability (AI), and quantum superposition [32:26] Microsoft Research's exploration-to-exploitation playbook for breakthrough discoveries [35:26] Research leadership secret: Curiosity across fields enables unexpected connections [37:11] Hidden Mathematical Structures Transformers Architecture in LLMs [40:04] Microsoft Research's vision: Becoming Bell Labs for AI era [42:22] Steering AI models for mental health and critical thinking conversations Profile: Doug Burger, Technical Fellow and Corporate Vice President, Microsoft Research LinkedIn: https://www.linkedin.com/in/dcburger/ Microsoft Research Profile: https://www.microsoft.com/en-us/research/people/dburger/ Podcast Information: Bernard Leong hosts and produces the show. The proper credits for the intro and end music are "Energetic Sports Drive." G. Thomas Craig mixed and edited the episode in both video and audio format. Here are the links to watch or listen to our podcast. Analyse Asia Main Site: https://analyse.asia Analyse Asia Spotify: https://open.spotify.com/show/1kkRwzRZa4JCICr2vm0vGl Analyse Asia Apple Podcasts: https://podcasts.apple.com/us/podcast/analyse-asia-with-bernard-leong/id914868245 Analyse Asia YouTube: https://www.youtube.com/@AnalyseAsia Analyse Asia LinkedIn: https://www.linkedin.com/company/analyse-asia/ Analyse Asia X (formerly known as Twitter): https://twitter.com/analyseasia Analyse Asia Threads: https://www.threads.net/@analyseasia Sign Up for Our This Week in Asia Newsletter: https://www.analyse.asia/#/portal/signup Subscribe Newsletter on LinkedIn https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7149559878934540288

GOTO - Today, Tomorrow and the Future
Prompt Engineering for Generative AI • James Phoenix, Mike Taylor & Phil Winder

GOTO - Today, Tomorrow and the Future

Play Episode Listen Later Aug 1, 2025 53:33 Transcription Available


This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereJames Phoenix - Co-Author of "Prompt Engineering for Generative AI"Mike Taylor - Co-Author of "Prompt Engineering for Generative AI"Phil Winder - Author of "Reinforcement Learning" & CEO of Winder.AIRESOURCESJameshttps://x.com/jamesaphoenix12https://www.linkedin.com/in/jamesphoenixhttps://understandingdata.comMikehttp://saxifrage.xyzhttps://twitter.com/hammer_mthttps://www.linkedin.com/in/mjt145Philhttps://twitter.com/DrPhilWinderhttps://linkedin.com/in/drphilwinderhttps://winder.aiLinkshttps://brightpool.devhttps://karpathy.aihttps://help.openai.com/en/articles/6654000https://gemini.google.comhttps://dreambooth.github.iohttps://github.com/microsoft/LoRAhttps://claude.aihttps://www.langchain.com/langgraphDESCRIPTIONLarge language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. Because they have been trained on all the public text and images on the internet, they can make useful contributions to a wide variety of tasks. And with the barrier to entry greatly reduced today, practically any developer can harness LLMs and diffusion models to tackle problems previously unsuitable for automation.With this book, you'll gain a solid foundation in generative AI, including how to apply these models in practice. When first integrating LLMs and diffusion models into their workflows, most developers struggle to coax reliable enough results from them to use in automated systems.* Book description: © O'ReillyRECOMMENDED BOOKSJames Phoenix & Mike Taylor • Prompt Engineering for Generative AIPhil WiBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!

Not Another Politics Podcast
MechaHitler and The Political Bias of AI Chatbots

Not Another Politics Podcast

Play Episode Listen Later Jul 24, 2025 57:28


When you ask ChatGPT or Gemini a question about politics, whose opinions are you really hearing?In this episode, we dive into a provocative new study from political scientist Justin Grimmer and his colleagues, which finds that nearly every major large language model—from ChatGPT to Grok—is perceived by Americans as having a left-leaning bias. But why is that? Is it the training data? The guardrails? The Silicon Valley engineers? Or something deeper about the culture of the internet itself?The hosts grapple with everything from “Mecha Hitler” incidents on Grok to the way terms like “unhoused” sneak into AI-generated text—and what that might mean for students, voters, and future regulation. Should the government step in to ensure “political neutrality”? Will AI reshape how people learn about history or policy? Or are we just projecting our own echo chambers onto machines?

TalkRL: The Reinforcement Learning Podcast
Stefano Albrecht on Multi-Agent RL @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Jul 22, 2025 31:34 Transcription Available


Stefano V. Albrecht was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup Deepflow. He is a Program Chair of RLDM 2025 and is co-author of the MIT Press textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches".Featured ReferencesMulti-Agent Reinforcement Learning: Foundations and Modern ApproachesStefano V. Albrecht,  Filippos Christianos,  Lukas SchäferMIT Press, 2024RLDM 2025: Reinforcement Learning and Decision Making ConferenceDublin, IrelandEPyMARL: Extended Python MARL frameworkhttps://github.com/uoe-agents/epymarlBenchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksGeorgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht

The MAD Podcast with Matt Turck
Ex‑DeepMind Researcher Misha Laskin on Enterprise Super‑Intelligence | Reflection AI

The MAD Podcast with Matt Turck

Play Episode Listen Later Jul 17, 2025 66:29


What if your company had a digital brain that never forgot, always knew the answer, and could instantly tap the knowledge of your best engineers, even after they left? Superintelligence can feel like a hand‑wavy pipe‑dream— yet, as Misha Laskin argues, it becomes a tractable engineering problem once you scope it to the enterprise level. Former DeepMind researcher Laskin is betting on an oracle‑like AI that grasps every repo, Jira ticket and hallway aside as deeply as your principal engineer—and he's building it at Reflection AI.In this wide‑ranging conversation, Misha explains why coding is the fastest on‑ramp to superintelligence, how “organizational” beats “general” when real work is on the line, and why today's retrieval‑augmented generation (RAG) feels like “exploring a jungle with a flashlight.” He walks us through Asimov, Reflection's newly unveiled code‑research agent that fuses long‑context search, team‑wide memory and multi‑agent planning so developers spend less time spelunking for context and more time shipping.We also rewind his unlikely journey—from physics prodigy in a Manhattan‑Project desert town, to Berkeley's AI crucible, to leading RLHF for Google Gemini—before he left big‑lab comfort to chase a sharper vision of enterprise super‑intelligence. Along the way: the four breakthroughs that unlocked modern AI, why capital efficiency still matters in the GPU arms‑race, and how small teams can lure top talent away from nine‑figure offers.If you're curious about the next phase of AI agents, the future of developer tooling, or the gritty realities of scaling a frontier‑level startup—this episode is your blueprint.Reflection AIWebsite - https://reflection.aiLinkedIn - https://www.linkedin.com/company/reflectionaiMisha LaskinLinkedIn - https://www.linkedin.com/in/mishalaskinX/Twitter - https://x.com/mishalaskinFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:42) Reflection AI: Company Origins and Mission (04:14) Making Superintelligence Concrete (06:04) Superintelligence vs. AGI: Why the Goalposts Moved (07:55) Organizational Superintelligence as an Oracle (12:05) Coding as the Shortcut: Hands, Legs & Brain for AI (16:00) Building the Context Engine (20:55) Capturing Tribal Knowledge in Organizations (26:31) Introducing Asimov: A Deep Code Research Agent (28:44) Team-Wide Memory: Preserving Institutional Knowledge (33:07) Multi-Agent Design for Deep Code Understanding (34:48) Data Retrieval and Integration in Asimov (38:13) Enterprise-Ready: VPC and On-Prem Deployments (39:41) Reinforcement Learning in Asimov's Development (41:04) Misha's Journey: From Physics to AI (42:06) Growing Up in a Science-Driven Desert Town (53:03) Building General Agents at DeepMind (56:57) Founding Reflection AI After DeepMind (58:54) Product-Driven Superintelligence: Why It Matters (01:02:22) The State of Autonomous Coding Agents (01:04:26) What's Next for Reflection AI

alphalist.CTO Podcast - For CTOs and Technical Leaders
#124 - The Path to AGI: Inside poolside's AI Model Factory for Code with Eiso Kant

alphalist.CTO Podcast - For CTOs and Technical Leaders

Play Episode Listen Later Jun 27, 2025 63:56 Transcription Available


How do you build a foundation model that can write code at a human level? Eiso Kant (CTO & co-founder, Poolside) reveals the technical architecture, distributed team strategies, and reinforcement learning breakthroughs powering one of Europe's most ambitious AI startups. Learn how Poolside operates 10,000+ H200s, runs the world's largest code execution RL environment, and why CTOs must rethink engineering orgs for an agent-driven future.

TalkRL: The Reinforcement Learning Podcast
Satinder Singh: The Origin Story of RLDM @ RLDM 2025

TalkRL: The Reinforcement Learning Podcast

Play Episode Listen Later Jun 25, 2025 5:57 Transcription Available


Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM.  Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.Featured ReferencesRLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)June 11-14, 2025 at Trinity College Dublin, IrelandSatinder Singh on Google Scholar

Father Fessio in Five (by Ignatius Press)
109: Reinforcement Learning—The Final Step of Making A.I.

Father Fessio in Five (by Ignatius Press)

Play Episode Listen Later Jun 20, 2025 10:01


The final step of making A.I. requires giving the system certain questions we know the answer to and some questions we do not know the answer to and then checking their answers against reality. Fr. Fessio explains how A.I. ultimately depends entirely on humans and thereby cannot self-replicate.

Eye On A.I.
#261 Jonathan Frankle: How Databricks is Disrupting AI Model Training

Eye On A.I.

Play Episode Listen Later Jun 12, 2025 52:47


This episode is sponsored by Oracle. OCI is the next-generation cloud designed for every workload – where you can run any application, including any AI projects, faster and more securely for less. On average, OCI costs 50% less for compute, 70% less for storage, and 80% less for networking. Join Modal, Skydance Animation, and today's innovative AI tech companies who upgraded to OCI…and saved.   Try OCI for free at http://oracle.com/eyeonai   What if you could fine-tune an AI model without any labeled data—and still outperform traditional training methods?   In this episode of Eye on AI, we sit down with Jonathan Frankle, Chief Scientist at Databricks and co-founder of MosaicML, to explore TAO (Test-time Adaptive Optimization)—Databricks' breakthrough tuning method that's transforming how enterprises build and scale large language models (LLMs).   Jonathan explains how TAO uses reinforcement learning and synthetic data to train models without the need for expensive, time-consuming annotation. We dive into how TAO compares to supervised fine-tuning, why Databricks built their own reward model (DBRM), and how this system allows for continual improvement, lower inference costs, and faster enterprise AI deployment.   Whether you're an AI researcher, enterprise leader, or someone curious about the future of model customization, this episode will change how you think about training and deploying AI.   Explore the latest breakthroughs in data and AI from Databricks: https://www.databricks.com/events/dataaisummit-2025-announcements Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI  

Crazy Wisdom
Episode #465: Proof of Aliveness: A Cryptographic Theater of the Real

Crazy Wisdom

Play Episode Listen Later Jun 2, 2025 61:11


I, Stewart Alsop, am thrilled to welcome Xathil of Poliebotics to this episode of Crazy Wisdom, for what is actually our second take, this time with a visual surprise involving a fascinating 3D-printed Bauta mask. Xathil is doing some truly groundbreaking work at the intersection of physical reality, cryptography, and AI, which we dive deep into, exploring everything from the philosophical implications of anonymity to the technical wizardry behind his "Truth Beam."Check out this GPT we trained on the conversationTimestamps01:35 Xathil explains the 3D-printed Bauta Mask, its Venetian origins, and its role in enabling truth through anonymity via his project, Poliepals.04:50 The crucial distinction between public identity and "real" identity, and how pseudonyms can foster truth-telling rather than just conceal.10:15 Addressing the serious risks faced by crypto influencers due to public displays of wealth and the broader implications for online identity.15:05 Xathil details the core Poliebotics technology: the "Truth Beam," a projector-camera system for cryptographically timestamping physical reality.18:50 Clarifying the concept of "proof of aliveness"—verifying a person is currently live in a video call—versus the more complex "proof of liveness."21:45 How the speed of light provides a fundamental advantage for Poliebotics in outmaneuvering AI-generated deepfakes.32:10 The concern of an "inversion," where machine learning systems could become dominant over physical reality by using humans as their actuators.45:00 Xathil's ambitious project to use Poliebotics for creating cryptographically verifiable records of biodiversity, beginning with an enhanced Meles trap.Key InsightsAnonymity as a Truth Catalyst: Drawing from Oscar Wilde, the Bauta mask symbolizes how anonymity or pseudonyms can empower individuals to reveal deeper, more authentic truths. This challenges the notion that masks only serve to hide, suggesting they can be tools for genuine self-expression.The Bifurcation of Identity: In our digital age, distinguishing between one's core "real" identity and various public-facing personas is increasingly vital. This separation isn't merely about concealment but offers a space for truthful expression while navigating public life.The Truth Beam: Anchoring Reality: Poliebotics' "Truth Beam" technology employs a projector-camera system to cast cryptographic hashes onto physical scenes, recording them and anchoring them to a blockchain. This aims to create immutable, verifiable records of reality to combat the rise of sophisticated deepfakes.Harnessing Light Speed Against Deepfakes: The fundamental defense Poliebotics offers against AI-generated fakes is the speed of light. Real-world light reflection for capturing projected hashes is virtually instantaneous, whereas an AI must simulate this complex process, a task too slow to keep up with real-time verification.The Specter of Humans as AI Actuators: A significant future concern is the "inversion," where AI systems might utilize humans as unwitting agents to achieve their objectives in the physical world. By manipulating incentives, AIs could effectively direct human actions, raising profound questions about agency.Towards AI Symbiosis: The ideal future isn't a human-AI war or complete technological asceticism, but a cooperative coexistence between nature, humanity, and artificial systems. This involves developing AI responsibly, instilling human values, and creating systems that are non-threatening and beneficial.Contact Information*   Polybotics' GitHub*   Poliepals*   Xathil: Xathil@ProtonMail.com

Crazy Wisdom
Episode #464: From Meme Coins to Mind Melds: Crypto Meets AI

Crazy Wisdom

Play Episode Listen Later May 26, 2025 48:22


I, Stewart Alsop, had a fascinating conversation on this episode of Crazy Wisdom with Mallory McGee, the founder of Chroma, who is doing some really interesting work at the intersection of AI and crypto. We dove deep into how these two powerful technologies might reshape the internet and our interactions with it, moving beyond the hype cycles to what's truly foundational.Check out this GPT we trained on the conversationTimestamps00:00 The Intersection of AI and Crypto01:28 Bitcoin's Origins and Austrian Economics04:35 AI's Centralization Problem and the New Gatekeepers09:58 Agent Interactions and Decentralized Databases for Trustless Transactions11:11 AI as a Prosthetic Mind and the Interpretability Challenge15:12 Deterministic Blockchains vs. Non-Deterministic AI Intents18:44 The Demise of Traditional Apps in an Agent-Driven World35:07 Property Rights, Agent Registries, and Blockchains as BackendsKey InsightsCrypto's Enduring Fundamentals: Mallory emphasized that while crypto prices are often noise, the underlying fundamentals point to a new, long-term cycle for the Internet itself. It's about decentralizing control, a core principle stemming from Bitcoin's original blend of economics and technology.AI's Centralization Dilemma: We discussed the concerning trend of AI development consolidating power within a few major players. This, as Mallory pointed out, ironically mirrors the very centralization crypto aims to dismantle, potentially shifting control from governments to a new set of tech monopolies.Agents are the Future of Interaction: Mallory envisions a future where most digital interactions aren't human-to-LLM, but agent-to-agent. These autonomous agents will require decentralized, trustless platforms like blockchains to transact, hold assets, and communicate confidentially.Bridging Non-Deterministic AI with Deterministic Blockchains: A fascinating challenge Mallory highlighted is translating the non-deterministic "intents" of AI (e.g., an agent's goal to "get me a good return on spare cash") into the deterministic transactions required by blockchains. This translation layer is crucial for agents to operate effectively on-chain.The Decline of Traditional Apps: Mallory made a bold claim that traditional apps and web interfaces are on their way out. As AI agents become capable of generating personalized interfaces on the fly, the need for standardized, pre-built apps will diminish, leading to a world where software is hyper-personalized and often ephemeral.Blockchains as Agent Backbones: We explored the intriguing idea that blockchains might be inherently better suited for AI agents than for direct human use. Their deterministic nature, ability to handle assets, and potential for trustless reputation systems make them ideal backends for an agent-centric internet.Trust and Reputation for Agents: In a world teeming with AI agents, establishing trust is paramount. Mallory suggested that on-chain mechanisms like reward and slashing systems can be used to build verifiable reputation scores for agents, helping us discern trustworthy actors from malicious ones without central oversight.The Battle for an Open AI Future: The age-old battle between open and closed source is playing out again in the AI sphere. While centralized players currently seem to dominate, Mallory sees hope in the open-source AI movement, which could provide a crucial alternative to a future controlled by a few large entities.Contact Information*   Twitter: @McGee_noodle*   Company: Chroma

Machine Learning Guide
MLG 034 Large Language Models 1

Machine Learning Guide

Play Episode Listen Later May 7, 2025 50:48


Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at ocdevel.com/mlg/mlg34 Build the future of multi-agent software with AGNTCY Try a walking desk stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.

This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726

This Week in Machine Learning & Artificial Intelligence (AI) Podcast

Play Episode Listen Later Apr 8, 2025 51:45


Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori's two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori's performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.

Sales vs. Marketing
Lessons - Breaking Free From Bad Habits | Dr. Jud Brewer - Neuroscience of Addiction Expert

Sales vs. Marketing

Play Episode Listen Later Mar 25, 2025 13:58


➡️ Like The Podcast? Leave A Rating: https://ratethispodcast.com/successstory  In this "Lessons" episode, Dr. Jud Brewer, Neuroscience of Addiction Expert, reveals the science behind habits and addictions, explaining how our brains form automatic behaviors to conserve energy and how reinforcement learning reinforces unhealthy patterns. By learning to recognize the true rewards of our actions, Dr. Brewer shows us how to transform negative routines into opportunities for healthier change. ➡️ Show Linkshttps://successstorypodcast.com  YouTube: https://youtu.be/PpI2aFjA9FUApple: https://podcasts.apple.com/us/podcast/dr-judson-brewer-neuroscientist-addiction-psychiatrist/id1484783544Spotify: https://open.spotify.com/episode/531cPamqo4H0Esq6Yp8RQ3 ➡️ Watch the Podcast On Youtubehttps://www.youtube.com/c/scottdclary

Programming Throwdown
180: Reinforcement Learning

Programming Throwdown

Play Episode Listen Later Mar 17, 2025 112:22


Intro topic: GrillsNews/Links:You can't call yourself a senior until you've worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I've ever used — here's whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF &  GRPO ★ Support this podcast on Patreon ★