POPULARITY
The Twenty Minute VC: Venture Capital | Startup Funding | The Pitch
Joelle Pineau is the Chief Scientist at Cohere, where she leads research on advancing large language models and practical AI systems. Before joining Cohere, she was VP of AI Research at Meta, where she founded and led Meta AI's Montreal lab. A professor at McGill University, Joelle is renowned for her pioneering work in reinforcement learning, robotics, and responsible AI development. AGENDA: 00:00 Introduction to AI Scaling Laws 03:00 How Meta Shaped How I Think About AI Research 04:36 Challenges in Reinforcement Learning 10:00 Is It Possible to be Capital Efficient in AI 15:52 AI in Enterprise: Efficiency and Adoption 22:15 Security Concerns with AI Agents 28:34 Can Zuck Win By Buying the Galacticos of AI 32:15 The Rising Cost of Data 35:28 Synthetic Data and Model Degradation 37:22 Why AI Coding is Akin to Image Generation in 2015 48:46 If Joelle Was a VC Where Would She Invest? 52:17 Quickfire: Lessons from Zuck, Biggest Mindset Shift
My Fintech Newsletter for more interviews and the latest insights:↪︎ https://rexsalisbury.substack.com/In this episode, I sit down with Stevie Case from Vanta, a former pro gamer turned chief revenue officer, to discuss how AI is transforming the entire go-to-market function in B2B SaaS. Stevie shares insights on building agile sales organizations, how AI supercharges human roles rather than replacing them, and the evolving expectations for sales, customer success, and RevOps teams. The conversation covers AI tool adoption, hiring for an AI-native workforce, and why go-to-market roles are among the most exciting in tech today.Stevie Case: https://www.linkedin.com/in/steviecase/00:00:00 - AI's Impact on Go-To-Market Functions00:02:06 - Building Scalable Sales Organizations00:04:47 - Specialization and Segmentation in Sales00:06:28 - AI Supercharging Customer Success00:08:23 - Hiring and Onboarding with AI Support00:10:07 - Building AI-Driven Products with Customers00:12:08 - Selling New Products to Existing Customers00:15:02 - Early Product Adoption and Iteration00:17:25 - Operating at All Levels in Organizations00:20:01 - Creating Intense, High-Velocity Teams00:22:15 - Hiring AI-Native, Curious Builders00:25:05 - Measuring Success by Team Pride and Feedback00:26:07 - Developing Agent Platforms00:28:02 - Monetization and Business Model Evolution00:30:49 - AI-Enabled Competitive Advantages in Fintech00:32:31 - Top-Down AI Automation Demand00:34:11 - Reinforcement Learning in Fraud Detection00:38:00 - International Go-To-Market Expansion00:41:33 - Designing Global Sales Footprints00:45:04 - Resourcing RevOps and Systems Teams___Rex Salisbury LinkedIn:↪︎ https://www.linkedin.com/in/rexsalisburyTwitter: https://twitter.com/rexsalisburyTikTok: https://www.tiktok.com/@rex.salisburyInstagram: https://www.instagram.com/rexsalisbury/
Hey folks, Alex here. Can you believe it's already the middle of October? This week's show was a special one, not just because of the mind-blowing news, but because we set a new ThursdAI record with four incredible interviews back-to-back!We had Jessica Gallegos from Google DeepMind walking us through the cinematic new features in VEO 3.1. Then we dove deep into the world of Reinforcement Learning with my new colleague Kyle Corbitt from OpenPipe. We got the scoop on Amp's wild new ad-supported free tier from CEO Quinn Slack. And just as we were wrapping up, Swyx ( from Latent.Space , now with Cognition!) jumped on to break the news about their blazingly fast SWE-grep models. But the biggest story? An AI model from Google and Yale made a novel scientific discovery about cancer cells that was then validated in a lab. This is it, folks. This is the “let's f*****g go” moment we've been waiting for. So buckle up, because this week was an absolute monster. Let's dive in!ThursdAI - Recaps of the most high signal AI weekly spaces is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.Open Source: An AI Model Just Made a Real-World Cancer DiscoveryWe always start with open source, but this week felt different. This week, open source AI stepped out of the benchmarks and into the biology lab.Our friends at Qwen kicked things off with new 3B and 8B parameter versions of their Qwen3-VL vision model. It's always great to see powerful models shrink down to sizes that can run on-device. What's wild is that these small models are outperforming last generation's giants, like the 72B Qwen2.5-VL, on a whole suite of benchmarks. The 8B model scores a 33.9 on OS World, which is incredible for an on-device agent that can actually see and click things on your screen. For comparison, that's getting close to what we saw from Sonnet 3.7 just a few months ago. The pace is just relentless.But then, Google dropped a bombshell. A 27-billion parameter Gemma-based model they developed with Yale, called C2S-Scale, generated a completely novel hypothesis about how cancer cells behave. This wasn't a summary of existing research; it was a new idea, something no human scientist had documented before. And here's the kicker: researchers then took that hypothesis into a wet lab, tested it on living cells, and proved it was true.This is a monumental deal. For years, AI skeptics like Gary Marcus have said that LLMs are just stochastic parrots, that they can't create genuinely new knowledge. This feels like the first, powerful counter-argument. Friend of the pod, Dr. Derya Unutmaz, has been on the show before saying AI is going to solve cancer, and this is the first real sign that he might be right. The researchers noted this was an “emergent capability of scale,” proving once again that as these models get bigger and are trained on more complex data—in this case, turning single-cell RNA sequences into “sentences” for the model to learn from—they unlock completely new abilities. This is AI as a true scientific collaborator. Absolutely incredible.Big Companies & APIsThe big companies weren't sleeping this week, either. The agentic AI race is heating up, and we're seeing huge updates across the board.Claude Haiku 4.5: Fast, Cheap Model Rivals Sonnet 4 Accuracy (X, Official blog, X)First up, Anthropic released Claude Haiku 4.5, and it is a beast. It's a fast, cheap model that's punching way above its weight. On the SWE-bench verified benchmark for coding, it hit 73.3%, putting it right up there with giants like GPT-5 Codex, but at a fraction of the cost and twice the speed of previous Claude models. Nisten has already been putting it through its paces and loves it for agentic workflows because it just follows instructions without getting opinionated. It seems like Anthropic has specifically tuned this one to be a workhorse for agents, and it absolutely delivers. The thing to note also is the very impressive jump in OSWorld (50.7%), which is a computer use benchmark, and at this price and speed ($1/$5 MTok input/output) is going to make computer agents much more streamlined and speedy! ChatGPT will loose restrictions; age-gating enables “adult mode” with new personality features coming (X) Sam Altman set X on fire with a thread announcing that ChatGPT will start loosening its restrictions. They're planning to roll out an “adult mode” in December for age-verified users, potentially allowing for things like erotica. More importantly, they're bringing back more customizable personalities, trying to recapture some of the magic of GPT-4.0 that so many people missed. It feels like they're finally ready to treat adults like adults, letting us opt-in to R-rated conversations while keeping strong guardrails for minors. This is a welcome change, and we've been advocating for this for a while, and it's a notable change from the XAI approach I covered last week. Opt in for adults with verification while taking precautions vs engagement bait in the form of a flirty animated waifu with engagement mechanics. Microsoft is making every windows 11 an AI PC with copilot voice input and agentic powers (Blog,X)And in breaking news from this morning, Microsoft announced that every Windows 11 machine is becoming an AI PC. They're building a new Copilot agent directly into the OS that can take over and complete tasks for you. The really clever part? It runs in a secure, sandboxed desktop environment that you can watch and interact with. This solves a huge problem with agents that take over your mouse and keyboard, locking you out of your own computer. Now, you can give the agent a task and let it run in the background while you keep working. This is going to put agentic AI in front of hundreds of millions of users, and it's a massive step towards making AI a true collaborator at the OS level.NVIDIA DGX - the tiny personal supercomputer at $4K (X, LMSYS Blog)NVIDIA finally delivered their promised AI Supercomputer, and while the excitement was in the air with Jensen hand delivering the DGX Spark to OpenAI and Elon (recreating that historical picture when Jensen hand delivered a signed DGX workstation while Elon was still affiliated with OpenAI). The workstation was sold out almost immediately. Folks from LMSys did a great deep dive into specs, all the while, folks on our feeds are saying that if you want to get the maximum possible open source LLMs inference speed, this machine is probably overpriced, compared to what you can get with an M3 Ultra Macbook with 128GB of RAM or the RTX 5090 GPU which can get you similar if not better speeds at significantly lower price points. Anthropic's “Claude Skills”: Your AI Agent Finally Gets a Playbook (Blog)Just when we thought the week couldn't get any more packed, Anthropic dropped “Claude Skills,” a huge upgrade that lets you give your agent custom instructions and workflows. Think of them as expertise folders you can create for specific tasks. For example, you can teach Claude your personal coding style, how to format reports for your company, or even give it a script to follow for complex data analysis.The best part is that Claude automatically detects which “Skill” is needed for a given task, so you don't have to manually load them. This is a massive step towards making agents more reliable and personalized, moving beyond just a single custom instruction and into a library of repeatable, expert processes. It's available now for all paid users, and it's a feature I've been waiting for. Our friend Simon Willison things skills may be a bigger deal than MCPs!
What does it really mean when GPT-5 “thinks”? In this conversation, OpenAI's VP of Research Jerry Tworek explains how modern reasoning models work in practice—why pretraining and reinforcement learning (RL/RLHF) are both essential, what that on-screen “thinking” actually does, and when extra test-time compute helps (or doesn't). We trace the evolution from O1 (a tech demo good at puzzles) to O3 (the tool-use shift) to GPT-5 (Jerry calls it “03.1-ish”), and talk through verifiers, reward design, and the real trade-offs behind “auto” reasoning modes.We also go inside OpenAI: how research is organized, why collaboration is unusually transparent, and how the company ships fast without losing rigor. Jerry shares the backstory on competitive-programming results like ICPC, what they signal (and what they don't), and where agents and tool use are genuinely useful today. Finally, we zoom out: could pretraining + RL be the path to AGI? This is the MAD Podcast —AI for the 99%. If you're curious about how these systems actually work (without needing a PhD), this episode is your map to the current AI frontier.OpenAIWebsite - https://openai.comX/Twitter - https://x.com/OpenAIJerry TworekLinkedIn - https://www.linkedin.com/in/jerry-tworek-b5b9aa56X/Twitter - https://x.com/millionintFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro(01:01) What Reasoning Actually Means in AI(02:32) Chain of Thought: Models Thinking in Words(05:25) How Models Decide Thinking Time(07:24) Evolution from O1 to O3 to GPT-5(11:00) Before OpenAI: Growing up in Poland, Dropping out of School, Trading(20:32) Working on Robotics and Rubik's Cube Solving(23:02) A Day in the Life: Talking to Researchers(24:06) How Research Priorities Are Determined(26:53) Collaboration vs IP Protection at OpenAI(29:32) Shipping Fast While Doing Deep Research(31:52) Using OpenAI's Own Tools Daily(32:43) Pre-Training Plus RL: The Modern AI Stack(35:10) Reinforcement Learning 101: Training Dogs(40:17) The Evolution of Deep Reinforcement Learning(42:09) When GPT-4 Seemed Underwhelming at First(45:39) How RLHF Made GPT-4 Actually Useful(48:02) Unsupervised vs Supervised Learning(49:59) GRPO and How DeepSeek Accelerated US Research(53:05) What It Takes to Scale Reinforcement Learning(55:36) Agentic AI and Long-Horizon Thinking(59:19) Alignment as an RL Problem(1:01:11) Winning ICPC World Finals Without Specific Training(1:05:53) Applying RL Beyond Math and Coding(1:09:15) The Path from Here to AGI(1:12:23) Pure RL vs Language Models
Invest Like the Best: Read the notes at at podcastnotes.org. Don't forget to subscribe for free to our newsletter, the top 10 ideas of the week, every Monday --------- My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- This episode is brought to you by Ramp. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to Ramp.com/invest to sign up for free and get a $250 welcome bonus. – This episode is brought to you by Ridgeline. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to ridgelineapps.com to learn more about the platform. – This episode is brought to you by AlphaSense. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at Alpha-Sense.com/Invest and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing
Invest Like the Best Key Takeaways Today, the challenge is not to make the model bigger; the problem is knowing how best to generate and create data in useful domains so that the model gets better at them AIs do not have to get to digital god mode for AI to have an enormous impact on productivity and society: Even if AI does not become smarter than humans in the short term, the economic value creation boom will still be enormous“If we didn't have the AI boom, the US probably would be behind China and no longer the world hegemon by the end of the decade, if not sooner.” – Dylan Patel The US is doing what China has done historically: dumping tons of capital into something, and then the market becomes If there is a sustained lag in model improvement, the US economy will go into a recession; this is the case for Korea and Taiwan, too On the AI talent wars: If these companies are willing to spend billions on training runs, it makes sense to spend a lot on talent to optimize those runs and potentially mitigate errors We actually are not dedicating that much power to AI yet; only 3-4% of total power is going to data centers He is more optimistic on Anthropic than OpenAI; their revenue is accelerating much faster because of their focus on the $2 trillion software market, whereas OpenAI's focus is split between many thingsWhile Meta “has the cards to potentially own it all”, Google is better positioned to dominate the consumer and professional markets Read the full notes @ podcastnotes.orgMy guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- This episode is brought to you by Ramp. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to Ramp.com/invest to sign up for free and get a $250 welcome bonus. – This episode is brought to you by Ridgeline. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to ridgelineapps.com to learn more about the platform. – This episode is brought to you by AlphaSense. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at Alpha-Sense.com/Invest and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing
Invest Like the Best Key Takeaways Today, the challenge is not to make the model bigger; the problem is knowing how best to generate and create data in useful domains so that the model gets better at them AIs do not have to get to digital god mode for AI to have an enormous impact on productivity and society: Even if AI does not become smarter than humans in the short term, the economic value creation boom will still be enormous“If we didn't have the AI boom, the US probably would be behind China and no longer the world hegemon by the end of the decade, if not sooner.” – Dylan Patel The US is doing what China has done historically: dumping tons of capital into something, and then the market becomes If there is a sustained lag in model improvement, the US economy will go into a recession; this is the case for Korea and Taiwan, too On the AI talent wars: If these companies are willing to spend billions on training runs, it makes sense to spend a lot on talent to optimize those runs and potentially mitigate errors We actually are not dedicating that much power to AI yet; only 3-4% of total power is going to data centers He is more optimistic on Anthropic than OpenAI; their revenue is accelerating much faster because of their focus on the $2 trillion software market, whereas OpenAI's focus is split between many thingsWhile Meta “has the cards to potentially own it all”, Google is better positioned to dominate the consumer and professional markets Read the full notes @ podcastnotes.orgMy guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- This episode is brought to you by Ramp. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to Ramp.com/invest to sign up for free and get a $250 welcome bonus. – This episode is brought to you by Ridgeline. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to ridgelineapps.com to learn more about the platform. – This episode is brought to you by AlphaSense. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at Alpha-Sense.com/Invest and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing
My guest today is Dylan Patel. Dylan is the founder and CEO of SemiAnalysis. At SemiAnalysis Dylan tracks the semiconductor supply chain and AI infrastructure buildout with unmatched granularity—literally watching data centers get built through satellite imagery and mapping hundreds of billions in capital flows. Our conversation explores the massive industrial buildout powering AI, from the strategic chess game between OpenAI, Nvidia, and Oracle to why we're still in the first innings of post-training and reinforcement learning. Dylan explains infrastructure realities like electrician wages doubling and companies using diesel truck engines for emergency power, while making a sobering case about US-China competition and why America needs AI to succeed. We discuss his framework for where value will accrue in the stack, why traditional SaaS economics are breaking down under AI's high cost of goods sold, and which hardware bottlenecks matter most. This is one of the most comprehensive views of the physical reality underlying the AI revolution you'll hear anywhere. Please enjoy my conversation with Dylan Patel. For the full show notes, transcript, and links to mentioned content, check out the episode page here. ----- This episode is brought to you by Ramp. Ramp's mission is to help companies manage their spend in a way that reduces expenses and frees up time for teams to work on more valuable projects. Go to Ramp.com/invest to sign up for free and get a $250 welcome bonus. – This episode is brought to you by Ridgeline. Ridgeline has built a complete, real-time, modern operating system for investment managers. It handles trading, portfolio management, compliance, customer reporting, and much more through an all-in-one real-time cloud platform. Head to ridgelineapps.com to learn more about the platform. – This episode is brought to you by AlphaSense. AlphaSense has completely transformed the research process with cutting-edge AI technology and a vast collection of top-tier, reliable business content. Invest Like the Best listeners can get a free trial now at Alpha-Sense.com/Invest and experience firsthand how AlphaSense and Tegus help you make smarter decisions faster. ----- Editing and post-production work for this episode was provided by The Podcast Consultant (https://thepodcastconsultant.com). Show Notes: (00:00:00) Welcome to Invest Like the Best (00:05:12) The AI Infrastructure Buildout (00:08:25) Scaling AI Models and Compute Needs (00:11:44) Reinforcement Learning and AI Training (00:14:07) The Future of AI and Compute (00:17:47) AI in Practical Applications (00:22:29) The Importance of Data and Environments in AI Training (00:29:45) Human Analogies in AI Development (00:40:34) The Challenge of Infinite Context in AI Models (00:44:08) The Bullish and Bearish Perspectives on AI (00:48:25) The Talent Wars in AI Research (00:56:54) The Power Dynamics in AI and Tech (01:13:29) The Future of AI and Its Economic Impact (01:18:55) The Gigawatt Data Center Boom (01:21:12) Supply Chain and Workforce Dynamics (01:24:23) US vs. China: AI and Power Dynamics (01:37:16) AI Startups and Innovations (01:52:44) The Changing Economics of Software (01:58:12) The Kindest Thing
How do we get from today's AI copilots to true human-level intelligence? In this episode of Eye on AI, Craig Smith sits down with Eiso Kant, Co-Founder of Poolside, to explore why reinforcement learning + software development might be the fastest path to human-level AI. Eiso shares Poolside's mission to build AI that doesn't just autocomplete code — but learns like a real developer. You'll hear how Poolside uses reinforcement learning from code execution (RLCF), why software development is the perfect training ground for intelligence, and how agentic AI systems are about to transform the way we build and ship software. If you want to understand the future of AI, software engineering, and AGI, this conversation is packed with insights you won't want to miss. Stay Updated: Craig Smith on X:https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI (00:00) The Missing Ingredient for Human-Level AI(01:02) Eiso Kant's Journey(05:30) Using Software Development to Reach AGI(07:48) Why Coding Is the Perfect Training Ground for Intelligence(10:11) Reinforcement Learning from Code Execution (RLCF) Explained(13:14) How Poolside Builds and Trains Its Foundation Models(17:35) The Rise of Agentic AI(21:08) Making Software Creation Accessible to Everyone(26:03) Overcoming Model Limitations(32:08) Training Models to Think(37:24) Building the Future of AI Agents(42:11) Poolside's Full-Stack Approach to AI Deployment(46:28) Enterprise Partnerships, Security & Customization Behind the Firewall(50:48) Giving Enterprises Transparency to Drive Adoption
Artificial intelligence could fuel Alberta's next big tech boom. Three leaders in the field—Cam Linke, CEO of Amii; Nicole Janssen, co-founder of AltaML; and Danielle Gifford, managing director of AI with PwC—dig into how AI is transforming everything from energy to healthcare and even space. They share why Edmonton is a world leader in reinforcement learning, and why Alberta's natural advantages could make it a global hub for data centres and AI commercialization. This podcast is generously supported by Don Archibald. The Hub thanks him for his ongoing support. The Hub is Canada's fastest-growing independent digital news outlet. Subscribe to our YouTube channel to get our latest videos: https://www.youtube.com/@TheHubCanada Subscribe to The Hub's podcast feed to get our best content when you are on the go: https://tinyurl.com/3a7zpd7e (Apple) https://tinyurl.com/y8akmfn7 (Spotify) Want more Hub? Get a FREE 3-month trial membership on us: https://thehub.ca/free-trial/ Follow The Hub on X: https://x.com/thehubcanada?lang=en CREDITS: Falice Chin - Producer and Editor Ryan Hastman - Host Amal Attar-Guzman - Sound and Video Assistant To contact us, sign up for updates, and access transcripts email support@thehub.ca
Hansohl Kim is an engineer at Anthropic, where he focuses on reinforcement learning & AI safety for models like Claude. With experience spanning computer science, biotech, & machine learning, he brings a unique perspective to the fast-changing world of artificial intelligence.Listen as Hansohl unpacks the challenges of alignment, the importance of guardrails, & what it takes to design AI systems we can truly trust.RELATED LINKS:
David Abel is a Senior Research Scientist at DeepMind on the Agency team, and an Honorary Fellow at the University of Edinburgh. His research blends computer science and philosophy, exploring foundational questions about reinforcement learning, definitions, and the nature of agency. Featured References Plasticity as the Mirror of Empowerment David Abel, Michael Bowling, André Barreto, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh A Definition of Continual RL David Abel, André Barreto, Benjamin Van Roy, Doina Precup, Hado van Hasselt, Satinder Singh Agency is Frame-Dependent David Abel, André Barreto, Michael Bowling, Will Dabney, Shi Dong, Steven Hansen, Anna Harutyunyan, Khimya Khetarpal, Clare Lyle, Razvan Pascanu, Georgios Piliouras, Doina Precup, Jonathan Richens, Mark Rowland, Tom Schaul, Satinder Singh On the Expressivity of Markov Reward David Abel, Will Dabney, Anna Harutyunyan, Mark Ho, Michael Littman, Doina Precup, Satinder Singh — Outstanding Paper Award, NeurIPS 2021 Additional References Bidirectional Communication Theory — Marko 1973 Causality, Feedback and Directed Information — Massey 1990 The Big World Hypothesis — Javed et al. 2024 Loss of plasticity in deep continual learning — Dohare et al. 2024 Three Dogmas of Reinforcement Learning — Abel 2024 Explaining dopamine through prediction errors and beyond — Gershman et al. 2024 David Abel Google Scholar David Abel personal website
Recorded at Reinforcement Learning Conference 2025 at University of Alberta, Edmonton Alberta Canada.Featured ReferencesLecture on the Oak Architecture, Rich SuttonAlberta Plan, Rich Sutton with Mike Bowling and Patrick Pilarski Additional ReferencesJacob Beck on Google Scholar Alex Goldie on Google ScholarCornelius Braun on Google ScholarReinforcement Learning Conference
We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References Empirical Reinforcement Learning ResearchMitigating Suboptimality of Deterministic Policy Gradients in Complex Q-functionsAyush Jain, Norio Kosaka, Xinhu Li, Kyung-Min Kim, Erdem Biyik, Joseph J LimApplications of Reinforcement LearningWOFOSTGym: A Crop Simulator for Learning Annual and Perennial Crop Management StrategiesWilliam Solow, Sandhya Saisubramanian, Alan FernEmerging Topics in Reinforcement LearningTowards Improving Reward Design in RL: A Reward Alignment Metric for RL PractitionersCalarina Muslimani, Kerrick Johnstonbaugh, Suyog Chandramouli, Serena Booth, W. Bradley Knox, Matthew E. TaylorScientific Understanding in Reinforcement LearningMulti-Task Reinforcement Learning Enables Parameter ScalingReginald McLean, Evangelos Chatzaroulas, J K Terry, Isaac Woungang, Nariman Farsad, Pablo Samuel Castro
We caught up with the RLC Outstanding Paper award winners for your listening pleasure. Recorded on location at Reinforcement Learning Conference 2025, at University of Alberta, in Edmonton Alberta Canada in August 2025.Featured References Scientific Understanding in Reinforcement Learning How Should We Meta-Learn Reinforcement Learning Algorithms? Alexander David Goldie, Zilin Wang, Jakob Nicolaus Foerster, Shimon Whiteson Tooling, Environments, and Evaluation for Reinforcement Learning Syllabus: Portable Curricula for Reinforcement Learning Agents Ryan Sullivan, Ryan Pégoud, Ameen Ur Rehman, Xinchen Yang, Junyun Huang, Aayush Verma, Nistha Mitra, John P Dickerson Resourcefulness in Reinforcement Learning PufferLib 2.0: Reinforcement Learning at 1M steps/s Joseph Suarez Theory of Reinforcement Learning Deep Reinforcement Learning with Gradient Eligibility Traces Esraa Elelimy, Brett Daley, Andrew Patterson, Marlos C. Machado, Adam White, Martha White
Prof Thomas Akam is a Neuroscientist at the Oxford University Department of Experimental Psychology. He is a Wellcome Career Development Fellow and Associate Professor at the University of Oxford, and leads the Cognitive Circuits research group.Featured ReferencesBrain Architecture for Adaptive BehaviourThomas Akam, RLDM 2025 TutorialAdditional ReferencesThomas Akam on Google ScholarpyPhotometry : Open source, Python based, fiber photometry data acquisition pyControl : Open source, Python based, behavioural experiment control.Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control, Nathaniel D Daw, Yael Niv, Peter Dayan, 2005Further analysis of the hippocampal amnesic syndrome: 14-year follow-up study of H. M., Milner, B., Corkin, S., & Teuber, H. L., 1968Internally generated cell assembly sequences in the rat hippocampus, Pastalkova E, Itskov V, Amarasingham A, Buzsáki G. Science. 2008Multi-disciplinary Conference on Reinforcement Learning and Decision 2025
"If you're going to be running a very elite research institution, you have to have the best people. To have the best people, you have to trust them and empower them. You can't hire a world expert in some area and then tell them what to do. They know more than you do. They're smarter than you are in their area. So you've got to trust your people. One of our really foundational commitments to our people is: we trust you. We're going to work to empower you. Go do the thing that you need to do. If somebody in the labs wants to spend 5, 10, 15 years working on something they think is really important, they're empowered to do that." - Doug Burger Fresh out of the studio, Doug Burger, Technical Fellow and Corporate Vice President at Microsoft Research, joins us to explore Microsoft's bold expansion into Southeast Asia with the recent launch of the Microsoft Research Asia lab in Singapore. From there, Doug shares his accidental journey from academia to leading global research operations, reflecting on how Microsoft Research's open collaboration model empowers over thousands of researchers worldwide to tackle humanity's biggest challenges. Following on, he highlights the recent breakthroughs from Microsoft Research for example, the quantum computing breakthrough with topological qubits, the evolution from lines of code to natural language programming, and how AI is accelerating innovation across multiple scaling dimensions beyond traditional data limits. Addressing the intersection of three computing paradigms—logic, probability, and quantum—he emphasizes that geographic diversity in research labs enables Microsoft to build AI that works for everyone, not just one region. Closing the conversation, Doug shares his vision of what great looks like for Microsoft Research with researchers driven by purpose and passion to create breakthroughs that advance both science and society. Episode Highlights: [00:00] Quote of the Day by Doug Burger [01:08] Doug Burger's journey from academia to Microsoft Research [02:24] Career advice: Always seek challenges, move when feeling restless or comfortable [03:07] Launch of Microsoft Research Asia in Singapore: Tapping local talent and culture for inclusive AI development [04:13] Singapore lab focuses on foundational AI, embodied AI, and healthcare applications [06:19] AI detecting seizures in children and assessing Parkinson's motor function [08:24] Embedding Southeast Asian societal norms and values into Foundational AI research [10:26] Microsoft Research's open collaboration model [12:42] Generative AI's rapid pace accelerating technological innovation and research tools [14:36] AI revolutionizing computer architecture by creating completely new interfaces [16:24] Open versus closed source AI models debate and Microsoft's platform approach [18:08] Reasoning models enabling formal verification and correctness guarantees in AI [19:35] Multiple scaling dimensions in AI beyond traditional data scaling laws [21:01] Project Catapult and Brainwave: Building configurable hardware acceleration platforms [23:29] Microsoft's 17-year quantum computing journey with topological qubits breakthrough [26:26] Balancing blue-sky foundational research with application-driven initiatives at scale [29:16] Three computing paradigms: logic, probability (AI), and quantum superposition [32:26] Microsoft Research's exploration-to-exploitation playbook for breakthrough discoveries [35:26] Research leadership secret: Curiosity across fields enables unexpected connections [37:11] Hidden Mathematical Structures Transformers Architecture in LLMs [40:04] Microsoft Research's vision: Becoming Bell Labs for AI era [42:22] Steering AI models for mental health and critical thinking conversations Profile: Doug Burger, Technical Fellow and Corporate Vice President, Microsoft Research LinkedIn: https://www.linkedin.com/in/dcburger/ Microsoft Research Profile: https://www.microsoft.com/en-us/research/people/dburger/ Podcast Information: Bernard Leong hosts and produces the show. The proper credits for the intro and end music are "Energetic Sports Drive." G. Thomas Craig mixed and edited the episode in both video and audio format. Here are the links to watch or listen to our podcast. Analyse Asia Main Site: https://analyse.asia Analyse Asia Spotify: https://open.spotify.com/show/1kkRwzRZa4JCICr2vm0vGl Analyse Asia Apple Podcasts: https://podcasts.apple.com/us/podcast/analyse-asia-with-bernard-leong/id914868245 Analyse Asia YouTube: https://www.youtube.com/@AnalyseAsia Analyse Asia LinkedIn: https://www.linkedin.com/company/analyse-asia/ Analyse Asia X (formerly known as Twitter): https://twitter.com/analyseasia Analyse Asia Threads: https://www.threads.net/@analyseasia Sign Up for Our This Week in Asia Newsletter: https://www.analyse.asia/#/portal/signup Subscribe Newsletter on LinkedIn https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7149559878934540288
This interview was recorded for the GOTO Book Club.http://gotopia.tech/bookclubRead the full transcription of the interview hereJames Phoenix - Co-Author of "Prompt Engineering for Generative AI"Mike Taylor - Co-Author of "Prompt Engineering for Generative AI"Phil Winder - Author of "Reinforcement Learning" & CEO of Winder.AIRESOURCESJameshttps://x.com/jamesaphoenix12https://www.linkedin.com/in/jamesphoenixhttps://understandingdata.comMikehttp://saxifrage.xyzhttps://twitter.com/hammer_mthttps://www.linkedin.com/in/mjt145Philhttps://twitter.com/DrPhilWinderhttps://linkedin.com/in/drphilwinderhttps://winder.aiLinkshttps://brightpool.devhttps://karpathy.aihttps://help.openai.com/en/articles/6654000https://gemini.google.comhttps://dreambooth.github.iohttps://github.com/microsoft/LoRAhttps://claude.aihttps://www.langchain.com/langgraphDESCRIPTIONLarge language models (LLMs) and diffusion models such as ChatGPT and Stable Diffusion have unprecedented potential. Because they have been trained on all the public text and images on the internet, they can make useful contributions to a wide variety of tasks. And with the barrier to entry greatly reduced today, practically any developer can harness LLMs and diffusion models to tackle problems previously unsuitable for automation.With this book, you'll gain a solid foundation in generative AI, including how to apply these models in practice. When first integrating LLMs and diffusion models into their workflows, most developers struggle to coax reliable enough results from them to use in automated systems.* Book description: © O'ReillyRECOMMENDED BOOKSJames Phoenix & Mike Taylor • Prompt Engineering for Generative AIPhil WiBlueskyTwitterInstagramLinkedInFacebookCHANNEL MEMBERSHIP BONUSJoin this channel to get early access to videos & other perks:https://www.youtube.com/channel/UCs_tLP3AiwYKwdUHpltJPuA/joinLooking for a unique learning experience?Attend the next GOTO conference near you! Get your ticket: gotopia.techSUBSCRIBE TO OUR YOUTUBE CHANNEL - new videos posted daily!
When you ask ChatGPT or Gemini a question about politics, whose opinions are you really hearing?In this episode, we dive into a provocative new study from political scientist Justin Grimmer and his colleagues, which finds that nearly every major large language model—from ChatGPT to Grok—is perceived by Americans as having a left-leaning bias. But why is that? Is it the training data? The guardrails? The Silicon Valley engineers? Or something deeper about the culture of the internet itself?The hosts grapple with everything from “Mecha Hitler” incidents on Grok to the way terms like “unhoused” sneak into AI-generated text—and what that might mean for students, voters, and future regulation. Should the government step in to ensure “political neutrality”? Will AI reshape how people learn about history or policy? Or are we just projecting our own echo chambers onto machines?
Stefano V. Albrecht was previously Associate Professor at the University of Edinburgh, and is currently serving as Director of AI at startup Deepflow. He is a Program Chair of RLDM 2025 and is co-author of the MIT Press textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches".Featured ReferencesMulti-Agent Reinforcement Learning: Foundations and Modern ApproachesStefano V. Albrecht, Filippos Christianos, Lukas SchäferMIT Press, 2024RLDM 2025: Reinforcement Learning and Decision Making ConferenceDublin, IrelandEPyMARL: Extended Python MARL frameworkhttps://github.com/uoe-agents/epymarlBenchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative TasksGeorgios Papoudakis and Filippos Christianos and Lukas Schäfer and Stefano V. Albrecht
What if your company had a digital brain that never forgot, always knew the answer, and could instantly tap the knowledge of your best engineers, even after they left? Superintelligence can feel like a hand‑wavy pipe‑dream— yet, as Misha Laskin argues, it becomes a tractable engineering problem once you scope it to the enterprise level. Former DeepMind researcher Laskin is betting on an oracle‑like AI that grasps every repo, Jira ticket and hallway aside as deeply as your principal engineer—and he's building it at Reflection AI.In this wide‑ranging conversation, Misha explains why coding is the fastest on‑ramp to superintelligence, how “organizational” beats “general” when real work is on the line, and why today's retrieval‑augmented generation (RAG) feels like “exploring a jungle with a flashlight.” He walks us through Asimov, Reflection's newly unveiled code‑research agent that fuses long‑context search, team‑wide memory and multi‑agent planning so developers spend less time spelunking for context and more time shipping.We also rewind his unlikely journey—from physics prodigy in a Manhattan‑Project desert town, to Berkeley's AI crucible, to leading RLHF for Google Gemini—before he left big‑lab comfort to chase a sharper vision of enterprise super‑intelligence. Along the way: the four breakthroughs that unlocked modern AI, why capital efficiency still matters in the GPU arms‑race, and how small teams can lure top talent away from nine‑figure offers.If you're curious about the next phase of AI agents, the future of developer tooling, or the gritty realities of scaling a frontier‑level startup—this episode is your blueprint.Reflection AIWebsite - https://reflection.aiLinkedIn - https://www.linkedin.com/company/reflectionaiMisha LaskinLinkedIn - https://www.linkedin.com/in/mishalaskinX/Twitter - https://x.com/mishalaskinFIRSTMARKWebsite - https://firstmark.comX/Twitter - https://twitter.com/FirstMarkCapMatt Turck (Managing Director)LinkedIn - https://www.linkedin.com/in/turck/X/Twitter - https://twitter.com/mattturck(00:00) Intro (01:42) Reflection AI: Company Origins and Mission (04:14) Making Superintelligence Concrete (06:04) Superintelligence vs. AGI: Why the Goalposts Moved (07:55) Organizational Superintelligence as an Oracle (12:05) Coding as the Shortcut: Hands, Legs & Brain for AI (16:00) Building the Context Engine (20:55) Capturing Tribal Knowledge in Organizations (26:31) Introducing Asimov: A Deep Code Research Agent (28:44) Team-Wide Memory: Preserving Institutional Knowledge (33:07) Multi-Agent Design for Deep Code Understanding (34:48) Data Retrieval and Integration in Asimov (38:13) Enterprise-Ready: VPC and On-Prem Deployments (39:41) Reinforcement Learning in Asimov's Development (41:04) Misha's Journey: From Physics to AI (42:06) Growing Up in a Science-Driven Desert Town (53:03) Building General Agents at DeepMind (56:57) Founding Reflection AI After DeepMind (58:54) Product-Driven Superintelligence: Why It Matters (01:02:22) The State of Autonomous Coding Agents (01:04:26) What's Next for Reflection AI
Rico Knapper is the CEO of Pailot and loves PCBs shopfloors. Why? Because his AI based solution outperforms other approaches. We met him at Siemens.
How do you build a foundation model that can write code at a human level? Eiso Kant (CTO & co-founder, Poolside) reveals the technical architecture, distributed team strategies, and reinforcement learning breakthroughs powering one of Europe's most ambitious AI startups. Learn how Poolside operates 10,000+ H200s, runs the world's largest code execution RL environment, and why CTOs must rethink engineering orgs for an agent-driven future.
Professor Satinder Singh of Google DeepMind and U of Michigan is co-founder of RLDM. Here he narrates the origin story of the Reinforcement Learning and Decision Making meeting (not conference).Recorded on location at Trinity College Dublin, Ireland during RLDM 2025.Featured ReferencesRLDM 2025: Multi-disciplinary Conference on Reinforcement Learning and Decision Making (RLDM)June 11-14, 2025 at Trinity College Dublin, IrelandSatinder Singh on Google Scholar
The final step of making A.I. requires giving the system certain questions we know the answer to and some questions we do not know the answer to and then checking their answers against reality. Fr. Fessio explains how A.I. ultimately depends entirely on humans and thereby cannot self-replicate.
This episode is sponsored by Oracle. OCI is the next-generation cloud designed for every workload – where you can run any application, including any AI projects, faster and more securely for less. On average, OCI costs 50% less for compute, 70% less for storage, and 80% less for networking. Join Modal, Skydance Animation, and today's innovative AI tech companies who upgraded to OCI…and saved. Try OCI for free at http://oracle.com/eyeonai What if you could fine-tune an AI model without any labeled data—and still outperform traditional training methods? In this episode of Eye on AI, we sit down with Jonathan Frankle, Chief Scientist at Databricks and co-founder of MosaicML, to explore TAO (Test-time Adaptive Optimization)—Databricks' breakthrough tuning method that's transforming how enterprises build and scale large language models (LLMs). Jonathan explains how TAO uses reinforcement learning and synthetic data to train models without the need for expensive, time-consuming annotation. We dive into how TAO compares to supervised fine-tuning, why Databricks built their own reward model (DBRM), and how this system allows for continual improvement, lower inference costs, and faster enterprise AI deployment. Whether you're an AI researcher, enterprise leader, or someone curious about the future of model customization, this episode will change how you think about training and deploying AI. Explore the latest breakthroughs in data and AI from Databricks: https://www.databricks.com/events/dataaisummit-2025-announcements Stay Updated: Craig Smith on X: https://x.com/craigss Eye on A.I. on X: https://x.com/EyeOn_AI
I, Stewart Alsop, am thrilled to welcome Xathil of Poliebotics to this episode of Crazy Wisdom, for what is actually our second take, this time with a visual surprise involving a fascinating 3D-printed Bauta mask. Xathil is doing some truly groundbreaking work at the intersection of physical reality, cryptography, and AI, which we dive deep into, exploring everything from the philosophical implications of anonymity to the technical wizardry behind his "Truth Beam."Check out this GPT we trained on the conversationTimestamps01:35 Xathil explains the 3D-printed Bauta Mask, its Venetian origins, and its role in enabling truth through anonymity via his project, Poliepals.04:50 The crucial distinction between public identity and "real" identity, and how pseudonyms can foster truth-telling rather than just conceal.10:15 Addressing the serious risks faced by crypto influencers due to public displays of wealth and the broader implications for online identity.15:05 Xathil details the core Poliebotics technology: the "Truth Beam," a projector-camera system for cryptographically timestamping physical reality.18:50 Clarifying the concept of "proof of aliveness"—verifying a person is currently live in a video call—versus the more complex "proof of liveness."21:45 How the speed of light provides a fundamental advantage for Poliebotics in outmaneuvering AI-generated deepfakes.32:10 The concern of an "inversion," where machine learning systems could become dominant over physical reality by using humans as their actuators.45:00 Xathil's ambitious project to use Poliebotics for creating cryptographically verifiable records of biodiversity, beginning with an enhanced Meles trap.Key InsightsAnonymity as a Truth Catalyst: Drawing from Oscar Wilde, the Bauta mask symbolizes how anonymity or pseudonyms can empower individuals to reveal deeper, more authentic truths. This challenges the notion that masks only serve to hide, suggesting they can be tools for genuine self-expression.The Bifurcation of Identity: In our digital age, distinguishing between one's core "real" identity and various public-facing personas is increasingly vital. This separation isn't merely about concealment but offers a space for truthful expression while navigating public life.The Truth Beam: Anchoring Reality: Poliebotics' "Truth Beam" technology employs a projector-camera system to cast cryptographic hashes onto physical scenes, recording them and anchoring them to a blockchain. This aims to create immutable, verifiable records of reality to combat the rise of sophisticated deepfakes.Harnessing Light Speed Against Deepfakes: The fundamental defense Poliebotics offers against AI-generated fakes is the speed of light. Real-world light reflection for capturing projected hashes is virtually instantaneous, whereas an AI must simulate this complex process, a task too slow to keep up with real-time verification.The Specter of Humans as AI Actuators: A significant future concern is the "inversion," where AI systems might utilize humans as unwitting agents to achieve their objectives in the physical world. By manipulating incentives, AIs could effectively direct human actions, raising profound questions about agency.Towards AI Symbiosis: The ideal future isn't a human-AI war or complete technological asceticism, but a cooperative coexistence between nature, humanity, and artificial systems. This involves developing AI responsibly, instilling human values, and creating systems that are non-threatening and beneficial.Contact Information* Polybotics' GitHub* Poliepals* Xathil: Xathil@ProtonMail.com
I, Stewart Alsop, had a fascinating conversation on this episode of Crazy Wisdom with Mallory McGee, the founder of Chroma, who is doing some really interesting work at the intersection of AI and crypto. We dove deep into how these two powerful technologies might reshape the internet and our interactions with it, moving beyond the hype cycles to what's truly foundational.Check out this GPT we trained on the conversationTimestamps00:00 The Intersection of AI and Crypto01:28 Bitcoin's Origins and Austrian Economics04:35 AI's Centralization Problem and the New Gatekeepers09:58 Agent Interactions and Decentralized Databases for Trustless Transactions11:11 AI as a Prosthetic Mind and the Interpretability Challenge15:12 Deterministic Blockchains vs. Non-Deterministic AI Intents18:44 The Demise of Traditional Apps in an Agent-Driven World35:07 Property Rights, Agent Registries, and Blockchains as BackendsKey InsightsCrypto's Enduring Fundamentals: Mallory emphasized that while crypto prices are often noise, the underlying fundamentals point to a new, long-term cycle for the Internet itself. It's about decentralizing control, a core principle stemming from Bitcoin's original blend of economics and technology.AI's Centralization Dilemma: We discussed the concerning trend of AI development consolidating power within a few major players. This, as Mallory pointed out, ironically mirrors the very centralization crypto aims to dismantle, potentially shifting control from governments to a new set of tech monopolies.Agents are the Future of Interaction: Mallory envisions a future where most digital interactions aren't human-to-LLM, but agent-to-agent. These autonomous agents will require decentralized, trustless platforms like blockchains to transact, hold assets, and communicate confidentially.Bridging Non-Deterministic AI with Deterministic Blockchains: A fascinating challenge Mallory highlighted is translating the non-deterministic "intents" of AI (e.g., an agent's goal to "get me a good return on spare cash") into the deterministic transactions required by blockchains. This translation layer is crucial for agents to operate effectively on-chain.The Decline of Traditional Apps: Mallory made a bold claim that traditional apps and web interfaces are on their way out. As AI agents become capable of generating personalized interfaces on the fly, the need for standardized, pre-built apps will diminish, leading to a world where software is hyper-personalized and often ephemeral.Blockchains as Agent Backbones: We explored the intriguing idea that blockchains might be inherently better suited for AI agents than for direct human use. Their deterministic nature, ability to handle assets, and potential for trustless reputation systems make them ideal backends for an agent-centric internet.Trust and Reputation for Agents: In a world teeming with AI agents, establishing trust is paramount. Mallory suggested that on-chain mechanisms like reward and slashing systems can be used to build verifiable reputation scores for agents, helping us discern trustworthy actors from malicious ones without central oversight.The Battle for an Open AI Future: The age-old battle between open and closed source is playing out again in the AI sphere. While centralized players currently seem to dominate, Mallory sees hope in the open-source AI movement, which could provide a crucial alternative to a future controlled by a few large entities.Contact Information* Twitter: @McGee_noodle* Company: Chroma
Since ChatGPT came on the scene, numerous incidents have surfaced involving attorneys submitting court filings riddled with AI-generated hallucinations—plausible-sounding case citations that purport to support key legal propositions but are, in fact, entirely fictitious. As sanctions against attorneys mount, it seems clear there are a few kinks in the tech. Even AI tools designed specifically for lawyers can be prone to hallucinations. In this episode, we look at the potential and risks of AI-assisted tech in law and policy with two Stanford Law researchers at the forefront of this issue: RegLab Director Professor Daniel Ho and JD/PhD student and computer science researcher Mirac Suzgun. Together with several co-authors, they examine the emerging risks in two recent papers, “Profiling Legal Hallucinations in Large Language Models” (Oxford Journal of Legal Analysis, 2024) and the forthcoming “Hallucination-Free?” in the Journal of Empirical Legal Studies. Ho and Suzgun offer new insights into how legal AI is working, where it's failing, and what's at stake.Links:Daniel Ho >>> Stanford Law pageStanford Institute for Human-Centered Artificial Intelligence (HAI) >>> Stanford University pageRegulation, Evaluation, and Governance Lab (RegLab) >>> Stanford University pageConnect:Episode Transcripts >>> Stanford Legal Podcast WebsiteStanford Legal Podcast >>> LinkedIn PageRich Ford >>> Twitter/XPam Karlan >>> Stanford Law School PageStanford Law School >>> Twitter/XStanford Lawyer Magazine >>> Twitter/X (00:00:00) Introduction to AI in Legal Education (00:05:01) AI Tools in Legal Research and Writing(00:12:01) Challenges of AI-Generated Content (00:20:0) Reinforcement Learning with Human Feedback(00:30:01) Audience Q&A
Our 208th episode with a summary and discussion of last week's big AI news! Recorded on 05/02/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI showcases new integration capabilities in their API, enhancing the performance of LLMs and image generators with updated functionalities and improved user interfaces. Analysis of OpenAI's preparedness framework reveals updates focusing on biological and chemical risks, cybersecurity, and AI self-improvement, while tone down the emphasis on persuasion capabilities. Anthropic's research highlights potential security vulnerabilities in AI models, demonstrating various malicious use cases such as influence operations and hacking tool creation. A detailed examination of AI competition between the US and China reveals China's impending capability to match the US in AI advancement this year, emphasizing the impact of export controls and the importance of geopolitical strategy. Timestamps + Links: Tools & Apps (00:02:57) Anthropic lets users connect more apps to Claude (00:08:20) OpenAI undoes its glaze-heavy ChatGPT update (00:15:16) Baidu ERNIE X1 and 4.5 Turbo boast high performance at low cost (00:19:44) Adobe adds more image generators to its growing AI family (00:24:35) OpenAI makes its upgraded image generator available to developers (00:27:01) xAI's Grok chatbot can now ‘see' the world around it Applications & Business: (00:28:41) Thinking Machines Lab CEO Has Unusual Control in Andreessen-Led Deal (00:33:36) Chip war heats up: Huawei 910C emerges as China's answer to US export bans (00:34:21) Huawei to Test New AI Chip (00:40:17) ByteDance, Alibaba and Tencent stockpile billions worth of Nvidia chips (00:43:59) Speculation mounts that Musk will raise tens of billions for AI supercomputer with 1 million GPUs: Report Projects & Open Source: (00:47:14) Alibaba unveils Qwen 3, a family of ‘hybrid' AI reasoning models (00:54:14) Intellect-2 (01:02:07) BitNet b1.58 2B4T Technical Report (01:05:33) Meta AI Introduces Perception Encoder: A Large-Scale Vision Encoder that Excels Across Several Vision Tasks for Images and Video Research & Advancements: (01:06:42) The Leaderboard Illusion (01:12:08) Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? (01:18:38) Reinforcement Learning for Reasoning in Large Language Models with One Training Example (01:24:40) Sleep-time Compute: Beyond Inference Scaling at Test-time Policy & Safety: (01:28:23) Every AI Datacenter Is Vulnerable to Chinese Espionage, Report Says (01:32:27) OpenAI preparedness framework update (01:38:31) Detecting and Countering Malicious Uses of Claude: March 2025 (01:46:33) Chinese AI Will Match America's
Explains language models (LLMs) advancements. Scaling laws - the relationships among model size, data size, and compute - and how emergent abilities such as in-context learning, multi-step reasoning, and instruction following arise once certain scaling thresholds are crossed. The evolution of the transformer architecture with Mixture of Experts (MoE), describes the three-phase training process culminating in Reinforcement Learning from Human Feedback (RLHF) for model alignment, and explores advanced reasoning techniques such as chain-of-thought prompting which significantly improve complex task performance. Links Notes and resources at ocdevel.com/mlg/mlg34 Build the future of multi-agent software with AGNTCY Try a walking desk stay healthy & sharp while you learn & code Transformer Foundations and Scaling Laws Transformers: Introduced by the 2017 "Attention is All You Need" paper, transformers allow for parallel training and inference of sequences using self-attention, in contrast to the sequential nature of RNNs. Scaling Laws: Empirical research revealed that LLM performance improves predictably as model size (parameters), data size (training tokens), and compute are increased together, with diminishing returns if only one variable is scaled disproportionately. The "Chinchilla scaling law" (DeepMind, 2022) established the optimal model/data/compute ratio for efficient model performance: earlier large models like GPT-3 were undertrained relative to their size, whereas right-sized models with more training data (e.g., Chinchilla, LLaMA series) proved more compute and inference efficient. Emergent Abilities in LLMs Emergence: When trained beyond a certain scale, LLMs display abilities not present in smaller models, including: In-Context Learning (ICL): Performing new tasks based solely on prompt examples at inference time. Instruction Following: Executing natural language tasks not seen during training. Multi-Step Reasoning & Chain of Thought (CoT): Solving arithmetic, logic, or symbolic reasoning by generating intermediate reasoning steps. Discontinuity & Debate: These abilities appear abruptly in larger models, though recent research suggests that this could result from non-linearities in evaluation metrics rather than innate model properties. Architectural Evolutions: Mixture of Experts (MoE) MoE Layers: Modern LLMs often replace standard feed-forward layers with MoE structures. Composed of many independent "expert" networks specializing in different subdomains or latent structures. A gating network routes tokens to the most relevant experts per input, activating only a subset of parameters—this is called "sparse activation." Enables much larger overall models without proportional increases in compute per inference, but requires the entire model in memory and introduces new challenges like load balancing and communication overhead. Specialization & Efficiency: Experts learn different data/knowledge types, boosting model specialization and throughput, though care is needed to avoid overfitting and underutilization of specialists. The Three-Phase Training Process 1. Unsupervised Pre-Training: Next-token prediction on massive datasets—builds a foundation model capturing general language patterns. 2. Supervised Fine Tuning (SFT): Training on labeled prompt-response pairs to teach the model how to perform specific tasks (e.g., question answering, summarization, code generation). Overfitting and "catastrophic forgetting" are risks if not carefully managed. 3. Reinforcement Learning from Human Feedback (RLHF): Collects human preference data by generating multiple responses to prompts and then having annotators rank them. Builds a reward model (often PPO) based on these rankings, then updates the LLM to maximize alignment with human preferences (helpfulness, harmlessness, truthfulness). Introduces complexity and risk of reward hacking (specification gaming), where the model may exploit the reward system in unanticipated ways. Advanced Reasoning Techniques Prompt Engineering: The art/science of crafting prompts that elicit better model responses, shown to dramatically affect model output quality. Chain of Thought (CoT) Prompting: Guides models to elaborate step-by-step reasoning before arriving at final answers—demonstrably improves results on complex tasks. Variants include zero-shot CoT ("let's think step by step"), few-shot CoT with worked examples, self-consistency (voting among multiple reasoning chains), and Tree of Thought (explores multiple reasoning branches in parallel). Automated Reasoning Optimization: Frontier models selectively apply these advanced reasoning techniques, balancing compute costs with gains in accuracy and transparency. Optimization for Training and Inference Tradeoffs: The optimal balance between model size, data, and compute is determined not only for pretraining but also for inference efficiency, as lifetime inference costs may exceed initial training costs. Current Trends: Efficient scaling, model specialization (MoE), careful fine-tuning, RLHF alignment, and automated reasoning techniques define state-of-the-art LLM development.
Join Tommy Shaughnessy from Delphi Ventures as he hosts Sam Lehman, Principal at Symbolic Capital and AI researcher, for a deep dive into the Reinforcement Learning (RL) renaissance and its implications for decentralized AI. Sam recently authored a widely discussed post, "The World's RL Gym", exploring the evolution of AI scaling and the exciting potential of decentralized networks for training next-generation models. The World's RL Gym: https://www.symbolic.capital/writing/the-worlds-rl-gym
After pioneering reinforcement learning breakthroughs at DeepMind with Capture the Flag and AlphaStar, Max Jaderberg aims to revolutionize drug discovery with AI as Chief AI Officer of Isomorphic Labs, which was spun out of DeepMind. He discusses how AlphaFold 3's diffusion-based architecture enables unprecedented understanding of molecular interactions, and why we're approaching a "Move 37 moment" in AI-powered drug design where models will surpass human intuition. Max shares his vision for general AI models that can solve all diseases, and the importance of developing agents that can learn to search through the whole potential design space. Hosted by Stephanie Zhan, Sequoia capital Mentioned in this episode: Playing Atari with Deep Reinforcement Learning: Seminal 2013 paper on Reinforcement Learning Capture the Flag: 2019 DeepMind paper on the emergence of cooperative agents AlphaStar: 2019 DeepMind paper on attaining grandmaster level in StarCraft II using multi-agent RL AlphaFold Server: Web interface for AlphaFold 3 model for non-commercial academic use
In this episode of Project Synapse, hosts Marcel Ganger, John Pinard, and Jim Love explore the transformative potential of AI in the contemporary world. They delve into the significance of the new paper 'Welcome to the Era of Experience' by AI pioneers David Silver and Richard Sutton. The discussion spans a range of topics including the evolution of AI training methods, the impact of AI on the workforce, and the concept of AI as autonomous co-workers. They also reflect on the broader implications of AI, such as changes in societal structures and the philosophical aspects of human intelligence versus AI. The hosts share insights on the rapid advancements in AI technology, the necessity of preparing for a non-linear future, and the importance of adapting corporate strategies to integrate AI effectively. 00:00 Introduction to Project Synapse 00:34 Discussing the New AI Paper 01:15 Mid-Conversation Banter 03:36 AI in the Workforce 11:52 Reinforcement Learning and AI Training 23:25 The Bitter Lesson and AI's Future 34:15 Mental Health Systems and Homelessness 35:43 AI and Human Intelligence 36:16 The Move 37 Phenomenon 37:46 Humility and Expertise 39:54 Dolphin Intelligence and AI 42:25 Human Achievements and AI 44:54 Job Displacement and AI 48:52 Transitioning to an AI-Driven Society 01:00:39 Experiential Learning in AI 01:05:12 Final Thoughts and Resources
Our 207th episode with a summary and discussion of last week's big AI news! Recorded on 04/14/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: OpenAI introduces GPT-4.1 with optimized coding and instruction-following capabilities, featuring variants like GPT-4.1 Mini and Nano, and a million-token context window. Concerns arise as OpenAI reduces resources for safety testing, sparking internal and external criticisms. XAI's newly launched API for Grok 3 showcases significant capabilities comparable to other leading models. Meta faces allegations of aiding China in AI development for business advantages, with potential compliances and public scrutiny looming. Timestamps + Links: Tools & Apps (00:03:13) OpenAI's new GPT-4.1 AI models focus on coding (00:08:12) ChatGPT will now remember your old conversations (00:11:16) Google's newest Gemini AI model focuses on efficiency (00:14:27) Elon Musk's AI company, xAI, launches an API for Grok 3 (00:18:35) Canva is now in the coding and spreadsheet business (00:20:31) Meta's vanilla Maverick AI model ranks below rivals on a popular chat benchmark Applications & Business (00:25:46) Ironwood: The first Google TPU for the age of inference (00:34:15) Anthropic rolls out a $200-per-month Claude subscription (00:37:17) OpenAI co-founder Ilya Sutskever's Safe Superintelligence reportedly valued at $32B (00:40:20) Mira Murati's AI startup gains prominent ex-OpenAI advisers (00:42:52) Hugging Face buys a humanoid robotics startup (00:44:58) Stargate developer Crusoe could spend $3.5 billion on a Texas data center. Most of it will be tax-free. Projects & Open Source (00:48:14) OpenAI Open Sources BrowseComp: A New Benchmark for Measuring the Ability for AI Agents to Browse the Web Research & Advancements (00:56:09) Sample, Don't Search: Rethinking Test-Time Alignment for Language Models (01:03:32) Concise Reasoning via Reinforcement Learning (01:09:37) Going beyond open data – increasing transparency and trust in language models with OLMoTrace (01:15:34) Independent evaluations of Grok-3 and Grok-3 mini on our suite of benchmarks Policy & Safety (01:17:58) OpenAI countersues Elon Musk, calls for enjoinment from ‘further unlawful and unfair action' (01:24:33) OpenAI slashes AI model safety testing time (01:27:55) Ex-OpenAI staffers file amicus brief opposing the company's for-profit transition (01:32:25) Access to future AI models in OpenAI's API may require a verified ID (01:34:53) Meta whistleblower claims tech giant built $18 billion business by aiding China in AI race and undermining U.S. national security
For episode 507, Brandon Zemp is joined by the Founder of Pluralis Research Dr. Alexander Long. He was previously an AI Researcher at Amazon in a team of 14 Deep Learning PhDs. At Amazon, Dr Long's research focus was in retrieval augmentation and sample-efficient adaptation of large multi-modal foundation models. At UNSW his PhD was on sample efficient Reinforcement Learning and non-parametric memory in Deep Learning, where he was the School Nominee for the Malcolm Chaikin Prize (UNSW Best Thesis).Pluralis Research is pioneering Protocol Learning, an alternative to today's closed AI models and economically unsustainable open-source initiatives. Protocol Learning enables collaborative model training by pooling computational resources across multiple participants, while ensuring no single entity can obtain the complete model.
This Week in Machine Learning & Artificial Intelligence (AI) Podcast
Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Action-Thought (COAT) approach, which uses special tokens—continue, reflect, and explore—to guide the model through distinct reasoning actions, allowing it to navigate complex reasoning tasks without external supervision. We also break down Satori's two-stage training process: format tuning, which teaches the model to understand and utilize the special action tokens, and reinforcement learning, which optimizes reasoning through trial-and-error self-improvement. We cover key techniques such “restart and explore,” which allows the model to self-correct and generalize beyond its training domain. Finally, Maohao reviews Satori's performance and how it compares to other models, the reward design, the benchmarks used, and the surprising observations made during the research. The complete show notes for this episode can be found at https://twimlai.com/go/726.
Eiso Kant, CTO of poolside AI, discusses the company's approach to building frontier AI foundation models, particularly focused on software development. Their unique strategy is reinforcement learning from code execution feedback which is an important axis for scaling AI capabilities beyond just increasing model size or data volume. Kant predicts human-level AI in knowledge work could be achieved within 18-36 months, outlining poolside's vision to dramatically increase software development productivity and accessibility. SPONSOR MESSAGES:***Tufa AI Labs is a brand new research lab in Zurich started by Benjamin Crouzier focussed on o-series style reasoning and AGI. They are hiring a Chief Engineer and ML engineers. Events in Zurich. Goto https://tufalabs.ai/***Eiso Kant:https://x.com/eisokanthttps://poolside.ai/TRANSCRIPT:https://www.dropbox.com/scl/fi/szepl6taqziyqie9wgmk9/poolside.pdf?rlkey=iqar7dcwshyrpeoz0xa76k422&dl=0TOC:1. Foundation Models and AI Strategy [00:00:00] 1.1 Foundation Models and Timeline Predictions for AI Development [00:02:55] 1.2 Poolside AI's Corporate History and Strategic Vision [00:06:48] 1.3 Foundation Models vs Enterprise Customization Trade-offs2. Reinforcement Learning and Model Economics [00:15:42] 2.1 Reinforcement Learning and Code Execution Feedback Approaches [00:22:06] 2.2 Model Economics and Experimental Optimization3. Enterprise AI Implementation [00:25:20] 3.1 Poolside's Enterprise Deployment Strategy and Infrastructure [00:26:00] 3.2 Enterprise-First Business Model and Market Focus [00:27:05] 3.3 Foundation Models and AGI Development Approach [00:29:24] 3.4 DeepSeek Case Study and Infrastructure Requirements4. LLM Architecture and Performance [00:30:15] 4.1 Distributed Training and Hardware Architecture Optimization [00:33:01] 4.2 Model Scaling Strategies and Chinchilla Optimality Trade-offs [00:36:04] 4.3 Emergent Reasoning and Model Architecture Comparisons [00:43:26] 4.4 Balancing Creativity and Determinism in AI Models [00:50:01] 4.5 AI-Assisted Software Development Evolution5. AI Systems Engineering and Scalability [00:58:31] 5.1 Enterprise AI Productivity and Implementation Challenges [00:58:40] 5.2 Low-Code Solutions and Enterprise Hiring Trends [01:01:25] 5.3 Distributed Systems and Engineering Complexity [01:01:50] 5.4 GenAI Architecture and Scalability Patterns [01:01:55] 5.5 Scaling Limitations and Architectural Patterns in AI Code Generation6. AI Safety and Future Capabilities [01:06:23] 6.1 Semantic Understanding and Language Model Reasoning Approaches [01:12:42] 6.2 Model Interpretability and Safety Considerations in AI Systems [01:16:27] 6.3 AI vs Human Capabilities in Software Development [01:33:45] 6.4 Enterprise Deployment and Security ArchitectureCORE REFS (see shownotes for URLs/more refs):[00:15:45] Research demonstrating how training on model-generated content leads to distribution collapse in AI models, Ilia Shumailov et al. (Key finding on synthetic data risk)[00:20:05] Foundational paper introducing Word2Vec for computing word vector representations, Tomas Mikolov et al. (Seminal NLP technique)[00:22:15] OpenAI O3 model's breakthrough performance on ARC Prize Challenge, OpenAI (Significant AI reasoning benchmark achievement)[00:22:40] Seminal paper proposing a formal definition of intelligence as skill-acquisition efficiency, François Chollet (Influential AI definition/philosophy)[00:30:30] Technical documentation of DeepSeek's V3 model architecture and capabilities, DeepSeek AI (Details on a major new model)[00:34:30] Foundational paper establishing optimal scaling laws for LLM training, Jordan Hoffmann et al. (Key paper on LLM scaling)[00:45:45] Seminal essay arguing that scaling computation consistently trumps human-engineered solutions in AI, Richard S. Sutton (Influential "Bitter Lesson" perspective)
Candace and Frank are joined today by an extraordinary guest, Geordi Rose, a trailblazer in the quantum computing space. Once the CEO and CTO of D Wave, Geordi is not only a quantum computing pioneer but also delves into the intersection of AI and quantum technology.In this fascinating episode, we dive into Geordie's experiences and insights, from his innovative work at D Wave to his current projects in AI—touching upon reinforcement learning, quantum annealing, and even the enigma of consciousness itself. And if that isn't enough to take in, he's on a unique vision quest, quite literally running across Canada! Join us as we unravel these mind-blowing topics and explore a future where quantum computing and AI might redefine the very essence of our existence. So, tune in and prepare for a riveting ride into the quantum realm!Quotable Moments00:00 "Quantum Insights with Geordi Rose"06:51 Quantum Supremacy: Major Milestone13:29 Quantum Computing: Theoretical Uncertainty Remains19:16 Optimizing Noise Reduction in D-Wave22:47 Thermal Annealing: Metal Transformation Process26:55 Optimizing Qubit State Selection33:28 "Reinforcement Learning's Simple Complexity"41:53 Mechanistic View of Consciousness Explained48:03 "Assigning Rights to Advanced AI"49:39 Embrace Humility for Our Future54:56 "Convincing Parents of Computer Science"59:22 "Future Uncertainty and Quantum Role"
How did no one notice these AI Agents?
This episode is sponsored by the DFINITY Foundation. DFINITY Foundation's mission is to develop and contribute technology that enables the Internet Computer (ICP) blockchain and its ecosystem, aiming to shift cloud computing into a fully decentralized state. Find out more at https://internetcomputer.org/ In this episode of Eye on AI, Yoav Shoham, co-founder of AI21 Labs, shares his insights on the evolution of AI, touching on key advancements such as Jamba and Maestro. From the early days of his career to the latest developments in AI systems, Yoav offers a comprehensive look into the future of artificial intelligence. Yoav opens up about his journey in AI, beginning with his academic roots in game theory and logic, followed by his entrepreneurial ventures that led to the creation of AI21 Labs. He explains the founding of AI21 Labs and the company's mission to combine traditional AI approaches with modern deep learning methods, leading to innovations like Jamba—a highly efficient hybrid AI model that's disrupting the traditional transformer architecture. He also introduces Maestro, AI21's orchestrator that works with multiple large language models (LLMs) and AI tools to create more reliable, predictable, and efficient systems for enterprises. Yoav discusses how Maestro is tackling real-world challenges in enterprise AI, moving beyond flashy demos to practical, scalable solutions. Throughout the conversation, Yoav emphasizes the limitations of current large language models (LLMs), even those with reasoning capabilities, and explains how AI systems, rather than just pure language models, are becoming the future of AI. He also delves into the philosophical side of AI, discussing whether models truly "understand" and what that means for the future of artificial intelligence. Whether you're deeply invested in AI research or curious about its applications in business, this episode is filled with valuable insights into the current and future landscape of artificial intelligence. Stay Updated: Craig Smith Twitter: https://twitter.com/craigss Eye on A.I. Twitter: https://twitter.com/EyeOn_AI (00:00) Introduction: The Future of AI Systems (02:33) Yoav's Journey: From Academia to AI21 Labs (05:57) The Evolution of AI: Symbolic AI and Deep Learning (07:38) Jurassic One: AI21 Labs' First Language Model (10:39) Jamba: Revolutionizing AI Model Architecture (16:11) Benchmarking AI Models: Challenges and Criticisms (22:18) Reinforcement Learning in AI Models (24:33) The Future of AI: Is Jamba the End of Larger Models? (27:31) Applications of Jamba: Real-World Use Cases in Enterprise (29:56) The Transition to Mass AI Deployment in Enterprises (33:47) Maestro: The Orchestrator of AI Tools and Language Models (36:03) GPT-4.5 and Reasoning Models: Are They the Future of AI? (38:09) Yoav's Pet Project: The Philosophical Side of AI Understanding (41:27) The Philosophy of AI Understanding (45:32) Explanations and Competence in AI (48:59) Where to Access Jamba and Maestro
➡️ Like The Podcast? Leave A Rating: https://ratethispodcast.com/successstory In this "Lessons" episode, Dr. Jud Brewer, Neuroscience of Addiction Expert, reveals the science behind habits and addictions, explaining how our brains form automatic behaviors to conserve energy and how reinforcement learning reinforces unhealthy patterns. By learning to recognize the true rewards of our actions, Dr. Brewer shows us how to transform negative routines into opportunities for healthier change. ➡️ Show Linkshttps://successstorypodcast.com YouTube: https://youtu.be/PpI2aFjA9FUApple: https://podcasts.apple.com/us/podcast/dr-judson-brewer-neuroscientist-addiction-psychiatrist/id1484783544Spotify: https://open.spotify.com/episode/531cPamqo4H0Esq6Yp8RQ3 ➡️ Watch the Podcast On Youtubehttps://www.youtube.com/c/scottdclary
پریناز سبحانی از اون آدمهای خیلی خاص تو دنیای هوش مصنوعیه که وقتی داستان حرفهایشو میشنوی، واقعاً تحت تأثیر قرار میگیری. الان مدیر ارشد هوش مصنوعی در شرکت سرمایهگذاری ساگارده؛ یه شرکت بزرگ بینالمللی با بیش از ۲۵ میلیارد دلار دارایی که داره حسابی روی آینده سرمایهگذاری با کمک هوش مصنوعی تمرکز میکنه.پریناز دکترای هوش مصنوعی داره از دانشگاه اتاوا و تو این سالها، هم تو دانشگاه و هم صنعت، حسابی تجربه جمع کرده، مخصوصاً تو حوزههایی مثل پردازش زبان طبیعی و یادگیری عمیق. جالبه بدونی که قبلتر تو مایکروسافت ریسرچ و شورای ملی تحقیقات کانادا هم کار کرده و روی پروژههایی مثل ترجمه ماشینی و مدلهای یادگیری عمیق برای پردازش متن کار کرده.00:00 پیشگفتار08:01 علاقه به ماشین لرنینگ و فوق لیسانس هوش مصنوعی شریف11:27 دیپ لرنینگ و شبکههای عصبی مغز15:32 تفاوت رویکردهای محاسباتی سنتی و یادگیری عمیق در هوش مصنوعی23:03 تحول اصطلاحات: از ماشین لرنینگ تا هوش مصنوعی و دیتا ساینس25:44 دیگر شاخههای هوش مصنوعی که باید بشناسید31:43 یادگیری تقویتی (Reinforcement Learning): چالشها و روشهای شبیهسازی47:23 جعبه سیاه هوش مصنوعی؟ محدودیت درک انسان در برابر پیچیدگیهای هوش مصنوعی57:32 دلیل ورود به سرمایهگذاری خطرپذیر1:09:05 فرصتهای کارآفرینی در حوزه هوش مصنوعی و حل مسائل موجود1:16:44 توضیح پتانسیل هوش مصنوعی برای افراد غیر متخصصParinaz Sobhani is a distinguished figure in artificial intelligence, currently serving as the Head of AI at Sagard, a global alternative asset management firm with over $25 billion in assets under management. With a Ph.D. in AI from the University of Ottawa, specializing in natural language processing, she has amassed over 15 years of experience in both academic and industry settings.اسپانسر این قسمتسعادت رنتاجاره ماشین لوکس در دبی، بدون پیشپرداخت و با بیمه کامل، راحت و سریع. https://www.saadatrent.com?ref_id=Tabaghe16شرکت ارائهدهنده خدمات میزبانی وب - لیموهاستhttps://limoo.hostاطلاعات بیشتر درباره پادکست طبقه ۱۶ و لینک پادکستهای صوتیhttps://linktr.ee/tabaghe16 Hosted on Acast. See acast.com/privacy for more information.
Editor's Summary by JAMA Deputy Editors Linda Brubaker, MD, and Preeti Malani, MD, MSJ, for articles published from March 15-21, 2025.
In this episode, Brandon Cui, Research Scientist at MosaicML and Databricks, dives into cutting-edge advancements in AI model optimization, focusing on Reward Models and Reinforcement Learning from Human Feedback (RLHF).Highlights include:- How synthetic data and RLHF enable fine-tuning models to generate preferred outcomes.- Techniques like Policy Proximal Optimization (PPO) and Direct PreferenceOptimization (DPO) for enhancing response quality.- The role of reward models in improving coding, math, reasoning, and other NLP tasks.Connect with Brandon Cui:https://www.linkedin.com/in/bcui19/
Intro topic: GrillsNews/Links:You can't call yourself a senior until you've worked on a legacy projecthttps://www.infobip.com/developers/blog/seniors-working-on-a-legacy-projectRecraft might be the most powerful AI image platform I've ever used — here's whyhttps://www.tomsguide.com/ai/ai-image-video/recraft-might-be-the-most-powerful-ai-image-platform-ive-ever-used-heres-whyNASA has a list of 10 rules for software developmenthttps://www.cs.otago.ac.nz/cosc345/resources/nasa-10-rules.htmAMD Radeon RX 9070 XT performance estimates leaked: 42% to 66% faster than Radeon RX 7900 GREhttps://www.tomshardware.com/tech-industry/amd-estimates-of-radeon-rx-9070-xt-performance-leaked-42-percent-66-percent-faster-than-radeon-rx-7900-gre Book of the ShowPatrick: The Player of Games (Ian M Banks)https://a.co/d/1ZpUhGl (non-affiliate)Jason: Basic Roleplaying Universal Game Enginehttps://amzn.to/3ES4p5iPatreon Plug https://www.patreon.com/programmingthrowdown?ty=hTool of the ShowPatrick: Pokemon Sword and ShieldJason: Features and Labels ( https://fal.ai )Topic: Reinforcement LearningThree types of AISupervised LearningUnsupervised LearningReinforcement LearningOnline vs Offline RLOptimization algorithmsValue optimizationSARSAQ-LearningPolicy optimizationPolicy GradientsActor-CriticProximal Policy OptimizationValue vs Policy OptimizationValue optimization is more intuitive (Value loss)Policy optimization is less intuitive at first (policy gradients)Converting values to policies in deep learning is difficultImitation LearningSupervised policy learningOften used to bootstrap reinforcement learningPolicy EvaluationPropensity scoring versus model-basedChallenges to training RL modelTwo optimization loopsCollecting feedback vs updating the modelDifficult optimization targetPolicy evaluationRLHF & GRPO ★ Support this podcast on Patreon ★
Our 202nd episode with a summary and discussion of last week's big AI news! Recorded on 03/07/2025 Hosted by Andrey Kurenkov and Jeremie Harris. Feel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.ai Read out our text newsletter and comment on the podcast at https://lastweekin.ai/. Join our Discord here! https://discord.gg/nTyezGSKwP In this episode: Alibaba released Qwen-32B, their latest reasoning model, on par with leading models like DeepMind's R1. Anthropic raised $3.5 billion in a funding round, valuing the company at $61.5 billion, solidifying its position as a key competitor to OpenAI. DeepMind introduced BigBench Extra Hard, a more challenging benchmark to evaluate the reasoning capabilities of large language models. Reinforcement Learning pioneers Andrew Bartow and Rich Sutton were awarded the prestigious Turing Award for their contributions to the field. Timestamps + Links: cle picks: (00:00:00) Intro / Banter (00:01:41) Episode Preview (00:02:50) GPT-4.5 Discussion (00:14:13) Alibaba's New QwQ 32B Model is as Good as DeepSeek-R1 ; Outperforms OpenAI's o1-mini (00:21:29) With Alexa Plus, Amazon finally reinvents its best product (00:26:08) Another DeepSeek moment? General AI agent Manus shows ability to handle complex tasks (00:29:14) Microsoft's new Dragon Copilot is an AI assistant for healthcare (00:32:24) Mistral's new OCR API turns any PDF document into an AI-ready Markdown file (00:33:19) A.I. Start-Up Anthropic Closes Deal That Values It at $61.5 Billion (00:35:49) Nvidia-Backed CoreWeave Files for IPO, Shows Growing Revenue (00:38:05) Waymo and Uber's Austin robotaxi expansion begins today (00:38:54) UK competition watchdog drops Microsoft-OpenAI probe (00:41:17) Scale AI announces multimillion-dollar defense deal, a major step in U.S. military automation (00:44:43) DeepSeek Open Source Week: A Complete Summary (00:45:25) DeepSeek AI Releases DualPipe: A Bidirectional Pipeline Parallelism Algorithm for Computation-Communication Overlap in V3/R1 Training (00:53:00) Physical Intelligence open-sources Pi0 robotics foundation model (00:54:23) BIG-Bench Extra Hard (00:56:10) Cognitive Behaviors that Enable Self-Improving Reasoners (01:01:49) The MASK Benchmark: Disentangling Honesty From Accuracy in AI Systems (01:05:32) Pioneers of Reinforcement Learning Win the Turing Award (01:06:56) OpenAI launches $50M grant program to help fund academic research (01:07:25) The Nuclear-Level Risk of Superintelligent AI (01:13:34) METR's GPT-4.5 pre-deployment evaluations (01:17:16) Chinese buyers are getting Nvidia Blackwell chips despite US export controls
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Claire Bizon Monroc from Inria: WFCRL: A Multi-Agent Reinforcement Learning Benchmark for Wind Farm Control Andrew Wagenmaker from UC Berkeley: Overcoming the Sim-to-Real Gap: Leveraging Simulation to Learn to Explore for Real-World RL Harley Wiltzer from MILA: Foundations of Multivariate Distributional Reinforcement Learning Vinzenz Thoma from ETH AI Center: Contextual Bilevel Reinforcement Learning for Incentive Alignment Haozhe (Tony) Chen & Ang (Leon) Li from Columbia: QGym: Scalable Simulation and Benchmarking of Queuing Network Controllers
Posters and Hallway episodes are short interviews and poster summaries. Recorded at NeurIPS 2024 in Vancouver BC Canada. Featuring Jonathan Cook from University of Oxford: Artificial Generational Intelligence: Cultural Accumulation in Reinforcement Learning Yifei Zhou from Berkeley AI Research: DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning Rory Young from University of Glasgow: Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach Glen Berseth from MILA: Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn Alexander Rutherford from University of Oxford: JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
On this episode of Crazy Wisdom, host Stewart Alsop speaks with Ivan Vendrov for a deep and thought-provoking conversation covering AI, intelligence, societal shifts, and the future of human-machine interaction. They explore the "bitter lesson" of AI—that scale and compute ultimately win—while discussing whether progress is stalling and what bottlenecks remain. The conversation expands into technology's impact on democracy, the centralization of power, the shifting role of the state, and even the mythology needed to make sense of our accelerating world. You can find more of Ivan's work at nothinghuman.substack.com or follow him on Twitter at @IvanVendrov.Check out this GPT we trained on the conversation!Timestamps00:00 Introduction and Setting00:21 The Bitter Lesson in AI02:03 Challenges in AI Data and Infrastructure04:03 The Role of User Experience in AI Adoption08:47 Evaluating Intelligence and Divergent Thinking10:09 The Future of AI and Society18:01 The Role of Big Tech in AI Development24:59 Humanism and the Future of Intelligence29:27 Exploring Kafka and Tolkien's Relevance29:50 Tolkien's Insights on Machine Intelligence30:06 Samuel Butler and Machine Sovereignty31:03 Historical Fascism and Machine Intelligence31:44 The Future of AI and Biotech32:56 Voice as the Ultimate Human-Computer Interface36:39 Social Interfaces and Language Models39:53 Javier Malay and Political Shifts in Argentina50:16 The State of Society in the U.S.52:10 Concluding Thoughts on Future ProspectsKey InsightsThe Bitter Lesson Still Holds, but AI Faces Bottlenecks – Ivan Vendrov reinforces Rich Sutton's "bitter lesson" that AI progress is primarily driven by scaling compute and data rather than human-designed structures. While this principle still applies, AI progress has slowed due to bottlenecks in high-quality language data and GPU availability. This suggests that while AI remains on an exponential trajectory, the next major leaps may come from new forms of data, such as video and images, or advancements in hardware infrastructure.The Future of AI Is Centralization and Fragmentation at the Same Time – The conversation highlights how AI development is pulling in two opposing directions. On one hand, large-scale AI models require immense computational resources and vast amounts of data, leading to greater centralization in the hands of Big Tech and governments. On the other hand, open-source AI, encryption, and decentralized computing are creating new opportunities for individuals and small communities to harness AI for their own purposes. The long-term outcome is likely to be a complex blend of both centralized and decentralized AI ecosystems.User Interfaces Are a Major Limiting Factor for AI Adoption – Despite the power of AI models like GPT-4, their real-world impact is constrained by poor user experience and integration. Vendrov suggests that AI has created a "UX overhang," where the intelligence exists but is not yet effectively integrated into daily workflows. Historically, technological revolutions take time to diffuse, as seen with the dot-com boom, and the current AI moment may be similar—where the intelligence exists but society has yet to adapt to using it effectively.Machine Intelligence Will Radically Reshape Cities and Social Structures – Vendrov speculates that the future will see the rise of highly concentrated AI-powered hubs—akin to "mile by mile by mile" cubes of data centers—where the majority of economic activity and decision-making takes place. This could create a stark divide between AI-driven cities and rural or off-grid communities that choose to opt out. He draws a parallel to Robin Hanson's Age of Em and suggests that those who best serve AI systems will hold power, while others may be marginalized or reduced to mere spectators in an AI-driven world.The Enlightenment's Individualism Is Being Challenged by AI and Collective Intelligence – The discussion touches on how Western civilization's emphasis on the individual may no longer align with the realities of intelligence and decision-making in an AI-driven era. Vendrov argues that intelligence is inherently collective—what matters is not individual brilliance but the ability to recognize and leverage diverse perspectives. This contradicts the traditional idea of intelligence as a singular, personal trait and suggests a need for new frameworks that incorporate AI into human networks in more effective ways.Javier Milei's Libertarian Populism Reflects a Global Trend Toward Radical Experimentation – The rise of Argentina's President Javier Milei exemplifies how economic desperation can drive societies toward bold, unconventional leaders. Vendrov and Alsop discuss how Milei's appeal comes not just from his radical libertarianism but also from his blunt honesty and willingness to challenge entrenched power structures. His movement, however, raises deeper questions about whether libertarianism alone can provide a stable social foundation, or if voluntary cooperation and civil society must be explicitly cultivated to prevent libertarian ideals from collapsing into chaos.AI, Mythology, and the Need for New Narratives – The conversation closes with a reflection on the power of mythology in shaping human understanding of technological change. Vendrov suggests that as AI reshapes the world, new myths will be needed to make sense of it—perhaps similar to Tolkien's elves fading as the age of men begins. He sees AI as part of an inevitable progression, where human intelligence gives way to something greater, but argues that this transition must be handled with care. The stories we tell about AI will shape whether we resist, collaborate, or simply fade into irrelevance in the face of machine intelligence.