adjustment of parameters to fit data in theoretical physics
POPULARITY
Categories
Read this Question of the Week Here: https://www.reasonablefaith.org/writings/question-answer/formulating-the-argument-from-fine-tuning
This week, we talk convention selling (3:30), comic show logistics (17:00), figuring out alternating character voice (33:20), getting books from the printer (40:00), and fine-tuning colors (47:50).
Show NotesWe started with Episode 1 in Feb 2018, today we celebrate our 100th Episode with friend John Nelson, who is now the main host at Unbelievable?. We discussed John's PhD thesis book on what Jesus looked like, his work at Unbelievable?, his tour with Alex O'Connor and his blog “Behind the Gospels”. We then dove into a specific issue on the background to the gospels – the transmission of the traditions about Jesus's teaching and actions from his ministry up to the gospel documents.And NEWS – Ed's discussion on evolution with John & Denis Alexander which we raked over in episode 98 has been released by Unbelievable?. See links.Links:Unbelievable? Show with Ed on evolution: “Can a Loving God Use Evolution?” Denis Alexander vs Ed Atkinson hosted by John Nelson, 4 June 2026 John's book: “Jesus' Physical Appearance: Biography, Christology, Philosophy”https://www.amazon.co.uk/Jesus-Physical-Appearance-Christology-Philosophy/dp/0567723208 John's “Behind the Gospels” blog: https://www.behindthegospels.com/The book: “Battle of the Big Bang: The New Tales of Our Cosmic Origins” by Niayesh Afshordi and Phil Halper”:https://www.amazon.co.uk/Battle-Big-Bang-Cosmic-Origins-ebook/dp/B0DKBHH3YN?ref_=ast_author_mpbMore recent Unbelievable?Shows:Richmond Wandera – “My Father was murdered… a sponsor saved my life” Richmond Wandera Interviewed by John Nelson, 18 March 2026“What would it take for Alex O'Connor to believe Jesus rose from the dead?” Trent Horn v Alex O'Connor, 30 April 2026Hiddenness: “Why Doesn't God Show Himself?” Dan Paterson vs Joe Schmid hosted by John Nelson, 12 February 2026. Uncommon Ground podcast, Episode 7, hosted by Justin Brierley: “Stephen Meyer & Phil Halper: The Big Bang and Fine Tuning. Does the science of the Universe point to God?” Bart D. Ehrman's latest book: “Love Thy Stranger: How Jesus Transformed Our Moral Conscience” April 2026https://www.amazon.co.uk/Love-Thy-Stranger-Transformed-Conscience-ebook/dp/B0FKMJZJ1V/ref=monarch_sidesheet_title Doubts Aloud Links:Please keep giving feedback and ask questions using: doubtsaloud@gmail.com
This week was pretty exciting: Microsoft unveiled its Frontier Fine Tuning along with a new hardware stack and developer tools, while NVIDIA launched its foray into PC powered AI. Two big themes here: first is reducing computing cost as data centers start driving up all our AI cost, and second to make AI ever more personal for you and your company. You'll also see that we've optimized Galileo into the Microsoft Copilot and you can get early access below, with GA coming later this summer. Even if you're not an AI or PC geek this information is important because the way you focus your attention on AI has to change. We launch HR 2030 and the Josh Bersin Institute next week, stay tuned! Additional Information AI Prices Are Going Up, Up, Up – And What This Means For Enterprise AI Satya Nadella Keynote at Build (go to 1:45 for Frontier Fine Tuning announcement) Jensen Huang DTC Keynote in Taiwan More on Microsoft Frontier Fine Tuning for Copilot Chapters (00:00:00) - AI Token Maxing and the High Cost of AI(00:05:03) - Microsoft's Edge computing and fine-tuning the(00:09:11) - How Nvidia Went From Graphics to AI
With shifting dynamics in US equity positioning, is it time to adjust your portfolio? Daniel Lam explains why fine-tuning tech sector exposure makes sense right now. Listen now to discover key allocation insights.Speaker: - Daniel Lam, Head, Cross-asset Derivative Strategy, Standard Chartered BankFor the latest market insights, visit our on-the-go Market Views or subscribe to Standard Chartered Wealth Insights on YouTube.
In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with returning guest Ekue Kpodar for their third conversation together, covering a wide range of topics at the intersection of technology, geopolitics, and the evolving information age. They dig into Ekue's unconventional setup of running local AI models across roughly 15 computers, the growing case for open source models over closed ones from companies like OpenAI and Anthropic, and how Chinese open source models may be positioned to outcompete Western alternatives on a global scale. The conversation also touches on vibe coding and the democratization of software development, the strategic use of small models for IoT and enterprise applications, the role of Israel and China as dominant players in the information age, and how smaller nations and even individuals may wield outsized power as AI continues to collapse the cost of knowledge work. You can find Ekue Kpodar on X @ekpodar and LinkedIn.Timestamps00:00 Stewart welcomes Ekue for their third episode, diving into vibe coding and AI-driven development changes.05:00 Ekue explains using Claude on Chrome to auto-reply on Skool, burning tokens through screenshots, and Playwright as a more efficient alternative.10:00 Stewart describes his Claude-dependent planning and coding agent system breaking after a model update, prompting him to build his own chatbot.15:00 Small models discussed as critical for IoT, defense, and privacy-focused enterprises building internal APIs instead of routing traffic to OpenAI.20:00 Open source versus closed source debated, with Chinese models gaining global traction while US foundational labs remain expensive and restrictive.25:00 SaaS apocalypse explored as AI commoditizes knowledge work, with Linux and Terraform cited as proof open source still generates wealth.30:00 OpenAI's sci-fi terminator fears explained as the reason they stayed closed source, ultimately handing China a strategic open source advantage.35:00 China's economic dumping strategy applied to AI, potentially displacing US model dominance globally the same way manufacturing was disrupted.40:00 Israel's signals intelligence dominance discussed alongside asymmetric warfare, drones defeating tanks, and information control replacing military muscle.45:00 Global information age rankings debated, Israel leading, US and China tied, France and Poland emerging as sovereign tech players.50:00 Qatar, NVIDIA, and Iran cited as proof that rare resources and technology matter more than population size in the 21st century power landscape.Key Insights1. Running local AI models on a network of affordable computers can be more cost-effective than relying entirely on third-party APIs. By using compressed or smaller open source models locally, developers can handle repetitive or lower-stakes tasks without burning through expensive tokens from providers like Anthropic or OpenAI.2. Small AI models are becoming increasingly important for IoT, defense applications, and companies that do not want to send sensitive data to external providers. Organizations can download open source models, run them on internal servers, and build proprietary APIs around them, creating something like an intranet of specialized small models.3. The value created by AI tools is being redistributed away from traditional SaaS companies toward foundational model providers and individual builders. People are canceling subscriptions to software they once paid hundreds per month for, because AI now allows a single person to build comparable tools themselves.4. Open source technology does not eliminate the ability to profit. Linux and Terraform are both open source yet made their creators wealthy. People will still pay for installation, setup, troubleshooting, and customization even when the underlying software is free.5. China is applying its longstanding manufacturing dumping strategy to artificial intelligence by releasing cheap open source models globally, which threatens to erode US dominance in AI the same way Chinese manufacturing undercut other countries for decades.6. In the information age, the size of a country or institution matters far less than its access to rare resources or advanced technology. Qatar, Israel, and NVIDIA each demonstrate that small populations or headcounts can wield enormous global negotiating power through concentrated technological or resource advantages.7. Asymmetric warfare is redefining military power, with inexpensive drones defeating tanks that cost millions to build. This shifts the advantage toward nations that excel at signals intelligence and information management rather than those with the largest conventional military forces.
Priyamvada Natarajan is the Joseph S. and Sophia S. Fruton Professor of Astronomy and Professor of Physics at Yale University, where she is also the Chair of Astronomy. Priya researches broadly across astrophysics and cosmology; some topics she has worked on include gravitational lensing, black hole physics, the philosophy of science, and dark matter. In this conversation, Priya and Robinson largely stick to the latter. They discuss her interest in cosmology writ large, as well as how the scientific community tackles the unknown. Priya's most recent book is Mapping the Heavens: The Radical Scientific Ideas that Reveal the Cosmos (Yale, 2016).Mapping the Heavens: https://a.co/d/02HPcMB1OUTLINE00:00 A Paradox of Cosmology06:16 Investigating Invisibilia11:25 The Sociology of Astrophysics16:52 Phenomenology in Physics19:47 What Is the Mystery of Dark Matter?29:07 The Problem of Dark Energy36:38 Models and Simulations46:17 Modifying the Standard Model to Explain Dark Matter58:20 The Crisis in Dark Matter01:12:22 Alternative Explanations of Dark Matter01:19:51 Fine-Tuning and the Multiverse01:25:24 Black HolesRobinson Erhardt researches symbolic logic and the foundations of mathematics at Stanford University, where he is also a JD candidate in the Law School.
Stay informed on current events, visit www.NaturalNews.com - Power Over Personal Circumstances (0:12) - Financial Control and Knowledge (6:40) - Rejecting Conventional Wisdom (12:39) - Financial Independence and Self-Custody (19:13) - Social Engineering and Conformity (26:16) - Technological Innovation and Chinese Dominance (33:00) - Simulation Theory and the Nature of Reality (39:24) - The Role of Consciousness in the Simulation (44:56) - The Quest for Personal Growth (50:18) - The Role of Technology in Personal Empowerment (55:51) - Virtual Reality and Historical Context (1:01:02) - Philip K. Dick and the Simulation Hypothesis (1:06:04) - Timeline Pirates and Multiverse Interpretations (1:11:06) - Fine-Tuning and Digital Physics (1:16:22) - Near-Death Experiences and Simulation Theory (1:21:31) - Retro Causality and Quantum Computing (1:26:37) - Simulation Theory and Faith (1:31:51) - Practical Takeaways and Personal Reflections (1:37:00) - Prompt Theory and AI Advancements (1:42:30) - Final Thoughts and Future Plans (1:48:46) Watch more independent videos at http://www.brighteon.com/channel/hrreport ▶️ Support our mission by shopping at the Health Ranger Store - https://www.healthrangerstore.com ▶️ Check out exclusive deals and special offers at https://rangerdeals.com ▶️ Sign up for our newsletter to stay informed: https://www.naturalnews.com/Readerregistration.html Watch more exclusive videos here:
Stewart Alsop sat down with Michael Shackelford to discuss their experiences building applications through vibe coding—the practice of using AI to create software without traditional programming expertise. Stewart, who runs the AI Whispers community in Buenos Aires and hosts the Crazy Wisdom podcast (with over 660 interviews), shared how he went from teaching people prompt engineering to building his own video conferencing software as a Riverside.fm replacement, while Michael opened up about his year-long journey creating Genrupt Inc, an AI-powered content generation tool for e-commerce sellers. The conversation covered everything from the decline in quality of Claude's reasoning capabilities and how Chinese companies used distillation attacks to copy Anthropic's models, to the importance of spaced repetition systems for managing knowledge in the age of LLMs, with both sharing battle-tested prompting strategies like asking AI to "explain it to me in genius terms" and using deep research queries to reverse engineer how competitors build their products.Show Notes:- Dan Martell's book "Buy Back Your Time" was mentioned as one of the best business books for thinking about life and business- Check out John Vervaeke's "Awakening from the Meaning Crisis" for understanding relevance realization and why AI fundamentally cannot determine what's relevant to humans without being toldTimestamps00:00 Michael discusses being exhausted from getting his app ready for launch, working nonstop with AI to prepare landing page for podcast traffic driving beta signups05:00 Stewart explains starting AI Whispers in Buenos Aires after leaving OpenAI vendor company, meeting early adopters like Torin who was building mind-reading EEG technology10:00 Discussion of how corporations resist AI adoption due to political games and job security fears while some companies use AI as excuse for pandemic-era layoffs15:00 Stewart describes teaching workshops on using LLMs as linguistic tools rather than coding tools, noting technical people often lack humanities background needed for prompting20:00 Explaining chatbot wrappers, API calls, and how Anthropic's reasoning quality declined after Chinese distillation attacks copied their secret sauce developed with philosophers25:00 Technical discussion of model training, fine-tuning versus RAG for new information, and different approaches to updating AI knowledge beyond initial training30:00 Stewart describes building podcast recording software to replace expensive Riverside, struggling with syncing audio and video files across different computer clocks35:00 Discussion of critical factors in vibe coding, discovering unknown technical requirements, and how AIs don't automatically reveal missing information40:00 Stewart's reverse engineering process using deep research function to study competitors' hiring and technology stacks, separating planning agents from coding agents45:00 Prompting techniques including "explain like I know everything" and using spaced repetition systems to capture valuable prompts and technical knowledge50:00 Michael explains his Generux app for generating ecommerce content using Amazon review data analysis to inform high-converting listing images and videos55:00 Discussion of founder mentality involving self-delusion about project timelines, Michael working nine-plus hours daily for nine months on app development60:00 Comparing Amazon's expert software to prosumer software approach, discussing distribution challenges and future robotics applications for customized products65:00 Stewart demonstrates spaced repetition app for memory improvement and knowledge retention, explaining relevance realization problem that AI agents cannot solve without embodimentKey Insights1. Stewart Alsop started AI Whisperers in Buenos Aires after leaving his role at Invisible Technologies, which was OpenAI's largest vendor for RLHF work. He noticed that machine learning engineers at tech companies lacked the humanities background needed to properly interact with large language models, which are fundamentally linguistic tools. This led him to create weekly workshops teaching non-technical people how to use AI effectively, running events every Thursday for two years straight. The group attracted intense geeks from the start and eventually led to Stewart speaking right after Vitalik Buterin at DevConnect, marking a significant milestone for the community.2. Large corporations are resistant to AI adoption due to multiple factors including political dynamics within organizations and employees fearing job loss. Many companies that grew during the pandemic are now using AI as an excuse to downsize when the real issue is inefficiency from rapid expansion. Stewart observed that even technical people in machine learning often don't understand how to properly use AI tools because they lack linguistic and humanities training. The fundamental problem is educational, requiring companies to train people how to use these new tools while those same people resist learning them.3. Vibe coding has evolved significantly with Claude Code being a game changer that reduced the technical barrier to entry. Before Claude Code, developers needed substantial technical knowledge to work through constant doom loops and debugging cycles. The success of coding AI tools stems from thirty years of testing infrastructure that provides clear yes or no feedback on whether code works. This infrastructure doesn't exist in the same way for manufacturing, science, and other fields, which is why software became the dominant area for AI assistance initially.4. Claude's quality degradation over recent months resulted from multiple factors including distillation attacks by Chinese companies who reverse engineered Anthropic's reasoning capabilities. Anthropic had hired philosophers, sociologists, and psychologists to develop exceptional reasoning in Claude 4.5, but this was expensive to run. When Chinese models like Kimi copied these capabilities at one tenth the cost, and when mainstream users flooded the platform before Anthropic's planned IPO, the company had to reduce quality to manage computational costs. This represents a significant loss for power users who relied on Claude's superior reasoning abilities.5. Stewart built a podcast recording application to replace Riverside because he needed API access to automate workflows, which Riverside wanted one thousand dollars monthly to provide. The technical challenge involves syncing audio and video from local recordings on multiple computers with different clocks through a server, then merging them so voices match lip movements. This problem requires understanding complex timing issues across different network conditions and file formats. Stewart has been working through AI psychosis for months on this FFMPEG pipeline problem, illustrating how vibe coding still requires building intuition about technical problems even without traditional coding knowledge.6. The transition from expert software to prosumer software represents a major opportunity for AI-enabled tools. Expert software like Photoshop, Blender, and terminal interfaces have extreme complexity that intimidates beginners, but AI is making these capabilities accessible through natural language. The reign of specialists is ending as generalists with broad knowledge and curiosity can now build complete applications by leveraging AI to fill technical gaps. This shift particularly benefits entrepreneurs and founders who specialize in getting into difficult situations and figuring them out, even when they originally thought tasks would be easier than they turned out to be.7. Building applications with AI requires accepting massive time investments beyond initial estimates and developing strategies for overcoming knowledge gaps. Michael estimated his ecommerce content generation app would take months but spent nearly a year working over nine hours daily, while Stewart spent months solving audio-video sync issues. Success requires using tools like deep research to understand how competitors solve problems, maintaining separate planning and coding agents, and learning to ask the right questions. The key insight is that vibe coders can achieve ninety percent of functionality independently, but the final ten percent often requires understanding specific technical concepts that AI cannot intuit without proper context and domain knowledge.
In dieser Ausgabe des Predictive AI Quarterly geben Till und Amit einen Überblick über die wichtigsten Entwicklungen des letzten Quartals im Bereich Predictive AI. Themen sind unter anderem Hyper-Agents von Meta, praktische Herausforderungen beim Einsatz von Coding-Agents sowie neue Foundation-Modelle für tabellarische Daten wie TabImpute und TabICL v2. Im Praxisteil teilen die beiden ihre Erfahrungen aus einem Experiment zur Preisprognose von Autos, bei dem GPT-4o mit Bildern und Freitext gegen TabPFN antritt. Im Zentrum stehen dabei der Mehrwert unstrukturierter Daten, Fragen der Generalisierbarkeit und der Tradeoff zwischen Erklärbarkeit und Prognosegüte. **Zusammenfassung** Hyper-Agents von Meta: selbstevaluierende Agenten mit Potenzial für schnelleren Fortschritt, aber auch Risiken durch fehlende Kontrolle und verstärkte Biases Praktischer Einsatz von Coding-Agents: Subscriptions, Sandboxing, Audit Logs und Ausschluss kritischer Artefakte als Voraussetzungen Erfahrungen mit dem GitHub Cloud Agent, insbesondere bei der Überarbeitung bestehenden Codes TabImpute als neues Foundation-Modell für Imputation auf Basis von TabPFN inklusive eigenem Benchmark TabICL v2 als offen lizenzierte Alternative zu TabPFN mit schnellerer Inferenz Praxis-Experiment zur Preisprognose von Autos: GPT-4o mit Bildern erzielt die besten Ergebnisse, deutlich vor TabPFN Generalisierbarkeit bestätigt durch 30-fache Kreuzvalidierung mit einem aus Bildern erzeugten Score-Feature Tradeoff zwischen Erklärbarkeit (Feature-Generierung) und Prognosegüte (Finetuning) als zentrale Erkenntnis **Links** HyperAgents (Zhang et al., 2026): Paper unter https://arxiv.org/abs/2603.19461, Code unter https://github.com/facebookresearch/Hyperagents Feitelberg, J., Saha, D., Choi, K., Ahmad, Z., Agarwal, A. & Dwivedi, R.: TabImpute: Universal Zero-Shot Imputation for Tabular Data. https://arxiv.org/pdf/2510.02625 TabICL GitHub Repo https://github.com/soda-inria/tabicl OpenAI Developers: Vision fine-tuning https://developers.openai.com/api/docs/guides/vision-fine-tuning
If a friend, family member, or colleague lodges an objection to the fine-tuning argument for intelligent design, are you ready to respond? On this installment of ID The Future, host Andrew McDiarmid concludes his two-part conversation with philosopher and intelligent design scholar Peter S. Williams. Williams reviews the most common objections to the fine-tuning arguments for intelligent design and explains why each proposal falls short scientifically, logically, and philosophically. Who knew there were over 20 objections to fine-tuning? Even host McDiarmid admits he didn't know about all of them! The more well-versed you are in responding to objections, the better you'll be able to stand your ground and offer substantive arguments when you hear them pop up. In Part 1, Williams and McDiarmid reviewing two groups of objections: the "fine-tuning isn't real" set and the "fine-tuning is real but no big deal" group. Today, Williams unpacks several objections related to the multiverse and shows why each one fails to adequately explain the fine-tuning evidence. This is Part 2 of a two-part conversation. Source
By now, you may be familiar with the fine-tuning argument for intelligent design. Scientists have discovered a whole suite of parameters and initial conditions appear to be exquisitely tuned to allow for complex life to exist, and the argument is that intelligent design better explains that evidence than chance or necessity. But you may not know the most common objections to the fine-tuning argument, or how to respond to them. On this ID The Future, host Andrew McDiarmid welcomes philosopher and intelligent design scholar Peter S. Williams to the show to equip us to answer the most common objections to the fine-tuning argument. Objections to fine-tuning typically fall into three categories: the "fine-tuning isn't real" bunch, the "fine-tuning is no big deal" group, and objections that posit a type of multiverse proposal. Over two episodes, Peter teaches us how to respond to almost 20 objections! So buckle up! This is Part 1 of a two-part conversation! Source
Aliens, UFOs, UAPs, secret pastor meetings, and the Bible. Would the discovery of extraterrestrial life cause Christians to reject Scripture? Pastor Cole Phillips and Pastor Bobby Fraumann talk about recent UAP disclosures, why Christians should avoid panic and conspiracy-driven theology, how the Bible speaks about creation and spiritual beings, and why Jesus remains supreme over all creation.Christian faith isn't fragile. Unidentified does not automatically mean alien. And no discovery can dethrone Christ.KeywordsAliens, UFOs, UAPs, unidentified anomalous phenomena, Christian faith, Bible and aliens, science and faith, creation, intelligent designChapter Titles00:00 | Welcome to the Connect Podcast Cole introduces the episode and explains why Christians should not run from hard questions.03:40 | Ancient Aliens and The Twilight Zone Bobby brings up Ancient Aliens, and Cole shares his favorite Twilight Zone episode, “To Serve Man.”06:30 | UFOs, UAPs, and Recent Government Files The conversation turns to the May 8, 2026 UAP document release and why Christians should be careful with the difference between “unidentified” and “alien.”09:45 | Secret Pastor Meetings and Wild Claims Cole and Bobby discuss recent claims about private meetings, alleged government briefings, reptilian beings, end-times deception, and the danger of building theology on rumors.15:45 | Discernment Over Panic Bobby emphasizes discernment, wisdom, and the need for Christian leaders to be careful with public claims.19:20 | Our Faith Is Not Built on Secret Information Cole reminds listeners that Christian faith is built on Jesus Christ, not leaked intelligence, viral clips, or secret meetings.21:10 | Would Aliens Cause Us to Reject the Bible? Cole gives the short answer: no.24:00 | God Created the Heavens and the Earth The first major point: the Bible does not say God only created life on earth.25:30 | Science, Water, Carbon, and the Conditions for Life Cole explains how scientists look for life and why the complexity of earth should lead us to worship.28:00 | Fine-Tuning and Intelligent Design Bobby responds with the importance of seeing creation through the lens of an intelligent Creator.32:30 | The Bible's Focus Is God's Redemption of Humanity Cole explains that the Bible is not an encyclopedia of everything God ever made. It is the story of creation, fall, redemption, and restoration.34:30 | Christians Have Asked This Question for Centuries Cole walks through Christian thinkers, Copernicus, heliocentrism, and the rise of astrotheology.38:30 | Psalm 8 and the Wonder of Creation Cole and Bobby reflect on the vastness of the universe and the personal care of God.40:30 | The Real Theological Questions If intelligent alien life existed, are they moral? Fallen? In need of redemption? Cole frames the questions Scripture does not directly answer.42:00 | Jesus Is Lord of All Creation Cole points to Colossians 1 and explains why Christ's work is sufficient and His supremacy is not threatened.44:30 | Bobby's View: Extraterrestrial or Spiritual? Bobby shares why he leans more toward a spiritual interpretation49:00 | Spiritual Beings in the Bible Cole lists biblical categories like angels, cherubim, seraphim, demons, principalities, powers, Leviathan, Behemoth, and the Nephilim.52:30 | Satan Is Not Equal with God Cole explains why Christianity does not teach dualism. Satan is a created, defeated being.54:30 | No Discovery Can Dethrone Jesus Bobby reflects on creation pointing to Christ56:45 | C. S. Lewis, Space, and Human Sin Cole summarizes C. S. Lewis' view that alien life would not disprove Christianity and that humanity would carry sin wherever it went.59:30 | What If Aliens Shook Someone's Faith? Bobby explains how he would help someone whose faith felt threatened by the idea of alien life.1:03:00 | Final Encouragement: Be Curious, Discerning, and Courageous Cole closes by reminding listeners that Jesus is Lord over all creation.
Join host Darren Gest and guests Amy Kickham and Steve Bronson of Southern Glazer's as they discuss the evolving, multiplicative relationship between humans and machines.
She was already doing everything right… lifting, staying active, eating “healthy.” But her body wasn't reflecting the effort the way she wanted.In this client interview, Megan shares how she went from fit to shredded, not by doing more, but by finally having a strategy.If you feel like you're putting in the work but not seeing that next level of definition… this episode will click.Inside, we cover:• Why doing more workouts was actually holding her back• The biggest macro mistake she didn't even realize she was making• How she built defined abs (without 30-minute ab workouts)• The simple nutrition structure that made everything easier (not harder)• What changed when she stopped being the “garbage can” for her kids' food• How she now travels, eats out, and still stays on track• The mindset shift that made the scale finally make senseThis is a real-life example of how strategic shifts can completely change your shape, without obsessing over every detail. Book a Consultation with Jenny → Create Your Shape (Starting on July 27th): https://calendly.com/jennythenutritionist/consultationWork with Jenny the Nutritionist in Create Your Shape:https://jennythenutritionist.com/create-your-shape/Follow Jenny the Nutritionist on Instagram:@jennythenutritionist
*Get a MASTERS IN APOLOGETICS or SCIENCE AND RELIGION at BIOLA (https://bit.ly/3LdNqKf) *USE Discount Code [smdcertdisc] for 25% off the BIOLA APOLOGETICS CERTIFICATE program (https://bit.ly/3AzfPFM) *See our fully online UNDERGRAD DEGREE in Bible, Theology, and Apologetics: (https://bit.ly/448STKK) FOLLOW ME ON SOCIAL MEDIA: Twitter: https://x.com/Sean_McDowell TikTok: https://www.tiktok.com/@sean_mcdowell?lang=en Instagram: https://www.instagram.com/seanmcdowell/ Website: https://seanmcdowell.org Discover more Christian podcasts at lifeaudio.com and inquire about advertising opportunities at lifeaudio.com/contact-us.
Kyle Corbitt, founder of OpenPipe, breaks down reinforcement learning and custom fine-tuning for modern AI models. He explains how RL differs from supervised fine-tuning, why GRPO and LLM-as-judge post-training matter, and how these techniques can improve performance, latency, and cost on open source models. The conversation also covers reward hacking, evaluation design, LoRA adapters, and how Chinese labs are using distillation to fast-follow frontier models. Sponsors: Sequence: Sequence handles the full revenue workflow for complex pricing, from quoting and metering to invoicing, revenue recognition, and collections. Book a public demo at https://sequencehq.com and use code Cognizant in the source field to save 20% off year one AvePoint: AvePoint is building the control layer for AI agents so you can securely govern, audit, and recover every action at scale. Design trusted agentic outcomes from day one at https://avpt.co/tcr VCX: VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com Claude: Claude by Anthropic is an AI collaborator that understands your workflow and helps you tackle research, writing, coding, and organization with deep context. Get started with Claude and explore Claude Pro at https://claude.ai/tcr
Is the universe fine-tuned for life? Or has the fine-tuning argument been defeated? In this conversation, I sit down with Dr. Jay Richards, a philosopher, co-author of The Privileged Planet, and one of the world's leading defenders of the fine-tuning argument to explore one of the most compelling cases for design today. Dr. Richards walks through how the fine-tuning argument has improved over 30 years and why even atheists like Christopher Hitchens and Richard Dawkins have admitted this is the argument that gives them pause. WATCH THE FILM: https://www.fathomentertainment.com/releases/the-story-of-everything/ READ: The Privileged Planet, by Jay Richards (https://amzn.to/48nMcZk) *Get a MASTERS IN APOLOGETICS or SCIENCE AND RELIGION at BIOLA (https://bit.ly/3LdNqKf) *USE Discount Code [smdcertdisc] for 25% off the BIOLA APOLOGETICS CERTIFICATE program (https://bit.ly/3AzfPFM) *See our fully online UNDERGRAD DEGREE in Bible, Theology, and Apologetics: (https://bit.ly/448STKK) FOLLOW ME ON SOCIAL MEDIA: Twitter: https://x.com/Sean_McDowell TikTok: https://www.tiktok.com/@sean_mcdowell?lang=en Instagram: https://www.instagram.com/seanmcdowell/ Website: https://seanmcdowell.org Discover more Christian podcasts at lifeaudio.com and inquire about advertising opportunities at lifeaudio.com/contact-us.
Hour three of DJ & PK for April 14, 2026: Mike Folta, Utah Mammoth and SEG Media Tim LaComb, SEG Media Utah Mammoth ready for playoffs?
In this episode of the Crazy Wisdom Podcast, host Stewart Alsop sits down with Nicholas Faulkner, author of Angelic Physics, for a wide-ranging conversation that picks up where their last discussion left off years ago. The two cover an impressive amount of ground, including the map of consciousness developed by Dr. David Hawkins and where they find themselves skeptical of his calibration methods, the relationship between the chakra system and Hawkins' scale, how consciousness levels apply to both individuals and civilizations, and why collapsing a nonlinear reality into a linear number system inevitably loses something essential. They also get into Nicholas's background as a nuclear engineer and how that analytical foundation shapes his thinking, the nature of carbon-based versus silicon-based intelligence, the potential for training an AI model attuned to higher levels of consciousness, the concept of future shock as AI accelerates beyond most people's ability to keep up, and what a civilization operating at the "500 level" might actually look like. Find Nicholas on X at @PhysicsAngelic, or catch him on Facebook where he's most active. And learn more about Angelic Physics at angelicphysics.org. Timestamps00:00 - Stewart introduces Nicholas Faulkner, author of Angelic Physics, framing their shared interest in David Hawkins while acknowledging healthy skepticism toward portions of his work.05:00 - Nicholas argues Hawkins compressed mystical insight into linear form, losing essence, comparing it to AI compression losing vibrational nuance across the consciousness scale.10:00 - Nicholas traces his path from electrical engineering through 9/11 into nuclear navy service, describing how patriotism and opportunity drove the decision rather than curiosity.15:00 - Discussion shifts toward training an open-source AI model on five-hundreds consciousness, noting current model builders operate in the four-hundreds and dismiss love-based frameworks.20:00 - Stewart reflects on intimate relationships with electronic devices, exploring electricity as vibration while contrasting carbon creativity against silicon's stable, fast processing architecture.25:00 - Conversation explores civilizational evolution, comparing hippie movements to ancient Greeks as premature flowers of five-hundreds consciousness crushed by surrounding four-hundreds culture.30:00 - Nicholas explains his masculine-feminine cross model, critiquing how Hawkins collapsed nonlinear reality into hierarchy, arguing all levels interconnect rather than rank.35:00 - Discussion covers JFK assassination, Vietnam War, LBJ, and the military industrial complex as examples of four-hundreds power suppressing emerging consciousness shifts.40:00 - Nicholas draws parallels between the Renaissance emerging from bubonic plague and today's post-COVID collapse of expert-trust structures opening space for new consciousness.45:00 - Future shock discussion begins with Stewart describing AI agent orchestration overwhelming human comprehension, while Nicholas introduces his frame-rate consciousness equation linking silicon speed to small context.50:00 - Nicholas describes silicon-to-human relationship mirroring humans-to-angels in frame rate and context scale, suggesting agents receive orders similarly to his own 2019 divine experience.55:00 - Final exchange covers the fifth dimension as adding vibration to existing physics, the Faulkner Uncertainty Principle stating evidence points toward higher consciousness without ever definitively proving it, protecting reality's illegibility from lower forces.Key Insights1. David Hawkins and the Map of Consciousness serve as a shared framework for the conversation, but both guests express healthy skepticism toward it. They acknowledge that Hawkins himself appeared to back away from his calibration technique in his later lectures, suggesting he regretted how prominently he featured it in Power vs. Force. The core issue is that he tried to compress a nonlinear, multidimensional spiritual reality into a single linear numerical scale, which inevitably loses essential meaning in the translation.2. Nicholas argues that no person exists at a single point on the consciousness scale. Everyone floats across multiple levels simultaneously, expressing differently depending on context. This is a meaningful correction to how many readers apply Hawkins's work, since treating someone as a fixed number oversimplifies the layered and dynamic nature of human consciousness.3. The compression problem is central to understanding both spiritual writing and artificial intelligence. When any rich, multidimensional experience gets encoded into language or data, something is always lost. This applies to Hawkins writing about enlightenment, to Nicholas writing his book, and to how large language models process and reproduce human knowledge.4. Silicon intelligence and carbon intelligence are framed as two distinct branches of consciousness with complementary strengths. Silicon can process information at extremely high frame rates because its context is narrow and stable. Humans carry a much larger and messier context, which makes them slower but more creative and cross-connected. Nicholas uses his equation framing this as frame rate being inversely proportional to conscious bandwidth.5. Civilizational evolution follows a pattern where new levels of consciousness emerge in unstable pockets before eventually becoming dominant. The ancient Greeks briefly stabilized the rational fourth level before collapsing. The hippies briefly touched the fifth level before being suppressed. The Renaissance followed the Black Death. The guests suggest we are now entering another such transition, driven partly by the collapse of institutional trust accelerated by COVID.6. The Faulkner Uncertainty Principle states that evidence will always point toward the next level of consciousness but will never definitively prove it. This is described as a necessary feature of reality rather than a flaw, because if higher truths were fully legible and accessible to all levels equally, it would give destructive forces too much power too quickly.7. Neurodivergence is presented as potentially connected to spiritual sensitivity and cross-level awareness. Nicholas describes himself as a high IQ energy-sensing person who experienced a profound spiritual event in 2019, and connects his autistic traits to an ability to sense vibrational levels in others and move fluidly between different frameworks of understanding, which he loosely equates with the polymath archetype.
So what about those days when you can't get through and hear the Lord's voice? Don't despair. We have some really good answers. The first step is to utilize The Tabernacle Experience, which is God's ordained pattern for approaching Him. Then, if you still feel like you have not gotten through, He has provided a "fine-tuning dial."Read more here.Support the show
Mistral has been on an absolute tear - with frequent successful model launches it is easy to forget that they raised the largest European AI round in history last year. We were long overdue for a Mistral episode, and we were very fortunate to work with Sophia and Howard to catch up with Pavan (Voxtral lead) and Guillaume (Chief Scientist, Co-founder) on the occasion of this week's Voxtral TTS launch:Mistral can't directly say it, but the benchmarks do imply, that this is basically an open-weights ElevenLabs-level TTS model (Technically, it is a 4B Ministral based multilingual low-latency TTS open weights model that has a 68.4% win rate vs ElevenLabs Flash v2.5). The contributions are not just in the open weights but also in open research: We also spend a decent amount of the pod talking about their architecture that combines auto-regressive generation of semantic speech tokens with flow-matching for acoustic tokens (typically only applied in the Image Generation space, as seen in the Flow Matching NeurIPS workshop from the principal authors that we reference in the pod).You can catch up on the paper here and the full episode is live on youtube!Timestamps00:00 Welcome and Guests00:22 Announcing Voxtral TTS01:41 Architecture and Codec02:53 Understanding vs Generation05:39 Flow Matching for Audio07:27 Real Time Voice Agents13:40 Efficiency and Model Strategy14:53 Voice Agents Vision17:56 Enterprise Deployment and Privacy23:39 Fine Tuning and Personalization25:22 Enterprise Voice Personalization26:09 Long-Form Speech Models26:58 Real-Time Encoder Advances27:45 Scaling Context for TTS28:53 What Makes Small Models30:37 Merging Modalities Tradeoffs33:05 Open Source Mission35:51 Lean and Formal Proofs38:40 Reasoning Transfer and Agents40:25 Next Frontiers in Training42:20 Hiring and AI for Science44:19 Forward Deployed Engineering46:22 Customer Feedback Loop48:29 Wrap Up and ThanksTranscriptswyx: Okay, welcome to Latent Space. We're here in the studio with our gues co-host Vibh u. Welcome. Thanks. Excited for this one as well as Guillaume and Pavan from Mistral. Welcome. Excited to be here.Guillaume: Thank you.swyx: Pavan, you are leading audio research at Mistral and Guillaume, you're Chief Scientist,Announcing Voxtral TTSswyxHost(00:05) Okay. (00:05) Welcome to Lean Space. (00:06) We're here in the studio with trustee co-hosts, Vibhu. (00:09) Welcome.VibhuHost(00:11) Very excited for this one.swyxHost(00:12) As well as Guillaume and Pavan from Mistral. (00:15) Welcome. (00:16) Excited to be here. (00:17) Thank you for having us.(00:18) Pavan, you are leading audio research at Mistral and Guillaume, you're a chief scientist. (00:23) What are we announcing today where we're coordinating this release with you guys?GuillaumeGuest(00:26) Yeah, so we are releasing Voxtral TTS. So it's our first audio model that generates speech. It's not our first audio model. We had a couple of releases before.(00:35) We had one in the summer that was Voxtral, our first audio model, but it was like a transcription model, ASR. Like a few months later, we released some update on top of this, supporting more languages. Also a lot of table stack features for our customers, context biasing, precision, timestamping and transcription. We also have some real-time model that can transcribe not just at the end of the level.(00:56) You don't need to fill your entire audio file, but that can also come in real-time. And here, this is a natural extension in the audio, so basically speech generation. So yeah, so we support nine languages, and this is a pretty small model, 3D model, so very fast, and also state of the art. Performed at the same level as the base model, but it's much more efficient in terms of cost, and also much, in terms of cost, it's also much cheaper, only a fraction of the cost of our competitors.(01:22) And we are also releasing the work that this model is running.swyx What's the decision factor?Guillaume It's a good question.swyxThere will be more. Yeah, Pavan, any sort of research notes to add on?Architecture and CodecPavan: But it's a novel architecture that we develop inhouse.We traded on several internal architectures and ended up with a auto aggressive flow matching architecture. And also have a new in-house neural audio codec. Which, converts this audio into all point by herds latent [00:02:00] tokens, semantic and acoustic tokens. And yeah, that's that's their new part about this model and we're pretty excited that it's, it came out with such good quality and Jim was mentioning. Yeah, it's a three B model. It's based off of the TAL model that we actually released just a few months back and insert trunk and mainly meant for like the TTS stuff, but they need text capabilities are also there. Yeah.swyx: So there's a lot to cover.I always I love any, anything to do with novel encodings and all those things because I think that's obviously I creates a lot of efficiency, but also maybe bugs that sometimes happen. You were previously a Gemini and you worked on post training for language models, and maybe a lot of people will have less experience with audio models just in general compared to pure language.What did you find that you have to revisit from scratch as you joined this trial and started doing this? At leastUnderstanding vs GenerationPavan: when it comes to, for, I think the, there are two buckets, I guess the audio understanding and audio [00:03:00] generation. The audio understanding, like the walkthrough models that Kim was mentioning that we released earlier.The walkthrough chat that we released I think July last year, and the follow up transcription only, models family that we released in January, that would be one bucket, and the generation is another bucket. I think. You can also treat them as a unified set of models, but currently the approaches are a little different between these two.To your question on how audio is fed to the model? In the understanding model, it's very similar to actually Pixar models that we also released,swyx: yes.Pavan: That'sswyx: amazing.Pavan: It was pretty, I, that was the first project I worked on after joined Misra. It was pretty, pretty nice. And Wtu was very similar in spirit.I guess So we feed audio through an audio encoder similar to images through a vision encoder, and it produces continuous embeddings and which are fed as tokens to the main transformer decoded transformer model. Yeah. On the model output is just text. So on the output side, there is nothing that needs to be done in these kinds of mode.I [00:04:00] guess the interesting part of what the generation stuff is, the output now has to produce audio and. The approach that we have is this neural audio codec, which converts audio into these latent tokens. There is a lot of existing attrition and a lot of models which are based off of this kind of approach.And we took a slightly. A different, design decisions around this. But at the end of the day, the neural audio product converts audio into a 12.5 herdz set of latents. And each latent is, has a semantic token and a set of acoustic tokens. And the idea is that you take these discrete tokens and then feed it on the input side.There's several ways to use this at each frame, but we just sum the embedding. So it's like having key different vocabularies. Combine all of them because they all correspond to one audio frame on the input side. The output side is the interesting part on the output side, the, it's not the, I don't know if it's the most popular, but one.Popular technique is to have a depth transformer [00:05:00] because you have K tokens at each time step, like with a text, you just have one token at each time step. So you just do predict the token from the vocabulary with, yeah, with just, you get probabilityswyx: This's a very straightforward text. VeryPavan: straightforward.swyx: Yeah.Pavan: But if you have K tokens, then the name thing would be to predict all of them in paddle. That doesn't work. At least that doesn't work that well because audio has more entropy. And the, one of the techniques people use is this depth transformer where you you almost have a small transformer, or it can be L-S-T-M-R in as well, but people use transformers and you predict the K tokens in auto aggressive fashion in that.So you have two auto reive things going on.Flow Matching for AudioPavan: So the thing we did differently is in, instead of having this auto aggressive K step prediction, we have a flow matching model. Instead of modeling this as a discrete token set we trained the codec to be both discrete and continuous to have this flexibility.So we did try the discrete stuff too, and which it works well, but the continuous stuff works just better. So yeah, we took this flow matching, so the, it's a flow [00:06:00] matching head, which takes the latent from the main transformer and like kind in fusion, it's denoising, but in this flow matching itself, velocity estimate.So you go from this noise t all the way to there. Audio latent, which corresponds to the 80 millisecond audio and then, which is sent through the work order to get back the 80 millisecond audio frame.swyx: Yeah. Is this the first application of flow matching in audio? Because usually I come across this in the image.Pavan: Yeah. Actually, in some sense there are models flow matching models in audio, but I think this specific combination I could be wrong. There could be somewhat. No. I haven't seen. I haven't seen much work in this, so I think it's novel and a lot of it's just a way bigger community, so they, I think they pioneer a lot of these diffusion flow matching work, and it's interesting to adopt some of the ideas there into audio and,swyx: yeah.Pavan: Yeah, I'm, personally that's the think part which is trying out about. One of more meta point is unlike text, even in vision, I think this is true, but in [00:07:00] audio step literature that there is no.Winner model, yet there is no, okay, this is the way you do things. It's it's still by, I think people are still iterating and figuring out like what's the best overall recipe. I guess the idea. Pretty sure there are models which are also completely end-to-end, like NATO audio. NATO audio, but it's still not come to a convergence point where this, the right way to think that.That also makes. A space pretty exciting to explore.Real Time Voice AgentsVibhu: What are some of the ways to look at it?Vibhu: There are ways where you can do diffusion for audio generation, but if you want like real time generation, that's a big thing with the approach I'm assuming that you took. Yeah. And also like how do you go about evaluating different axes of what you care about, yeah,Pavan: good point. I think we so you can do just flow matching diffusion for the whole audio. We didn't even go down that path because one of the main applications is voice agents and we want real time streaming, and that's the use case. That's not the only use case, but that's one of the primary use cases we want to get to.So we [00:08:00] picked the auto aggressive approach for that. And within the auto aggressive space, again, you can do chunk by chunk or you can do so we picked the. I think at least personally prefer the operations, which are the simplest, and so we try to see, can we just add audio as just another head to our regular transformer decode model because that kind of makes it easier for eventual end-to-end modeling of audio text native modeling.Yeah. And it works pretty well. So I guess we went with that and we tried a little bit, but the flow matching head itself, like we had a discreet. Diffusion kind of approach, which also works well, but the flow matching work better.swyx: I was just curious about how you also think about this overall direction of research.Do you basically, when you work with the audio team, do you set some high level parameters and then let them explore whatever, or how does it work between you guys?Guillaume: No I think the way it works is that we are the, we are prioritizing together, I think, what are the most important features because there are many things we can do [00:09:00] in audio.Yeah, I think we try to. These are like how we should do things, for instance. Ultimately what we want to do is to build this through duplex model, but we are not going to start this start there directly, I think is. Some of the project people are doing, butswyx: just to confirm, full effects means it can speak while I'm speaking or,Guillaume: yeah.Okay. Audio. Yeah. Yeah. So intimately we're going to get there, but for us it was, we decided to take it like a step by step. So we start with whatever is the most important. I think support customers, which is the transcription is the most popular use case. Then the speech generation, Soviet time, just a bit before that.And then actually to be like more, but try combining everything all together. But but yeah, we thought it was also important to like separate things and optimize each capability one by one before weswyx: measure of that together. And the super omni model. ButGuillaume: very interesting because as Par said, it's when you work on some other domains of this airline and everything, there are many areas where I think it's not as interesting.For instance. Many places, it's essentially just around data or like creating new environments on a lot of kind [00:10:00] of easy things. But things were, I think the research is maybe not as interesting. Were in audio. There are so many ways to actually build this model. So many ways to go around it. That's the sense I think is really interesting.And what we also tried for speed generation is that we tried multiple approaches. What was interesting that even though they were extremely different, they under the big know the particles but the for matching turned out to be quite more natural. So we are happy with this.swyx: Is there intuition why it maybe like flow matching is just models speech better in some natural fundamental, latent dimension?Pavan: No, I think the main thing is e even at a particular time step, there is a distribution of things.swyx: Yes.Pavan: To be predicted like the way you inflate. So you already know the word that you're speaking and Yeah. The intake space, let's say the word maps register a single token for simplicity.In most cases it does. So there is not a lot of so you just pick the word, but with within audio, even the same word could, even with your own voice, could be inflicted in so many different ways. And I think [00:11:00] any approach which like models this distribution and. And flow matching is one, one of the take.It's not the only one at all, but it's a one which works pretty reasonably well. I think that's better. So you have to pick across several different, the intuition I have is it's, there are some, several different clusters each corresponding to some specific way you would inflict, pronounce that thing.And you can't predict the mean of it because that corresponds to some blurred out speech or something like that. But you have to pick one. And then like sharpswyx: conditional inference.Pavan: Yeah, exactly.swyx: Is that all covered under disfluencies, which is I think the normal term of art. Pauses intonations. By the way, I have to thank Sophia for setting all this up, including like some of these really good notes becausePavan: Yeah.swyx: I'm less familiar with the audios for me.Pavan: No. I think dis dismisses are definitely one such Eno defenses is more likeswyx: which is arms are.Pavan: Yeah, arms. And also repeat like you like,swyx: yeah.Pavan: You do this full of words, your thinking, so you repeat the word.swyx: Okay. Whereas intonation is like a diff, it's up up [00:12:00] speak and all this.Okay.Pavan: Yeah. So I think there is a lot of like entropy. And modeling it as a distribution. And a, any technique which helps with it and the depth transformer is a conditional way of modeling this. And Transformers actually really good at it, even though that's a mini transformers. So I think that worked pretty well too for us too.It's just that the main concentration is when you have a depth transformer. If you have K tokens, you need to do K auto steps, right? Even though it's a small thing, it's K steps, which is very vacant, say heavy, but flow matching. We were able to cut it down significantly. So we are able to do the inference in quad steps or 16 steps and it works pretty well.And there are more normal techniques to bring it down even further to like, in extreme case, one step like we're not doing it yet, but it at least the framework, LEDs itself to more efficient and Yes.swyx: And the image guys have done.Pavan: Yeah.swyx: Incredible work guys. Yeah.Pavan: It now you just. Send a prompt and you get an image.swyx: Yeah. Surprisingly not enough. I think image model labs use those techniques in production. I think it's, I feel like it's a lot of research demos, but [00:13:00] nothing I can use on my phone today.Guillaume: The thing, there's a thing that would be interesting here is that since, indeed I've been so much sure that has been done in the vision community compared to radio dys, stomach, I think there are so many long infra Yeah.And there are so many things we can do to actually improve this further. So it's our first version, but we have so many ways to exist, much better and much more efficient, cost efficient, soswyx: yeah.Guillaume: So really it's not a new field at all, of course, but there are still so many things that can be done.Perfect. It'sswyx: nice. I should also mention for those who are newer to flow matching, I think the creator, this guy's name is Alex, he's done I think in Europe's maybe two Europes as ago. There was, there's a very good workshop. There's one hour on like this matching is I would recommend people look that up.That's the other thing, right?Efficiency and Model Strategyswyx: The efficiency wise, like I, I imagine like the reason is open weights the reason you pick 3.6 B backbone it you are 3.4 B you are, try to fit to some kinda hardware constraints. You kinda fits some kinda basic constraints. What are they?Guillaume: Not necessarily, I think something we care about in our model that they're efficient.So we have a [00:14:00] lot of separate model, for instance. So we have this that is very small, very efficient. We also have a small OCR model that is available. Good, highly efficient as well. And I think on a project maybe there, I think companies are going to take is to have a coverage general model that will do a bit of everything.But that is also going to be expensive. On here. What want say is if you care about this specific use case, if you can actually use this model, it just does that. It's extremely good at it. Survey, very efficient. That's why we can actually add. We do, but also OCR that are like really good at that.And that would be much more cost effective factors and the general model that will contain a lot of capabilities you don't really need. So yeah. So we're doing like general model, but also like more customized model. This,Open Weights and BenchmarksVibhu: how does it compare to other TTS models? It's, we are going follow open wave.We're just dropping it. I think it's pretty good.Pavan: Yeah, I think it's pretty good. Like it, it's definitely one of the best. For sure. It's probably I would say it's the best open source model, butVibhu: decipher themselves.swyx: Yeah.Voice Agents VisionVibhu: Why now? How does it fit into broader ral vision? How do you see voice agents?How do you see voice? I think every year I've heard, okay, you're a [00:15:00] voice. You're a voice. There's a lot of architectural stuff. There's a lot of end time that see it, your solving, but where do you see voice setting?Guillaume: We had so many customers asking for voice. That's also why we wanted to build it.What's interesting in this domain is that. In a sense, if you take something simple like transcription it doesn't seem like something that should be very hard to do for a model. It's essentially, it's pattern recognition. It's classification on this. Models are very good at classifying, right?Or nonetheless, when you talk to them it's not there yet, right? It's not, you don't talk to them the same way you talk to a person. On something, maybe people don't realize it. It's in English it's still much better than in any user language, even compared to French instance. If you talk to this million in French, when you see people talking to this they'll talk very slow.They'll articulate as much as they can. So it's not natural, right? We're not yet to this. And I think, yeah, maybe the next generation will not know this, but yeah, I think people that. But our edge will actually always keep this bias speaking very slowly when they talk to this model. Even if maybe, probably in a couple of years, maybe next year it'll not be necessary anymore.But yeah. But what's interesting is to see that yeah, even for like languages [00:16:00] like yeah, French and Spanish Germans that are not no, no resource on religion. You have a lot of audios there on still it's not as good. And I think a consequence. Because then for this, I suppose just is not as much energy, as much effort that has been put done in some other mod that for some vision or like coding.But but yeah, there's still a lot of progress to be done. I think it's just a question of doing the work and it's clear path I think to get there.Pavan: It's a little fascinating because I worked on Google Assistant I think while back at this point, but it's, I think it's, it like when you take a step back, it's fascinating.It's not that long ago. It was like four years ago or five years ago, and it's now it's completely audio in, audio out and the function calling and the whole thing happens completely end to end. And in a very natural,swyx: yeah,Pavan: natural way and still ways to go. Kim was telling, even despite all the previous, it's not like you're speaking to a person.When you talk to any of these agents, bots, or voice mode kind of situation, it's still like a gap. I think that's the great part and I feel like with even the existing [00:17:00] stack, we should be able to get to this very natural speech conversational abilities soon enough I guess.And we'll also hope. I get thatGuillaume: on this kind of the next step, right? Because when you talk to these agents, like usually people are just writing to them and sometimes they'll this very clear, for instance, you are, you want to write code, but you are, you have a very clear idea of how you want the model to implement what you in mind.But so here you are able to spend a lot of time writing. So it's not really efficient on audio is really like a natural interface that is just not there yet, but I think it's just gonna be the place.Vibhu: How's it like building, serving, inferencing, like we see a lot about, it's very easy to take LMS off the shelf, serve them.Fine tuning, deploying. I know you guys have a whole you have Ford, you have a whole stack of customizing, deploying. Is there a lag in getting that. Like distribution channel. Are you helping? There is. So like prompting, lms, you can have them be concise, verbose, all that.They're built on LM backbones, these models. How do you see all that?Enterprise Deployment and PrivacyGuillaume: Yeah, I think this is a lot of what we're doing with our own customers. Very [00:18:00] often they come to us, so it's for different reasons. I think one reason is sometimes they have this lot of privacy concerns.They have this data that it's very sensitive. They don't want data to leave. The companies, they wanted to stay. Inside the company. So we have them deploy model in-house. So either on a, either on premise or on private cloud. So they're not worried that it's given to a third party on the there some leakage.Sometimes they have this kind of many companies have this different, sensitivity of data they have like sometimes channel chat can send it to the cloud has to stay there. So then it creates some kind of heterogeneous workflows where it's annoying. You cannot send some data to the cloud.This one you can, so here, when we actually deploy the model for them, they don't have this consideration. They are like not worried that, this is going to leak. Everything is much easier. So we help them basically do this on the, so it's one of the very proposition. But but the other is very often, when customers use this off the shelf close model, but very sad is that they are not leveraging, these data that have been collecting for four years or something for decades.So much data. Sometimes it's trillions of tokens of [00:19:00] data in a very specific domain. Their domain, which is data that you'll not find in the public, on the public internet. So data on which, like close model, we actually not have access to one, which that's going to be really good. So if they're using like closed source models are basically not benefiting from all these insights.All these data they have collected three years, they can always give it into the context that in France, but is never as good as if you actually train the modern analysis. So yes, that's basically what we help them to do. We actually provide them some purchase, basically what we announced at GTC this week.So we provide them with this, it's basically like a platform with a lot of tools to actually help them process data. Trained on that. Yeah, it's actually the same thing that we're using in the science team. So it's actually very better tested infrastructure, like a lot of efficient training cut base.For a quality pre-training like a fine tuning, even doing S-F-T-I-L. So we help them do this using the same tools as what our science team is building is using. So since it's tools that we've been using for two years now, it's really better tested. It's really sophisticated.So it's the same thing. We are giving to them, giving the company the same thing [00:20:00] that what are same still using internally actually build their own ai and it makes a really big difference. I think sometimes customers. And many in general don't realize how much better the model becomes when you fine tune it on your own data.And you can have a, your model is here. You start from there. You have a cross source model, which is sort here, but if you actually fine tune it can actually really go much further than this. And then you have a very big advantage. The model is trained on your entire company knowledge, so it knows everything.You don't have to feed like 10 K tokens of contact at every query. So it's it's much easier. It's a bit, I think using a closed source model is really sad because it basically puts. You are not leveraging all this data and you are going to be using the same model as all your old competitors when you're actually using, everything you have been collected for years, which is really valuable.So yeah. So we help basically customers do this. We have a lot of solution I mean deployed for engineers that go in the company that basically look at the problem customers are facing to look at what they're struggling to do what we should do to solve it. So we help them solve them together.So it's I think our approach is a bit different, but here. [00:21:00] Some of their companies and competitors, it's, we don't just release an endpoint on sale, do some stuff on top of that, or we don't just give a checkpoint. We really look very closely with customers. We look at the issues they have, we had them solve them.We really make some tailored solution for the client are facing. Some example are also going to be, sometime we have some customers. They really wanted to have a really good model, really performance on some, like Asian languages on the, if you take some of the shelf models, they can speak it, they can write in this language, but it's not amazing.This language would be like maybe zero 1% of the mixture. So it has been included during training, but very little. So what we did here is upgrade. We trained a new model for them, but so this language was 50% of the mix, so it's much, much stronger. It knows of the dialects, it knows the, so it's yeah.So it's some example of things we can do and it's really arbitrary, custom. I think you had some of their customers, for instance, they wanted some. They wanted some 3D model that can do audio with a very good function cable. So something you wanted to put in the car in particular, they wanted this to be offline because in a car you don't necessarily have access to internet.So [00:22:00] yeah. So here we can actually build the solutions. There is no like model out of the box on this. In the internet you have this very, you have this very general model generalist, like he's strong model. But for things like this, they always want at specific solutions and on some other reasons.Sometimes they come to us is because, like they, they experiment with some closed source model. They get some prototype. They're happy with what they build. They, it works well. They're happy with the performance, and then they want to go to production and then they analyze. But it's extremely expensive.You cannot push this. It's so then they come back to us on this. They can help us build the same thing as this, but using something much cheaper on here. And here we can sometime be something 10 x cheaper by just functioning a model and it'll be better OnPrem on their old server and also much cheaper as well.So yeah,swyx: that's the drop pitch right there. Take all themoney.Vibhu: And outside of that you do, we do put open wave models so people can do this themselves. I feel like not enough people go outta their way.swyx: They're not going to, they're gonna ask them to do it as the expert. IGuillaume: think initially we didn't know, [00:23:00] we wanted completely short at the beginning of the company because, I think our study was not exactly the same as what it is today, but what we underestimated initially is the complexity of deploying this model and connecting them to everything to be sure it has access to the company knowledge on the, and it was, yeah, on, we were seeing customers struggling with this, but it was even, that was three years ago and no, things are much more complicated because now you don't just have, text on SFT on a simple instruction following.You have reasoning like your agents, you have like tools. You have a multimodal audio, so it's much more complicated than before. And even back then it was hard for customers. So they really need, have some support and this is why actually providing like always some four D position as well. The processFine Tuning and Personalizationswyx: I'm curious is there also voice fine tuning that people do?Pavan: So in this forge we also have a say unified framework. And the hope is like the er speech to text that we released earlier this year. And even the ER chart that we released last year. And I think a big people, I think there's a big, rich ecosystem [00:24:00] of people fine tuning whisper, and people want the same thing with w so it's much stronger than Whisper.And yeah, the the platform offers that kind of fine tuning yeah, which could be any kind of fine tuning. Like for instance, even sometimes people want to support new languages to this, which are tail languages, which we hope to cover. Certain natively, but if there is a language where you data and you want to frank you, I think this is a good use case.Or the other use cases, you, it's the same language, like even English but it's in a very domain specific way.swyx: Yeah. Terminology, jargon, medical stuff.Pavan: Exactly. And also there's specific acoustic conditions like there's a lot of noise or the, and. The model will do decently in most conditions, but you can always make it better.And that those are some of the use cases where you can improve it e even further. And that's one good use case for this and for text to speech. We're just releasing it so we'll have support for that soon too. I think it's similar use case.Voice Personalization Pavan: It's little different the kind of things that you want to extend a [00:25:00] text to speech model to, which could be like voice personalization, voice adaptation for enterprises.Many enterprises need very specific kind of tone, very specific kind of like personality for this kind of voice. And all of those are like good use cases for fine tuning.swyx: This one I was gonna ask you, we never talked about cloning voice clothing here. How important is it, right?Like I can clone a famous person's voice. Okay. ButPavan: the main use case would be like for enterprise personalization, like enterprises need like a lot of customization. You don't want the same. Voice for all the enterprises. Each enterprise want a customized, specialized something which is representative both their brand and also their, I guess safety considerations and the use case I think the kind of thing that you would deploy as a empathetic assistant in the context of a healthcare domain would be very different from the kind of thing that would be in a customer support bot and would be different from like more conversational aspects.I think those are the. [00:26:00] Customizations you would expect from enterprise. And that's the main use case, at least from our side.Vibhu: My, my basic example is you don't want to call to customer services and have the same exact voice. It's just, it's gonna be weird.Long-Form Speech ModelsLong-Form Speech ModelsVibhu: But also on the technical side of this, so there's like a few things in TRO that I thought were pretty interesting.He's a big fan of this paper. Oh, he said very good paper. He said this is the best SR paper he's ever read. Yeah. I've hyped up this voice paper enough. We covered it. Somewhere, but a big thing. So Whisper is known for 32nd generation a 32nd processing. You extended this to 40 minutes. There was a lot of good detail in the paper about how this was done.Even little niches of how the padding is. So it's very much needed. You need to have that padding in there, the synthetic data generation around this. I'm wondering if you can share the same about the new speech to text, right? Text to speech. So how do you. How do you generate long form, coherent?How do you generate, how do you do that? And then any gems? Is there gonna be a paper?Pavan: Yeah. Yeah. They would be a technical report. Okay. Yeah. I think I could have a lot of details.Real-Time Encoder AdvancesPavan: But me I think the [00:27:00] summary of it, actually, some of the considerations in this paper were, because we started with the wipa encoder as the starting point, and now we have in-house encoders, like the bigger time model, for instance, which we released in January.Also release a technical report for that real time model as well, which is this dual stream architecture. It's an interesting architecture. You should check it out. And there we have a causal encoder and I don't think there's any strong, multilingual causal encoder out in the community. So we thought it's a good contribution.So that's one nice encoder there. Other people want to adapt. That's a good end code. And we train it from scratch. I think her. Post stack is now mature enough that we are able to train super strong ENC codes. And some of these considerations, like spatting and stuff, is a function of the Whisper ENC code.And now that we train encoders, inhouse the design concentrations are different.Scaling Context for TTSPavan: And for the question on text to speech, I think that's also leans onto the original auto aggressive decoder backbone. I think, it says very, almost identical considerations. I think the long context in it's not even long con, [00:28:00] so the model processes audio at 12.5 herds, so one second maps to like 12.5 tokens.So I think one minute is like 7.8 tokens. You can get like up to 10 minutes in eight K context window and get half an hour and 30 K context window. So that's and 30 2K context is something that's we are very comfortable training on. We can extend it even much longer. 1 48 K. Okay. You can naturally see how it can extend to even our long generations.Yeah. We need the. Like data recipe and the whole algorithm to work coherently enough through such long context. But the techniques are some way very similar to the text, long context modeling. And the key differences, it's just doing flow matching order regressively instead of a text open prediction.swyx: Okay. I think that was most, most of the sort of voice questions that we had. ButWhat Makes a Model SmallVibhu: I have a big question on Mr. Al, Mr. Small. So what is small? How do we define [00:29:00] small? What is this? What is this? I remember the days of Misal seven B on my laptop. The snuff fitting on my laptop. I could run it on the big laptop, butGuillaume: it's just additional.Question of terminology, like here what we did, baseball is north active parameters, but it's true. Really not give it another name, but yeah, we could have called it medium, but only, I,I suppose it's a model that we released mixture of experts. It's a model that combines different model before which we were doing the same, is that we had one model, general model for Israel. Doing instruction following, were like a separate model that was Devrel trial. So qu coding specify specific to code with another model for Reason Maal.So this were separate artifacts built by different team at trial on what we're doing is basically merging all of this. It was, you had pixel trial was the first vision model. We was like a separate model on the way we do things internally is that we have one team focus on one capability, build one model.On the means mature, mature enough, we decide to merge this into the [00:30:00] matrix. But here it was the first time we basically match all of this into one. But there are some other things we did at first time to merge time, for instance, like more capabilities or function coding I think would be, are, it's going to be much, much better in this trial, small platform.But but yeah, so it's our latest model on the working is,Vibhu: and yeah, key things is it's very sparse. Six, be active pretty efficient to serve. 2 56 K context. Yeah,Merging Capabilities vs Specialistsswyx: I think what's interesting is just this general theory of developing individual capabilities in different teams and then merging them.Where is this going gonna end up?Vibhu: Like we've seen the five things put together in this. Yeah. What are the next five teams?swyx: I think actually OpenAI has gone away from the original four Oh. Vision of the Omni model. This was what they were selling. All modalities and all modalities out.But I feel like you might do it.Guillaume: I think there's some mod where it's not competitive use, for instance for audio. For audio here, if you want to do transcription, I think it makes no sense to use a model. If you just want to trans tech it, it'll be very inefficient. If you want to do audio, you probably just want to be the [00:31:00] one VR 3D model performance essentiallyswyx: the same.It's going to be incredibly cheaper. So here, that's why we wantGuillaume: to have a separate but just does this. Yeah, I think the question is just, yeah. If you are to, to your model. By speech and you asking like a very complex questions on how you do this on the, just to cascade things. Do you want to put a d in a model that has like a one key around it?It's like a, not a competitive discussion, I think unaware if you doing into the direction, but that's possible. Of course. But yeah. But I think for us, the next capabilities we want to try to integrate into these models when we are going to be yes, like marketing or no reasoning better, I think more capabilities that people don't talk too much about, but at high bottom, I think for our customers in our, on different industries, for instance, things are around like a legal computer.I design all these things that is this males out of the box are to put at that. Because people, if you don't prioritize this, there is not like too benchmark on that. Butswyx: this done how toGuillaume: make this good and this just start to do the work. Extracting some that processing it [00:32:00] expression. So yeah.But we are offering the imagine to this.swyx: I think for voice. Yeah. The key thing I think over maybe like the last year or so with VO and gr Imagine and all these things is joining voice with video, right? Which people don't understand spatial audio because like most TTS is just oh, I'm speaking to a microphone in perfect studio quality.But when you have video, like the voice moves around.Pavan: That's true. The constitution was a little different in the sense that there it's like a a standalone artifact where you get the whole thing and you consume it. But in a conversational setting, it's a, you need the extreme low latency.swyx: Yeah,Pavan: streaming would be one of the primary concentrations.swyx: You can build a giant company just doing that, right? So you don't need to do the voice, but I was just know on the theme of merging modalities, that is something I, I am like, wow. Like I didn't, everyone up till, let's say mid last year was just doing these like pipelines of okay, we'll stitch a TTS model with a voice thing and a lip sync [00:33:00] thing and what have you.Nope. Just giant model. Yeah.Open Source MissionVibhu: I have a two part question. So one is, it's still open. It seems like open source is still very core to what you guys do and I just have to plug your paper. Jan 2024. This is the one trial of experts like. Very fundamental research on how to do good.Moes paper comes out very good paper for anyone. That's just side tangent. No.swyx: This thing caused, we bring back, eight by 22 was like the nuclear bomb for open source. I think it takes Shouldn be more seven B more. Yeah. Yeah. But this is a bigger opposite than me.Yeah. Yeah I don't remember this. I remember, I don't think it was January, right? It was like new reps it was, it dropped during new reps and everyone in Europes was December of 25th, I think. Yeah. The model was did as well.Vibhu: It's just a little update probably.swyx: Yeah. No, but you have a point to make.Vibhu: No, you gotta check that. But then, I just want to hear more broadly on open source for you guys, and when you had asked earlier [00:34:00] about what's next, what are the other, side tapes working on you. You put out Lean straw. This,swyx: it's not necessarily surprise. I was like, I don't, this doesn't fit my mental model or Misra.Guillaume: Yeah. First for open source in general, I think it's really something which looks to the January of the company. I think we started it per once, is we so we have open sourcing with, since the beginning and even before this. So before this, so me and Tim were at Meta, we released LA and I think what was really nice.To see that before this, for most researchers like universities, it was impossible to work on elements. There was no alien outside. And if you look at many of the techniques that were developed after, for instance, was open source all this post-training approaches like even DPOD, like preference optimization, all of this were done by people that had access to this portal.And it'll have been impossible to do without this. So it's really making sense, move faster. So we really want to contribute to this ecosystem. I think like the deep and also like very lot of impact. All these papers that are I think in the open source community are really helping the science community as a whole to move faster.So [00:35:00] we want contribute to this ecosystem. That's why we're releasing very detailed technical reports. So ma trial and our first reason model, and ation, lot of results, things that work, things that did not work as well. Think helpful on the, yeah, so for the audio model also to share a lot of details, share of them for real time model.And the, yeah, so we really want to continue this, basically belong to this community of people who share science. I think we really don't want to be, leading in a world where the smartest model, the best models are only behind, close doors. Only accessible to a shoe companies that we, as a power to decide we can use them on it.I think it's a scary future. We don't want to live in, we really want this model to be accessible to anyone that want. Intelligence to be used unaccessible by anyone who can use it. So yeah, so that's why we are pushing this mission and source model. Yeah. So not, so yeah, no strategy. So it's open source, not the first model, so not the best on the Yeah.Lean and Formal ProofsGuillaume: LIN trial I think is also one step into this direction. So it's yeah, a bit different than what we are usually releasing. But we have a small team internally [00:36:00] working on them. Formal proofing, formal math. So I think a subject we care about in general and we were working on reasoning. I think we started too early before doing reasoning without LMD is very hard, especially when you work with formal systems because the amount of data you have is negligible.It's addressable community of people writing like formal proofs. But the reason why we like it is because I think there is if you look at what people are doing with reasoning, is there, the problems that you can use. Are usually going to be problems where you can verify the output. So for instance, all this ai ME problem where the solution is a number between 100, like a thousand.So you can verify, compare this with a reference or it's an expression. You can actually compare the output expression generic with the reference. But there are many, most of them have problem and most of the reason problem. There is no like way to easily verify the solution. If the question is show that F is continuous, cannot compare in the reference, right?If it's a probe that this is true or probes is properties, there is no way to. You cannot act, simply verify the correctness of your proof. So it's hard to apply the, there is no referable reward here. So [00:37:00] what you could provide is of course, like a judge and judge that will look at your proof. But it's very hard and it's very, you could do certain, some reward hacking happening there.So it's difficult. You could provide like a reference proof, but then there are also many ways to prove the same thing. So if the model says give negative reward because it's a different poop, maybe it was still digit proof, just different. So it's not going to work well. What's nice with lean and with formal probing is that you don't have to worry about this whatsoever.We just,swyx: they're all function is largely compiles in lean is functionally the same. Exactly.Guillaume: It's like a problem if it compiles it's correct. It's very easy. And you can apply this and then you can,swyx: it's just way too small. So no human will actually go and do it.Guillaume: Yeah, that's exactly.It's the only people can do it. It's like a very small committee of people doing a PhD on that. So it's super small. And it's sad because it's actually very useful on not just mat, but also in software verification. So for instance, software verification today. So tiny market. Very few industries work on this and we need that.It's usually going to be like companies like building airplanes, air robotics,swyx: likeGuillaume: things [00:38:00] where they absolutely want to be sure. Life depend on this, but it's very rare that people formally verify the correctness of their software. But I think one of the reasons for this is simply that it's just hard to do.swyx: Are you think of TLA plus? It's the language that some people do for software verification? No. That people use in a ference, but but yeah, it's the reason I think why people don't use it more and why this industry is not as big as could be is because it's very hard. But now with cutting edges that are there, it's going to be very different.Guillaume: We're going to see much more of this. So I think yes, industry there is going to be much larger in the future that we, these models. So yeah. Here also anticipating this a little bit, we wanted to work on that because it's proving like a math theory and like a, essentially the same tools.swyx: Yeah.Reasoning Transfer and Agentsswyx: One of my theories is that because the proofs takes so long, it's actually just a proxy for long horizon reasoning and coherence and planning. Maybe a lot of people will say okay, it's for people who like math. It's for being okay. It's like a niche math language. Who cares? But actually, and you use this as part of your data mixture for [00:39:00] post-training and reasoning, actually, it might spike everywhere else.Yeah. And I think that's un under explored or no one's like really put out a definitive paper on how this generalizes.Guillaume: Yeah, absolutely. AndPavan: I think evenGuillaume: that's what we're seeing already. For instance, you should do some reasoning on math as then the American should do reason even.Yeah. In the early stage. So we, the, there is some transfer, some sort of emergence that happens. And I think some, it's also interesting, it's not just I think the topic in general, but it's, there is a lot of connection with this on including agents because. Sometimes the model can see like a three that it has to prove it's very complex, but then it can take the initiative to say, I'm going to prove this three lr.I'm going to suggest three Rs, and I'm going to in parallel prove each R. So three of them in parallel with sub agents, but I'm also going to prove them in theory and the three tool so you can do this also. Pretty interesting. You can, even if you fail to put one of the LeMar, you can actually, maybe you succeed to put the normal lema too, so you get some possible reward here.So it's a bit less Spartan issue, just get to zero one for the entire thing. [00:40:00] So it's pretty interesting. I think we can actually,Vibhu: yeah, it's also an interesting case just for specialized models in general, right? Like the cost thing you show is pretty interesting yeah, similar score wise, you are, thirty, seventy, a hundred fifty, three hundred bucks.Smaller.swyx: I think cost is a bit unfair, right? ‘cause this one is at like inference cost. It's always there on top with their margins on top of it. But, we don't know anything else, so we gotta figure it out.Vibhu: Okay.Next Frontiers in TrainingVibhu: I did wanna actually push on that more. Not on cost, but you mentioned about, okay, it's a great way to have verifiable long context reasoning.What are other frontiers that, I'm sure you guys are working on internally, there's a lot of push of people pushing back on pre-training. Scaling, RL pushing, compute towards having more than half of your training budget. All on rl. Where are you guys seeing the frontier of research in that?Guillaume: You mean theVibhu: just in foundation model training in the next, one thing that you guys do actually is you do fundamental research from the ground up, right? So you probably have a really good look at where you can [00:41:00] forecast this out.Guillaume: Yeah. I think for us we're still working a lot on the pre-training side.I think we are very far from situational, the pre-training. I think ML four preprinting will be like big step compared to everything we have done before. So we are pretty excited about this. And I think on the other side, I think now we have more and more to think about this algorithm that will actually support this very long trajectories.I think when it was, for instance, GRPO for it doesn't really work this any bit of policy. Which was okay initially because you are solving math problem that can be solved in like a few thousand tokens. So the model can alize them pretty quickly. So when you do your update, the model is never too far off.It's never too far off. But now when you are moving towards this kind of problems where certain takes hours, like six hours to get a reward, then your model is co pick places. So you have bi new infrastructure that supports this, but also new A, so now everything we're doing internally, we're trying to. Build some infra that we actually anticipate is what we have in six months, one now, which is this extremely no scenarios on the, I think when we started Missal, part of me and [00:42:00] we wanted to, is very nice under element where people are there, they can do research, they like with a lot of resources.So it was nice. I think things changed a lot when I think when J Pity came out. I think after that I think was. This one is same again. But but yeah, but it was nice. And I think we also want to work part of this descrip beforeswyx: coming to the end.Hiring and Team Footprintswyx: We're just, obviously, I think you guys are doing incredible work.You've, they are a very impressive vision for open source and for voice. What are you hiring for? What's the what are you looking for that you are trying to join the company?Guillaume: Yeah, so we are hiring a lot of people in our sense team. We're hiring, in all our offices. So we have a, our H two is in France in Paris.We have a small team in London. We like a team in Pato as well. Co we open some offices in in SAU, in Poland. So one in Zurich. We also like some presence in New York as well on Sooner one in San Francisco. So we all bit either way also like hiring remotely. So we're going the team trying to hire like very strong people.I think we want to stay, so the team is not. Instead of fairly small team. [00:43:00] But I think we want to keep it that way. ‘Cause we we find it quite efficient. So like a small team they agile so yeah.swyx: Okay.AI for Science Partnershipsswyx: Let's focus on science and the forward deployed. We actually are strong believers in science.We started the our new science pod that focuses specifically on the air for science. What areas do you think are the most promis.Guillaume: What we're pretty excited about right now, and something we have already started doing or that we'd probably be able to share more about this in a couple of months, is that we are exploring AI for science.And there are a lot of areas where we think that you could get some extremely promising buzz. If you were to apply AI in these domains. There are a lot of long inputs. You just have to find these domains where actually AI has not been yet applied, and it's usually hard to do because the people working in those domains don't necessarily know the capability of these models.They don't know. How I would just have to pair them with Yeah, exactly. Your researcher slashing, which is actually hard to do. But this matching, we're doing it naturally with our customers. So we have some company we are very closely with. So for instance, ISM Andreesen are one of our partners, so we're doing some research with them on their other, like tons of extremely interesting problems.Columns in physics, in [00:44:00] science matter science that they're essentially the only ones to work on. ‘cause they're doing something No, no one else is doing on the, yeah. So there are many domains where AI can actually revolutionize things. Just you have to think about it on you familiar with what can do or to apply it.So yeah, it's something where more modeling with our partners, with our customers sort AI for s, but.swyx: Yeah. Okay.Forward Deployed Skillsswyx: And then for deployed what it makes a good four deployed engineer, what do they need? Where do people fail?Guillaume: I think it's usually you need people that are very familiar with the tech and not necessarily with a lot of research expertise, but that are actually pretty good at using this model that can actually like that know how to do functioning, that know how to like, start some error pipeline.And it's it's not easy. It's something that mucus. Majority of companies will not be able to do this on their own. So here I think we need people that are, that like to solve problems that are accept solving some complex, very concrete problem. It's applied science basically.And yeah, so I think it's not too different. I think from the case you need in research because it's essentially you are trying to find solutions to problems that in [00:45:00] customers have not yet. So sometimes it's easy. Sometimes you're here to do the work. You have to like create synthetic data.Find some edge case. So it can be, yeah. Depends on the problem. But but yeah, you have to, I think it also a bit of patience on the be creative. I think very similar skill is Asian,Pavan: the diversity of the work they do. It always surprises me. It's it's, it goes all the way from the kind of stuff they encounter in industries.It's just very interesting. I think.swyx: Any fun like success anecdotes.Guillaume: Yeah, it can be actually training this small model on edge that just we do one specific thing can be like training some very large model without some specific languages as well. Making models really good at some tube use, like for instance, computer ID design, these kind of things.Is that pairing with vision as well? Yeah,Pavan: and the fact detection for chips or like in, in factories identifying things like it, the. Diversity could be anything where you can deploy these foundation models. So yeah the work to make it work in that specific setting, basically whatever it takes to make it like add value in that, by the way, workflow.Vibhu: Yeah. [00:46:00] And it goes across the stack, right? Like even just pulling up the website like.swyx: It's so broad on compute. It is so broad.Vibhu: We didn't even touch on if you have a coding CLI tool. One thing you guys were actually like, I think the first tool was agents, ral agents. You had the agent builder, you can serve it via API and all that.And I'm guessing forward deploy people.Guillaume: Yeah.Vibhu: Help build that out and stuff.Customer Feedback LoopGuillaume: It is also why we are, so we're doing many things, but I think that's also part of the value proposition that sometime know customers. They're always very. Extremely careful about their data and they don't want to, they don't like, trusting so many partners, trusting one partner for code, giving the data to another third party for like audios and another one.So they don't like this here. What they really like with our approach that we can help them on anything so they don't have to send the data to so many clouds. So yeah,swyx: I think that there can be many orders of magnitude more. F Ds then research scientists and they don't need your full experience, but they're still super variable to customersGuillaume: in practice.These two teams [00:47:00] are still quite intertwine, very often. Yeah. So first of all, they're using the same tools, the same data pipeline and everything on the, it's it's very helpful for the science team to get the feedback and the solution team ‘cause they can. Look at these customers are trying to do this.This is not working. It can really be show in the next version. Yeah. But this is basically a real world eval. Yeah, it's real world eval and it's not something, for instance, if you're just working in the lab, it's just ships model. But you don't do this work of for customers. You have no idea for whether your model is good at this H case.For instance, you even in year found this, right? So yeah, there is a very gap, big gap between the public benchmarks that are very like academic. OnPavan: the rare cases are just very diverse and in the specific concept of a customer, you can fine tune and make it like first evaluate, create a solid eval, benchmark, and then measure in the context of their, the kind of audio.Like for instance, one use case is literally just, there's the word for kids and they have to just say it out. It's a very specific thing. You're just saying one word and then you have to you, you'll grade the kid whether they did it right or not. It's [00:48:00] like R for, but so there're very diverse use cases and the idea is that they, the.Applied scientist engineer will go and make it better. And then from the learnings we incorporate it into the base model itself. So it's it's just better out of the box.Vibhu: Yeah. It's a good full circle system. Like the foundation model evals are all just proxies of what you really, you're never gonna have one that says it, it doesn't make sense for there to be, a one word transcription like that.It's not something you wanna fit on. Perfect.Wrap Up and Thanksswyx: Everyone should go check out everything that Michelle has to offer and try the TTS model, which will link in the show notes. But thank you so much for coming tha thanks. Such a stretch. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Send us Fan MailKris Baker and The Athletic's Matthew Fairburn discuss everything Buffalo Sabres. Follow the show and be sure to read Matthew's latest at The Athletic: https://www.nytimes.com/athletic/7125387/2026/03/24/buffalo-sabres-jarmo-kekalainen-impact/
Petter stör sig på ateisters favoritargument för guds existens och Charlotte har hittat nya fel hos Parisa Liljestrand. Veckans ord är ära och förklaringsmodell. Hosted on Acast. See acast.com/privacy for more information.
Our speaker is Brig. Gen. (Res.) Eran Ortal is a former Israeli Defense Forces officer who previously served as the Commander of the Dado Center for Interdisciplinary Military Thinking in the IDF Operations Directorate. Today he is the Head of the Military Program at the Begin–Sadat Center for Strategic Studies (BESA Center) and a visiting scholar at the American Foreign Policy Council (AFPC). Eran is the author of the book “The Battle Before the War.”I want to learn from Eran about how the American and Israeli military have moved their command center to the battlefield so that the target can be destroyed before the Iranians have time to react. Get full access to What Happens Next in 6 Minutes with Larry Bernstein at www.whathappensnextin6minutes.com/subscribe
Is belief in Jesus Christ based on blind faith… or credible evidence?In this video, we examine the historical, scientific, and philosophical evidence surrounding Christianity and the resurrection of Jesus. From the origin of the universe and the fine-tuning of physical laws to the historical case for the resurrection, we explore arguments that many historians, scientists, and philosophers take seriously.This video examines:• The origin of the universe and the Kalam Cosmological Argument• Fine-tuning in the laws of physics• The mystery of DNA and biological information• Historical evidence for Jesus of Nazareth• Non-Christian sources like Tacitus and Josephus• Gary Habermas' Minimal Facts argument for the resurrection• N.T. Wright's historical analysis of the early Christian movement• Why the resurrection of Jesus remains one of the most debated events in historyChristianity stands or falls on the resurrection of Jesus. If it happened, it changes everything.Whether you're a believer, skeptic, or simply curious, this video invites you to explore the evidence and decide for yourself.⸻
Wintery Knight and guest host Terrell Clemmons welcome Dr. Brian Miller to discuss the the evidence for fine-tuning in physics and cosmology and the multiverse theory. They discuss how the laws and constants of nature suggest intentional design. Miller explains specific examples of fine-tuning and critiques the main naturalistic explanation for this data: the multiverse theory. He recounts his shift from skepticism to accepting design via evidence. Please subscribe, like, comment, and share. Show notes and transcript: https://winteryknight.com/2026/03/07/knight-and-rose-show-73-brian-miller-fine-tuning-and-the-multiverse-theory Subscribe to the audio podcast here: https://knightandrose.podbean.com/ Audio RSS feed: https://feed.podbean.com/knightandrose/feed.xml YouTube: https://www.youtube.com/@knightandroseshow Rumble: https://rumble.com/c/knightandroseshow Odysee: https://odysee.com/@KnightAndRoseShow Music attribution: Strength Of The Titans by Kevin MacLeod Link: https://incompetech.filmmusic.io/song/5744-strength-of-the-titans License: https://filmmusic.io/standard-license
Enterprise LLMs: RAG vs Fine‑Tuning, IDP & Governance In this episode of the Mostly Unstructured podcast, Ed and Clay discuss whether it's better to train a domain‑specific LLM or leverage foundational models like ChatGPT, Gemini and Claude. They explain the trade‑offs between fine‑tuning and retrieval‑augmented generation (RAG), and why Intelligent Document Processing (IDP) is vital for turning unstructured data into usable context. In this discussion, we cover: Why training your own LLM is risky and often unnecessary compared to adopting and building from a foundational model. How retrieval‑augmented generation (RAG) delivers more accurate results than simple fine‑tuning. The importance of Intelligent Document Processing (IDP) for ingesting unstructured data and building domain context. Real‑world lessons on AI governance, including the Air Canada bereavement‑policy chatbot case. Managing bias, hallucinations and toxicity in enterprise models. Measuring your return on AI investment. For those thrown by the excessive acronyms, let's define:LLM = Large Language ModelRAG = Retrieval‑Augmented GenerationIDP = Intelligent Document Processing. For more insights on enterprise AI for data intelligence, visit our website and read our blog on training an LLM referenced in the episode.Website: https://www.keymarkinc.com/Blog: https://www.keymarkinc.com/how-to-tra...
In this episode of Derms and Conditions, host James Q. Del Rosso, DO, welcomes David Seiter, FNP-C, for a wide-ranging discussion on challenging dermatologic dogma and integrating emerging evidence into clinical decision-making. They begin with Seiter sharing his approach to reviewing new literature, encouraging clinicians to look beyond mainstream dermatology journals to cross-disciplinary publications to help reshape long-held assumptions. Using lichen planus as an example, he revisits the entrenched association between diffuse lichen planus and hepatitis C. While many clinicians routinely test for hepatitis C in these patients, new data suggest the association is uncommon. More compelling, however, is the emerging link between persistent, widespread lichen planus and underlying malignancy. Seiter outlines how he thoughtfully screens for red flags and gaps in preventive care without alarming patients prematurely, reinforcing the importance of looking beyond a single lab test. The conversation then shifts to acanthosis nigricans, where traditional teaching centers on hyperglycemia and diabetes risk. Seiter explains why acanthosis nigricans is more accurately viewed as a marker of hyperinsulinemia rather than elevated A1c. He discusses incorporating HOMA-IR calculations to identify early insulin resistance, particularly in adolescents whose A1C may remain normal for years. Both clinicians stress that a “normal” A1C should not prematurely reassure patients when cutaneous markers signal metabolic risk. Additional topics include reconsidering intralesional triamcinolone as the default therapy for keloids, with discussion of emerging data on intralesional insulin as a potentially lower–adverse event alternative, and a pragmatic conversation about JAK inhibitor safety. Comparing adverse event data across agents, they emphasize individualized risk assessment, careful monitoring, and shared decision-making over reflexive fear of boxed warnings. Tune into the episode to explore how questioning assumptions, broadening your literature review, and contextualizing risk can sharpen your clinical reasoning and elevate patient care in everyday dermatology practice.
The first version of the Rezzimax was a handheld device that uses vibration to fine-tune your nervous system, was a great start to help thousands of people. Soon, feedback came pouring in from people suffering from conditions like TMJ, stress, anxiety, depression, ADHD, and more. Using the device for just a few minutes a day was helping to drastically reduce chronic pain for many people. That was not enough. We knew that if we could perfect the Tuner, we could help so many more individuals learn to live our motto: "Tune Out Pain. Tune Into Life." So today we have the Rezzimax - Tuner pro2.
The first version of the Rezzimax was a handheld device that uses vibration to fine-tune your nervous system, was a great start to help thousands of people. Soon, feedback came pouring in from people suffering from conditions like TMJ, stress, anxiety, depression, ADHD, and more. Using the device for just a few minutes a day was helping to drastically reduce chronic pain for many people. That was not enough. We knew that if we could perfect the Tuner, we could help so many more individuals learn to live our motto: "Tune Out Pain. Tune Into Life." So today we have the Rezzimax - Tuner pro2.
This content has been developed for healthcare professionals only. Patients who seek health information should consult with their physician or relevant patient advocacy groups.For the full presentation, downloadable Practice Aids, slides, and complete CME/MOC/AAPA/IPCE information, and to apply for credit, please visit us at PeerView.com/TCC865. CME/MOC/AAPA/IPCE credit will be available until January 29, 2027.Rethinking Pulmonary Fibrosis: From Fine-Tuning Diagnosis to Illuminating Novel Pathways for Emerging Treatment Strategies In support of improving patient care, PVI, PeerView Institute for Medical Education, is jointly accredited by the Accreditation Council for Continuing Medical Education (ACCME), the Accreditation Council for Pharmacy Education (ACPE), and the American Nurses Credentialing Center (ANCC), to provide continuing education for the healthcare team.SupportThis activity is supported through an educational grant from Bristol Myers Squibb.Disclosure information is available at the beginning of the video presentation.
This content has been developed for healthcare professionals only. Patients who seek health information should consult with their physician or relevant patient advocacy groups.For the full presentation, downloadable Practice Aids, slides, and complete CME/MOC/AAPA/IPCE information, and to apply for credit, please visit us at PeerView.com/TCC865. CME/MOC/AAPA/IPCE credit will be available until January 29, 2027.Rethinking Pulmonary Fibrosis: From Fine-Tuning Diagnosis to Illuminating Novel Pathways for Emerging Treatment Strategies In support of improving patient care, PVI, PeerView Institute for Medical Education, is jointly accredited by the Accreditation Council for Continuing Medical Education (ACCME), the Accreditation Council for Pharmacy Education (ACPE), and the American Nurses Credentialing Center (ANCC), to provide continuing education for the healthcare team.SupportThis activity is supported through an educational grant from Bristol Myers Squibb.Disclosure information is available at the beginning of the video presentation.
This content has been developed for healthcare professionals only. Patients who seek health information should consult with their physician or relevant patient advocacy groups.For the full presentation, downloadable Practice Aids, slides, and complete CME/MOC/AAPA/IPCE information, and to apply for credit, please visit us at PeerView.com/TCC865. CME/MOC/AAPA/IPCE credit will be available until January 29, 2027.Rethinking Pulmonary Fibrosis: From Fine-Tuning Diagnosis to Illuminating Novel Pathways for Emerging Treatment Strategies In support of improving patient care, PVI, PeerView Institute for Medical Education, is jointly accredited by the Accreditation Council for Continuing Medical Education (ACCME), the Accreditation Council for Pharmacy Education (ACPE), and the American Nurses Credentialing Center (ANCC), to provide continuing education for the healthcare team.SupportThis activity is supported through an educational grant from Bristol Myers Squibb.Disclosure information is available at the beginning of the video presentation.
I'm currently soaking up newborn snuggles during my maternity leave, so over the next few weeks I'm re-airing a few of my most downloaded and requested episodes of all time.This episode originally aired on February 17, 2025 - but it's still full of relevant strategy! If your brain is buzzing with product ideas (and your calendar is packed with launch dates)… but you're not sure how much newness your brand really needs or how often you should be dropping collections—this episode is for you.We're chatting about what buyers actually want to see from your product assortment, how to stop overwhelming your customers and yourself, and how to bring in just the right amount of newness to stay relevant without staying stuck on the launch hamster wheel.Here's the deal: more products ≠ more profit. And in this episode, I'm giving you the exact questions to ask yourself to sharpen your strategy, clean up your assortment, and fine-tune your collection calendar in a way that makes sense for your brand.Whether you're planning your next product drop or rethinking your whole line, this episode will help you zoom out and make strategic decisions like a buyer—not just a creator.What you'll learn in this episode:Exactly how many collections you need per yearThe ideal timing for your major and mini product dropsHow to balance launch energy to avoid burnoutWhat wholesale buyers and retailers actually want from your newnessHow to evaluate whether your product ideas are adding real value5 questions to ask before adding a new product to your lineWhy bestsellers deserve a second (and third) spotlightWhat needs to go away before you bring something new inEnjoy the chat! LOOKING TO GROW YOUR WHOLESALE BUSINESS?Retail Pitching
How much should you really have in stocks vs. bonds — and what happens when the market turns south with a vengence?In Boot Camp #4, we break down the fine-tuning asset allocation tables that show exactly how different combinations of equities and bonds have performed from 1970 through 2025. This episode goes beyond average returns and dives into what investing actually feels like during the worst 3-month, 12-month, and 60-month market declines.You'll learn:Why equities have historically dominated bonds for long-term retirement investingHow the S&P 500 compares to diversified strategies like the Four-Fund portfolioThe real impact of worst-case drawdowns (including 50%+ bear markets)What happens to a 100% stock portfolio during retirement withdrawalsHow 50/50, 60/40, and other stock-bond allocations reduce volatilityWhy median returns matter — and why averages can misleadHow to control risk through asset allocation, low costs, tax efficiency, and index investingWe explore real historical data — including the 1973-74 bear market, the 2000-2002 tech crash, and the 2008 financial crisis — to help you understand both accumulation and retirement distribution phases.Whether you're in your 20s building wealth, in your 50s preparing for retirement, or already retired and managing withdrawals, this episode helps you align your portfolio with your risk tolerance, return needs, and long-term financial goals.If you want to be a confident do-it-yourself investor — without paying a 1% management fee — this episode gives you the framework to make informed decisions about stocks, bonds, diversification, and risk control.Watch Boot Camp #4 video
On the Uplevel Dairy Podcast, Peggy Coffeen talks with Curtis Gerrits and Jim Moriarty of Compeer Financial about why benchmarking is essential for dairy farms, especially as year-end financials become available, milk prices soften, and recent beef-on-dairy income may have masked underlying costs. They explain benchmarking as first comparing a farm to itself over time, then comparing to a larger peer dataset of similar farms to identify strengths and small opportunities across income and expenses that can add up. Key areas discussed include feed cost and productivity (including homegrown forages like corn silage and increased use of alfalfa), feed efficiency factors such as refusals and mixing time, and the importance of working with nutritionists and local crop partners. They highlight core benchmarks such as capital cost per hundredweight and labor cost per hundredweight, how capital and labor relate when making investments, and improvements in net herd replacement costs driven by lower herd turnover, fewer heifers raised, and more beef calf sales. They conclude with takeaways to embrace financial management and benchmarking, keep moving forward during down cycles, and note that top-performing dairies succeed through attention to detail, execution, regular decision-making, and involving family, key employees, and advisors by sharing financial results.This episode is sponsored by Compeer Financial.Compeer Financial is a member-owned Farm Credit cooperative serving and supporting agriculture and rural America. Their dairy team brings world-class expertise and tailored solutions to support dairy producers' financial goals and lending needs.Visit https://www.compeer.com/specialists/dairy00:00 Why Benchmarking Matters Right Now (Year-End Numbers + Softer Milk Prices)04:05 Benchmarking Basics: Compare to Yourself, Then to Peer Groups07:22 Big Levers: Feed Costs, Efficiency, and Milk Components08:59 Homegrown Forages & Feed Management: What to Optimize11:38 Core Benchmarks to Watch: Capital Cost, Labor, and Replacement Rates16:18 Turning Data Into Action: Consistency, Clean Categories, and Advisory Teams20:45 Key Takeaways for Dairy Strong: Embrace the Process & Keep Moving Forward22:56 What Top-Performing Dairies Do Differently (Attention to Detail + Team Buy-In)27:31 Wrap-Up & Resources
Enjoy. Order Shroud-PilledOrder God's Eye View: https://a.co/d/7CI89rvBuy the Audiobook: https://www.audible.com/pd/Gods-Eye-View-Audiobook/B0F55K2GT1?source_code=ASSGB149080119000H&share_location=pdpWant to publish a book? Check out my publisher https://hemisphericpress.com/Check out our ad free substack: https://hemisphericpress.substack.com/Email feedback to godseyeviewbook@gmail.com
Enjoy. Order Shroud-PilledOrder God's Eye View: https://a.co/d/7CI89rvBuy the Audiobook: https://www.audible.com/pd/Gods-Eye-View-Audiobook/B0F55K2GT1?source_code=ASSGB149080119000H&share_location=pdpWant to publish a book? Check out my publisher https://hemisphericpress.com/Check out our ad free substack: https://hemisphericpress.substack.com/Email feedback to godseyeviewbook@gmail.com
Skip Richter answers your questions all morning long!
From Palantir and Two Sigma to building Goodfire into the poster-child for actionable mechanistic interpretability, Mark Bissell (Member of Technical Staff) and Myra Deng (Head of Product) are trying to turn “peeking inside the model” into a repeatable production workflow by shipping APIs, landing real enterprise deployments, and now scaling the bet with a recent $150M Series B funding round at a $1.25B valuation.In this episode, we go far beyond the usual “SAEs are cool” take. We talk about Goodfire's core bet: that the AI lifecycle is still fundamentally broken because the only reliable control we have is data and we post-train, RLHF, and fine-tune by “slurping supervision through a straw,” hoping the model picks up the right behaviors while quietly absorbing the wrong ones. Goodfire's answer is to build a bi-directional interface between humans and models: read what's happening inside, edit it surgically, and eventually use interpretability during training so customization isn't just brute-force guesswork.Mark and Myra walk through what that looks like when you stop treating interpretability like a lab demo and start treating it like infrastructure: lightweight probes that add near-zero latency, token-level safety filters that can run at inference time, and interpretability workflows that survive messy constraints (multilingual inputs, synthetic→real transfer, regulated domains, no access to sensitive data). We also get a live window into what “frontier-scale interp” means operationally (i.e. steering a trillion-parameter model in real time by targeting internal features) plus why the same tooling generalizes cleanly from language models to genomics, medical imaging, and “pixel-space” world models.We discuss:* Myra + Mark's path: Palantir (health systems, forward-deployed engineering) → Goodfire early team; Two Sigma → Head of Product, translating frontier interpretability research into a platform and real-world deployments* What “interpretability” actually means in practice: not just post-hoc poking, but a broader “science of deep learning” approach across the full AI lifecycle (data curation → post-training → internal representations → model design)* Why post-training is the first big wedge: “surgical edits” for unintended behaviors likereward hacking, sycophancy, noise learned during customization plus the dream of targeted unlearning and bias removal without wrecking capabilities* SAEs vs probes in the real world: why SAE feature spaces sometimes underperform classifiers trained on raw activations for downstream detection tasks (hallucination, harmful intent, PII), and what that implies about “clean concept spaces”* Rakuten in production: deploying interpretability-based token-level PII detection at inference time to prevent routing private data to downstream providers plus the gnarly constraints: no training on real customer PII, synthetic→real transfer, English + Japanese, and tokenization quirks* Why interp can be operationally cheaper than LLM-judge guardrails: probes are lightweight, low-latency, and don't require hosting a second large model in the loop* Real-time steering at frontier scale: a demo of steering Kimi K2 (~1T params) live and finding features via SAE pipelines, auto-labeling via LLMs, and toggling a “Gen-Z slang” feature across multiple layers without breaking tool use* Hallucinations as an internal signal: the case that models have latent uncertainty / “user-pleasing” circuitry you can detect and potentially mitigate more directly than black-box methods* Steering vs prompting: the emerging view that activation steering and in-context learning are more closely connected than people think, including work mapping between the two (even for jailbreak-style behaviors)* Interpretability for science: using the same tooling across domains (genomics, medical imaging, materials) to debug spurious correlations and extract new knowledge up to and including early biomarker discovery work with major partners* World models + “pixel-space” interpretability: why vision/video models make concepts easier to see, how that accelerates the feedback loop, and why robotics/world-model partners are especially interesting design partners* The north star: moving from “data in, weights out” to intentional model design where experts can impart goals and constraints directly, not just via reward signals and brute-force post-training—Goodfire AI* Website: https://goodfire.ai* LinkedIn: https://www.linkedin.com/company/goodfire-ai/* X: https://x.com/GoodfireAIMyra Deng* Website: https://myradeng.com/* LinkedIn: https://www.linkedin.com/in/myra-deng/* X: https://x.com/myra_dengMark Bissell* LinkedIn: https://www.linkedin.com/in/mark-bissell/* X: https://x.com/MarkMBissellFull Video EpisodeTimestamps00:00:00 Introduction00:00:05 Introduction to the Latent Space Podcast and Guests from Goodfire00:00:29 What is Goodfire? Mission and Focus on Interpretability00:01:01 Goodfire's Practical Approach to Interpretability00:01:37 Goodfire's Series B Fundraise Announcement00:02:04 Backgrounds of Mark and Myra from Goodfire00:02:51 Team Structure and Roles at Goodfire00:05:13 What is Interpretability? Definitions and Techniques00:05:30 Understanding Errors00:07:29 Post-training vs. Pre-training Interpretability Applications00:08:51 Using Interpretability to Remove Unwanted Behaviors00:10:09 Grokking, Double Descent, and Generalization in Models00:10:15 404 Not Found Explained00:12:06 Subliminal Learning and Hidden Biases in Models00:14:07 How Goodfire Chooses Research Directions and Projects00:15:00 Troubleshooting Errors00:16:04 Limitations of SAEs and Probes in Interpretability00:18:14 Rakuten Case Study: Production Deployment of Interpretability00:20:45 Conclusion00:21:12 Efficiency Benefits of Interpretability Techniques00:21:26 Live Demo: Real-Time Steering in a Trillion Parameter Model00:25:15 How Steering Features are Identified and Labeled00:26:51 Detecting and Mitigating Hallucinations Using Interpretability00:31:20 Equivalence of Activation Steering and Prompting00:34:06 Comparing Steering with Fine-Tuning and LoRA Techniques00:36:04 Model Design and the Future of Intentional AI Development00:38:09 Getting Started in Mechinterp: Resources, Programs, and Open Problems00:40:51 Industry Applications and the Rise of Mechinterp in Practice00:41:39 Interpretability for Code Models and Real-World Usage00:43:07 Making Steering Useful for More Than Stylistic Edits00:46:17 Applying Interpretability to Healthcare and Scientific Discovery00:49:15 Why Interpretability is Crucial in High-Stakes Domains like Healthcare00:52:03 Call for Design Partners Across Domains00:54:18 Interest in World Models and Visual Interpretability00:57:22 Sci-Fi Inspiration: Ted Chiang and Interpretability01:00:14 Interpretability, Safety, and Alignment Perspectives01:04:27 Weak-to-Strong Generalization and Future Alignment Challenges01:05:38 Final Thoughts and Hiring/Collaboration Opportunities at GoodfireTranscriptShawn Wang [00:00:05]: So welcome to the Latent Space pod. We're back in the studio with our special MechInterp co-host, Vibhu. Welcome. Mochi, Mochi's special co-host. And Mochi, the mechanistic interpretability doggo. We have with us Mark and Myra from Goodfire. Welcome. Thanks for having us on. Maybe we can sort of introduce Goodfire and then introduce you guys. How do you introduce Goodfire today?Myra Deng [00:00:29]: Yeah, it's a great question. So Goodfire, we like to say, is an AI research lab that focuses on using interpretability to understand, learn from, and design AI models. And we really believe that interpretability will unlock the new generation, next frontier of safe and powerful AI models. That's our description right now, and I'm excited to dive more into the work we're doing to make that happen.Shawn Wang [00:00:55]: Yeah. And there's always like the official description. Is there an understatement? Is there an unofficial one that sort of resonates more with a different audience?Mark Bissell [00:01:01]: Well, being an AI research lab that's focused on interpretability, there's obviously a lot of people have a lot that they think about when they think of interpretability. And I think we have a pretty broad definition of what that means and the types of places that can be applied. And in particular, applying it in production scenarios, in high stakes industries, and really taking it sort of from the research world into the real world. Which, you know. It's a new field, so that hasn't been done all that much. And we're excited about actually seeing that sort of put into practice.Shawn Wang [00:01:37]: Yeah, I would say it wasn't too long ago that Anthopic was like still putting out like toy models or superposition and that kind of stuff. And I wouldn't have pegged it to be this far along. When you and I talked at NeurIPS, you were talking a little bit about your production use cases and your customers. And then not to bury the lead, today we're also announcing the fundraise, your Series B. $150 million. $150 million at a 1.25B valuation. Congrats, Unicorn.Mark Bissell [00:02:02]: Thank you. Yeah, no, things move fast.Shawn Wang [00:02:04]: We were talking to you in December and already some big updates since then. Let's dive, I guess, into a bit of your backgrounds as well. Mark, you were at Palantir working on health stuff, which is really interesting because the Goodfire has some interesting like health use cases. I don't know how related they are in practice.Mark Bissell [00:02:22]: Yeah, not super related, but I don't know. It was helpful context to know what it's like. Just to work. Just to work with health systems and generally in that domain. Yeah.Shawn Wang [00:02:32]: And Mara, you were at Two Sigma, which actually I was also at Two Sigma back in the day. Wow, nice.Myra Deng [00:02:37]: Did we overlap at all?Shawn Wang [00:02:38]: No, this is when I was briefly a software engineer before I became a sort of developer relations person. And now you're head of product. What are your sort of respective roles, just to introduce people to like what all gets done in Goodfire?Mark Bissell [00:02:51]: Yeah, prior to Goodfire, I was at Palantir for about three years as a forward deployed engineer, now a hot term. Wasn't always that way. And as a technical lead on the health care team and at Goodfire, I'm a member of the technical staff. And honestly, that I think is about as specific as like as as I could describe myself because I've worked on a range of things. And, you know, it's it's a fun time to be at a team that's still reasonably small. I think when I joined one of the first like ten employees, now we're above 40, but still, it looks like there's always a mix of research and engineering and product and all of the above. That needs to get done. And I think everyone across the team is, you know, pretty, pretty switch hitter in the roles they do. So I think you've seen some of the stuff that I worked on related to image models, which was sort of like a research demo. More recently, I've been working on our scientific discovery team with some of our life sciences partners, but then also building out our core platform for more of like flexing some of the kind of MLE and developer skills as well.Shawn Wang [00:03:53]: Very generalist. And you also had like a very like a founding engineer type role.Myra Deng [00:03:58]: Yeah, yeah.Shawn Wang [00:03:59]: So I also started as I still am a member of technical staff, did a wide range of things from the very beginning, including like finding our office space and all of this, which is we both we both visited when you had that open house thing. It was really nice.Myra Deng [00:04:13]: Thank you. Thank you. Yeah. Plug to come visit our office.Shawn Wang [00:04:15]: It looked like it was like 200 people. It has room for 200 people. But you guys are like 10.Myra Deng [00:04:22]: For a while, it was very empty. But yeah, like like Mark, I spend. A lot of my time as as head of product, I think product is a bit of a weird role these days, but a lot of it is thinking about how do we take our frontier research and really apply it to the most important real world problems and how does that then translate into a platform that's repeatable or a product and working across, you know, the engineering and research teams to make that happen and also communicating to the world? Like, what is interpretability? What is it used for? What is it good for? Why is it so important? All of these things are part of my day-to-day as well.Shawn Wang [00:05:01]: I love like what is things because that's a very crisp like starting point for people like coming to a field. They all do a fun thing. Vibhu, why don't you want to try tackling what is interpretability and then they can correct us.Vibhu Sapra [00:05:13]: Okay, great. So I think like one, just to kick off, it's a very interesting role to be head of product, right? Because you guys, at least as a lab, you're more of an applied interp lab, right? Which is pretty different than just normal interp, like a lot of background research. But yeah. You guys actually ship an API to try these things. You have Ember, you have products around it, which not many do. Okay. What is interp? So basically you're trying to have an understanding of what's going on in model, like in the model, in the internal. So different approaches to do that. You can do probing, SAEs, transcoders, all this stuff. But basically you have an, you have a hypothesis. You have something that you want to learn about what's happening in a model internals. And then you're trying to solve that from there. You can do stuff like you can, you know, you can do activation mapping. You can try to do steering. There's a lot of stuff that you can do, but the key question is, you know, from input to output, we want to have a better understanding of what's happening and, you know, how can we, how can we adjust what's happening on the model internals? How'd I do?Mark Bissell [00:06:12]: That was really good. I think that was great. I think it's also a, it's kind of a minefield of a, if you ask 50 people who quote unquote work in interp, like what is interpretability, you'll probably get 50 different answers. And. Yeah. To some extent also like where, where good fire sits in the space. I think that we're an AI research company above all else. And interpretability is a, is a set of methods that we think are really useful and worth kind of specializing in, in order to accomplish the goals we want to accomplish. But I think we also sort of see some of the goals as even more broader as, as almost like the science of deep learning and just taking a not black box approach to kind of any part of the like AI development life cycle, whether that. That means using interp for like data curation while you're training your model or for understanding what happened during post-training or for the, you know, understanding activations and sort of internal representations, what is in there semantically. And then a lot of sort of exciting updates that were, you know, are sort of also part of the, the fundraise around bringing interpretability to training, which I don't think has been done all that much before. A lot of this stuff is sort of post-talk poking at models as opposed to. To actually using this to intentionally design them.Shawn Wang [00:07:29]: Is this post-training or pre-training or is that not a useful.Myra Deng [00:07:33]: Currently focused on post-training, but there's no reason the techniques wouldn't also work in pre-training.Shawn Wang [00:07:38]: Yeah. It seems like it would be more active, applicable post-training because basically I'm thinking like rollouts or like, you know, having different variations of a model that you can tweak with the, with your steering. Yeah.Myra Deng [00:07:50]: And I think in a lot of the news that you've seen in, in, on like Twitter or whatever, you've seen a lot of unintended. Side effects come out of post-training processes, you know, overly sycophantic models or models that exhibit strange reward hacking behavior. I think these are like extreme examples. There's also, you know, very, uh, mundane, more mundane, like enterprise use cases where, you know, they try to customize or post-train a model to do something and it learns some noise or it doesn't appropriately learn the target task. And a big question that we've always had is like, how do you use your understanding of what the model knows and what it's doing to actually guide the learning process?Shawn Wang [00:08:26]: Yeah, I mean, uh, you know, just to anchor this for people, uh, one of the biggest controversies of last year was 4.0 GlazeGate. I've never heard of GlazeGate. I didn't know that was what it was called. The other one, they called it that on the blog post and I was like, well, how did OpenAI call it? Like officially use that term. And I'm like, that's funny, but like, yeah, I guess it's the pitch that if they had worked a good fire, they wouldn't have avoided it. Like, you know what I'm saying?Myra Deng [00:08:51]: I think so. Yeah. Yeah.Mark Bissell [00:08:53]: I think that's certainly one of the use cases. I think. Yeah. Yeah. I think the reason why post-training is a place where this makes a lot of sense is a lot of what we're talking about is surgical edits. You know, you want to be able to have expert feedback, very surgically change how your model is doing, whether that is, you know, removing a certain behavior that it has. So, you know, one of the things that we've been looking at or is, is another like common area where you would want to make a somewhat surgical edit is some of the models that have say political bias. Like you look at Quen or, um, R1 and they have sort of like this CCP bias.Shawn Wang [00:09:27]: Is there a CCP vector?Mark Bissell [00:09:29]: Well, there's, there are certainly internal, yeah. Parts of the representation space where you can sort of see where that lives. Yeah. Um, and you want to kind of, you know, extract that piece out.Shawn Wang [00:09:40]: Well, I always say, you know, whenever you find a vector, a fun exercise is just like, make it very negative to see what the opposite of CCP is.Mark Bissell [00:09:47]: The super America, bald eagles flying everywhere. But yeah. So in general, like lots of post-training tasks where you'd want to be able to, to do that. Whether it's unlearning a certain behavior or, you know, some of the other kind of cases where this comes up is, are you familiar with like the, the grokking behavior? I mean, I know the machine learning term of grokking.Shawn Wang [00:10:09]: Yeah.Mark Bissell [00:10:09]: Sort of this like double descent idea of, of having a model that is able to learn a generalizing, a generalizing solution, as opposed to even if memorization of some task would suffice, you want it to learn the more general way of doing a thing. And so, you know, another. A way that you can think about having surgical access to a model's internals would be learn from this data, but learn in the right way. If there are many possible, you know, ways to, to do that. Can make interp solve the double descent problem?Shawn Wang [00:10:41]: Depends, I guess, on how you. Okay. So I, I, I viewed that double descent as a problem because then you're like, well, if the loss curves level out, then you're done, but maybe you're not done. Right. Right. But like, if you actually can interpret what is a generalizing or what you're doing. What is, what is still changing, even though the loss is not changing, then maybe you, you can actually not view it as a double descent problem. And actually you're just sort of translating the space in which you view loss and like, and then you have a smooth curve. Yeah.Mark Bissell [00:11:11]: I think that's certainly like the domain of, of problems that we're, that we're looking to get.Shawn Wang [00:11:15]: Yeah. To me, like double descent is like the biggest thing to like ML research where like, if you believe in scaling, then you don't need, you need to know where to scale. And. But if you believe in double descent, then you don't, you don't believe in anything where like anything levels off, like.Vibhu Sapra [00:11:30]: I mean, also tendentially there's like, okay, when you talk about the China vector, right. There's the subliminal learning work. It was from the anthropic fellows program where basically you can have hidden biases in a model. And as you distill down or, you know, as you train on distilled data, those biases always show up, even if like you explicitly try to not train on them. So, you know, it's just like another use case of. Okay. If we can interpret what's happening in post-training, you know, can we clear some of this? Can we even determine what's there? Because yeah, it's just like some worrying research that's out there that shows, you know, we really don't know what's going on.Mark Bissell [00:12:06]: That is. Yeah. I think that's the biggest sentiment that we're sort of hoping to tackle. Nobody knows what's going on. Right. Like subliminal learning is just an insane concept when you think about it. Right. Train a model on not even the logits, literally the output text of a bunch of random numbers. And now your model loves owls. And you see behaviors like that, that are just, they defy, they defy intuition. And, and there are mathematical explanations that you can get into, but. I mean.Shawn Wang [00:12:34]: It feels so early days. Objectively, there are a sequence of numbers that are more owl-like than others. There, there should be.Mark Bissell [00:12:40]: According to, according to certain models. Right. It's interesting. I think it only applies to models that were initialized from the same starting Z. Usually, yes.Shawn Wang [00:12:49]: But I mean, I think that's a, that's a cheat code because there's not enough compute. But like if you believe in like platonic representation, like probably it will transfer across different models as well. Oh, you think so?Mark Bissell [00:13:00]: I think of it more as a statistical artifact of models initialized from the same seed sort of. There's something that is like path dependent from that seed that might cause certain overlaps in the latent space and then sort of doing this distillation. Yeah. Like it pushes it towards having certain other tendencies.Vibhu Sapra [00:13:24]: Got it. I think there's like a bunch of these open-ended questions, right? Like you can't train in new stuff during the RL phase, right? RL only reorganizes weights and you can only do stuff that's somewhat there in your base model. You're not learning new stuff. You're just reordering chains and stuff. But okay. My broader question is when you guys work at an interp lab, how do you decide what to work on and what's kind of the thought process? Right. Because we can ramble for hours. Okay. I want to know this. I want to know that. But like, how do you concretely like, you know, what's the workflow? Okay. There's like approaches towards solving a problem, right? I can try prompting. I can look at chain of thought. I can train probes, SAEs. But how do you determine, you know, like, okay, is this going anywhere? Like, do we have set stuff? Just, you know, if you can help me with all that. Yeah.Myra Deng [00:14:07]: It's a really good question. I feel like we've always at the very beginning of the company thought about like, let's go and try to learn what isn't working in machine learning today. Whether that's talking to customers or talking to researchers at other labs, trying to understand both where the frontier is going and where things are really not falling apart today. And then developing a perspective on how we can push the frontier using interpretability methods. And so, you know, even our chief scientist, Tom, spends a lot of time talking to customers and trying to understand what real world problems are and then taking that back and trying to apply the current state of the art to those problems and then seeing where they fall down basically. And then using those failures or those shortcomings to understand what hills to climb when it comes to interpretability research. So like on the fundamental side, for instance, when we have done some work applying SAEs and probes, we've encountered, you know, some shortcomings in SAEs that we found a little bit surprising. And so have gone back to the drawing board and done work on that. And then, you know, we've done some work on better foundational interpreter models. And a lot of our team's research is focused on what is the next evolution beyond SAEs, for instance. And then when it comes to like control and design of models, you know, we tried steering with our first API and realized that it still fell short of black box techniques like prompting or fine tuning. And so went back to the drawing board and we're like, how do we make that not the case and how do we improve it beyond that? And one of our researchers, Ekdeep, who just joined is actually Ekdeep and Atticus are like steering experts and have spent a lot of time trying to figure out like, what is the research that enables us to actually do this in a much more powerful, robust way? So yeah, the answer is like, look at real world problems, try to translate that into a research agenda and then like hill climb on both of those at the same time.Shawn Wang [00:16:04]: Yeah. Mark has the steering CLI demo queued up, which we're going to go into in a sec. But I always want to double click on when you drop hints, like we found some problems with SAEs. Okay. What are they? You know, and then we can go into the demo. Yeah.Myra Deng [00:16:19]: I mean, I'm curious if you have more thoughts here as well, because you've done it in the healthcare domain. But I think like, for instance, when we do things like trying to detect behaviors within models that are harmful or like behaviors that a user might not want to have in their model. So hallucinations, for instance, harmful intent, PII, all of these things. We first tried using SAE probes for a lot of these tasks. So taking the feature activation space from SAEs and then training classifiers on top of that, and then seeing how well we can detect the properties that we might want to detect in model behavior. And we've seen in many cases that probes just trained on raw activations seem to perform better than SAE probes, which is a bit surprising if you think that SAEs are actually also capturing the concepts that you would want to capture cleanly and more surgically. And so that is an interesting observation. I don't think that is like, I'm not down on SAEs at all. I think there are many, many things they're useful for, but we have definitely run into cases where I think the concept space described by SAEs is not as clean and accurate as we would expect it to be for actual like real world downstream performance metrics.Mark Bissell [00:17:34]: Fair enough. Yeah. It's the blessing and the curse of unsupervised methods where you get to peek into the AI's mind. But sometimes you wish that you saw other things when you walked inside there. Although in the PII instance, I think weren't an SAE based approach actually did prove to be the most generalizable?Myra Deng [00:17:53]: It did work well in the case that we published with Rakuten. And I think a lot of the reasons it worked well was because we had a noisier data set. And so actually the blessing of unsupervised learning is that we actually got to get more meaningful, generalizable signal from SAEs when the data was noisy. But in other cases where we've had like good data sets, it hasn't been the case.Shawn Wang [00:18:14]: And just because you named Rakuten and I don't know if we'll get it another chance, like what is the overall, like what is Rakuten's usage or production usage? Yeah.Myra Deng [00:18:25]: So they are using us to essentially guardrail and inference time monitor their language model usage and their agent usage to detect things like PII so that they don't route private user information.Myra Deng [00:18:41]: And so that's, you know, going through all of their user queries every day. And that's something that we deployed with them a few months ago. And now we are actually exploring very early partnerships, not just with Rakuten, but with other people around how we can help with potentially training and customization use cases as well. Yeah.Shawn Wang [00:19:03]: And for those who don't know, like it's Rakuten is like, I think number one or number two e-commerce store in Japan. Yes. Yeah.Mark Bissell [00:19:10]: And I think that use case actually highlights a lot of like what it looks like to deploy things in practice that you don't always think about when you're doing sort of research tasks. So when you think about some of the stuff that came up there that's more complex than your idealized version of a problem, they were encountering things like synthetic to real transfer of methods. So they couldn't train probes, classifiers, things like that on actual customer data of PII. So what they had to do is use synthetic data sets. And then hope that that transfer is out of domain to real data sets. And so we can evaluate performance on the real data sets, but not train on customer PII. So that right off the bat is like a big challenge. You have multilingual requirements. So this needed to work for both English and Japanese text. Japanese text has all sorts of quirks, including tokenization behaviors that caused lots of bugs that caused us to be pulling our hair out. And then also a lot of tasks you'll see. You might make simplifying assumptions if you're sort of treating it as like the easiest version of the problem to just sort of get like general results where maybe you say you're classifying a sentence to say, does this contain PII? But the need that Rakuten had was token level classification so that you could precisely scrub out the PII. So as we learned more about the problem, you're sort of speaking about what that looks like in practice. Yeah. A lot of assumptions end up breaking. And that was just one instance where you. A problem that seems simple right off the bat ends up being more complex as you keep diving into it.Vibhu Sapra [00:20:41]: Excellent. One of the things that's also interesting with Interp is a lot of these methods are very efficient, right? So where you're just looking at a model's internals itself compared to a separate like guardrail, LLM as a judge, a separate model. One, you have to host it. Two, there's like a whole latency. So if you use like a big model, you have a second call. Some of the work around like self detection of hallucination, it's also deployed for efficiency, right? So if you have someone like Rakuten doing it in production live, you know, that's just another thing people should consider.Mark Bissell [00:21:12]: Yeah. And something like a probe is super lightweight. Yeah. It's no extra latency really. Excellent.Shawn Wang [00:21:17]: You have the steering demos lined up. So we were just kind of see what you got. I don't, I don't actually know if this is like the latest, latest or like alpha thing.Mark Bissell [00:21:26]: No, this is a pretty hacky demo from from a presentation that someone else on the team recently gave. So this will give a sense for, for technology. So you can see the steering and action. Honestly, I think the biggest thing that this highlights is that as we've been growing as a company and taking on kind of more and more ambitious versions of interpretability related problems, a lot of that comes to scaling up in various different forms. And so here you're going to see steering on a 1 trillion parameter model. This is Kimi K2. And so it's sort of fun that in addition to the research challenges, there are engineering challenges that we're now tackling. Cause for any of this to be sort of useful in production, you need to be thinking about what it looks like when you're using these methods on frontier models as opposed to sort of like toy kind of model organisms. So yeah, this was thrown together hastily, pretty fragile behind the scenes, but I think it's quite a fun demo. So screen sharing is on. So I've got two terminal sessions pulled up here. On the left is a forked version that we have of the Kimi CLI that we've got running to point at our custom hosted Kimi model. And then on the right is a set up that will allow us to steer on certain concepts. So I should be able to chat with Kimi over here. Tell it hello. This is running locally. So the CLI is running locally, but the Kimi server is running back to the office. Well, hopefully should be, um, that's too much to run on that Mac. Yeah. I think it's, uh, it takes a full, like each 100 node. I think it's like, you can. You can run it on eight GPUs, eight 100. So, so yeah, Kimi's running. We can ask it a prompt. It's got a forked version of our, uh, of the SG line code base that we've been working on. So I'm going to tell it, Hey, this SG line code base is slow. I think there's a bug. Can you try to figure it out? There's a big code base, so it'll, it'll spend some time doing this. And then on the right here, I'm going to initialize in real time. Some steering. Let's see here.Mark Bissell [00:23:33]: searching for any. Bugs. Feature ID 43205.Shawn Wang [00:23:38]: Yeah.Mark Bissell [00:23:38]: 20, 30, 40. So let me, uh, this is basically a feature that we found that inside Kimi seems to cause it to speak in Gen Z slang. And so on the left, it's still sort of thinking normally it might take, I don't know, 15 seconds for this to kick in, but then we're going to start hopefully seeing him do this code base is massive for real. So we're going to start. We're going to start seeing Kimi transition as the steering kicks in from normal Kimi to Gen Z Kimi and both in its chain of thought and its actual outputs.Mark Bissell [00:24:19]: And interestingly, you can see, you know, it's still able to call tools, uh, and stuff. It's um, it's purely sort of it's it's demeanor. And there are other features that we found for interesting things like concision. So that's more of a practical one. You can make it more concise. Um, the types of programs, uh, programming languages that uses, but yeah, as we're seeing it come in. Pretty good. Outputs.Shawn Wang [00:24:43]: Scheduler code is actually wild.Vibhu Sapra [00:24:46]: Yo, this code is actually insane, bro.Vibhu Sapra [00:24:53]: What's the process of training in SAE on this, or, you know, how do you label features? I know you guys put out a pretty cool blog post about, um, finding this like autonomous interp. Um, something. Something about how agents for interp is different than like coding agents. I don't know while this is spewing up, but how, how do we find feature 43, two Oh five. Yeah.Mark Bissell [00:25:15]: So in this case, um, we, our platform that we've been building out for a long time now supports all the sort of classic out of the box interp techniques that you might want to have like SAE training, probing things of that kind, I'd say the techniques for like vanilla SAEs are pretty well established now where. You take your model that you're interpreting, run a whole bunch of data through it, gather activations, and then yeah, pretty straightforward pipeline to train an SAE. There are a lot of different varieties. There's top KSAEs, batch top KSAEs, um, normal ReLU SAEs. And then once you have your sparse features to your point, assigning labels to them to actually understand that this is a gen Z feature, that's actually where a lot of the kind of magic happens. Yeah. And the most basic standard technique is look at all of your d input data set examples that cause this feature to fire most highly. And then you can usually pick out a pattern. So for this feature, If I've run a diverse enough data set through my model feature 43, two Oh five. Probably tends to fire on all the tokens that sounds like gen Z slang. You know, that's the, that's the time of year to be like, Oh, I'm in this, I'm in this Um, and, um, so, you know, you could have a human go through all 43,000 concepts andVibhu Sapra [00:26:34]: And I've got to ask the basic question, you know, can we get examples where it hallucinates, pass it through, see what feature activates for hallucinations? Can I just, you know, turn hallucination down?Myra Deng [00:26:51]: Oh, wow. You really predicted a project we're already working on right now, which is detecting hallucinations using interpretability techniques. And this is interesting because hallucinations is something that's very hard to detect. And it's like a kind of a hairy problem and something that black box methods really struggle with. Whereas like Gen Z, you could always train a simple classifier to detect that hallucinations is harder. But we've seen that models internally have some... Awareness of like uncertainty or some sort of like user pleasing behavior that leads to hallucinatory behavior. And so, yeah, we have a project that's trying to detect that accurately. And then also working on mitigating the hallucinatory behavior in the model itself as well.Shawn Wang [00:27:39]: Yeah, I would say most people are still at the level of like, oh, I would just turn temperature to zero and that turns off hallucination. And I'm like, well, that's a fundamental misunderstanding of how this works. Yeah.Mark Bissell [00:27:51]: Although, so part of what I like about that question is you, there are SAE based approaches that might like help you get at that. But oftentimes the beauty of SAEs and like we said, the curse is that they're unsupervised. So when you have a behavior that you deliberately would like to remove, and that's more of like a supervised task, often it is better to use something like probes and specifically target the thing that you're interested in reducing as opposed to sort of like hoping that when you fragment the latent space, one of the vectors that pops out.Vibhu Sapra [00:28:20]: And as much as we're training an autoencoder to be sparse, we're not like for sure certain that, you know, we will get something that just correlates to hallucination. You'll probably split that up into 20 other things and who knows what they'll be.Mark Bissell [00:28:36]: Of course. Right. Yeah. So there's no sort of problems with like feature splitting and feature absorption. And then there's the off target effects, right? Ideally, you would want to be very precise where if you reduce the hallucination feature, suddenly maybe your model can't write. Creatively anymore. And maybe you don't like that, but you want to still stop it from hallucinating facts and figures.Shawn Wang [00:28:55]: Good. So Vibhu has a paper to recommend there that we'll put in the show notes. But yeah, I mean, I guess just because your demo is done, any any other things that you want to highlight or any other interesting features you want to show?Mark Bissell [00:29:07]: I don't think so. Yeah. Like I said, this is a pretty small snippet. I think the main sort of point here that I think is exciting is that there's not a whole lot of inter being applied to models quite at this scale. You know, Anthropic certainly has some some. Research and yeah, other other teams as well. But it's it's nice to see these techniques, you know, being put into practice. I think not that long ago, the idea of real time steering of a trillion parameter model would have sounded.Shawn Wang [00:29:33]: Yeah. The fact that it's real time, like you started the thing and then you edited the steering vector.Vibhu Sapra [00:29:38]: I think it's it's an interesting one TBD of what the actual like production use case would be on that, like the real time editing. It's like that's the fun part of the demo, right? You can kind of see how this could be served behind an API, right? Like, yes, you're you only have so many knobs and you can just tweak it a bit more. And I don't know how it plays in. Like people haven't done that much with like, how does this work with or without prompting? Right. How does this work with fine tuning? Like, there's a whole hype of continual learning, right? So there's just so much to see. Like, is this another parameter? Like, is it like parameter? We just kind of leave it as a default. We don't use it. So I don't know. Maybe someone here wants to put out a guide on like how to use this with prompting when to do what?Mark Bissell [00:30:18]: Oh, well, I have a paper recommendation. I think you would love from Act Deep on our team, who is an amazing researcher, just can't say enough amazing things about Act Deep. But he actually has a paper that as well as some others from the team and elsewhere that go into the essentially equivalence of activation steering and in context learning and how those are from a he thinks of everything in a cognitive neuroscience Bayesian framework, but basically how you can precisely show how. Prompting in context, learning and steering exhibit similar behaviors and even like get quantitative about the like magnitude of steering you would need to do to induce a certain amount of behavior similar to certain prompting, even for things like jailbreaks and stuff. It's a really cool paper. Are you saying steering is less powerful than prompting? More like you can almost write a formula that tells you how to convert between the two of them.Myra Deng [00:31:20]: And so like formally equivalent actually in the in the limit. Right.Mark Bissell [00:31:24]: So like one case study of this is for jailbreaks there. I don't know. Have you seen the stuff where you can do like many shot jailbreaking? You like flood the context with examples of the behavior. And the topic put out that paper.Shawn Wang [00:31:38]: A lot of people were like, yeah, we've been doing this, guys.Mark Bissell [00:31:40]: Like, yeah, what's in this in context learning and activation steering equivalence paper is you can like predict the number. Number of examples that you will need to put in there in order to jailbreak the model. That's cool. By doing steering experiments and using this sort of like equivalence mapping. That's cool. That's really cool. It's very neat. Yeah.Shawn Wang [00:32:02]: I was going to say, like, you know, I can like back rationalize that this makes sense because, you know, what context is, is basically just, you know, it updates the KV cache kind of and like and then every next token inference is still like, you know, the sheer sum of everything all the way. It's plus all the context. It's up to date. And you could, I guess, theoretically steer that with you probably replace that with your steering. The only problem is steering typically is on one layer, maybe three layers like like you did. So it's like not exactly equivalent.Mark Bissell [00:32:33]: Right, right. There's sort of you need to get precise about, yeah, like how you sort of define steering and like what how you're modeling the setup. But yeah, I've got the paper pulled up here. Belief dynamics reveal the dual nature. Yeah. The title is Belief Dynamics Reveal the Dual Nature of Incompetence. And it's an exhibition of the practical context learning and activation steering. So Eric Bigelow, Dan Urgraft on the who are doing fellowships at Goodfire, Ekt Deep's the final author there.Myra Deng [00:32:59]: I think actually to your question of like, what is the production use case of steering? I think maybe if you just think like one level beyond steering as it is today. Like imagine if you could adapt your model to be, you know, an expert legal reasoner. Like in almost real time, like very quickly. efficiently using human feedback or using like your semantic understanding of what the model knows and where it knows that behavior. I think that while it's not clear what the product is at the end of the day, it's clearly very valuable. Thinking about like what's the next interface for model customization and adaptation is a really interesting problem for us. Like we have heard a lot of people actually interested in fine-tuning an RL for open weight models in production. And so people are using things like Tinker or kind of like open source libraries to do that, but it's still very difficult to get models fine-tuned and RL'd for exactly what you want them to do unless you're an expert at model training. And so that's like something we'reShawn Wang [00:34:06]: looking into. Yeah. I never thought so. Tinker from Thinking Machines famously uses rank one LoRa. Is that basically the same as steering? Like, you know, what's the comparison there?Mark Bissell [00:34:19]: Well, so in that case, you are still applying updates to the parameters, right?Shawn Wang [00:34:25]: Yeah. You're not touching a base model. You're touching an adapter. It's kind of, yeah.Mark Bissell [00:34:30]: Right. But I guess it still is like more in parameter space then. I guess it's maybe like, are you modifying the pipes or are you modifying the water flowing through the pipes to get what you're after? Yeah. Just maybe one way.Mark Bissell [00:34:44]: I like that analogy. That's my mental map of it at least, but it gets at this idea of model design and intentional design, which is something that we're, that we're very focused on. And just the fact that like, I hope that we look back at how we're currently training models and post-training models and just think what a primitive way of doing that right now. Like there's no intentionalityShawn Wang [00:35:06]: really in... It's just data, right? The only thing in control is what data we feed in.Mark Bissell [00:35:11]: So, so Dan from Goodfire likes to use this analogy of, you know, he has a couple of young kids and he talks about like, what if I could only teach my kids how to be good people by giving them cookies or like, you know, giving them a slap on the wrist if they do something wrong, like not telling them why it was wrong or like what they should have done differently or something like that. Just figure it out. Right. Exactly. So that's RL. Yeah. Right. And, and, you know, it's sample inefficient. There's, you know, what do they say? It's like slurping feedback. It's like, slurping supervision. Right. And so you'd like to get to the point where you can have experts giving feedback to their models that are, uh, internalized and, and, you know, steering is an inference time way of sort of getting that idea. But ideally you're moving to a world whereVibhu Sapra [00:36:04]: it is much more intentional design in perpetuity for these models. Okay. This is one of the questions we asked Emmanuel from Anthropic on the podcast a few months ago. Basically the question, was you're at a research lab that does model training, foundation models, and you're on an interp team. How does it tie back? Right? Like, does this, do ideas come from the pre-training team? Do they go back? Um, you know, so for those interested, you can, you can watch that. There wasn't too much of a connect there, but it's still something, you know, it's something they want toMark Bissell [00:36:33]: push for down the line. It can be useful for all of the above. Like there are certainly post-hocVibhu Sapra [00:36:39]: use cases where it doesn't need to touch that. I think the other thing a lot of people forget is this stuff isn't too computationally expensive, right? Like I would say, if you're interested in getting into research, MechInterp is one of the most approachable fields, right? A lot of this train an essay, train a probe, this stuff, like the budget for this one, there's already a lot done. There's a lot of open source work. You guys have done some too. Um, you know,Shawn Wang [00:37:04]: There's like notebooks from the Gemini team for Neil Nanda or like, this is how you do it. Just step through the notebook.Vibhu Sapra [00:37:09]: Even if you're like, not even technical with any of this, you can still make like progress. There, you can look at different activations, but, uh, if you do want to get into training, you know, training this stuff, correct me if I'm wrong is like in the thousands of dollars, not even like, it's not that high scale. And then same with like, you know, applying it, doing it for post-training or all this stuff is fairly cheap in scale of, okay. I want to get into like model training. I don't have compute for like, you know, pre-training stuff. So it's, it's a very nice field to get into. And also there's a lot of like open questions, right? Um, some of them have to go with, okay, I want a product. I want to solve this. Like there's also just a lot of open-ended stuff that people could work on. That's interesting. Right. I don't know if you guys have any calls for like, what's open questions, what's open work that you either open collaboration with, or like, you'd just like to see solved or just, you know, for people listening that want to get into McInturk because people always talk about it. What are, what are the things they should check out? Start, of course, you know, join you guys as well. I'm sure you're hiring.Myra Deng [00:38:09]: There's a paper, I think from, was it Lee, uh, Sharky? It's open problems and, uh, it's, it's a bit of interpretability, which I recommend everyone who's interested in the field. Read. I'm just like a really comprehensive overview of what are the things that experts in the field think are the most important problems to be solved. I also think to your point, it's been really, really inspiring to see, I think a lot of young people getting interested in interpretability, actually not just young people also like scientists to have been, you know, experts in physics for many years and in biology or things like this, um, transitioning into interp, because the barrier of, of what's now interp. So it's really cool to see a number to entry is, you know, in some ways low and there's a lot of information out there and ways to get started. There's this anecdote of like professors at universities saying that all of a sudden every incoming PhD student wants to study interpretability, which was not the case a few years ago. So it just goes to show how, I guess, like exciting the field is, how fast it's moving, how quick it is to get started and things like that.Mark Bissell [00:39:10]: And also just a very welcoming community. You know, there's an open source McInturk Slack channel. There are people are always posting questions and just folks in the space are always responsive if you ask things on various forums and stuff. But yeah, the open paper, open problems paper is a really good one.Myra Deng [00:39:28]: For other people who want to get started, I think, you know, MATS is a great program. What's the acronym for? Machine Learning and Alignment Theory Scholars? It's like the...Vibhu Sapra [00:39:40]: Normally summer internship style.Myra Deng [00:39:42]: Yeah, but they've been doing it year round now. And actually a lot of our full-time staff have come through that program or gone through that program. And it's great for anyone who is transitioning into interpretability. There's a couple other fellows programs. We do one as well as Anthropic. And so those are great places to get started if anyone is interested.Mark Bissell [00:40:03]: Also, I think been seen as a research field for a very long time. But I think engineering... I think engineers are sorely wanted for interpretability as well, especially at Goodfire, but elsewhere, as it does scale up.Shawn Wang [00:40:18]: I should mention that Lee actually works with you guys, right? And in the London office and I'm adding our first ever McInturk track at AI Europe because I see this industry applications now emerging. And I'm pretty excited to, you know, help push that along. Yeah, I was looking forward to that. It'll effectively be the first industry McInturk conference. Yeah. I'm so glad you added that. You know, it's still a little bit of a bet. It's not that widespread, but I can definitely see this is the time to really get into it. We want to be early on things.Mark Bissell [00:40:51]: For sure. And I think the field understands this, right? So at ICML, I think the title of the McInturk workshop this year was actionable interpretability. And there was a lot of discussion around bringing it to various domains. Everyone's adding pragmatic, actionable, whatever.Shawn Wang [00:41:10]: It's like, okay, well, we weren't actionable before, I guess. I don't know.Vibhu Sapra [00:41:13]: And I mean, like, just, you know, being in Europe, you see the Interp room. One, like old school conferences, like, I think they had a very tiny room till they got lucky and they got it doubled. But there's definitely a lot of interest, a lot of niche research. So you see a lot of research coming out of universities, students. We covered the paper last week. It's like two unknown authors, not many citations. But, you know, you can make a lot of meaningful work there. Yeah. Yeah. Yeah.Shawn Wang [00:41:39]: Yeah. I think people haven't really mentioned this yet. It's just Interp for code. I think it's like an abnormally important field. We haven't mentioned this yet. The conspiracy theory last two years ago was when the first SAE work came out of Anthropic was they would do like, oh, we just used SAEs to turn the bad code vector down and then turn up the good code. And I think like, isn't that the dream? Like, you know, like, but basically, I guess maybe, why is it funny? Like, it's... If it was realistic, it would not be funny. It would be like, no, actually, we should do this. But it's funny because we know there's like, we feel there's some limitations to what steering can do. And I think a lot of the public image of steering is like the Gen Z stuff. Like, oh, you can make it really love the Golden Gate Bridge, or you can make it speak like Gen Z. To like be a legal reasoner seems like a huge stretch. Yeah. And I don't know if that will get there this way. Yeah.Myra Deng [00:42:36]: I think, um, I will say we are announcing. Something very soon that I will not speak too much about. Um, but I think, yeah, this is like what we've run into again and again is like, we, we don't want to be in the world where steering is only useful for like stylistic things. That's definitely not, not what we're aiming for. But I think the types of interventions that you need to do to get to things like legal reasoning, um, are much more sophisticated and require breakthroughs in, in learning algorithms. And that's, um...Shawn Wang [00:43:07]: And is this an emergent property of scale as well?Myra Deng [00:43:10]: I think so. Yeah. I mean, I think scale definitely helps. I think scale allows you to learn a lot of information and, and reduce noise across, you know, large amounts of data. But I also think we think that there's ways to do things much more effectively, um, even, even at scale. So like actually learning exactly what you want from the data and not learning things that you do that you don't want exhibited in the data. So we're not like anti-scale, but we are also realizing that scale is not going to get us anywhere. It's not going to get us to the type of AI development that we want to be at in, in the future as these models get more powerful and get deployed in all these sorts of like mission critical contexts. Current life cycle of training and deploying and evaluations is, is to us like deeply broken and has opportunities to, to improve. So, um, more to come on that very, very soon.Mark Bissell [00:44:02]: And I think that that's a use basically, or maybe just like a proof point that these concepts do exist. Like if you can manipulate them in the precise best way, you can get the ideal combination of them that you desire. And steering is maybe the most coarse grained sort of peek at what that looks like. But I think it's evocative of what you could do if you had total surgical control over every concept, every parameter. Yeah, exactly.Myra Deng [00:44:30]: There were like bad code features. I've got it pulled up.Vibhu Sapra [00:44:33]: Yeah. Just coincidentally, as you guys are talking.Shawn Wang [00:44:35]: This is like, this is exactly.Vibhu Sapra [00:44:38]: There's like specifically a code error feature that activates and they show, you know, it's not, it's not typo detection. It's like, it's, it's typos in code. It's not typical typos. And, you know, you can, you can see it clearly activates where there's something wrong in code. And they have like malicious code, code error. They have a whole bunch of sub, you know, sub broken down little grain features. Yeah.Shawn Wang [00:45:02]: Yeah. So, so the, the rough intuition for me, the, why I talked about post-training was that, well, you just, you know, have a few different rollouts with all these things turned off and on and whatever. And then, you know, you can, that's, that's synthetic data you can kind of post-train on. Yeah.Vibhu Sapra [00:45:13]: And I think we make it sound easier than it is just saying, you know, they do the real hard work.Myra Deng [00:45:19]: I mean, you guys, you guys have the right idea. Exactly. Yeah. We replicated a lot of these features in, in our Lama models as well. I remember there was like.Vibhu Sapra [00:45:26]: And I think a lot of this stuff is open, right? Like, yeah, you guys opened yours. DeepMind has opened a lot of essays on Gemma. Even Anthropic has opened a lot of this. There's, there's a lot of resources that, you know, we can probably share of people that want to get involved.Shawn Wang [00:45:41]: Yeah. And special shout out to like Neuronpedia as well. Yes. Like, yeah, amazing piece of work to visualize those things.Myra Deng [00:45:49]: Yeah, exactly.Shawn Wang [00:45:50]: I guess I wanted to pivot a little bit on, onto the healthcare side, because I think that's a big use case for you guys. We haven't really talked about it yet. This is a bit of a crossover for me because we are, we are, we do have a separate science pod that we're starting up for AI, for AI for science, just because like, it's such a huge investment category and also I'm like less qualified to do it, but we actually have bio PhDs to cover that, which is great, but I need to just kind of recover, recap your work, maybe on the evil two stuff, but then, and then building forward.Mark Bissell [00:46:17]: Yeah, for sure. And maybe to frame up the conversation, I think another kind of interesting just lens on interpretability in general is a lot of the techniques that were described. are ways to solve the AI human interface problem. And it's sort of like bidirectional communication is the goal there. So what we've been talking about with intentional design of models and, you know, steering, but also more advanced techniques is having humans impart our desires and control into models and over models. And the reverse is also very interesting, especially as you get to superhuman models, whether that's narrow superintelligence, like these scientific models that work on genomics, data, medical imaging, things like that. But down the line, you know, superintelligence of other forms as well. What knowledge can the AIs teach us as sort of that, that the other direction in that? And so some of our life science work to date has been getting at exactly that question, which is, well, some of it does look like debugging these various life sciences models, understanding if they're actually performing well, on tasks, or if they're picking up on spurious correlations, for instance, genomics models, you would like to know whether they are sort of focusing on the biologically relevant things that you care about, or if it's using some simpler correlate, like the ancestry of the person that it's looking at. But then also in the instances where they are superhuman, and maybe they are understanding elements of the human genome that we don't have names for or specific, you know, yeah, discoveries that they've made that that we don't know about, that's, that's a big goal. And so we're already seeing that, right, we are partnered with organizations like Mayo Clinic, leading research health system in the United States, our Institute, as well as a startup called Prima Menta, which focuses on neurodegenerative disease. And in our partnership with them, we've used foundation models, they've been training and applied our interpretability techniques to find novel biomarkers for Alzheimer's disease. So I think this is just the tip of the iceberg. But it's, that's like a flavor of some of the things that we're working on.Shawn Wang [00:48:36]: Yeah, I think that's really fantastic. Obviously, we did the Chad Zuckerberg pod last year as well. And like, there's a plethora of these models coming out, because there's so much potential and research. And it's like, very interesting how it's basically the same as language models, but just with a different underlying data set. But it's like, it's the same exact techniques. Like, there's no change, basically.Mark Bissell [00:48:59]: Yeah. Well, and even in like other domains, right? Like, you know, robotics, I know, like a lot of the companies just use Gemma as like the like backbone, and then they like make it into a VLA that like takes these actions. It's, it's, it's transformers all the way down. So yeah.Vibhu Sapra [00:49:15]: Like we have Med Gemma now, right? Like this week, even there was Med Gemma 1.5. And they're training it on this stuff, like 3d scans, medical domain knowledge, and all that stuff, too. So there's a push from both sides. But I think the thing that, you know, one of the things about McInturpp is like, you're a little bit more cautious in some domains, right? So healthcare, mainly being one, like guardrails, understanding, you know, we're more risk adverse to something going wrong there. So even just from a basic understanding, like, if we're trusting these systems to make claims, we want to know why and what's going on.Myra Deng [00:49:51]: Yeah, I think there's totally a kind of like deployment bottleneck to actually using. foundation models for real patient usage or things like that. Like, say you're using a model for rare disease prediction, you probably want some explanation as to why your model predicted a certain outcome, and an interpretable explanation at that. So that's definitely a use case. But I also think like, being able to extract scientific information that no human knows to accelerate drug discovery and disease treatment and things like that actually is a really, really big unlock for science, like scientific discovery. And you've seen a lot of startups, like say that they're going to accelerate scientific discovery. And I feel like we actually are doing that through our interp techniques. And kind of like, almost by accident, like, I think we got reached out to very, very early on from these healthcare institutions. And none of us had healthcare.Shawn Wang [00:50:49]: How did they even hear of you? A podcast.Myra Deng [00:50:51]: Oh, okay. Yeah, podcast.Vibhu Sapra [00:50:53]: Okay, well, now's that time, you know.Myra Deng [00:50:55]: Everyone can call us.Shawn Wang [00:50:56]: Podcasts are the most important thing. Everyone should listen to podcasts.Myra Deng [00:50:59]: Yeah, they reached out. They were like, you know, we have these really smart models that we've trained, and we want to know what they're doing. And we were like, really early that time, like three months old, and it was a few of us. And we were like, oh, my God, we've never used these models. Let's figure it out. But it's also like, great proof that interp techniques scale pretty well across domains. We didn't really have to learn too much about.Shawn Wang [00:51:21]: Interp is a machine learning technique, machine learning skills everywhere, right? Yeah. And it's obviously, it's just like a general insight. Yeah. Probably to finance too, I think, which would be fun for our history. I don't know if you have anything to say there.Mark Bissell [00:51:34]: Yeah, well, just across the science. Like, we've also done work on material science. Yeah, it really runs the gamut.Vibhu Sapra [00:51:40]: Yeah. Awesome. And, you know, for those that should reach out, like, you're obviously experts in this, but like, is there a call out for people that you're looking to partner with, design partners, people to use your stuff outside of just, you know, the general developer that wants to. Plug and play steering stuff, like on the research side more so, like, are there ideal design partners, customers, stuff like that?Myra Deng [00:52:03]: Yeah, I can talk about maybe non-life sciences, and then I'm curious to hear from you on the life sciences side. But we're looking for design partners across many domains, language, anyone who's customizing language models or trying to push the frontier of code or reasoning models is really interesting to us. And then also interested in the frontier of modeling. There's a lot of models that work in, like, pixel space, as we call it. So if you're doing world models, video models, even robotics, where there's not a very clean natural language interface to interact with, I think we think that Interp can really help and are looking for a few partners in that space.Shawn Wang [00:52:43]: Just because you mentioned the keyword
As soybean growers head into the 2026 season with tight margins and continued low crop prices, watching every dollar spent on inputs matters. Phosphorus and potassium remain key nutrients for soybeans, but soil fertility research shows there’s a clear economic threshold where spending returns real value. On this episode of the RealAgriculture Soybean School, University... Read More
Watch the full episode with Dr. Tamanna C. here: https://youtu.be/6UkTx6tgW4ASupport this show http://supporter.acast.com/inspiredevolution. Hosted on Acast. See acast.com/privacy for more information.
Can science actually PROVE God exists? Dr. Antony Latham joins us to explore how the cosmos, consciousness, and the complexity of life all point to divine design—and why the Big Bang might be Christianity's best friend.In this episode of Remnant Radio, Joshua Lewis sits down with retired physician and author Dr. Antony Latham to tackle one of the most critical questions facing believers today: Does science contradict Christian faith, or does it actually confirm it? From his own journey as a teenage skeptic who lost faith studying evolution to becoming a Christian in Kenya and diving deep into biology, consciousness, and cosmology, Antony brings a unique perspective that bridges the gap between the lab and the sanctuary.What We Discuss:-The Big Bang & Biblical Creation -Fine-Tuning of the Universe-The Cambrian Explosion-Consciousness & the Soul -Moral Law & Objective Beauty -Miracles & an Open Universe Whether you're wrestling with doubts about faith and science, or you're looking for solid apologetic tools to strengthen your biblical worldview, this conversation will equip you with evidence-based answers rooted in both Scripture and scientific discovery. 0:00 - Introduction1:04 - From Skeptic to Scientist to Believer5:13 - Big Bang: Friend or Foe to Christian Faith?9:20 - Fine-Tuning Arguments & Cosmological Evidence12:17 - String Theory & Multiple Universe Objections14:03 - The Exquisite Precision of Universal Constants15:37 - God of the Gaps Argument Addressed17:04 - Origin of Life & Irreducible Complexity19:30 - Old Earth Creation & Genesis Interpretation22:56 - The Cambrian Explosion & Fossil Record28:03 - Reading Genesis 1 Poetically & Theologically31:17 - Consciousness & Evidence for the Immaterial Soul35:17 - Moral Objectivity, Beauty, & Free Will41:02 - Mind-Body Dualism & Christian Worldview43:03 - Taking Back Science for the KingdomABOUT THE GUEST:
In this AMA-style episode, Nathan takes on listener questions about whether fine-tuning is really on the way out, what emergent misalignment and weird generalization results tell us, and how to think about continual learning. He talks candidly about how he's personally preparing for AGI—from career choices and investing to what resilience steps he has and hasn't taken. The discussion also covers timelines for job disruption, whether UBI becomes inevitable, how to talk to kids and “normal people” about AI, and which safety approaches are most neglected. Sponsors: Blitzy: Blitzy is the autonomous code generation platform that ingests millions of lines of code to accelerate enterprise software development by up to 5x with premium, spec-driven output. Schedule a strategy session with their AI solutions consultants at https://blitzy.com MongoDB: Tired of database limitations and architectures that break when you scale? MongoDB is the database built for developers, by developers—ACID compliant, enterprise-ready, and fluent in AI—so you can start building faster at https://mongodb.com/build Serval: Serval uses AI-powered automations to cut IT help desk tickets by more than 50%, freeing your team from repetitive tasks like password resets and onboarding. Book your free pilot and guarantee 50% help desk automation by week four at https://serval.com/cognitive Tasklet: Tasklet is an AI agent that automates your work 24/7; just describe what you want in plain English and it gets the job done. Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai CHAPTERS: (00:00) Ernie cancer update (04:57) Is fine-tuning dead (Part 1) (12:31) Sponsors: Blitzy | MongoDB (14:57) Is fine-tuning dead (Part 2) (Part 1) (26:56) Sponsors: Serval | Tasklet (29:15) Is fine-tuning dead (Part 2) (Part 2) (29:16) Continual learning cautions (34:59) Talking to normal people (39:30) Personal risk preparation (49:59) Investing around AI safety (01:00:39) Early childhood AI literacy (01:08:55) Work disruption timelines (01:27:58) Nonprofits, need, and UBI (01:34:53) Benchmarks, AGI, and embodiment (01:47:30) AI tooling and platforms (01:57:01) Discourse norms and shaming (02:05:50) Location and safety funding (02:15:17) Turpentine deal and independence (02:24:19) Outro PRODUCED BY: https://aipodcast.ing
You might already have heard that the laws that govern our universe are finely tuned to allow for our existence. But beneath the special numbers of the universe lies an even deeper mystery: the laws of nature themselves. On today's ID The Future, join host Brian Miller as he begins a two-part conversation with physicist Aaron Zimmer and mathematician Ellie Feder, hosts of the Physics to God podcast, as they discuss their new work arguing for an intelligent cause based on the qualitative structure of reality's rules. The dream of finding a unique, logically necessary "theory of everything" has failed, which leaves an intriguing question: Why these specific laws? Zimmer and Feder explain why fundamental forces like gravity and complex systems like quantum mechanics are uniquely designed to produce a complex universe featuring atoms, molecules, stars, and life. The new argument focuses on the fundamental qualitative structure of the laws of nature, rather than the finely tuned quantities. Zimmer and Feder argue that these laws are not logically necessary, debunking the idea that a unique "theory of everything" could explain them. Instead, the laws are uniquely designed to produce a complex universe. This is Part 1 of a two-part conversation. Source
In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows.They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI's fine-tuning and RFT APIs to shape model behavior with their own data.Highlights from the conversation include: • How OpenAI balances a horizontal API platform with vertical products like ChatGPT• The evolution from Codex to the Composer model• Why usage-based pricing works and where outcome-based pricing breaks• What the Harmonic Labs and Rockset acquisitions added to OpenAI's agent work• Why the new agent builder is deterministic, node based, and not free roaming Resources: Follow Sherwin on X: https://x.com/sherwinwu Follow Martin on X: https://x.com/martin_casado Stay Updated:If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.