POPULARITY
Categories
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Anthropic dropped Claude Opus 4.6 and OpenAI responded with GPT 5.3 Codex just 20 minutes later — the most intense head-to-head model release we've ever seen. Here's what each model brings, how they compare, and what the first reactions are telling us. In the headlines: Google and Amazon share their capex plans, and we're about to spend 2.5 moon landings on AI. Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRackspace AI Launchpad - Build, test and scale intelligent workloads faster - http://rackspace.com/ailaunchpadZencoder - From vibe coding to AI-first engineering - http://zencoder.ai/zenflowOptimizely Agents in Action - Join the virtual event (with me!) free March 4 - https://www.optimizely.com/insights/agents-in-action/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefSection - Build an AI workforce at scale - https://www.sectionai.com/LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
I sit down with Morgan Linton, Cofounder/CTO of Bold Metrics, to break down the same-day release of Claude Opus 4.6 and GPT-5.3 Codex. We walk through exactly how to set up Opus 4.6 in Claude Code, explore the philosophical split between autonomous agent teams and interactive pair-programming, and then put both models to the test by having each one build a Polymarket competitor from scratch, live and unscripted. By the end, you'll know how to configure each model, when to reach for one over the other, and what happened when we let them race head-to-head. Timestamps 00:00 – Intro 03:26 – Setting Up Opus 4.6 in Claude Code 05:16 – Enabling Agent Teams 08:32 – The Philosophical Divergence between Codex and Opus 11:11 – Core Feature Comparison (Context Window, Benchmarks, Agentic Behavior) 15:27 – Live Demo Setup: Polymarket Build Prompt Design 18:26 – Race Begins 21:02 – Best Model for Vibe Coders 22:12 – Codex Finishes in Under 4 Minutes 26:38 – Opus Agents Still Running, Token Usage Climbing 31:41 – Testing and Reviewing the Codex Build 40:25 – Opus Build Completes, First Look at Results 42:47 – Opus Final Build Reveal 44:22 – Side-by-Side Comparison: Opus Takes This Round 45:40 – Final Takeaways and Recommendations Key Points Opus 4.6 and GPT-5.3 Codex dropped within 18 minutes of each other and represent two fundamentally different engineering philosophies — autonomous agents vs. interactive collaboration. To use Opus 4.6 properly, you must update Claude Code to version 2.1.32+, set the model in settings.json, and explicitly enable the experimental Agent Teams feature. Opus 4.6's standout feature is multi-agent orchestration: you can spin up parallel agents for research, architecture, UX, and testing — all working simultaneously. GPT-5.3 Codex's standout feature is mid-task steering: you can interrupt, redirect, and course-correct the model while it's actively building. In the live head-to-head, Codex finished a Polymarket competitor in under 4 minutes; Opus took significantly longer but produced a more polished UI, richer feature set, and 96 tests vs. Codex's 10. Agent teams multiply token usage substantially — a single Opus build can consume 150,000–250,000 tokens across all agents. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ Morgan Linton X/Twitter: https://x.com/morganlinton Bold Metrics: https://boldmetrics.com Personal Website: https://linton.ai
Join Simtheory: https://simtheory.aiRegister for the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80It's the model same-day showdown of 2026. Opus 4.6 and Codex 5.3 dropped within minutes of each other, and we're breaking down what this means for the future of AI work. In this episode, we unpack Opus 4.6's million-token context window (if you've got billies in the bank), why Codex's pricing makes it nearly impossible to ignore for agentic loops, and the real cost of running agents for 24 hours ($10K, apparently). We dive deep into why coding-optimized models are secretly crushing it at non-coding tasks, the mental fatigue of managing AI workers, and whether the chatbot era is actually fading or just evolving. Plus: Chris accidentally books three real pig grooming appointments, we debate whether you need a "life coach agent" to manage your agent swarm, and yes – there's an Opus 4.6 diss track that goes unreasonably hard.CHAPTERS:0:00 Intro - Opus 4.6 Diss Track Preview0:09 The Model Same-Day Showdown: Opus 4.6 vs Codex 5.30:50 Opus 4.6 Breakdown: Million Token Context & Premium Pricing2:31 Token Bill Shock: $10K Research Bills & Extended Context Costs5:04 Codex Pricing: Why It's Nearly Free for Agentic Loops6:42 Why Coding Models Are Secretly Crushing Non-Coding Tasks10:14 Tool Fatigue: Too Many Models, Too Many Workflows12:47 Opus 4.6 First Impressions: "Solid" and "Faultless"13:48 Chris Accidentally Books Three Real Pig Grooming Appointments16:01 Unix Tools & Why Code-Optimized Models Win at Everything19:59 The Agentic Retraining Imperative: Chat to Delegation22:16 Agent Swarms & The Master Thread Architecture24:51 OpenAI vs Anthropic: The Enterprise Battle27:09 Corporate Espionage 2.0: Stealing Skills & The Open Source Threat31:19 The UX Problem: Why Delegation Isn't Solved Yet34:24 The Stress of Hyper-Productivity & Managing Agent Swarms37:07 Coordination: The Next Layer of Abstraction40:09 The Fantasy vs Reality of Autonomous AI Businesses44:37 Is the Turn-by-Turn Chatbot Era Actually Fading?49:23 Tokens as Spice: Turning Compute Into Money52:08 Reduce Cognitive Overload: The Real Goal of AI55:07 Still Relevant Tour Announcement55:39 BONUS: Full Opus 4.6 Diss TrackThanks for listening. Like & Sub. Links below for the Still Relevant Tour signup and Simtheory. The model wars are heating up, and your token bill is about to get interesting. xoxo
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
In this episode, we explore Anthropic's new Opus 4.6 and its 'agent teams' feature, alongside OpenAI's competing GPT 5.3 Codex, highlighting the intense rivalry in the AI development space. We also discuss OpenAI's new enterprise platform, Frontier, and how these advancements are changing the AI landscape for developers and other professionals.Chapters00:00 Anthropic Opus 4.6 Release02:02 Agent Teams and Context Windows06:39 Claude's SaaS Integration09:02 OpenAI's GPT 5.3 Codex and Frontier LinksGet the top 40+ AI Models for $20 at AI Box: https://aibox.aiAI Chat YouTube Channel: https://www.youtube.com/@JaedenSchaferJoin my AI Hustle Community: https://www.skool.com/aihustle
As SpaceX acquires xAI, all of Elon's companies under one umbrella is pretty much an inevitability isn't it? What is with this weird game of chicken that Nvidia and OpenAI and now Oracle are all engaging in all of the sudden. And self-driving cars are about to be ubiquitous, indication number 37. Musk's SpaceX Combines With xAI at $1.25 Trillion Valuation (Bloomberg) SpaceX acquires xAI, plans to launch a massive satellite constellation to power it (ArsTechnica) OpenAI's Codex just got its own Mac app - and anyone can try it for free now (ZDNet) Exclusive: OpenAI is unsatisfied with some Nvidia chips and looking for alternatives, sources say (Reuters) Waymo Raises $16 Billion From Alphabet and Others to Expand (Bloomberg) Learn more about your ad choices. Visit megaphone.fm/adchoices
The AI Breakdown: Daily Artificial Intelligence News and Discussions
Two stories were too big to squeeze into the headlines, so this episode goes deep on both. First, the surprise merger of xAI and SpaceX and what Elon Musk's vision of orbital data centers says about the future of AI compute, capital intensity, and sci-fi-scale ambition. Then, a close look at OpenAI's new Codex desktop app and why it signals a real shift from models competing on raw capability to products competing on how humans actually orchestrate agents at scale. Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsRackspace AI Launchpad - Build, test and scale intelligent workloads faster - http://rackspace.com/ailaunchpadZencoder - From vibe coding to AI-first engineering - http://zencoder.ai/zenflowOptimizely Opal - The agent orchestration platform build for marketers - https://www.optimizely.com/theaidailybriefAssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefSection - Build an AI workforce at scale - https://www.sectionai.com/LandfallIP - AI to Navigate the Patent Process - https://landfallip.com/Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? sponsors@aidailybrief.ai
In this Masterclass episode of Skin Anarchy, Dr. Ekta Yadav welcomes back Dr. Barbara Paldus of Codex Labs for a compelling deep dive into integrative dermatology—a next-generation approach that treats skin not as a surface problem, but as a window into whole-body health.This conversation reframes chronic skin issues through a systems lens, exploring how the gut, brain, immune system, hormones, and microbiomes continuously communicate—and how disruptions in one system can ripple outward to the skin. Instead of asking “How do we suppress this flare?” Dr. Paldus challenges us to ask “Why is the skin signaling distress in the first place?”You'll hear how stress, inflammation, microbiome imbalance, and intestinal permeability can quietly drive conditions like acne, eczema, rosacea, and psoriasis—often long before symptoms appear on the skin. The episode unpacks why conventional symptom-focused treatments sometimes fail to deliver lasting relief, and how root-cause care can create more durable, meaningful outcomes.Dr. Paldus also introduces a personalized teledermatology model that leverages biological data, at-home testing, and longitudinal tracking to design interventions tailored to each patient's unique internal landscape. From hormone-driven breakouts to chronic eczema and immune dysregulation, the discussion highlights how precision diagnostics can guide smarter, more sustainable treatment strategies.At its core, this episode invites listeners to rethink skincare entirely—not as a cycle of stronger products or short-term fixes, but as a long-term investment in systemic resilience.If you're curious about the future of dermatology, microbiome science, and why healing skin often starts far beyond the mirror, this episode offers a powerful preview of what root-cause, science-led care can look like.Listen to the full episode to explore how integrative dermatology is reshaping chronic skin care from the inside out.SHOP CODEXDon't forget to subscribe to Skin Anarchy on Apple Podcasts, Spotify, or your preferred platform.Reach out to us through email with any questions.Sign up for our newsletter!Shop all our episodes and products mentioned through our ShopMy Shelf!*This is a sponsored collaborationSupport the show
AI coding agents are rapidly reshaping how software is built, reviewed, and maintained. As large language model capabilities continue to increase, the bottleneck in software development is shifting away from code generation toward planning, review, deployment, and coordination. This shift is driving a new class of agentic systems that operate inside constrained environments, reason over The post OpenAI and Codex with Thibault Sottiaux and Ed Bayes appeared first on Software Engineering Daily.
This episode features Jerry Tworek, a key architect behind OpenAI's breakthrough reasoning models (o1, o3) and Codex, discussing the current state and future of AI. Jerry explores the real limits and promise of scaling pre-training and reinforcement learning, arguing that while these paradigms deliver predictable improvements, they're fundamentally constrained by data availability and struggle with generalization beyond their training objectives. He reveals his updated belief that continual learning—the ability for models to update themselves based on failure and work through problems autonomously—is necessary for AGI, as current models hit walls and become "hopeless" when stuck. Jerry discusses the convergence of major labs toward similar approaches driven by economic forces, the tension between exploration and exploitation in research, and why he left OpenAI to pursue new research directions. He offers candid insights on the competitive dynamics between labs, the focus required to win in specific domains like coding, what makes great AI researchers, and his surprisingly near-term predictions for robotics (2-3 years) while warning about the societal implications of widespread work automation that we're not adequately preparing for. (0:00) Intro(1:26) Scaling Paradigms in AI(3:36) Challenges in Reinforcement Learning(11:48) AGI Timelines(18:36) Converging Labs(25:05) Jerry's Departure from OpenAI(31:18) Pivotal Decisions in OpenAI's Journey(35:06) Balancing Research and Product Development(38:42) The Future of AI Coding(41:33) Specialization vs. Generalization in AI(48:47) Hiring and Building Research Teams(55:21) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint
I sit down with Alex Finn to break down how he sets up Moltbot (formally Clawdbot) as a proactive AI employee he treats like a teammate named Henry. We walk through the core workflow: Henry sends a daily morning brief, researches while Alex sleeps, and ships work as pull requests for review. Alex explains the setup that makes this work; feeding the bot deep personal and business context, then setting clear expectations for proactive behavior. We cover model strategy (Opus as “brain,” Codex as “muscle”), a “Mission Control” task tracker Henry built, hardware options, and the security mindset around prompt injection and account access. Timestamps 00:00 – Intro 02:08 – Clawdbot Overview 03:33 – The Morning Brief Workflow 05:01 - Proactive Builds: Trends → Features → Pull Requests 07:27 – The Setup: Context + Expectations For Proactivity 09:38 – The Onboarding Prompt Alex Uses 12:05 – Hunting “Unknown Unknowns” For Real Leverage 12:43 – Using the right Models for cost control 14:18 – Mission Control: A Kanban Tracker Henry Built 17:16 – The future of Human and AI workflow 22:01 – Hardware And Hosting: Cloud vs Local (Mac Mini/Studio) 25:47 – The Productivity Framework 27:10 – The Possible Evolution of Clawdbot 28:53 – Security and Privacy Concerns 33:38 – Closing Thoughts: Tinkering, Opportunity, And Next Steps Key Points I get the most leverage when I treat the agent like a proactive teammate with clear expectations and rich context. Henry delivers compounding value by shipping work for review (pull requests) based on trend monitoring and conversation memory. I separate “brain” and “muscle” by delegating heavy coding to Codex while using Opus for reasoning and direction. I track autonomous work with a dedicated “Mission Control” board so progress stays visible over time. I keep risk contained by controlling environment and account access, especially around email and prompt injection. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ FIND ALEX ON SOCIAL Youtube: https://www.youtube.com/@AlexFinnOfficial/videos X/Twitter: https://x.com/AlexFinnX Creator Buddy: https://www.creatorbuddy.io/
Join Christina Warren and Brett Terpstra as they navigate the freezing Minnesotan cold without running water, delve into the intersection of tech and political turmoil, and explore the latest in AI agents and multi-agent workflows. Dive into a whirlwind of emotions, tech tips, and political ranting, all while contemplating the ethics of open source funding and AI coding. From brutal weather updates to philosophical debates on modern fascism, this episode pulls no punches. Sponsor Copilot Money can help you take control of your finances. Get a fresh start with your money for 2026 with 2 months free when you visit try.copilot.money/overtired. Show Links Crimethinc: Being “Peaceful” and “Law-Abiding” Will Not Stop Authoritarianism Gas Town Apex OpenCode Backdrop Cindori Sensei Moltbot Chapters 00:00 Introduction and Host Updates 00:21 Brett’s Water Crisis 02:27 Political Climate and Media Suppression 06:32 Police Violence and Public Response 18:31 Social Media and Surveillance 22:15 Sponsor Break: Copilot Money 26:20 Tech Talk: Gas Town and AI Agents 31:58 Crypto Controversies 37:09 Ethics in Journalism and Personal Dilemmas 39:45 The Future of Open Source and Cryptocurrency 45:03 Apex 1.0? 48:25 Challenges and Innovations in Markdown Processing 01:02:16 AI in Coding and Personal Assistants 01:06:36 GrAPPtitude 01:14:40 Conclusion and Upcoming Plans Join the Conversation Merch Come chat on Discord! Twitter/ovrtrd Instagram/ovrtrd Youtube Get the Newsletter Thanks! You’re downloading today’s show from CacheFly’s network BackBeat Media Podcast Network Check out more episodes at overtiredpod.com and subscribe on Apple Podcasts, Spotify, or your favorite podcast app. Find Brett as @ttscoff, Christina as @film_girl, Jeff as @jsguntzel, and follow Overtired at @ovrtrd on Twitter. Transcript AI Agents and Political Chaos Introduction and Host Updates Christina: [00:00:00] Welcome back. You’re listening to Overtired. I’m Christina Warren. Joined as always by Brett Terpstra. Jeff Severns. Guntzel could not be with us this week, um, but uh, but Brett and I are here. So Brett, how are you? How’s the cold? Brett: The cold. Brett’s Water Crisis Brett: So I’m going on day four without running water. Um, I drove to my parents last night to shower and we’re, we’re driving loads of dishes to friends’ house to wash them. We have big buckets of melted snow in our bathtub that we use to flush the Toyland. Um, and we have like big jugs with a spout on them for drinking water. So we’re surviving, but it is highly inconvenient. Um, and we don’t know yet if it’s a frozen pipe. Or if we have [00:01:00] a bad pump on our, well, uh, hopefully we’ll find that out today. But no guarantees because all the plumbers are very busy right now with negative 30 degree weather. They tend to get a lot of calls, lots of stuff happens. Um, so yeah, but I’m, I’m staying warm. I got a fireplace, I got my heat’s working Christina: I mean, that’s the important thing. Brett: and that went out, that went out twice, in, twice already. This winter, our heat has gone out, um, which I’m thankful. We, we finally, we added glycol to our, so our heat pumps water through, like, it’s not radiators, it’s like baseboard heat, but it, it uses water and. Um, and though we were getting like frozen spots, not burst pipes, just enough that the water wouldn’t go through fast enough to heat anything. So we added glycol to that [00:02:00] system to bring the freeze point down to like zero degrees. So it’s not perfect, but we also hardwired the pump so that it always circulates water, um, even when the heat’s not running. So hopefully it’ll never freeze again. That’s the goal. Um, and if we replace the well pump, that should be good for another 20 years. So hopefully after this things will be smoother. Political Climate and Media Suppression Brett: Um, yeah, but that, that’s all in addition to, you know, my state being occupied by federal agents and even in my small town, we’ve got people being like, abducted. Things are escalating quickly at this point, and a lot of it doesn’t get talked about on mainstream media. Um, but yeah, things, I don’t know, man. I think we’re making progress because, um, apparently Binos [00:03:00] getting retired Christina: I was going to say, I, I, I, I heard, I heard that, and I don’t know if that’s good or if that’s bad. Um, I can’t, I can’t tell. Brett: it’s, it’s like, it’s like if Trump died, we wouldn’t know if that was good or bad because JD Vance as president, like maybe things get way worse. Who knows? Uh, none of these, none of these actual figureheads are the solution. Removing them isn’t the solution to removing the kinda maga philosophy behind it. But yeah, and that’s also Jeff is, you know, highly involved and I, I won’t, I won’t talk about that for him. I hope we can get him monsoon to talk about that. Christina: No, me, me, me too. Because I’ve, I’ve been thinking about, about him and about you and about your whole area, your communities, you know, from several thousand miles away. Like all, all we, all we see is either what people post online, which of course now is being suppressed. [00:04:00] Uh, thanks a lot. You know, like, like the, oh, TikTok was gonna be so terrible. Chi the, the Chinese are gonna take over our, uh, our algorithms. Right? No, Larry Ellison is, is actually going to completely, you know, fuck up the algorithms, um, and, and suppress anything. I, yeah. Yeah. They’re, they’re Brett: is TikTok? Well, ’cause Victor was telling me that, they were seeing videos. Uh, you would see one frame of the video and then it would black out. And it all seemed to be videos that were negative towards the administration and we weren’t sure. Is this a glitch? Is this coincidence? Christina: well, they claim it’s a glitch, but I don’t believe it. Brett: Yeah, it seems, it seems Christina: I, I mean, I mean, I mean, the thing is like, maybe it is, maybe it is a glitch and we’re overreacting. I don’t know. Um, all I know is that they’ve given us absolutely zero reason to trust them, and so I don’t, and so, um, uh, apparently the, the state of California, this is, [00:05:00] so we are recording this on Tuesday morning. Apparently the state of California has said that they are going to look into whether things are being, you know, suppressed or not, and if that’s violating California law, um, because now that, that, that TikTok is, is controlled by an American entity, um, even if it is, you know, owned by like a, you know, uh, evil, uh, billionaire, you know, uh, crony sto fuck you, Larry Ellison. Um, uh, I guess that means we won’t be getting an Oracle sponsorship. Sorry. Um, uh, Brett: take it anyway. Christina: I, I know you wouldn’t, I know you wouldn’t. That’s why I felt safe saying that. Um, but, uh, but even if, if, if that were the case, like I, you know, but apparently like now that it is like a, you know, kind of, you know, state based like US thing, like California could step in and potentially make things difficult for them. I mean, I think that’s probably a lot of bluster on Newsom’s part. I don’t think that he could really, honestly achieve any sort of change if they are doing things to the algorithm. Brett: Yeah. Uh, [00:06:00] if, if laws even matter anymore, it would be something that got tied up in court for a long time Christina: Right. Which effectively wouldn’t matter. Right. And, and then that opens up a lot of other interesting, um, things about like, okay, well, you know, should we, like what, what is the role? Like even for algorithmically determined things of the government to even step in or whatever, right now, obviously does, I think, become like more of a speech issue if it’s government speech that’s being suppressed, but regardless, it, it is just, it’s bad. So I’ve been, I’ve been thinking about you, I’ve been thinking about Jeff. Police Violence and Public Response Christina: Um, you know, we all saw what happened over the weekend and, and, you know, people be, people are being murdered in the streets and I mean that, that, that’s what’s happening. And, Brett: white people no less, Christina: Right. Well, I mean, that’s the thing, right? Like, is that like, but, but, but they keep moving the bar. They, they keep moving the goalpost, right? So first it’s a white woman and, oh, she, she was, she was running over. The, the officer [00:07:00] or the ice guy, and it’s like, no, she wasn’t, but, but, but that, that’s immediately where they go and, and she’s, you know, radical whatever and, and, and a terrorist and this and that. Okay. Then you have a literal veterans affair nurse, right? Like somebody who literally, like, you know, has, has worked with, with, with combat veterans and has done those things. Who, um, is stepping in to help someone who’s being pepper sprayed, you know, is, is just observing. And because he happens to have, um, a, a, a, a gun on him legally, which he’s allowed to do, um, they immediately used that as cover to execute him. But if he hadn’t had the gun, they would’ve, they would’ve come up with something else. Oh, we thought he had a gun, and they, you know what I mean? So like, they, they got lucky with that one because they removed the method, the, the, the weapon and then shot him 10 times. You know, they literally executed him in the street. But if he hadn’t had a gun, they still would’ve executed. Brett: Yeah, no, for sure. Um, it’s really frustrating that [00:08:00] they took the gun away. So he was disarmed and, and immobilized and then they shot him. Um, like so that’s just a straight up execution. And then to bring, like, to say that it, he, because he had a gun, he was dangerous, is such a, an affront to America has spent so long fighting against gun control and saying that we had the right to carry fucking assault rifles in the Christina: Kyle Rittenhouse. Kyle Rittenhouse was literally acquitted. Right? Brett: Yeah. And he killed people. Christina: and, and he killed people. He was literally walking around little fucking stogey, you know, little blubbering little bitch, like, you know, crying, you know, he’s like carrying around like Rambo a gun and literally snipe shooting people. That’s okay. Brett: They defended Christina: if you have a. They defended him. Of course they did. Right? Of course they did. Oh, well he has the right to carry and this and that, and Oh, you should be able to be armed in [00:09:00] these places. Oh, no, but, but if you’re, um, somebody that we don’t like Brett: Yeah, Christina: and you have a concealed carry permit, and I don’t even know if he was really concealed. Right. Because I think that if you have it on your holster, I don’t even think that counts as concealed to Brett: was supposedly in Christina: I, I, I don’t, I don’t, I don’t. Brett: like it Christina: Which I don’t think counts as concealed. I think. Brett: No. Christina: Right, right. So, so, so, so, so that, that, that wouldn’t be concealed. Be because you have someone in, in that situation, then all of a sudden, oh, no. Now, now the, the key, the goalpost, okay, well, it’s fine if it’s, you know, uh, police we don’t like, or, or other people. And, and, and if you’re going after protesters, then you can shoot and kill whoever you want, um, because you’ve perceived a threat and you can take actions into your, to your own hands. Um, but now if you are even a white person, um, even, you know, someone who’s, who’s worked in Veterans Affairs, whatever, if, if you have, uh, even if you’re like a, a, a, you know, a, a gun owner and, and have permits, um, now [00:10:00] if we don’t like you and you are anywhere in the vicinity of anybody associated with law enforcement, now they have the right to shoot you dead. Like that’s, that’s, that’s the argument, which is insanity. Brett: so I’m, I’m just gonna point out that as the third right came to power, they disarmed the Jews and they disarmed the anarchists and the socialists and they armed the rest of the population and it became, um, gun control for people they didn’t like. Um, and this is, it’s just straight up the same playbook. There’s no, there’s no differentiation anymore. Christina: No, it, it, it actively makes me angry that, um, I, I could be, because, ’cause what can we do? And, and what they’re counting on is the fact that we’re all tired and we’re all kind of, you know, like just, [00:11:00] you know, from, from what happened, you know, six years ago and, and, and what happened, you know, five years ago. Um, and, and, and various things. I think a lot of people are, are just. It kind of like Brett: Sure. Christina: done with, with, with being able to, to, to, right. But now the actual fascism is here, right? Like, like we, we, we saw a, a, you know, a whiff of this on, on, on January 6th, but now it’s actual fascism and they control every branch of government. Brett: Yeah. Christina: And, um, and, and, and I, and I don’t know what we’re supposed to do, right? Like, I mean it, because I mean, you know, uh, Philadelphia is, is, is begging for, for, for them to come. And I think that would be an interesting kind of standoff. Seattle is this, this is what a friend of mine said was like, you know, you know Philadelphia, Filch Philadelphia is begging them to come. Seattle is like scared. Um, that, that they’re going to come, um, because honestly, like we’re a bunch of little bitch babies and, um, [00:12:00] people think they’re like, oh, you know the WTO. I’m like, yeah, that was, that was 27 years ago. Um, uh, I, I don’t think that Seattle has the juice to hold that sort of line again. Um, but I also don’t wanna find out, right? Like, but, but, but this is, this is the attack thing. It’s like, okay, why are they in Minnesota? Right? They’re what, like 130,000, um, Brett: exactly Christina: um, immigrants in, in Minnesota. There are, there are however many million in Texas, however many million in Florida. We know exactly why, right? This isn’t about. Anything more than Brett: in any way. Christina: and opt. Right, right. It has nothing, it has nothing to do with, with, with immigration anyway. I mean, even, even the Wall Street Journal. The Wall Street Journal who a, you know, ran an op-ed basically saying get out of Minnesota. They also, they also had like a, you know, a news story, which was not from the opinion board, which like broke down the, the, the footage showing, you know, that like the, the video footage doesn’t match the administration’s claims, but they also ran a story. Um, that [00:13:00] basically did the math, I guess, on like the number of, of criminals, um, or people with criminal records who have been deported. And at this point, like in, you know, and, and when things started out, like, I guess when the raid started out, the, the majority of the people that they were kind of going after were people who had criminal records. Now, whether they were really violent, the worst, the worst, I mean that’s, I’m, I’m not gonna get into that, but you could at least say like, they, they could at least say, oh, well these were people who had criminal records, whatever. Now some, some huge percentage, I think it’s close to 80% don’t have anything. And many of the people that do the, the criminal like thing that they would hold would be, you know, some sort of visa violation. Right. So it’s, it’s, it’s Brett: they deported a five-year-old kid after using him as bait to try to get the rest of his family. Christina: as bait. Brett: Yeah. And like it’s, it’s pretty deplorable. But I will say I am proud of Minnesota. Um, they have not backed [00:14:00] down. They have stood up in the face of increasing increasingly escalated attacks, and they have shown up in force thousands of people out in the streets. Like Conti, like last night they had a, um, well, yeah, I mean, it’s been ongoing, but, uh, what’s his name? Preddy Alex. Um, at the place where he was shot, they had a, like continuing kind of memorial protest, I guess, and there’s footage of like a thousand, a thousand mins surrounding about 50, um, ICE agents and. Like basically corralling them to the point where they were all backed into a corner and weren’t moving. And I don’t know what happened after that. Um, but thus far it hasn’t been violent on the part of protesters. It’s been very violent on the part of ice. I [00:15:00] personally, I don’t know where I stand on, like, I feel like the Democrats are urging pacifism because it affects their hold on power. And I don’t necessarily think that peace when they’re murdering us in the street. I don’t know if peace is the right response, but I don’t know. I’m not openly declaring that I support violence at this point, but. At the same time, do I not? I’m not sure. Like I keep going back and forth on is it time for a war or do we try to vote our way out of this? Christina: I mean, well, and the scary thing about voting our way out of this is will we even be able to have free elections, right? Be because they’re using any sort of anything, even the most benign sort of legal [00:16:00] protest, even if violence isn’t involved in all of a sudden, talks of the Insurrection Act come Brett: yeah. And Trump, Trump offered to pull out of Minnesota if Minnesota will turn over its voter database to the federal government. Like that’s just blatant, like that’s obviously the end goal is suppression. Christina: Right, right. And, and so to your point, I don’t know. Right. And I’m, I’m never somebody who would wanna advocate outwardly for violence, but I, I, I, I, I don’t know. I mean, they’re killing citizens in the streets. They’re assassinating people in cold blood. They’re executing people, right. That’s what they’re doing. They’re literally executing people in the streets and then covering it up in real time. Brett: if the argument is, if we are violent, it will cause them to kill us. They’re already killing Christina: already doing it. Right. So at, at this point, I mean, like, you know, I mean, like, w to your point, wars have been started for, for, for less, or for the exact same things. Brett: [00:17:00] Yeah. Christina: So, I don’t know. I don’t know. Um, I know that that’s a depressing way to probably do mental health corner and whatnot, but this is what’s happening in our world right now and in and in your community, and it’s, it’s terrifying. Brett: I’m going to link in the show notes an article from Crime Think that was written by, uh, people in Germany who have studied, um, both historical fascism and the current rise of the A FD, which will soon be the most powerful party in Germany, um, which is straight up a Nazi party. Um, and it, they offered, like their hope right now lies in America stopping fascism. Christina: Yeah. Brett: Like if we can, if we can stop fascism, then they believe the rest of Europe can stop fascism. Um, but like they, it, it’s a good article. It kind of, it kind of broaches the same questions I do about like, is it [00:18:00] time for violence? And they offer, like, we don’t, we’re not advocating for a civil war, but like Civil wars might. If you, if you, if you broach them as revolutions, it’s kind of, they’re kind of the same thing in cases like this. So anyway, I’ll, I’ll link that for anyone who wants to read kinda what’s going on in my head. I’m making a note to dig that up. I, uh, I love Crime Fake Oh and Blue Sky. Social Media and Surveillance Brett: Um, so I have not, up until very recently been an avid Blue Sky user. Um, I think I have like, I think I have maybe like 200 followers there and I follow like 50 people. But I’ve been expanding that and I am getting a ton of my news from Blue Sky and like to get stories from people on the ground, like news as it happens, unfiltered and Blue Sky has been [00:19:00] really good for that. Um, I, it’s. There’s not like an algorithm. I just get my stuff and like Macedon, I have a much larger following and I follow a lot more people, but it’s very tech, Christina: It’s very tech and, Brett: there for. Christina: well, and, and MAs on, um, understandably too is also European, um, in a lot of regards. And so it’s just, it’s not. Gonna have the same amount of, of people who are gonna be able to, at least for instances like this, like be on the ground and doing real-time stuff. It’s not, it doesn’t have like the more normy stuff. So, no, that makes sense. Um, no, that’s great. I think, yeah, blue Sky’s been been really good for, for these sorts of real-time events because again, they don’t have an algorithm. Like you can have one, like for a personalized kind of like for you feed or whatever, but in terms of what you see, you know, you see it naturally. You’re not seeing it being adjusted by anything, which can be good and bad. I, I think is good because nothing’s suppressing things and you see things in real time. It can be bad because sometimes you miss things, but I think on the whole, it’s better. [00:20:00] The only thing I will say, just to anyone listening and, and just to spread onto, you know, people in your communities too, from what I’ve observed from others, like, it does seem like the, the government and other sorts of, you know, uh, uh, the, you know, bodies like that are finally starting to pay more attention to blue sky in terms of monitoring things. And so that’s not to say don’t. You know, use it at all. But the same way, you don’t make threats on Twitter if you don’t want the Feds to show up at your house. Don’t make threats on Blue Sky, because it’s not just a little microcosm where, you know, no one will see it. People are, it, it’s still small, but it’s, it’s getting bigger to the point that like when people look at like where some of the, the, the fire hose, you know, things observable things are there, there seem to be more and more of them located in the Washington DC area, which could just be because data centers are there, who knows? But I’ve also just seen anecdotally, like people who have had, like other instances, it’s like, don’t, don’t think [00:21:00] that like, oh, okay, well, you know, no one’s monitoring this. Um, of course people are so just don’t be dumb, don’t, don’t say things that could potentially get you in trouble. Um. Brett: a political candidate in Florida. Um, had the cops show up at her house and read her one of her Facebook posts. I mean, this was local. This was local cops, but still, yeah, you Christina: right. Well, yeah, that’s the thing, right? No, totally. And, and my, my only point with that is we’ve known that they do that for Facebook and for, for, you know, Twitter and, and, uh, you know, Instagram and things like that, but they, but Blue Sky, like, I don’t know if it’s on background checks yet, but it, uh, like for, uh, for jobs and things like that, I, I, I don’t know if that’s happening, but it definitely is at that point where, um, I know that people are starting to monitor those things. So just, you know, uh, not even saying for you per se, but just for anybody out there, like, it’s awesome and I’m so glad that like, that’s where people can get information out, but don’t be like [00:22:00] lulled into this false sense of security. Like, oh, well they’re not gonna monitor this. They’re not Brett: Nobody’s watching me here. Christina: It is like, no, they are, they are. Um, so especially as it becomes, you know, more prominent. So I’m, I’m glad that that’s. That’s an option there too. Um, okay. Sponsor Break: Copilot Money Christina: This is like the worst possible segue ever, but should we go ahead and segue to our, our, our sponsor break? Brett: Let’s do it. Let’s, let’s talk about capitalism. Christina: All right. This episode is brought to you by copilot money. Copilot money is not just another finance app. It’s your personal finance partner designed to help you feel clear, calm, and in control of your money. Whether it’s tracking your spending, saving for specific goals, or simply getting the handle on your investments. Copilot money has you covered as we enter the new year. Clarity and control over our finances has never been more important with the recent shutdown of Mint and rising financial stress, for many consumers are looking for a modern, trustworthy tool to help navigate their financial journeys. That’s where copilot money comes in. [00:23:00] With this beautifully designed app, you can see all your bank accounts, spending, savings and goals and investments all in one place. Imagine easily tracking everything without the clutter of chaotic spreadsheets or outdated tools. It’s a practical way to start 2026 with a fresh financial outlook. And here’s the exciting part. As of December 15th, copilot money is now available on the web so you can manage your finances on any device that you choose. Plus, it offers a seamless experience that keeps your data secure with a privacy first approach, when you sign up using our link, you’ll get two months for free. So visit, try. Copilot money slash Overtired to get started with features like automatic subscription tracking so you never miss a renewal date and customizable savings goals to help you stay on track. Copilot money empowers you to take charge of your financial life with confidence. So why wait Start 2026 with clarity and purpose. Download copilot money on your devices or visit. Try copilot money slash [00:24:00] overti today to claim you’re two months free and embrace a more organized, stress-free approach to your finances. Try copilot.money/ Overtired. Brett: Awesome that I appreciate this segue. ’cause we, we, we could, we could be talking about other things. Um, like it’s, it feels so weird, like when I go on social media and I just want to post that like my water’s out. It feels out of place right now because there’s everything that’s going on feels so much more important than, Christina: Right. Brett: than anything else. Um, but there’s still a place for living our lives, um, Christina: there are a absolutely. I mean, and, and, and in a certain extent, like not to, I mean, maybe this is a little bit of a cope, but it’s like, if all we do is focus on the things that we can’t control at the expense of everything else, it’s like then they win. You know? Like, which, which isn’t, which, which isn’t even to [00:25:00] say, like, don’t talk about what’s happening. Don’t try to help, don’t try to speak out and, and, um, and do what we can do, but also. Like as individuals, there’s very little we can control about things. And being completely, you know, subsumed by that is, is not necessarily good either. Um, so yeah, there’s, there, there are other things going on and it’s important for us to get out of our heads. It’s important, especially for you, you know, being in the region, I think to be able to, to focus on other things and, and hopefully your water will be back soon. ’cause that sucks like that. I’ve been, I’ve been worried about you. I’m glad that you have heat. I’m glad you have internet. I’m glad you have power, but you know, the pipes being frozen and all that stuff is like, not Brett: it, the, the internet has also been down for up to six hours at a time. I don’t know why. There’s like an amplifier down on our street. Um, and that has sucked because I, out here, I live in a, I’m not gonna call it rural. Uh, we’re like five minutes from town, [00:26:00] but, um, we, we don’t. We have shitty internet. Like I pay for a gigabit and I get 500 megabits and it’s, and it’s up and down all the time and I hate it. But anyway. Tech Talk: Gas Town and AI Agents Brett: Let’s talk about, uh, let’s talk about Gas Town. What can you tell me about Gastown? Christina: Okay. So we’ve talked a lot about like AI agents and, um, kind of like, uh, coding, um, loops and, and things like that. And so Gastown, uh, which is available, um, at, I, it is not Gas Town. Let me find the URL, um, one second. It’s, it’s at a gas town. No, it’s not. Lemme find it. Um. Right. So this is a thing that, that Steve Yy, uh, has created, and [00:27:00] it is a multi-agent workspace manager. And so the idea is basically that you can be running like a lot of instances of, um, of, of Claude Code or, um, I guess you could use Codex. You could use, uh, uh, uh, co-pilot, um, SDK or CLI agent and whatnot. Um, and basically what it’s designed to do is to basically let you coordinate like multiple coding agents at one time so they can all be working on different tasks, but then instead of having, um, like the context get lost when agents restart, it creates like a, a persistent, um, like. Work state, which it uses with, with git on the backend, which is supposed to basically enable more multi-agent workflows. So, um, basically the idea would be like, you get, have multiple agents working at once, kind of talking to one another, handing things off, you know, each doing their own task and then coordinating the work with what the other ones are doing. But then you have like a persistent, um, uh, I guess kind of like, you know, layer in the backend so that if an agent has to restart or whatever, it’s not gonna lose the, [00:28:00] the context, um, that that’s happening. And you don’t have to manually, um, worry about things like, okay, you know, I’ve lost certain things in memory and, and I’ve, you know, don’t know how I’m, I’m managing all these things together. Um, there, there’s another project, uh, called Ralph, which is kind of based on this, this concept of like, what of Ralph Wickham was, you know, coding or, or was doing kind of a loop. And, and it’s, it’s, it’s a, it’s kind of a similar idea. Um, there’s also. Brett: my nose wouldn’t bleed so much if I just kept my finger out of there. Christina: Exactly, exactly. My cat’s breath smells like cat food. Um, and um, and so. Like there are ideas of like Ralph Loops and Gastown. And so these are a couple of like projects, um, that have really started to, uh, take over. So like, uh, Ralph is more of an autonomous AI agent loop that basically like it runs like over and over and over again until, uh, a task is done. Um, and, and a lot of people use, use Gastown and, [00:29:00] and, and Ralph together. Um, but yeah, no Ga gastown is is pretty cool. Um, we’ll we’re gonna talk about it more ’cause it’s my pick of the week. We’ll talk about Molt bot previously known as Claude Bot, which is, uses some, some similar ideas. But it’s really been interesting to see like how, like the, the multi-agent workflow, and by multi-agent, I mean like, people are running like 20 or 30 of them, you know, at a time. So it’s more than that, um, is really starting to become a thing that people can, uh, can do. Um, Brett: gets expensive though. Christina: I was, I was just about to say that’s the one thing, right? Most people who are using things like Gastown. Are using them with the Claude, um, code Max plans, which is $200 a month. And those plans do give you more value than like, what the, what it would be if you spent $200 in API credits, uh, but $200 a month. Like that’s not an expensive, that’s, you know, that, that’s, that, that, like, you know what I mean? Like, like that, that, that, that, that, that’s a lot of money to spend on these sorts of things. Um, but people [00:30:00] are getting good results out of it. It’s pretty cool. Um. There have been some open models, which of course, most people don’t have equipment that would be fast enough for them to, to run, uh, to be able to kind of do what they would want, um, reliably. But the, the AgTech stuff coming to some of the open models is better. And so if these things can continue, of course now we’re in a ram crisis and storage crisis and everything else, so who knows when the hardware will get good enough again, and we can, when we as consumers can even reasonably get things ourselves. But, but in, in theory, you know, if, if these sorts of things continue, I could see like a, a world where like, you know, some of the WAN models and some of the other things, uh, potentially, um, or Quinn models rather, um, could, uh. Be things that you could conceivably, like be running on your own equipment to run these sorts of nonstop ag agentic loops. But yeah, right now, like it’s really freaking cool and I’ve played around with it because I’m fortunate enough to have access to a lot of tokens. [00:31:00] Um, but yeah, I can get expensive real, real fast. Uh, but, but it’s still, it’s still pretty awesome. Brett: I do appreciate that. So, guest Town, the name is a reference to Mad Max and in the kind of, uh, vernacular that they built for things like background agents and I, uh, there’s a whole bunch, there are different levels of, of the interface that they kind of extrapolated on the gas town kind of metaphor for. Uh, I, it was, it, it, there were some interesting naming conventions and then they totally went in other directions with some of the names. It, they didn’t keep the theme very well, but, but still, uh, I appreciate Ralph Wig and Mad Max. That’s. It’s at the very least, it’s interesting. Christina: No, it definitely is. It definitely is. Crypto Controversies Christina: I will say that there’s been like a little bit [00:32:00] of a kerfuffle, uh, involved in both of those, uh, developers because, um, they’re both now promoting shit coins and, uh, and so that’s sort of an interesting thing. Um, basically there’s like this, this, this crypto company called bags that I guess apparently like if people want to, they will create crypto coins for popular open source projects, and then they will designate someone to, I guess get the, the gas fees, um, in, um, uh, a Solana parlance, uh, no pun intended, with the gas town, um, where basically like that’s, you know, like the, the, the fees that you spend to have the transaction work off of the blockchain, right? Like, especially if there’s. A lot of times that it would take, like, you pay a certain percentage of something and like those fees could be designated to an individual. And, um, in this case, like both of these guys were reached out to when basically they were like, Hey, this coin exists. You’ve got all this money just kind of sitting in a crypto wallet waiting for you. [00:33:00] Take the money, get, get the, the transaction fees, so to speak. And, uh, I mean, I think that, that, that’s, if you wanna take that money right, it’s, it’s there for you. I’m not gonna certainly judge anyone for that. What I will judge you for is if you then promote your shit coin to your community and basically kind of encourage everyone. To kind of buy into it. Maybe you put in the caveat, oh, this isn’t financial advice. Oh, this is all just for whatever. But, but you’re trying to do that and then you go one step beyond, which I think is actually pretty dumb, which is to be like, okay, well, ’cause like, here’s the thing, I’m not gonna judge anyone. If someone who’s like, Hey, here’s a wallet that we’re gonna give you, and it has real cash in it, and you can do whatever you want with it, and these are the transaction fees, so to speak, like, you know, the gas fees, whatever, you know what you do. You, even if you wanna let your audience know that you’ve done that, and maybe you’re promoting that, maybe some people will buy into it, like, people are adults. Fine. Where, where I do like side eye a little bit is if you are, then for whatever reason [00:34:00] going to be like, oh, I’m gonna take my fees and I’m gonna reinvest it in the coin. Like, okay, you are literally sitting on top of the pyramid, like you could not be in a better position and now you’re, but right. And now you’re literally like paying into the pyramid scheme. It’s like, this is not going to work well for you. These are rug bulls. Um, and so like the, the, the, the gas town coin like dropped like massively. The Ralph coin like dropped massively, like after the, the, the Ralph creator, I think he took out like 300 K or something and people, or, you know, sold like 300 K worth of coins. And people were like, oh, he’s pulling a rug pull. And I’m like, well, A, what did you expect? But B it’s like, this is why don’t, like, if someone’s gonna give you free money from something that’s, you know, kind of scammy, like, I’m not saying don’t take the money. I am saying maybe be smart enough to not to reinvest it into the scam. Brett: Yeah. Christina: Like, I don’t know. Anyway, that’s the only thing I will mention on that. ’cause I don’t think that that takes [00:35:00] anything away from either of those projects or it says that you shouldn’t use or play around with it either of those ideas at all. But that is just a thing that’s happened in the last couple of weeks too, where it’s like, oh, and now there’s like crypto, you know, the crypto people are trying to get kind of involved with these projects and, um, I, I think that that’s, uh, okay. You know, um, like I said, I’m, I’m not gonna judge anybody for taking free money that, that somebody is gonna offer them. I will judge you if you’re gonna try to then, you know, try to like, promote that to your audience and try to be like, oh, this is a great way where we, where you can help me and we can all get rich. It’s like, no, there are, if you really wanna support creators, like there are things like GitHub sponsors and there are like other methods that you can, you can do that, that don’t involve making financial risks on shit coins. Brett: I wish anything I made could be popular enough that I could do something that’s stupid. Yeah. Like [00:36:00] I, I, I, I’m not gonna pull a rug pull on anyone, but the chances that I’ll ever make $300,000 on anything I’m working on, it’s pretty slim. Christina: Yeah, but at the same time, like if you, if you did, if you were in that position, like, I don’t know, I mean, I guess that’d be a thing that you would have to kind of figure out, um, yourself would be like, okay, I have access to this amount of money. Am I going to try to, you know, go all in and, and maybe go full grift to get even more? Some, something tells me that like your own personal ethics would probably preclude you from that. Brett: I, um, I have spent, what, um, how old am I? 47. I, I’ve been, since I started blogging in like 1999, 2000, um, I have always adhered to a very strict code and like turning down sponsors. I didn’t agree with [00:37:00] not doing anything that would be shady. Not taking, not, not taking money from anyone I was writing about. Ethics in Journalism and Personal Dilemmas Brett: Like, it’s been, it’s a pain in the ass to try to be truly ethical, but I feel like I’ve done it for 30 some years and, and I don’t know, I wouldn’t change it. I’m not rich. I’ll never be rich. But yeah, I think ethics are important, especially if you’re in any kind of journalism. Christina: Yeah, if you’re in any sort of journalism. I think so, and I think like how people wanna define those things, I think it’s up to them. And, and like I said, like I’m not gonna even necessarily like, like judge people like for, because I, I don’t know personally like what my situation would be like. Like if somebody was like, Christina, here’s a wallet that has the equivalent of $300,000 in it and it’s just sitting here and we’re not even asking you to do anything with this. I would probably take the money. I’m not gonna lie, I don’t, I don’t, I don’t [00:38:00] know if I would promote it or anything and I maybe I would feel compelled to disclose, Hey, Brett: That is Christina: wallet belongs to me. Brett: money though. Christina: I, I, right. I, I, I might, I might be, I might feel compelled to com to, to disclose, Hey, someone created this coin in this thing. They created the foam grow coin and they are giving me, you know, the, the, the gas fees and I have accepted Brett: could be, I’d feel like you could do it if you were transparent enough about it. Christina: Yeah, I mean, I, I, I think where I draw the line is when you then go from like, because again, it’s fine if you wanna take it. It’s then when you are a. Reinvesting the free money into the coin, which I think is just idiotic. Like, I think that’s just actually dumb. Um, like I just, I just do like, that just seems like you are literally, like I said, you’re at the top of the pyramid and you’re literally like volunteering to get into the bottom again. Um, and, or, or b like if you do that and then you try to rationalize in some way, oh, well, you know, I think [00:39:00] that this could be a great thing for everybody to, you know, I get rich, you know, you could get rich, we could all get money out of this because this is the future of, you know, creator economy or whatever. It’s like, no, it’s not. This is gambling. Um, and, and, and, and you could make the argument to me, and I’d probably be persuaded to be like, this isn’t that different from poly market or any of the other sorts of things. But you know what? I don’t do those things either. And I wouldn’t promote those things to any audience that I had either. Um, but if somebody wanted to give me free money. I probably wouldn’t turn it down. I’m not gonna pretend that my ethics are, are that strong. Uh, I just don’t know if I would, if I would, uh, go on the other end and be like, okay, to the Moom, everyone let, let’s all go in on the crypto stuff. It’s like, okay, The Future of Open Source and Cryptocurrency Brett: So is this the future of open source is, ’cause I mean like open source has survived for decades as like a concept and it’s never been terribly profitable. But a [00:40:00] lot of large companies have invested in open source, and I guess at this point, like most of the big open source projects are either run by a corporation or by a foundation. Um, that are independently financed, but for a project like Gastown, like is it the future? Is this, is this something people are gonna start doing to like, kind of make open source profitable? Christina: I mean, maybe, I don’t know. I think the problem though is that it’s not necessarily predictable, right? And, and not to say that like normal donations or, or support methods are predictable, but at least that could be a thing where you’re like, they’re not, but, but, but it’s not volatile to the extent where you’re like, okay, I’m basing, you know, like my income based on how well this shit coin that someone else controls the supply of someone else, you know, uh, uh, created someone else, you know, burned, so to speak, somebody else’s is going to be, uh, [00:41:00] controlling and, and has other things and could be responsible for, you know, big seismic like market movements like that I think is very different, um, than anything else. And so, I don’t know. I mean, I, I think that they, what I do expect that we’ll see more of is more and more popular projects, things that go viral, especially around ai. Probably being approached or people like proactively creating coins around those things. And there have been some, um, developers who’ve already, you know, stood up oddly and been like, if you see anybody trying to create a coin around this, it is not associated with me. I won’t be associated with any of it. I won’t do it. Right. Uh, and I think that becomes a problem where you’re like, okay, if these things do become popular, then that becomes like another risk if you don’t wanna be involved in it. If you’re involved with a, with a popular project, right? Like the, like the, like the creator of MPM Isaac, like, I think there’s like an MPM coin now, and that, that he’s, you know, like involved in and it’s like, you know, again, he didn’t create it, but he is happy to promote it. He’s happy to take the money. I’m like, look, I’m happy for [00:42:00] Isaac to get money from NPMI am at the same time, you know, bun, which is basically like, you know, the, you know, replacement for, for Node and NPM in a lot of ways, they sold to Anthropic for. I guarantee you a fuck load more money than whatever Isaac is gonna make off of some MPM shitcoin. So, so like, it, it’s all a lottery and it’s not sustainable. But I also feel like for a lot of open source projects, and this isn’t like me saying that the people shouldn’t get paid for the work, quite the contrary. But I think if you go into it with the expectation of I’m going to be able to make a sustainable living off of something, like when you start a project, I think that that is not necessarily going to set you up for, I think that those expectations are misaligned with what reality might be, which again, isn’t to say that you shouldn’t get paid for your work, it’s just that the reason that we give back and the reason we contribute open source is to try to be part of like the, the greater good and to make things more available to everyone. Not to be [00:43:00] like, oh, I can, you know, quit my job. Like, that would be wonderful. I, I wish that more and more people could do that. And I give to a lot of, um, open source projects on, on a monthly basis or on an annual basis. Um, Brett: I, I give basically all the money that’s given to me for my open source projects I distribute among other open source projects. So it’s a, it’s a, it’s a wash for me, but yeah, I am, I, I pay, you know, five, 10 bucks a month to 20 different projects and yeah. Christina: Yeah. I mean, I think it’s important, but, but I, I don’t know. I, I, I hope that it’s not the future. I’m not mad, I think like if that’s a way where people can make, you know, a, a, an income. But I do, I guess worry the sense that like, if, if, if, I don’t want that to be, the reason why somebody would start an open source project is because they’re like, oh, I, I can get rich on a crypto thing. Right? Like, ’cause that that’s the exact wrong Brett: that’s not open source. That’s not the open source philosophy. Christina: no, [00:44:00] it’s not. And, and so, I mean, but I think, I think if it already exists, I mean, I don’t know. I, I also feel like no one should feel obligated. This should go without saying that. If you see a project that you like that is involved in one of those coins. Do you have a zero obligation to be, uh, supportive of that in any way? And in fact, it is probably in your financial best interest to not be involved. Um, it, it is your life, your money, your, you do whatever you want, gamble, however you want. But, uh, I, I, I, I do, I guess I, I bristle a little bit. Like if people try to portray it like, oh, well this is how you can support me by like buying into this thing. I’m like, okay, that’s alright. Like, I, I, if you wanna, again, like I said, if you wanna play poly market with this, fine, but don’t, don’t try to wrap that around like, oh, well this is how you can give back. It’s like, no, you can give back in other ways. Like you can do direct donations, you can do other stuff. Like I would, I would much rather encourage people to be like, rather than putting a hundred dollars in Ralph Coin, [00:45:00] give a hundred dollars to the Ralph Guy directly. Apex 1.0? Brett: So, speaking of unprofitable open source, I have Apex almost to 1.0. Um, it officially handles, I think, all of the syntax that I had hoped it would handle. Um, it does like crazy things, uh, that it’s all built on common mark, GFM, uh, like cmar, GFM, GitHub’s project. Um, so it, it does all of that. Plus it handles stuff from like M mark with like indices. Indices, and it incorporates, uh. Uh, oh, I forget the name of it. Like two different ways of creating indices. It handles all kinds of bibliography syntax, like every known bibliography syntax. Um, I just added, you can, you can create insert tags with plus, plus, uh, the same way you would create a deletion with, uh, til detail. Um, and [00:46:00] I’ve added a full plugin structure, and the plugins now can be project local. So you can have global plugins. And then if you have specific settings, so like I have a, I, my blogs are all based on cramdown and like the bunch documentation is based on cramdown, but then like the mark documentation. And most of my writing is based on multi markdown and they have different. Like the, for example, the IDs that go on headers in multi markdown. If it’s, if it has a space in multi markdown, it gets compressed to no space in common Mark or GFM, it gets a dash instead of a space, which means if I have cross links, cross references in my document, if I don’t have the right header syntax, the cross reference will break. So now I can put a, a config into like my bunch documentation that tells Apex to use, [00:47:00] um, the dash syntax. And in my Mark documentation, I can tell it to use the multi markdown syntax. And then I can just run Apex with no command line arguments and everything works. And I don’t know, I, I haven’t gotten adoption for it. Like the one place I thought it could be really useful was DEVONthink, Christina: Mm-hmm. Brett: which has always been based on multi markdown, which. Um, is I love multi markdown and I love Fletcher and, um, it’s just, it’s missing a lot of what I would consider modern syntax. Christina: Right. Brett: so I, I offered it to Devin think, and it turned out they were working on their own project along the same lines at the same time. Um, but I’m hoping to find some, some apps that will incorporate it and maybe get it some traction. It’s solid, it’s fast, it’s not as fast as common Mark, but it does twice as much. Um, like the [00:48:00] benchmarks, it a complex document renders in common mark in about. Uh, 27 milliseconds, and in Apex it’s more like 46 milliseconds. But in the grand scheme of things, I could render my whole blog 10 times faster than I can with cramm down or Panoc and yeah, and, and I can use all the syntax I want. Challenges and Innovations in Markdown Processing Brett: Did I tell you about, did I tell you about, uh, Panoc Divs? The div extension, um, like you can in with the panoc D extension, you can put colon, colon, colon instead of like back, take, back, take backtick. So normally, like back ticks would create a code block with colons, it creates a div, and you can apply, you can apply inline attribute lists after the colons to make, to give it a class and an ID and any other attributes you wanna apply to it. I extended that so that you can do colon, [00:49:00] colon, colon, and then type a tag name. So if you type colon, colon, colon aside and then applied an attribute list to it, it would create an aside tag with those attributes. Um, the, the only pan deck extension that I wish I could support that I don’t yet is grid tables. Have you ever seen grid tables? Christina: I have not. Brett: There, it’s, it’s kind of like multi markdown table syntax, except you use like plus signs for joints and uh, pipes and dashes, and you actually draw out the table like old ASCI diagrams Christina: Okay. Brett: and that would render that into a valid HTML table. But that supporting that has just been, uh, tables. Tables are the thing. I’ve pulled the most hair out over. Christina: Yeah, I was gonna say, I think I, they feel like tables are hard. I also feel like in a lot of circumstances, I mean obviously people use tables and whatnot, but like, [00:50:00] only thing I would say to you, like, you know, apex is, is so cool and I hope that other projects adopt it. Um, and, uh, potentially with the POC support as far as you’ve gotten with it, maybe, you know, projects that support some of POC stuff could, could, you know, uh, jump into it. But I will say it does feel like. Once you go into like the Panoc universe, like that almost feels like a separate thing from the markdown Flavors like that almost feels like its own like ecosystem. You know what I mean? Brett: Well, yeah, and I haven’t tried to adopt everything Panoc does because you can als, you can also use panoc. You can pipe from Apex into Panoc or vice versa. So I’m not gonna try to like one for one replicate panoc, Christina: No, no. Totally Brett: do all of panoc export options because Panoc can take HTML in and then output PDFs and Doc X and everything. So you can just pipe output from Apex into Panoc to create your PDF or whatever Christina: And like, and, and like to, [00:51:00] and like to me, like that seems ideal, right? But I feel like maybe like adopting some of the other things, especially like, like their grid, you know, table, things like that. Like that would be cool. But like, that feels like that’s a, potentially has the, has the potential, maybe slow down rendering and do other stuff which you don’t want. And then b it’s like, okay, now are we complicated to the point that like, this is, this is now not becoming like one markdown processor to rule them all, but you Brett: Yeah, the whole point, the whole point is to be able to just run Apex and not worry about what cex you’re using. Um, but grid tables are the kind of thing that are so intentional that you’re not gonna accidentally use them. Like the, the, the, the impetus for Apex was all these support requests I get from people that are like the tilde syntax for underline or delete doesn’t work in Mark. And it, it does if you choose the right processor. But then you have to know, yeah, you have to [00:52:00] know what processor supports what syntax and that takes research and time and bringing stuff in from, say, obsidian into mart. You would just kind of expect things to work. And that’s, that’s why I built Apex and Christina: right? Brett: you are correct that grid tables are the kind of thing, no one’s going to use grid tables if they haven’t specifically researched what Christina: I right. Brett: they’re gonna work with. Christina: And they’re going to have a way that has their file marked so that it is designated as poc and then whatever, you know, flags for whatever POC features it supports, um, does. Now I know that the whole point of APEX is you don’t have to worry about this, but, but I am assuming, based on kind of what you said, like if I pass like arguments like in like a, you know, in a config file or something like where I was like, these documents or, or, or this URL or these things are, you know, in this process or in this in another, then it can, it can just automatically apply those rules without having to infer based on the, on the syntax, right. Brett: right. It has [00:53:00] modes for cram down and common mark and GFM and discount, and you can like tell it what mode you’re writing in and it will limit the feature set to just what that processor would handle. Um, and then all of the flags, all of the features have neg negotiable flags on them. So if you wanted to say. Skip, uh, relax table rendering. You could turn that off on the command line or in a config file. Um, so yeah, everything, everything, you can make it behave like any particular processor. Uh, but I focus mostly on the unified mode, which again, like you don’t have to think about which processor you are using. Christina: Are you seeing, I guess like in, in circumstances like, ’cause I, in, in my, like, my experience, like, I would never think to, like, I would probably like, like to, I would probably do like what you do, which is like, I’m [00:54:00] going to use one syntax or, or one, you know, processor for one type of files and maybe another and another. Um, but I, I don’t think that like, I would ever have a, and maybe I’m misunderstanding this, but I don’t think I would ever have an instance where I would be like mixing the two together in the same file. Brett: See, that’s my, so that’s, that’s what’s changing for me is I’m switching my blog over to use Apex instead of Cramdown, which means I can now incorporate syntax that wasn’t available before. So moving forward, I am mixing, um, things from common mark, things from cram down, things from multi markdown. Um, and, and like, so once you know you have the option Christina: right. Then you might do that Brett: you have all the syntax available, you start doing it. And historically you won’t have, but like once you get used to it, then you can. Christina: Okay. So here’s the next existential question for you. At what point then does it go from being, you know, like [00:55:00] a, a, a rendering engine, kind of like an omni rendering engine to being a syntax and a flavor in and of itself? Brett: That is that, yeah, no, that’s a, that’s a very valid question and one that I have to keep asking myself, um, because I never, okay, so what to, to encapsulate what you’re saying, if you got used to writing for Apex and you were mixing your syntax, all of a sudden you have a document that can’t render in anything except Apex, which does eventually make it its own. Yeah, no, it is, it’s always, it’s a concern the whole time. Christina: well, and I, I wouldn’t even necessarily, I mean, like, and I think it could be two things, right? I mean, like, you could have it live in two worlds where, like on the one hand it could be like the rendering engine to end all rendering engines and it can render, you know, files and any of them, and you can specify like whatever, like in, in, in like a tunnel or something. Like, you know, these files are, [00:56:00] are this format, these are these, and you know, maybe have some sort of, you know, um, something, even like a header files or whatever to be like, this is what this rendering engine is. Um, you know, with, with your projects to have it, uh, do that. Um. Or have it infer, you know, based on, on, on, um, the, the logic that you’re importing. But it could also be one of those things where you’re like, okay, I just have created like, you know, the omni syntax. And that’s a thing that maybe, maybe you get people to try to encourage or try, try to adopt, right? Like, it’s like, okay, you can always just use common mark. You can always just use GFM, you can always just use multi markdown, but we support these other things too, from these other, um, systems and you can intermix and match them. Um, because, because I, I do feel like at a certain point, like at least the way you’re running it yourself, you have your own syntax. Like, like, you know. Brett: yeah. No, you have perfectly encapsulated the, the major [00:57:00] design concern. And I think you’re correct. It can exist, it can be both things at once. Um, but I have like, nobody needs another markdown syntax. Like there are so many flavors right now. Okay. There may be a dozen. It’s not like an infinite number, but, but there’s enough that the confusion is real. Um, and we don’t need yet another markdown flavor, but we do need a universal processor that. Makes the differentiations less, but yeah, no, it’s, I need, I need to nail down that philosophy, uh, and really like, put it into writing and say, this is the design goal of this project, uh, which I have like hinted at, but I’m a scattered thinker and like, part of, part of the design philosophy is if someone says, Hey, [00:58:00] could you make this work? I just wanted a project where I could say, yeah, I’m gonna make that work. I, I, I’m gonna add this somewhat esoteric syntax and it’s just gonna work and it’s not gonna affect anything else. And you don’t have to use it, but if you do, there it is. So it’s kind of, it was designed to bloat to a circuit certain extent. Um, but yeah, I need to, I need to actually write a page That’s just the philosophy and really, really, uh, put, put all my thoughts together on that. Christina: Yeah, no, ’cause I was just kind of thinking, I was like, ’cause it’s so cool. Um, but the way that I would’ve envisioned using it, like I, I still like, it’s cool that you can mix all those things in together. I still feel like I probably wouldn’t because I’m not you. And so then I would just have like this additional dependency that it’s like, okay, if something happens to Apex one day and that’s the only thing that can render my documents, then like, you know what I mean? And, and, and if it’s not getting updated [00:59:00] anymore or whatever, then I’m kind of like SOL, um, Brett: Maku. Do you remember Maku? Christina: vaguely. Brett: It’s, the project is kind of dead and a lot of its syntax has been incorporated into various other processors. But if you built your whole blog on Maku, you have to, you have to be able to run like a 7-year-old binary, um, and, and it’ll never be updated, and eventually you’re gonna run into trouble. The nice thing about Unix based stuff is it’s. Has a, you can stop developing it and it’ll work for a decade, um, until, like, there’s a major shift in processors, but like, just the shift to arm. Like if, if Maku was only ever compiled for, uh, for, uh, Intel and it wasn’t open source, you would, it would be gone. You wouldn’t be able to run it anymore. So yeah, these things can happen. Christina: [01:00:00] Well, and I just even think about like, you know, the fact that like, you know, like some of the early processors, like I remember like back, I mean this is a million years ago, but having to use like certain, like pearl, you know, based things, you know, but depending on like whatever your backend system was, then you moved to PHP, they maybe you move, moved to, you know, Ruby, if you’re using like Jekyll and maybe you move to something else. And I was like, okay, you know, what will the thing be in the future? Yeah. If, if I, if it’s open source and there’s a way that, you know, you can write a new, a new processor for that, but it does create like, dependencies on top of dependencies, which is why I, I kind of feel like I like having like the omni processor. I don’t know if, like, for me, I’m like, okay, I, I would probably be personally leery about intermingling all my different syntaxes together. Brett: to that end though, that is why I wanted it in C um, because C will probably never die. C can be compiled on just about any platform. And it can be used with, like, if you have, if you have a Jekyll blog and you wanna [01:01:00] incorporate a C program into a gem, it’s no problem. Uh, you can incorporate it into just about any. Langu
If you rely on complex scaffolding to build AI agents you aren't scaling you are coping. Thibault Sottiaux from OpenAI's Codex team joins us to explain why they are ruthlessly removing the harness to solve for true agentic autonomy. We discuss the bitter lesson of vertical integration, why scalable primitives beat clever tricks, and how the rise of the super bus factor is reshaping engineering careers.LinearB: Measure the impact of GitHub Copilot and CursorFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's guest:OpenAI Codex: Learn more about the models powering tools like GitHub Copilot.Codex Open Source Repo: The lightweight coding agent that runs in your terminal (check out the Rust migration mentioned in the episode).Agent Skills Open Standard: The open standard and catalog for giving agents new capabilities.The Bitter Lesson: Richard Sutton's essay on why compute-centric methods win in AI.Follow Tibo on X @thsottiaux | GitHubOFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.
Published as a 47-page pamphlet in colonial America on January 10, 1776, Common Sense challenged the authority of the British government and the royal monarchy. The elegantly plain and persuasive language that Thomas Paine used touched the hearts and minds of the average American and was the first work to openly ask for political freedom and independence from Great Britain. Paine’s powerful words came to symbolize the spirit of the Revolution itself. General George Washington had it read to his troops. Common Sense by Thomas Paine (read by Walter Dixon) at https://amzn.to/3MHAIYr Common Sense by Thomas Paine (book) available at https://amzn.to/3MKX77b Writings of Thomas Paine available at https://amzn.to/3MCaFC2 Books about Thomas Paine available at https://amzn.to/4s3qxOg ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credits: Common Sense—The Origin and Design of Government by Thomas Paine, audio recording read by Walter Dixon (Public Domain 2011 Gildan Media). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Codex History of Video Games with Mike Coletta and Tyler Ostby - Podaholics
Mike and Tyler are on vacation! Enjoy this previous episode on Wii U games! Mostly how some of these are amazing but only available on Wii U and that is a travesty! The theme music is by RoccoW. The logo was created by Dani Dodge.
How We Use Claude Code and Codex
With the Ralph loop going mainstream, how are engineering organizations utilizing it at scale? Andrew and Ben sit down with Angie Jones, VP of Engineering AI Tools and Enablement at Block, to pick her brain on how they are using the Ralph Wiggum technique to automate updates across 25,000 repos and how she is strategically preparing for Gas Town. The team also breaks down the launch of OpenAI's new GPT-5.2 Codex model before closing out the week with a look at the weirdest tech from CES, from hypersonic knives to music-playing lollipops.LinearB: Measure the impact of GitHub Copilot and CursorFollow the show:Subscribe to our Substack Follow us on LinkedInSubscribe to our YouTube ChannelLeave us a ReviewFollow the hosts:Follow AndrewFollow BenFollow DanFollow today's stories:Angie Jones: angiejones.tech | LinkedIn | X (Twitter)Goose (Block's AI Agent): github.com/block/gooseSteve Yegge's "Welcome to Gas Town": Read on MediumGeoffrey Huntley's Ralph Loop: ghuntley.com/ralphRyan Dahl on the End of Coding: @rough__seaThe Weirdest Tech of CES: Read the ArticleOFFERS Start Free Trial: Get started with LinearB's AI productivity platform for free. Book a Demo: Learn how you can ship faster, improve DevEx, and lead with confidence in the AI era. LEARN ABOUT LINEARB AI Code Reviews: Automate reviews to catch bugs, security risks, and performance issues before they hit production. AI & Productivity Insights: Go beyond DORA with AI-powered recommendations and dashboards to measure and improve performance. AI-Powered Workflow Automations: Use AI-generated PR descriptions, smart routing, and other automations to reduce developer toil. MCP Server: Interact with your engineering data using natural language to build custom reports and get answers on the fly.
Published as a 47-page pamphlet in colonial America on January 10, 1776, Common Sense challenged the authority of the British government and the royal monarchy. The elegantly plain and persuasive language that Thomas Paine used touched the hearts and minds of the average American and was the first work to openly ask for political freedom and independence from Great Britain. Paine’s powerful words came to symbolize the spirit of the Revolution itself. General George Washington had it read to his troops. Common Sense by Thomas Paine (read by Walter Dixon) at https://amzn.to/3MHAIYr Common Sense by Thomas Paine (book) available at https://amzn.to/3MKX77b Writings of Thomas Paine available at https://amzn.to/3MCaFC2 Books about Thomas Paine available at https://amzn.to/4s3qxOg ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credits: Common Sense—The Origin and Design of Government by Thomas Paine, audio recording read by Walter Dixon (Public Domain 2011 Gildan Media). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Welp. That was wild.
Dragons are eternal. Gaming mice are not. In today's episode of the RPGBOT.Podcast, we survive cursed peripherals, catastrophic Kingdom turns, and at least one near-fatal werewolf encounter before finally turning our attention to the real reason we woke up before dawn: Paizo's Lost Omens: Draconic Codex. It's a book that asks the important questions—like "What if dragons were powered by magical traditions?", "What if dragons were made of swords?", and "What if a dragon respawned because you can't kill the joke?" Pour yourself a gallon of coffee and join us as we dig into archdragons, dragon gods, delight dragons, wish dragons, and more dragons than should legally fit in one hardcover. Show Notes In this episode, the RPGBOT crew reviews Lost Omens: Draconic Codex, Paizo's definitive Pathfinder Second Edition sourcebook for dragons. The discussion covers both lore and mechanics introduced in the Remaster era, highlighting how Pathfinder 2e has fully reinvented dragons to align with its four magical traditions: Arcane, Divine, Occult, and Primal . Covered Topics Include: Remastered Dragon Lore Pathfinder's clean break from chromatic/metallic dragons Dragons aligned to magical traditions instead of color Why these dragons feel "native" to PF2e mechanics Dragon Creation Myth & Dragon Gods Apsu, Dahak, Sarshalatu, and the draconic origin story Dragon gods, pantheons, edicts, and anathema Cleric and champion support for dragon-aligned worship Archdragons & Dragon Physiology New age category: Archdragon Young → Adult → Ancient → Arch progression Why archdragons emerge during times of conflict Expanded archdragon stat blocks for existing dragons Bestiary Highlights (So Many Dragons) - Over 40 dragon types, including: Delight Dragons (joy, bubbles, toys, and respawning punchlines) Mocking Dragons (laughing at your failures—mechanically) Wish Dragons (granting wishes with no ritual cost… interpreted by the dragon) Vorpal Dragons (made of swords, can decapitate you and leave you alive) Sage Dragons (dragon nerds who weaponize your secrets) Wyrm Wraiths (void-fueled undead dragon horrors) Player & GM Options Dragon-themed archetypes and ancestry options Dragonets as playable, pseudo-dragon-like companions Expanded kobold options New spells, magic items, and dragon contracts (mechanical pacts that actually matter) GM Tools & Campaign Hooks Dragons as quest-givers, gods, villains, and punchlines High-level storytelling with wish-granting dragons Using dragons as expressions of magical philosophy Key Takeaways Lost Omens: Draconic Codex fully redefines dragons for Pathfinder 2e, making them mechanically and narratively distinct from D&D while remaining iconic . The four magical traditions give dragons clearer identities, spell access, and story roles. Archdragons provide true level-21+ threats with campaign-defining presence. Dragons in this book are not just monsters—they're gods, philosophers, tricksters, wish-granters, and walking rules arguments. Player options (dragonets, archetypes, contracts) meaningfully support dragon-centric campaigns. This book is a must-own for Pathfinder 2e GMs, especially for high-level or lore-heavy games. Welcome to the RPGBOT Podcast. If you love Dungeons & Dragons, Pathfinder, and tabletop RPGs, this is the podcast for you. Support the show for free: Rate and review us on Apple Podcasts, Spotify, or any podcast app. It helps new listeners find the best RPG podcast for D&D and Pathfinder players. Level up your experience: Join us on Patreon to unlock ad-free access to RPGBOT.net and the RPGBOT Podcast, chat with us and the community on the RPGBOT Discord, and jump into live-streamed RPG podcast recordings. Support while you shop: Use our Amazon affiliate link at https://amzn.to/3NwElxQ and help us keep building tools and guides for the RPG community. Meet the Hosts Tyler Kamstra – Master of mechanics, seeing the Pathfinder action economy like Neo in the Matrix. Randall James – Lore buff and technologist, always ready to debate which Lord of the Rings edition reigns supreme. Ash Ely – Resident cynic, chaos agent, and AI's worst nightmare, bringing pure table-flipping RPG podcast energy. Join the RPGBOT team where fantasy roleplaying meets real strategy, sarcasm, and community chaos. How to Find Us: In-depth articles, guides, handbooks, reviews, news on Tabletop Role Playing at RPGBOT.net Tyler Kamstra BlueSky: @rpgbot.net TikTok: @RPGBOTDOTNET Ash Ely Professional Game Master on StartPlaying.Games BlueSky: @GravenAshes YouTube: @ashravenmedia Randall James BlueSky: @GrimoireRPG Amateurjack.com Read Melancon: A Grimoire Tale (affiliate link) Producer Dan @Lzr_illuminati
Published as a 47-page pamphlet in colonial America on January 10, 1776, Common Sense challenged the authority of the British government and the royal monarchy. The elegantly plain and persuasive language that Thomas Paine used touched the hearts and minds of the average American and was the first work to openly ask for political freedom and independence from Great Britain. Paine’s powerful words came to symbolize the spirit of the Revolution itself. General George Washington had it read to his troops. Common Sense by Thomas Paine (read by Walter Dixon) at https://amzn.to/3MHAIYr Common Sense by Thomas Paine (book) available at https://amzn.to/3MKX77b Writings of Thomas Paine available at https://amzn.to/3MCaFC2 Books about Thomas Paine available at https://amzn.to/4s3qxOg ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credits: Common Sense—The Origin and Design of Government by Thomas Paine, audio recording read by Walter Dixon (Public Domain 2011 Gildan Media). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Codex History of Video Games with Mike Coletta and Tyler Ostby - Podaholics
Mike is sick so there is no new episode this week. Please enjoy this talk about the history of the Wii U! This includes development, launch, spec talk, and why they think this console is so underrated! The theme music is by RoccoW. The logo was created by Dani Dodge.
Aishwarya Naresh Reganti and Kiriti Badam have helped build and launch more than 50 enterprise AI products across companies like OpenAI, Google, Amazon, and Databricks. Based on these experiences, they've developed a small set of best practices for building and scaling successful AI products. The goal of this conversation is to save you and your team a lot of pain and suffering.We discuss:1. Two key ways AI products differ from traditional software, and why that fundamentally changes how they should be built2. Common patterns and anti-patterns in companies that build strong AI products versus those that struggle3. A framework they developed from real-world experience to iteratively build AI products that create a flywheel of improvement4. Why obsessing about customer trust and reliability is an underrated driver of successful AI products5. Why evals aren't a cure-all, and the most common misconceptions people have about them6. The skills that matter most for builders in the AI era—Brought to you by:Merge—The fastest way to ship 220+ integrations: https://merge.dev/lennyStrella—The AI-powered customer research platform: https://strella.io/lennyBrex—The banking solution for startups: https://www.brex.com/product/business-account?ref_code=bmk_dp_brand1H25_ln_new_fs—Transcript: https://www.lennysnewsletter.com/p/what-openai-and-google-engineers-learned—My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/183007822/referenced—Get 15% off Aishwarya and Kiriti's Maven course, Building Agentic AI Applications with a Problem-First Approach, using this link: https://bit.ly/3V5XJFp—Where to find Aishwarya Naresh Reganti:• LinkedIn: https://www.linkedin.com/in/areganti• GitHub: https://github.com/aishwaryanr/awesome-generative-ai-guide• X: https://x.com/aish_reganti—Where to find Kiriti Badam:• LinkedIn: https://www.linkedin.com/in/sai-kiriti-badam• X: https://x.com/kiritibadam—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Aishwarya and Kiriti(05:03) Challenges in AI product development(07:36) Key differences between AI and traditional software(13:19) Building AI products: start small and scale(15:23) The importance of human control in AI systems(22:38) Avoiding prompt injection and jailbreaking(25:18) Patterns for successful AI product development(33:20) The debate on evals and production monitoring(41:27) Codex team's approach to evals and customer feedback(45:41) Continuous calibration, continuous development (CC/CD) framework(58:07) Emerging patterns and calibration(01:01:24) Overhyped and under-hyped AI concepts(01:05:17) The future of AI(01:08:41) Skills and best practices for building AI products(01:14:04) Lightning round and final thoughts—Referenced:• LevelUp Labs: https://levelup-labs.ai/• Why your AI product needs a different development lifecycle: https://www.lennysnewsletter.com/p/why-your-ai-product-needs-a-different• Booking.com: https://www.booking.com• Research paper on agents in production (by Matei Zaharia's lab): https://arxiv.org/pdf/2512.04123• Matei Zaharia's research on Google Scholar: https://scholar.google.com/citations?user=I1EvjZsAAAAJ&hl=en• The coming AI security crisis (and what to do about it) | Sander Schulhoff: https://www.lennysnewsletter.com/p/the-coming-ai-security-crisis• Gajen Kandiah on LinkedIn: https://www.linkedin.com/in/gajenkandiah• Rackspace: https://www.rackspace.com• The AI-native startup: 5 products, 7-figure revenue, 100% AI-written code | Dan Shipper (co-founder/CEO of Every): https://www.lennysnewsletter.com/p/inside-every-dan-shipper• Semantic Diffusion: https://martinfowler.com/bliki/SemanticDiffusion.html• LMArena: https://lmarena.ai• Artificial Analysis: https://artificialanalysis.ai/leaderboards/providers• Why humans are AI's biggest bottleneck (and what's coming in 2026) | Alexander Embiricos (OpenAI Codex Product Lead): https://www.lennysnewsletter.com/p/why-humans-are-ais-biggest-bottleneck• Airline held liable for its chatbot giving passenger bad advice—what this means for travellers: https://www.bbc.com/travel/article/20240222-air-canada-chatbot-misinformation-what-travellers-should-know• Demis Hassabis on LinkedIn: https://www.linkedin.com/in/demishassabis• We replaced our sales team with 20 AI agents—here's what happened | Jason Lemkin (SaaStr): https://www.lennysnewsletter.com/p/we-replaced-our-sales-team-with-20-ai-agents• Socrates's quote: https://en.wikipedia.org/wiki/The_unexamined_life_is_not_worth_living• Noah Smith's newsletter: https://www.noahpinion.blog• Silicon Valley on HBO Max: https://www.hbomax.com/shows/silicon-valley/b4583939-e39f-4b5c-822d-5b6cc186172d• Clair Obscur: Expedition 33: https://store.steampowered.com/app/1903340/Clair_Obscur_Expedition_33/• Wisprflow: https://wisprflow.ai• Raycast: https://www.raycast.com• Steve Jobs's quote: https://www.goodreads.com/quotes/463176-you-can-t-connect-the-dots-looking-forward-you-can-only—Recommended books:• When Breath Becomes Air: https://www.amazon.com/When-Breath-Becomes-Paul-Kalanithi/dp/081298840X• The Three-Body Problem: https://www.amazon.com/Three-Body-Problem-Cixin-Liu/dp/0765382032• A Fire Upon the Deep: https://www.amazon.com/Fire-Upon-Deep-Zones-Thought/dp/0812515285—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. To hear more, visit www.lennysnewsletter.com
don't miss George's AIE talk: https://www.youtube.com/watch?v=sRpqPgKeXNk —- From launching a side project in a Sydney basement to becoming the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities—George Cameron and Micah Hill-Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is "open" really? We discuss: The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard) The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding "I don't know"), and Claude models lead with the lowest hallucination rates despite not always being the smartest GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias) The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron) The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents) Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions) V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models) — Artificial Analysis Website: https://artificialanalysis.ai (https://artificialanalysis.ai ("https://artificialanalysis.ai")) George Cameron on X: https://x.com/grmcameron (https://x.com/grmcameron ("https://x.com/grmcameron")) Micah Hill-Smith on X: https://x.com/_micah_h (https://x.com/_micah_h ("https://x.com/_micah_h")) Chapters 00:00:00 Introduction: Full Circle Moment and Artificial Analysis Origins 00:01:08 Business Model: Independence and Revenue Streams 00:04:00 The Origin Story: From Legal AI to Benchmarking 00:07:00 Early Challenges: Cost, Methodology, and Independence 00:16:13 AI Grant and Moving to San Francisco 00:18:58 Evolution of the Intelligence Index: V1 to V3 00:27:55 New Benchmarks: Hallucination Rate and Omissions Index 00:33:19 Critical Point and Frontier Physics Problems 00:35:56 GDPVAL AA: Agentic Evaluation and Stirrup Harness 00:51:47 The Openness Index: Measuring Model Transparency 00:57:57 The Smiling Curve: Cost of Intelligence Paradox 01:04:00 Hardware Efficiency and Sparsity Trends 01:07:43 Reasoning vs Non-Reasoning: Token Efficiency Matters 01:10:47 Multimodal Benchmarking and Community Requests 01:14:50 Looking Ahead: V4 Intelligence Index and Beyond
Happy New Year! You may have noticed that in 2025 we had moved toward YouTube as our primary podcasting platform. As we'll explain in the next State of Latent Space post, we'll be doubling down on Substack again and improving the experience for the over 100,000 of you who look out for our emails and website updates!We first mentioned Artificial Analysis in 2024, when it was still a side project in a Sydney basement. They then were one of the few Nat Friedman and Daniel Gross' AIGrant companies to raise a full seed round from them and have now become the independent gold standard for AI benchmarking—trusted by developers, enterprises, and every major lab to navigate the exploding landscape of models, providers, and capabilities.We have chatted with both Clementine Fourrier of HuggingFace's OpenLLM Leaderboard and (the freshly valued at $1.7B) Anastasios Angelopoulos of LMArena on their approaches to LLM evals and trendspotting, but Artificial Analysis have staked out an enduring and important place in the toolkit of the modern AI Engineer by doing the best job of independently running the most comprehensive set of evals across the widest range of open and closed models, and charting their progress for broad industry analyst use.George Cameron and Micah-Hill Smith have spent two years building Artificial Analysis into the platform that answers the questions no one else will: Which model is actually best for your use case? What are the real speed-cost trade-offs? And how open is “open” really?We discuss:* The origin story: built as a side project in 2023 while Micah was building a legal AI assistant, launched publicly in January 2024, and went viral after Swyx's retweet* Why they run evals themselves: labs prompt models differently, cherry-pick chain-of-thought examples (Google Gemini 1.0 Ultra used 32-shot prompts to beat GPT-4 on MMLU), and self-report inflated numbers* The mystery shopper policy: they register accounts not on their own domain and run intelligence + performance benchmarks incognito to prevent labs from serving different models on private endpoints* How they make money: enterprise benchmarking insights subscription (standardized reports on model deployment, serverless vs. managed vs. leasing chips) and private custom benchmarking for AI companies (no one pays to be on the public leaderboard)* The Intelligence Index (V3): synthesizes 10 eval datasets (MMLU, GPQA, agentic benchmarks, long-context reasoning) into a single score, with 95% confidence intervals via repeated runs* Omissions Index (hallucination rate): scores models from -100 to +100 (penalizing incorrect answers, rewarding ”I don't know”), and Claude models lead with the lowest hallucination rates despite not always being the smartest* GDP Val AA: their version of OpenAI's GDP-bench (44 white-collar tasks with spreadsheets, PDFs, PowerPoints), run through their Stirrup agent harness (up to 100 turns, code execution, web search, file system), graded by Gemini 3 Pro as an LLM judge (tested extensively, no self-preference bias)* The Openness Index: scores models 0-18 on transparency of pre-training data, post-training data, methodology, training code, and licensing (AI2 OLMo 2 leads, followed by Nous Hermes and NVIDIA Nemotron)* The smiling curve of AI costs: GPT-4-level intelligence is 100-1000x cheaper than at launch (thanks to smaller models like Amazon Nova), but frontier reasoning models in agentic workflows cost more than ever (sparsity, long context, multi-turn agents)* Why sparsity might go way lower than 5%: GPT-4.5 is ~5% active, Gemini models might be ~3%, and Omissions Index accuracy correlates with total parameters (not active), suggesting massive sparse models are the future* Token efficiency vs. turn efficiency: GPT-5 costs more per token but solves Tau-bench in fewer turns (cheaper overall), and models are getting better at using more tokens only when needed (5.1 Codex has tighter token distributions)* V4 of the Intelligence Index coming soon: adding GDP Val AA, Critical Point, hallucination rate, and dropping some saturated benchmarks (human-eval-style coding is now trivial for small models)Links to Artificial Analysis* Website: https://artificialanalysis.ai* George Cameron on X: https://x.com/georgecameron* Micah-Hill Smith on X: https://x.com/micahhsmithFull Episode on YouTubeTimestamps* 00:00 Introduction: Full Circle Moment and Artificial Analysis Origins* 01:19 Business Model: Independence and Revenue Streams* 04:33 Origin Story: From Legal AI to Benchmarking Need* 16:22 AI Grant and Moving to San Francisco* 19:21 Intelligence Index Evolution: From V1 to V3* 11:47 Benchmarking Challenges: Variance, Contamination, and Methodology* 13:52 Mystery Shopper Policy and Maintaining Independence* 28:01 New Benchmarks: Omissions Index for Hallucination Detection* 33:36 Critical Point: Hard Physics Problems and Research-Level Reasoning* 23:01 GDP Val AA: Agentic Benchmark for Real Work Tasks* 50:19 Stirrup Agent Harness: Open Source Agentic Framework* 52:43 Openness Index: Measuring Model Transparency Beyond Licenses* 58:25 The Smiling Curve: Cost Falling While Spend Rising* 1:02:32 Hardware Efficiency: Blackwell Gains and Sparsity Limits* 1:06:23 Reasoning Models and Token Efficiency: The Spectrum Emerges* 1:11:00 Multimodal Benchmarking: Image, Video, and Speech Arenas* 1:15:05 Looking Ahead: Intelligence Index V4 and Future Directions* 1:16:50 Closing: The Insatiable Demand for IntelligenceTranscriptMicah [00:00:06]: This is kind of a full circle moment for us in a way, because the first time artificial analysis got mentioned on a podcast was you and Alessio on Latent Space. Amazing.swyx [00:00:17]: Which was January 2024. I don't even remember doing that, but yeah, it was very influential to me. Yeah, I'm looking at AI News for Jan 17, or Jan 16, 2024. I said, this gem of a models and host comparison site was just launched. And then I put in a few screenshots, and I said, it's an independent third party. It clearly outlines the quality versus throughput trade-off, and it breaks out by model and hosting provider. I did give you s**t for missing fireworks, and how do you have a model benchmarking thing without fireworks? But you had together, you had perplexity, and I think we just started chatting there. Welcome, George and Micah, to Latent Space. I've been following your progress. Congrats on... It's been an amazing year. You guys have really come together to be the presumptive new gardener of AI, right? Which is something that...George [00:01:09]: Yeah, but you can't pay us for better results.swyx [00:01:12]: Yes, exactly.George [00:01:13]: Very important.Micah [00:01:14]: Start off with a spicy take.swyx [00:01:18]: Okay, how do I pay you?Micah [00:01:20]: Let's get right into that.swyx [00:01:21]: How do you make money?Micah [00:01:24]: Well, very happy to talk about that. So it's been a big journey the last couple of years. Artificial analysis is going to be two years old in January 2026. Which is pretty soon now. We first run the website for free, obviously, and give away a ton of data to help developers and companies navigate AI and make decisions about models, providers, technologies across the AI stack for building stuff. We're very committed to doing that and tend to keep doing that. We have, along the way, built a business that is working out pretty sustainably. We've got just over 20 people now and two main customer groups. So we want to be... We want to be who enterprise look to for data and insights on AI, so we want to help them with their decisions about models and technologies for building stuff. And then on the other side, we do private benchmarking for companies throughout the AI stack who build AI stuff. So no one pays to be on the website. We've been very clear about that from the very start because there's no use doing what we do unless it's independent AI benchmarking. Yeah. But turns out a bunch of our stuff can be pretty useful to companies building AI stuff.swyx [00:02:38]: And is it like, I am a Fortune 500, I need advisors on objective analysis, and I call you guys and you pull up a custom report for me, you come into my office and give me a workshop? What kind of engagement is that?George [00:02:53]: So we have a benchmarking and insight subscription, which looks like standardized reports that cover key topics or key challenges enterprises face when looking to understand AI and choose between all the technologies. And so, for instance, one of the report is a model deployment report, how to think about choosing between serverless inference, managed deployment solutions, or leasing chips. And running inference yourself is an example kind of decision that big enterprises face, and it's hard to reason through, like this AI stuff is really new to everybody. And so we try and help with our reports and insight subscription. Companies navigate that. We also do custom private benchmarking. And so that's very different from the public benchmarking that we publicize, and there's no commercial model around that. For private benchmarking, we'll at times create benchmarks, run benchmarks to specs that enterprises want. And we'll also do that sometimes for AI companies who have built things, and we help them understand what they've built with private benchmarking. Yeah. So that's a piece mainly that we've developed through trying to support everybody publicly with our public benchmarks. Yeah.swyx [00:04:09]: Let's talk about TechStack behind that. But okay, I'm going to rewind all the way to when you guys started this project. You were all the way in Sydney? Yeah. Well, Sydney, Australia for me.Micah [00:04:19]: George was an SF, but he's Australian, but he moved here already. Yeah.swyx [00:04:22]: And I remember I had the Zoom call with you. What was the impetus for starting artificial analysis in the first place? You know, you started with public benchmarks. And so let's start there. We'll go to the private benchmark. Yeah.George [00:04:33]: Why don't we even go back a little bit to like why we, you know, thought that it was needed? Yeah.Micah [00:04:40]: The story kind of begins like in 2022, 2023, like both George and I have been into AI stuff for quite a while. In 2023 specifically, I was trying to build a legal AI research assistant. So it actually worked pretty well for its era, I would say. Yeah. Yeah. So I was finding that the more you go into building something using LLMs, the more each bit of what you're doing ends up being a benchmarking problem. So had like this multistage algorithm thing, trying to figure out what the minimum viable model for each bit was, trying to optimize every bit of it as you build that out, right? Like you're trying to think about accuracy, a bunch of other metrics and performance and cost. And mostly just no one was doing anything to independently evaluate all the models. And certainly not to look at the trade-offs for speed and cost. So we basically set out just to build a thing that developers could look at to see the trade-offs between all of those things measured independently across all the models and providers. Honestly, it was probably meant to be a side project when we first started doing it.swyx [00:05:49]: Like we didn't like get together and say like, Hey, like we're going to stop working on all this stuff. I'm like, this is going to be our main thing. When I first called you, I think you hadn't decided on starting a company yet.Micah [00:05:58]: That's actually true. I don't even think we'd pause like, like George had an acquittance job. I didn't quit working on my legal AI thing. Like it was genuinely a side project.George [00:06:05]: We built it because we needed it as people building in the space and thought, Oh, other people might find it useful too. So we'll buy domain and link it to the Vercel deployment that we had and tweet about it. And, but very quickly it started getting attention. Thank you, Swyx for, I think doing an initial retweet and spotlighting it there. This project that we released. And then very quickly though, it was useful to others, but very quickly it became more useful as the number of models released accelerated. We had Mixtrel 8x7B and it was a key. That's a fun one. Yeah. Like a open source model that really changed the landscape and opened up people's eyes to other serverless inference providers and thinking about speed, thinking about cost. And so that was a key. And so it became more useful quite quickly. Yeah.swyx [00:07:02]: What I love talking to people like you who sit across the ecosystem is, well, I have theories about what people want, but you have data and that's obviously more relevant. But I want to stay on the origin story a little bit more. When you started out, I would say, I think the status quo at the time was every paper would come out and they would report their numbers versus competitor numbers. And that's basically it. And I remember I did the legwork. I think everyone has some knowledge. I think there's some version of Excel sheet or a Google sheet where you just like copy and paste the numbers from every paper and just post it up there. And then sometimes they don't line up because they're independently run. And so your numbers are going to look better than... Your reproductions of other people's numbers are going to look worse because you don't hold their models correctly or whatever the excuse is. I think then Stanford Helm, Percy Liang's project would also have some of these numbers. And I don't know if there's any other source that you can cite. The way that if I were to start artificial analysis at the same time you guys started, I would have used the Luther AI's eval framework harness. Yup.Micah [00:08:06]: Yup. That was some cool stuff. At the end of the day, running these evals, it's like if it's a simple Q&A eval, all you're doing is asking a list of questions and checking if the answers are right, which shouldn't be that crazy. But it turns out there are an enormous number of things that you've got control for. And I mean, back when we started the website. Yeah. Yeah. Like one of the reasons why we realized that we had to run the evals ourselves and couldn't just take rules from the labs was just that they would all prompt the models differently. And when you're competing over a few points, then you can pretty easily get- You can put the answer into the model. Yeah. That in the extreme. And like you get crazy cases like back when I'm Googled a Gemini 1.0 Ultra and needed a number that would say it was better than GPT-4 and like constructed, I think never published like chain of thought examples. 32 of them in every topic in MLU to run it, to get the score, like there are so many things that you- They never shipped Ultra, right? That's the one that never made it up. Not widely. Yeah. Yeah. Yeah. I mean, I'm sure it existed, but yeah. So we were pretty sure that we needed to run them ourselves and just run them in the same way across all the models. Yeah. And we were, we also did certain from the start that you couldn't look at those in isolation. You needed to look at them alongside the cost and performance stuff. Yeah.swyx [00:09:24]: Okay. A couple of technical questions. I mean, so obviously I also thought about this and I didn't do it because of cost. Yep. Did you not worry about costs? Were you funded already? Clearly not, but you know. No. Well, we definitely weren't at the start.Micah [00:09:36]: So like, I mean, we're paying for it personally at the start. There's a lot of money. Well, the numbers weren't nearly as bad a couple of years ago. So we certainly incurred some costs, but we were probably in the order of like hundreds of dollars of spend across all the benchmarking that we were doing. Yeah. So nothing. Yeah. It was like kind of fine. Yeah. Yeah. These days that's gone up an enormous amount for a bunch of reasons that we can talk about. But yeah, it wasn't that bad because you can also remember that like the number of models we were dealing with was hardly any and the complexity of the stuff that we wanted to do to evaluate them was a lot less. Like we were just asking some Q&A type questions and then one specific thing was for a lot of evals initially, we were just like sampling an answer. You know, like, what's the answer for this? Like, we didn't want to go into the answer directly without letting the models think. We weren't even doing chain of thought stuff initially. And that was the most useful way to get some results initially. Yeah.swyx [00:10:33]: And so for people who haven't done this work, literally parsing the responses is a whole thing, right? Like because sometimes the models, the models can answer any way they feel fit and sometimes they actually do have the right answer, but they just returned the wrong format and they will get a zero for that unless you work it into your parser. And that involves more work. And so, I mean, but there's an open question whether you should give it points for not following your instructions on the format.Micah [00:11:00]: It depends what you're looking at, right? Because you can, if you're trying to see whether or not it can solve a particular type of reasoning problem, and you don't want to test it on its ability to do answer formatting at the same time, then you might want to use an LLM as answer extractor approach to make sure that you get the answer out no matter how unanswered. But these days, it's mostly less of a problem. Like, if you instruct a model and give it examples of what the answers should look like, it can get the answers in your format, and then you can do, like, a simple regex.swyx [00:11:28]: Yeah, yeah. And then there's other questions around, I guess, sometimes if you have a multiple choice question, sometimes there's a bias towards the first answer, so you have to randomize the responses. All these nuances, like, once you dig into benchmarks, you're like, I don't know how anyone believes the numbers on all these things. It's so dark magic.Micah [00:11:47]: You've also got, like… You've got, like, the different degrees of variance in different benchmarks, right? Yeah. So, if you run four-question multi-choice on a modern reasoning model at the temperatures suggested by the labs for their own models, the variance that you can see on a four-question multi-choice eval is pretty enormous if you only do a single run of it and it has a small number of questions, especially. So, like, one of the things that we do is run an enormous number of all of our evals when we're developing new ones and doing upgrades to our intelligence index to bring in new things. Yeah. So, that we can dial in the right number of repeats so that we can get to the 95% confidence intervals that we're comfortable with so that when we pull that together, we can be confident in intelligence index to at least as tight as, like, a plus or minus one at a 95% confidence. Yeah.swyx [00:12:32]: And, again, that just adds a straight multiple to the cost. Oh, yeah. Yeah, yeah.George [00:12:37]: So, that's one of many reasons that cost has gone up a lot more than linearly over the last couple of years. We report a cost to run the artificial analysis. We report a cost to run the artificial analysis intelligence index on our website, and currently that's assuming one repeat in terms of how we report it because we want to reflect a bit about the weighting of the index. But our cost is actually a lot higher than what we report there because of the repeats.swyx [00:13:03]: Yeah, yeah, yeah. And probably this is true, but just checking, you don't have any special deals with the labs. They don't discount it. You just pay out of pocket or out of your sort of customer funds. Oh, there is a mix. So, the issue is that sometimes they may give you a special end point, which is… Ah, 100%.Micah [00:13:21]: Yeah, yeah, yeah. Exactly. So, we laser focus, like, on everything we do on having the best independent metrics and making sure that no one can manipulate them in any way. There are quite a lot of processes we've developed over the last couple of years to make that true for, like, the one you bring up, like, right here of the fact that if we're working with a lab, if they're giving us a private endpoint to evaluate a model, that it is totally possible. That what's sitting behind that black box is not the same as they serve on a public endpoint. We're very aware of that. We have what we call a mystery shopper policy. And so, and we're totally transparent with all the labs we work with about this, that we will register accounts not on our own domain and run both intelligence evals and performance benchmarks… Yeah, that's the job. …without them being able to identify it. And no one's ever had a problem with that. Because, like, a thing that turns out to actually be quite a good… …good factor in the industry is that they all want to believe that none of their competitors could manipulate what we're doing either.swyx [00:14:23]: That's true. I never thought about that. I've been in the database data industry prior, and there's a lot of shenanigans around benchmarking, right? So I'm just kind of going through the mental laundry list. Did I miss anything else in this category of shenanigans? Oh, potential shenanigans.Micah [00:14:36]: I mean, okay, the biggest one, like, that I'll bring up, like, is more of a conceptual one, actually, than, like, direct shenanigans. It's that the things that get measured become things that get targeted by labs that they're trying to build, right? Exactly. So that doesn't mean anything that we should really call shenanigans. Like, I'm not talking about training on test set. But if you know that you're going to be great at another particular thing, if you're a researcher, there are a whole bunch of things that you can do to try to get better at that thing that preferably are going to be helpful for a wide range of how actual users want to use the thing that you're building. But will not necessarily work. Will not necessarily do that. So, for instance, the models are exceptional now at answering competition maths problems. There is some relevance of that type of reasoning, that type of work, to, like, how we might use modern coding agents and stuff. But it's clearly not one for one. So the thing that we have to be aware of is that once an eval becomes the thing that everyone's looking at, scores can get better on it without there being a reflection of overall generalized intelligence of these models. Getting better. That has been true for the last couple of years. It'll be true for the next couple of years. There's no silver bullet to defeat that other than building new stuff to stay relevant and measure the capabilities that matter most to real users. Yeah.swyx [00:15:58]: And we'll cover some of the new stuff that you guys are building as well, which is cool. Like, you used to just run other people's evals, but now you're coming up with your own. And I think, obviously, that is a necessary path once you're at the frontier. You've exhausted all the existing evals. I think the next point in history that I have for you is AI Grant that you guys decided to join and move here. What was it like? I think you were in, like, batch two? Batch four. Batch four. Okay.Micah [00:16:26]: I mean, it was great. Nat and Daniel are obviously great. And it's a really cool group of companies that we were in AI Grant alongside. It was really great to get Nat and Daniel on board. Obviously, they've done a whole lot of great work in the space with a lot of leading companies and were extremely aligned. With the mission of what we were trying to do. Like, we're not quite typical of, like, a lot of the other AI startups that they've invested in.swyx [00:16:53]: And they were very much here for the mission of what we want to do. Did they say any advice that really affected you in some way or, like, were one of the events very impactful? That's an interesting question.Micah [00:17:03]: I mean, I remember fondly a bunch of the speakers who came and did fireside chats at AI Grant.swyx [00:17:09]: Which is also, like, a crazy list. Yeah.George [00:17:11]: Oh, totally. Yeah, yeah, yeah. There was something about, you know, speaking to Nat and Daniel about the challenges of working through a startup and just working through the questions that don't have, like, clear answers and how to work through those kind of methodically and just, like, work through the hard decisions. And they've been great mentors to us as we've built artificial analysis. Another benefit for us was that other companies in the batch and other companies in AI Grant are pushing the capabilities. Yeah. And I think that's a big part of what AI can do at this time. And so being in contact with them, making sure that artificial analysis is useful to them has been fantastic for supporting us in working out how should we build out artificial analysis to continue to being useful to those, like, you know, building on AI.swyx [00:17:59]: I think to some extent, I'm mixed opinion on that one because to some extent, your target audience is not people in AI Grants who are obviously at the frontier. Yeah. Do you disagree?Micah [00:18:09]: To some extent. To some extent. But then, so a lot of what the AI Grant companies are doing is taking capabilities coming out of the labs and trying to push the limits of what they can do across the entire stack for building great applications, which actually makes some of them pretty archetypical power users of artificial analysis. Some of the people with the strongest opinions about what we're doing well and what we're not doing well and what they want to see next from us. Yeah. Yeah. Because when you're building any kind of AI application now, chances are you're using a whole bunch of different models. You're maybe switching reasonably frequently for different models and different parts of your application to optimize what you're able to do with them at an accuracy level and to get better speed and cost characteristics. So for many of them, no, they're like not commercial customers of ours, like we don't charge for all our data on the website. Yeah. They are absolutely some of our power users.swyx [00:19:07]: So let's talk about just the evals as well. So you start out from the general like MMU and GPQA stuff. What's next? How do you sort of build up to the overall index? What was in V1 and how did you evolve it? Okay.Micah [00:19:22]: So first, just like background, like we're talking about the artificial analysis intelligence index, which is our synthesis metric that we pulled together currently from 10 different eval data sets to give what? We're pretty much the same as that. Pretty confident is the best single number to look at for how smart the models are. Obviously, it doesn't tell the whole story. That's why we published the whole website of all the charts to dive into every part of it and look at the trade-offs. But best single number. So right now, it's got a bunch of Q&A type data sets that have been very important to the industry, like a couple that you just mentioned. It's also got a couple of agentic data sets. It's got our own long context reasoning data set and some other use case focused stuff. As time goes on. The things that we're most interested in that are going to be important to the capabilities that are becoming more important for AI, what developers are caring about, are going to be first around agentic capabilities. So surprise, surprise. We're all loving our coding agents and how the model is going to perform like that and then do similar things for different types of work are really important to us. The linking to use cases to economically valuable use cases are extremely important to us. And then we've got some of the. Yeah. These things that the models still struggle with, like working really well over long contexts that are not going to go away as specific capabilities and use cases that we need to keep evaluating.swyx [00:20:46]: But I guess one thing I was driving was like the V1 versus the V2 and how bad it was over time.Micah [00:20:53]: Like how we've changed the index to where we are.swyx [00:20:55]: And I think that reflects on the change in the industry. Right. So that's a nice way to tell that story.Micah [00:21:00]: Well, V1 would be completely saturated right now. Almost every model coming out because doing things like writing the Python functions and human evil is now pretty trivial. It's easy to forget, actually, I think how much progress has been made in the last two years. Like we obviously play the game constantly of like the today's version versus last week's version and the week before and all of the small changes in the horse race between the current frontier and who has the best like smaller than 10B model like right now this week. Right. And that's very important to a lot of developers and people and especially in this particular city of San Francisco. But when you zoom out a couple of years ago, literally most of what we were doing to evaluate the models then would all be 100% solved by even pretty small models today. And that's been one of the key things, by the way, that's driven down the cost of intelligence at every tier of intelligence. We can talk about more in a bit. So V1, V2, V3, we made things harder. We covered a wider range of use cases. And we tried to get closer to things developers care about as opposed to like just the Q&A type stuff that MMLU and GPQA represented. Yeah.swyx [00:22:12]: I don't know if you have anything to add there. Or we could just go right into showing people the benchmark and like looking around and asking questions about it. Yeah.Micah [00:22:21]: Let's do it. Okay. This would be a pretty good way to chat about a few of the new things we've launched recently. Yeah.George [00:22:26]: And I think a little bit about the direction that we want to take it. And we want to push benchmarks. Currently, the intelligence index and evals focus a lot on kind of raw intelligence. But we kind of want to diversify how we think about intelligence. And we can talk about it. But kind of new evals that we've kind of built and partnered on focus on topics like hallucination. And we've got a lot of topics that I think are not covered by the current eval set that should be. And so we want to bring that forth. But before we get into that.swyx [00:23:01]: And so for listeners, just as a timestamp, right now, number one is Gemini 3 Pro High. Then followed by Cloud Opus at 70. Just 5.1 high. You don't have 5.2 yet. And Kimi K2 Thinking. Wow. Still hanging in there. So those are the top four. That will date this podcast quickly. Yeah. Yeah. I mean, I love it. I love it. No, no. 100%. Look back this time next year and go, how cute. Yep.George [00:23:25]: Totally. A quick view of that is, okay, there's a lot. I love it. I love this chart. Yeah.Micah [00:23:30]: This is such a favorite, right? Yeah. And almost every talk that George or I give at conferences and stuff, we always put this one up first to just talk about situating where we are in this moment in history. This, I think, is the visual version of what I was saying before about the zooming out and remembering how much progress there's been. If we go back to just over a year ago, before 01, before Cloud Sonnet 3.5, we didn't have reasoning models or coding agents as a thing. And the game was very, very different. If we go back even a little bit before then, we're in the era where, when you look at this chart, open AI was untouchable for well over a year. And, I mean, you would remember that time period well of there being very open questions about whether or not AI was going to be competitive, like full stop, whether or not open AI would just run away with it, whether we would have a few frontier labs and no one else would really be able to do anything other than consume their APIs. I am quite happy overall that the world that we have ended up in is one where... Multi-model. Absolutely. And strictly more competitive every quarter over the last few years. Yeah. This year has been insane. Yeah.George [00:24:42]: You can see it. This chart with everything added is hard to read currently. There's so many dots on it, but I think it reflects a little bit what we felt, like how crazy it's been.swyx [00:24:54]: Why 14 as the default? Is that a manual choice? Because you've got service now in there that are less traditional names. Yeah.George [00:25:01]: It's models that we're kind of highlighting by default in our charts, in our intelligence index. Okay.swyx [00:25:07]: You just have a manually curated list of stuff.George [00:25:10]: Yeah, that's right. But something that I actually don't think every artificial analysis user knows is that you can customize our charts and choose what models are highlighted. Yeah. And so if we take off a few names, it gets a little easier to read.swyx [00:25:25]: Yeah, yeah. A little easier to read. Totally. Yeah. But I love that you can see the all one jump. Look at that. September 2024. And the DeepSeek jump. Yeah.George [00:25:34]: Which got close to OpenAI's leadership. They were so close. I think, yeah, we remember that moment. Around this time last year, actually.Micah [00:25:44]: Yeah, yeah, yeah. I agree. Yeah, well, a couple of weeks. It was Boxing Day in New Zealand when DeepSeek v3 came out. And we'd been tracking DeepSeek and a bunch of the other global players that were less known over the second half of 2024 and had run evals on the earlier ones and stuff. I very distinctly remember Boxing Day in New Zealand, because I was with family for Christmas and stuff, running the evals and getting back result by result on DeepSeek v3. So this was the first of their v3 architecture, the 671b MOE.Micah [00:26:19]: And we were very, very impressed. That was the moment where we were sure that DeepSeek was no longer just one of many players, but had jumped up to be a thing. The world really noticed when they followed that up with the RL working on top of v3 and R1 succeeding a few weeks later. But the groundwork for that absolutely was laid with just extremely strong base model, completely open weights that we had as the best open weights model. So, yeah, that's the thing that you really see in the game. But I think that we got a lot of good feedback on Boxing Day. us on Boxing Day last year.George [00:26:48]: Boxing Day is the day after Christmas for those not familiar.George [00:26:54]: I'm from Singapore.swyx [00:26:55]: A lot of us remember Boxing Day for a different reason, for the tsunami that happened. Oh, of course. Yeah, but that was a long time ago. So yeah. So this is the rough pitch of AAQI. Is it A-A-Q-I or A-A-I-I? I-I. Okay. Good memory, though.Micah [00:27:11]: I don't know. I'm not used to it. Once upon a time, we did call it Quality Index, and we would talk about quality, performance, and price, but we changed it to intelligence.George [00:27:20]: There's been a few naming changes. We added hardware benchmarking to the site, and so benchmarks at a kind of system level. And so then we changed our throughput metric to, we now call it output speed, and thenswyx [00:27:32]: throughput makes sense at a system level, so we took that name. Take me through more charts. What should people know? Obviously, the way you look at the site is probably different than how a beginner might look at it.Micah [00:27:42]: Yeah, that's fair. There's a lot of fun stuff to dive into. Maybe so we can hit past all the, like, we have lots and lots of emails and stuff. The interesting ones to talk about today that would be great to bring up are a few of our recent things, I think, that probably not many people will be familiar with yet. So first one of those is our omniscience index. So this one is a little bit different to most of the intelligence evils that we've run. We built it specifically to look at the embedded knowledge in the models and to test hallucination by looking at when the model doesn't know the answer, so not able to get it correct, what's its probability of saying, I don't know, or giving an incorrect answer. So the metric that we use for omniscience goes from negative 100 to positive 100. Because we're simply taking off a point if you give an incorrect answer to the question. We're pretty convinced that this is an example of where it makes most sense to do that, because it's strictly more helpful to say, I don't know, instead of giving a wrong answer to factual knowledge question. And one of our goals is to shift the incentive that evils create for models and the labs creating them to get higher scores. And almost every evil across all of AI up until this point, it's been graded by simple percentage correct as the main metric, the main thing that gets hyped. And so you should take a shot at everything. There's no incentive to say, I don't know. So we did that for this one here.swyx [00:29:22]: I think there's a general field of calibration as well, like the confidence in your answer versus the rightness of the answer. Yeah, we completely agree. Yeah. Yeah.George [00:29:31]: On that. And one reason that we didn't do that is because. Or put that into this index is that we think that the, the way to do that is not to ask the models how confident they are.swyx [00:29:43]: I don't know. Maybe it might be though. You put it like a JSON field, say, say confidence and maybe it spits out something. Yeah. You know, we have done a few evils podcasts over the, over the years. And when we did one with Clementine of hugging face, who maintains the open source leaderboard, and this was one of her top requests, which is some kind of hallucination slash lack of confidence calibration thing. And so, Hey, this is one of them.Micah [00:30:05]: And I mean, like anything that we do, it's not a perfect metric or the whole story of everything that you think about as hallucination. But yeah, it's pretty useful and has some interesting results. Like one of the things that we saw in the hallucination rate is that anthropics Claude models at the, the, the very left-hand side here with the lowest hallucination rates out of the models that we've evaluated amnesty is on. That is an interesting fact. I think it probably correlates with a lot of the previously, not really measured vibes stuff that people like about some of the Claude models. Is the dataset public or what's is it, is there a held out set? There's a hell of a set for this one. So we, we have published a public test set, but we we've only published 10% of it. The reason is that for this one here specifically, it would be very, very easy to like have data contamination because it is just factual knowledge questions. We would. We'll update it at a time to also prevent that, but with yeah, kept most of it held out so that we can keep it reliable for a long time. It leads us to a bunch of really cool things, including breakdown quite granularly by topic. And so we've got some of that disclosed on the website publicly right now, and there's lots more coming in terms of our ability to break out very specific topics. Yeah.swyx [00:31:23]: I would be interested. Let's, let's dwell a little bit on this hallucination one. I noticed that Haiku hallucinates less than Sonnet hallucinates less than Opus. And yeah. Would that be the other way around in a normal capability environments? I don't know. What's, what do you make of that?George [00:31:37]: One interesting aspect is that we've found that there's not really a, not a strong correlation between intelligence and hallucination, right? That's to say that the smarter the models are in a general sense, isn't correlated with their ability to, when they don't know something, say that they don't know. It's interesting that Gemini three pro preview was a big leap over here. Gemini 2.5. Flash and, and, and 2.5 pro, but, and if I add pro quickly here.swyx [00:32:07]: I bet pro's really good. Uh, actually no, I meant, I meant, uh, the GPT pros.George [00:32:12]: Oh yeah.swyx [00:32:13]: Cause GPT pros are rumored. We don't know for a fact that it's like eight runs and then with the LM judge on top. Yeah.George [00:32:20]: So we saw a big jump in, this is accuracy. So this is just percent that they get, uh, correct and Gemini three pro knew a lot more than the other models. And so big jump in accuracy. But relatively no change between the Google Gemini models, between releases. And the hallucination rate. Exactly. And so it's likely due to just kind of different post-training recipe, between the, the Claude models. Yeah.Micah [00:32:45]: Um, there's, there's driven this. Yeah. You can, uh, you can partially blame us and how we define intelligence having until now not defined hallucination as a negative in the way that we think about intelligence.swyx [00:32:56]: And so that's what we're changing. Uh, I know many smart people who are confidently incorrect.George [00:33:02]: Uh, look, look at that. That, that, that is very humans. Very true. And there's times and a place for that. I think our view is that hallucination rate makes sense in this context where it's around knowledge, but in many cases, people want the models to hallucinate, to have a go. Often that's the case in coding or when you're trying to generate newer ideas. One eval that we added to artificial analysis is, is, is critical point and it's really hard, uh, physics problems. Okay.swyx [00:33:32]: And is it sort of like a human eval type or something different or like a frontier math type?George [00:33:37]: It's not dissimilar to frontier frontier math. So these are kind of research questions that kind of academics in the physics physics world would be able to answer, but models really struggled to answer. So the top score here is not 9%.swyx [00:33:51]: And when the people that, that created this like Minway and, and, and actually off via who was kind of behind sweep and what organization is this? Oh, is this, it's Princeton.George [00:34:01]: Kind of range of academics from, from, uh, different academic institutions, really smart people. They talked about how they turn the models up in terms of the temperature as high temperature as they can, where they're trying to explore kind of new ideas in physics as a, as a thought partner, just because they, they want the models to hallucinate. Um, yeah, sometimes it's something new. Yeah, exactly.swyx [00:34:21]: Um, so not right in every situation, but, um, I think it makes sense, you know, to test hallucination in scenarios where it makes sense. Also, the obvious question is, uh, this is one of. Many that there is there, every lab has a system card that shows some kind of hallucination number, and you've chosen to not, uh, endorse that and you've made your own. And I think that's a, that's a choice. Um, totally in some sense, the rest of artificial analysis is public benchmarks that other people can independently rerun. You provide it as a service here. You have to fight the, well, who are we to, to like do this? And your, your answer is that we have a lot of customers and, you know, but like, I guess, how do you converge the individual?Micah [00:35:08]: I mean, I think, I think for hallucinations specifically, there are a bunch of different things that you might care about reasonably, and that you'd measure quite differently, like we've called this a amnesty and solutionation rate, not trying to declare the, like, it's humanity's last hallucination. You could, uh, you could have some interesting naming conventions and all this stuff. Um, the biggest picture answer to that. It's something that I actually wanted to mention. Just as George was explaining, critical point as well is, so as we go forward, we are building evals internally. We're partnering with academia and partnering with AI companies to build great evals. We have pretty strong views on, in various ways for different parts of the AI stack, where there are things that are not being measured well, or things that developers care about that should be measured more and better. And we intend to be doing that. We're not obsessed necessarily with that. Everything we do, we have to do entirely within our own team. Critical point. As a cool example of where we were a launch partner for it, working with academia, we've got some partnerships coming up with a couple of leading companies. Those ones, obviously we have to be careful with on some of the independent stuff, but with the right disclosure, like we're completely comfortable with that. A lot of the labs have released great data sets in the past that we've used to great success independently. And so it's between all of those techniques, we're going to be releasing more stuff in the future. Cool.swyx [00:36:26]: Let's cover the last couple. And then we'll, I want to talk about your trends analysis stuff, you know? Totally.Micah [00:36:31]: So that actually, I have one like little factoid on omniscience. If you go back up to accuracy on omniscience, an interesting thing about this accuracy metric is that it tracks more closely than anything else that we measure. The total parameter count of models makes a lot of sense intuitively, right? Because this is a knowledge eval. This is the pure knowledge metric. We're not looking at the index and the hallucination rate stuff that we think is much more about how the models are trained. This is just what facts did they recall? And yeah, it tracks parameter count extremely closely. Okay.swyx [00:37:05]: What's the rumored size of GPT-3 Pro? And to be clear, not confirmed for any official source, just rumors. But rumors do fly around. Rumors. I get, I hear all sorts of numbers. I don't know what to trust.Micah [00:37:17]: So if you, if you draw the line on omniscience accuracy versus total parameters, we've got all the open ways models, you can squint and see that likely the leading frontier models right now are quite a lot bigger than the ones that we're seeing right now. And the one trillion parameters that the open weights models cap out at, and the ones that we're looking at here, there's an interesting extra data point that Elon Musk revealed recently about XAI that for three trillion parameters for GROK 3 and 4, 6 trillion for GROK 5, but that's not out yet. Take those together, have a look. You might reasonably form a view that there's a pretty good chance that Gemini 3 Pro is bigger than that, that it could be in the 5 to 10 trillion parameters. To be clear, I have absolutely no idea, but just based on this chart, like that's where you would, you would land if you have a look at it. Yeah.swyx [00:38:07]: And to some extent, I actually kind of discourage people from guessing too much because what does it really matter? Like as long as they can serve it as a sustainable cost, that's about it. Like, yeah, totally.George [00:38:17]: They've also got different incentives in play compared to like open weights models who are thinking to supporting others in self-deployment for the labs who are doing inference at scale. It's I think less about total parameters in many cases. When thinking about inference costs and more around number of active parameters. And so there's a bit of an incentive towards larger sparser models. Agreed.Micah [00:38:38]: Understood. Yeah. Great. I mean, obviously if you're a developer or company using these things, not exactly as you say, it doesn't matter. You should be looking at all the different ways that we measure intelligence. You should be looking at cost to run index number and the different ways of thinking about token efficiency and cost efficiency based on the list prices, because that's all it matters.swyx [00:38:56]: It's not as good for the content creator rumor mill where I can say. Oh, GPT-4 is this small circle. Look at GPT-5 is this big circle. And then there used to be a thing for a while. Yeah.Micah [00:39:07]: But that is like on its own, actually a very interesting one, right? That is it just purely that chances are the last couple of years haven't seen a dramatic scaling up in the total size of these models. And so there's a lot of room to go up properly in total size of the models, especially with the upcoming hardware generations. Yes.swyx [00:39:29]: So, you know. Taking off my shitposting face for a minute. Yes. Yes. At the same time, I do feel like, you know, especially coming back from Europe, people do feel like Ilya is probably right that the paradigm is doesn't have many more orders of magnitude to scale out more. And therefore we need to start exploring at least a different path. GDPVal, I think it's like only like a month or so old. I was also very positive when it first came out. I actually talked to Tejo, who was the lead researcher on that. Oh, cool. And you have your own version.George [00:39:59]: It's a fantastic. It's a fantastic data set. Yeah.swyx [00:40:01]: And maybe it will recap for people who are still out of it. It's like 44 tasks based on some kind of GDP cutoff that's like meant to represent broad white collar work that is not just coding. Yeah.Micah [00:40:12]: Each of the tasks have a whole bunch of detailed instructions, some input files for a lot of them. It's within the 44 is divided into like two hundred and twenty two to five, maybe subtasks that are the level of that we run through the agenda. And yeah, they're really interesting. I will say that it doesn't. It doesn't necessarily capture like all the stuff that people do at work. No avail is perfect is always going to be more things to look at, largely because in order to make the tasks well enough to find that you can run them, they need to only have a handful of input files and very specific instructions for that task. And so I think the easiest way to think about them are that they're like quite hard take home exam tasks that you might do in an interview process.swyx [00:40:56]: Yeah, for listeners, it is not no longer like a long prompt. It is like, well, here's a zip file with like a spreadsheet or a PowerPoint deck or a PDF and go nuts and answer this question.George [00:41:06]: OpenAI released a great data set and they released a good paper which looks at performance across the different web chat bots on the data set. It's a great paper, encourage people to read it. What we've done is taken that data set and turned it into an eval that can be run on any model. So we created a reference agentic harness that can run. Run the models on the data set, and then we developed evaluator approach to compare outputs. That's kind of AI enabled, so it uses Gemini 3 Pro Preview to compare results, which we tested pretty comprehensively to ensure that it's aligned to human preferences. One data point there is that even as an evaluator, Gemini 3 Pro, interestingly, doesn't do actually that well. So that's kind of a good example of what we've done in GDPVal AA.swyx [00:42:01]: Yeah, the thing that you have to watch out for with LLM judge is self-preference that models usually prefer their own output, and in this case, it was not. Totally.Micah [00:42:08]: I think the way that we're thinking about the places where it makes sense to use an LLM as judge approach now, like quite different to some of the early LLM as judge stuff a couple of years ago, because some of that and MTV was a great project that was a good example of some of this a while ago was about judging conversations and like a lot of style type stuff. Here, we've got the task that the grader and grading model is doing is quite different to the task of taking the test. When you're taking the test, you've got all of the agentic tools you're working with, the code interpreter and web search, the file system to go through many, many turns to try to create the documents. Then on the other side, when we're grading it, we're running it through a pipeline to extract visual and text versions of the files and be able to provide that to Gemini, and we're providing the criteria for the task and getting it to pick which one more effectively meets the criteria of the task. Yeah. So we've got the task out of two potential outcomes. It turns out that we proved that it's just very, very good at getting that right, matched with human preference a lot of the time, because I think it's got the raw intelligence, but it's combined with the correct representation of the outputs, the fact that the outputs were created with an agentic task that is quite different to the way the grading model works, and we're comparing it against criteria, not just kind of zero shot trying to ask the model to pick which one is better.swyx [00:43:26]: Got it. Why is this an ELO? And not a percentage, like GDP-VAL?George [00:43:31]: So the outputs look like documents, and there's video outputs or audio outputs from some of the tasks. It has to make a video? Yeah, for some of the tasks. Some of the tasks.swyx [00:43:43]: What task is that?George [00:43:45]: I mean, it's in the data set. Like be a YouTuber? It's a marketing video.Micah [00:43:49]: Oh, wow. What? Like model has to go find clips on the internet and try to put it together. The models are not that good at doing that one, for now, to be clear. It's pretty hard to do that with a code editor. I mean, the computer stuff doesn't work quite well enough and so on and so on, but yeah.George [00:44:02]: And so there's no kind of ground truth, necessarily, to compare against, to work out percentage correct. It's hard to come up with correct or incorrect there. And so it's on a relative basis. And so we use an ELO approach to compare outputs from each of the models between the task.swyx [00:44:23]: You know what you should do? You should pay a contractor, a human, to do the same task. And then give it an ELO and then so you have, you have human there. It's just, I think what's helpful about GDPVal, the OpenAI one, is that 50% is meant to be normal human and maybe Domain Expert is higher than that, but 50% was the bar for like, well, if you've crossed 50, you are superhuman. Yeah.Micah [00:44:47]: So we like, haven't grounded this score in that exactly. I agree that it can be helpful, but we wanted to generalize this to a very large number. It's one of the reasons that presenting it as ELO is quite helpful and allows us to add models and it'll stay relevant for quite a long time. I also think it, it can be tricky looking at these exact tasks compared to the human performance, because the way that you would go about it as a human is quite different to how the models would go about it. Yeah.swyx [00:45:15]: I also liked that you included Lama 4 Maverick in there. Is that like just one last, like...Micah [00:45:20]: Well, no, no, no, no, no, no, it is the, it is the best model released by Meta. And... So it makes it into the homepage default set, still for now.George [00:45:31]: Other inclusion that's quite interesting is we also ran it across the latest versions of the web chatbots. And so we have...swyx [00:45:39]: Oh, that's right.George [00:45:40]: Oh, sorry.swyx [00:45:41]: I, yeah, I completely missed that. Okay.George [00:45:43]: No, not at all. So that, which has a checkered pattern. So that is their harness, not yours, is what you're saying. Exactly. And what's really interesting is that if you compare, for instance, Claude 4.5 Opus using the Claude web chatbot, it performs worse than the model in our agentic harness. And so in every case, the model performs better in our agentic harness than its web chatbot counterpart, the harness that they created.swyx [00:46:13]: Oh, my backwards explanation for that would be that, well, it's meant for consumer use cases and here you're pushing it for something.Micah [00:46:19]: The constraints are different and the amount of freedom that you can give the model is different. Also, you like have a cost goal. We let the models work as long as they want, basically. Yeah. Do you copy paste manually into the chatbot? Yeah. Yeah. That's, that was how we got the chatbot reference. We're not going to be keeping those updated at like quite the same scale as hundreds of models.swyx [00:46:38]: Well, so I don't know, talk to a browser base. They'll, they'll automate it for you. You know, like I have thought about like, well, we should turn these chatbot versions into an API because they are legitimately different agents in themselves. Yes. Right. Yeah.Micah [00:46:53]: And that's grown a huge amount of the last year, right? Like the tools. The tools that are available have actually diverged in my opinion, a fair bit across the major chatbot apps and the amount of data sources that you can connect them to have gone up a lot, meaning that your experience and the way you're using the model is more different than ever.swyx [00:47:10]: What tools and what data connections come to mind when you say what's interesting, what's notable work that people have done?Micah [00:47:15]: Oh, okay. So my favorite example on this is that until very recently, I would argue that it was basically impossible to get an LLM to draft an email for me in any useful way. Because most times that you're sending an email, you're not just writing something for the sake of writing it. Chances are context required is a whole bunch of historical emails. Maybe it's notes that you've made, maybe it's meeting notes, maybe it's, um, pulling something from your, um, any of like wherever you at work store stuff. So for me, like Google drive, one drive, um, in our super base databases, if we need to do some analysis or some data or something, preferably model can be plugged into all of those things and can go do some useful work based on it. The things that like I find most impressive currently that I am somewhat surprised work really well in late 2025, uh, that I can have models use super base MCP to query read only, of course, run a whole bunch of SQL queries to do pretty significant data analysis. And. And make charts and stuff and can read my Gmail and my notion. And okay. You actually use that. That's good. That's, that's, that's good. Is that a cloud thing? To various degrees of order, but chat GPD and Claude right now, I would say that this stuff like barely works in fairness right now. Like.George [00:48:33]: Because people are actually going to try this after they hear it. If you get an email from Micah, odds are it wasn't written by a chatbot.Micah [00:48:38]: So, yeah, I think it is true that I have never actually sent anyone an email drafted by a chatbot. Yet.swyx [00:48:46]: Um, and so you can, you can feel it right. And yeah, this time, this time next year, we'll come back and see where it's going. Totally. Um, super base shout out another famous Kiwi. Uh, I don't know if you've, you've any conversations with him about anything in particular on AI building and AI infra.George [00:49:03]: We have had, uh, Twitter DMS, um, with, with him because we're quite big, uh, super base users and power users. And we probably do some things more manually than we should in. In, in super base support line because you're, you're a little bit being super friendly. One extra, um, point regarding, um, GDP Val AA is that on the basis of the overperformance of the models compared to the chatbots turns out, we realized that, oh, like our reference harness that we built actually white works quite well on like gen generalist agentic tasks. This proves it in a sense. And so the agent harness is very. Minimalist. I think it follows some of the ideas that are in Claude code and we, all that we give it is context management capabilities, a web search, web browsing, uh, tool, uh, code execution, uh, environment. Anything else?Micah [00:50:02]: I mean, we can equip it with more tools, but like by default, yeah, that's it. We, we, we give it for GDP, a tool to, uh, view an image specifically, um, because the models, you know, can just use a terminal to pull stuff in text form into context. But to pull visual stuff into context, we had to give them a custom tool, but yeah, exactly. Um, you, you can explain an expert. No.George [00:50:21]: So it's, it, we turned out that we created a good generalist agentic harness. And so we, um, released that on, on GitHub yesterday. It's called stirrup. So if people want to check it out and, and it's a great, um, you know, base for, you know, generalist, uh, building a generalist agent for more specific tasks.Micah [00:50:39]: I'd say the best way to use it is get clone and then have your favorite coding. Agent make changes to it, to do whatever you want, because it's not that many lines of code and the coding agents can work with it. Super well.swyx [00:50:51]: Well, that's nice for the community to explore and share and hack on it. I think maybe in, in, in other similar environments, the terminal bench guys have done, uh, sort of the Harbor. Uh, and so it's, it's a, it's a bundle of, well, we need our minimal harness, which for them is terminus and we also need the RL environments or Docker deployment thing to, to run independently. So I don't know if you've looked at it. I don't know if you've looked at the harbor at all, is that, is that like a, a standard that people want to adopt?George [00:51:19]: Yeah, we've looked at it from a evals perspective and we love terminal bench and, and host benchmarks of, of, of terminal mention on artificial analysis. Um, we've looked at it from a, from a coding agent perspective, but could see it being a great, um, basis for any kind of agents. I think where we're getting to is that these models have gotten smart enough. They've gotten better, better tools that they can perform better when just given a minimalist. Set of tools and, and let them run, let the model control the, the agentic workflow rather than using another framework that's a bit more built out that tries to dictate the, dictate the flow. Awesome.swyx [00:51:56]: Let's cover the openness index and then let's go into the report stuff. Uh, so that's the, that's the last of the proprietary art numbers, I guess. I don't know how you sort of classify all these. Yeah.Micah [00:52:07]: Or call it, call it, let's call it the last of like the, the three new things that we're talking about from like the last few weeks. Um, cause I mean, there's a, we do a mix of stuff that. Where we're using open source, where we open source and what we do and, um, proprietary stuff that we don't always open source, like long context reasoning data set last year, we did open source. Um, and then all of the work on performance benchmarks across the site, some of them, we looking to open source, but some of them, like we're constantly iterating on and so on and so on and so on. So there's a huge mix, I would say, just of like stuff that is open source and not across the side. So that's a LCR for people. Yeah, yeah, yeah, yeah.swyx [00:52:41]: Uh, but let's, let's, let's talk about open.Micah [00:52:42]: Let's talk about openness index. This. Here is call it like a new way to think about how open models are. We, for a long time, have tracked where the models are open weights and what the licenses on them are. And that's like pretty useful. That tells you what you're allowed to do with the weights of a model, but there is this whole other dimension to how open models are. That is pretty important that we haven't tracked until now. And that's how much is disclosed about how it was made. So transparency about data, pre-training data and post-training data. And whether you're allowed to use that data and transparency about methodology and training code. So basically, those are the components. We bring them together to score an openness index for models so that you can in one place get this full picture of how open models are.swyx [00:53:32]: I feel like I've seen a couple other people try to do this, but they're not maintained. I do think this does matter. I don't know what the numbers mean apart from is there a max number? Is this out of 20?George [00:53:44]: It's out of 18 currently, and so we've got an openness index page, but essentially these are points, you get points for being more open across these different categories and the maximum you can achieve is 18. So AI2 with their extremely open OMO3 32B think model is the leader in a sense.swyx [00:54:04]: It's hooking face.George [00:54:05]: Oh, with their smaller model. It's coming soon. I think we need to run, we need to get the intelligence benchmarks right to get it on the site.swyx [00:54:12]: You can't have it open in the next. We can not include hooking face. We love hooking face. We'll have that, we'll have that up very soon. I mean, you know, the refined web and all that stuff. It's, it's amazing. Or is it called fine web? Fine web. Fine web.Micah [00:54:23]: Yeah, yeah, no, totally. Yep. One of the reasons this is cool, right, is that if you're trying to understand the holistic picture of the models and what you can do with all the stuff the company's contributing, this gives you that picture. And so we are going to keep it up to date alongside all the models that we do intelligence index on, on the site. And it's just an extra view to understand.swyx [00:54:43]: Can you scroll down to this? The, the, the, the trade-offs chart. Yeah, yeah. That one. Yeah. This, this really matters, right? Obviously, because you can b
Published as a 47-page pamphlet in colonial America on January 10, 1776, Common Sense challenged the authority of the British government and the royal monarchy. The elegantly plain and persuasive language that Thomas Paine used touched the hearts and minds of the average American and was the first work to openly ask for political freedom and independence from Great Britain. Paine’s powerful words came to symbolize the spirit of the Revolution itself. General George Washington had it read to his troops. Codex 4.1 Common Sense by Thomas Paine (read by Walter Dixon) at https://amzn.to/3MHAIYr Common Sense by Thomas Paine (book) available at https://amzn.to/3MKX77b Writings of Thomas Paine available at https://amzn.to/3MCaFC2 Books about Thomas Paine available at https://amzn.to/4s3qxOg ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast is available at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio Credit: Common Sense—The Origin and Design of Government by Thomas Paine, audio recording read by Walter Dixon (Public Domain 2011 Gildan Media). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
HAPPY NEW YEAR!!Wrapping up 2025 with a conversation I shared with Rye, host of Codega's Codex of Curiosities! I hope 2026 is a blessing for you, God Bless!From the Show: In this episode of Codega's Codex of Curiosities, Rye the Codega sits down with Bo Kennedy, host of The BUMP Podcast, for a deep dive into one of the oldest and darkest figures in ancient lore: Lilith. But this isn't just another retelling of a myth — Bo takes us down the rabbit hole where biblical history, occult traditions, Nephilim lore, and the Sasquatch mystery collide. We explore why some researchers believe Lilith wasn't just a mythological rebel, but a real entity intertwined with the Watchers, the Nephilim, and the creation of the Elioud — and how these hybrid bloodlines may echo through modern sightings of Sasquatch and other hair-covered giants. Is Sasquatch a remnant of the Elioud?Is Lilith truly the mother of monsters?And why do ancient texts, hidden traditions, and modern encounters all point back to her? Bo and Rye unravel:Lilith's ancient origins — from early Mesopotamian demonology to Hebrew lore.The role of the Watchers and how their unions produced the Nephilim and later the Elioud.The Elioud–Sasquatch theory — why some believe Sasquatch may be a surviving branch of post-Flood hybrid giants.How Lilith connects to child-stealing legends, shapeshifters, night-demons, and modern paranormal encounters.Why patterns in cryptid sightings, ancient texts, and forbidden history might hint at a hidden lineage that started with her.This episode blends biblical mystery, cryptid lore, demonology, and high strangeness into one wild, fascinating journey. If you love exploring the shadows where myth and monster meet, this conversation is for you.Have an experience that you'd like to share?Holler at me: thebumppodcast@gmail.comFeel led to donate to The BUMP Podcast?Check out www.buymeacoffee.com/thebumppodcastWant to be better prepared for whatever life throws at you?Check out www.squatchsurvivalgear.comUse Promo Code: 25bump to save 15% SITE WIDEPick up a copy of my book!https://a.co/d/0S3HttW"Oh, My Soul" Written and Performed by Ray Messer Jr.
From pre-training data curation to shipping GPT-4o, o1, o3, and now GPT-5 thinking and the shopping model, Josh McGrath has lived through the full arc of OpenAI's post-training evolution—from the PPO vs DPO debates of 2023 to today's RLVR era, where the real innovation isn't optimization methods but data quality, signal trust, and token efficiency. We sat down with Josh at NeurIPS 2025 to dig into the state of post-training heading into 2026: why RLHF and RLVR are both just policy gradient methods (the difference is the input data, not the math), how GRPO from DeepSeek Math was underappreciated as a shift toward more trustworthy reward signals (math answers you can verify vs. human preference you can't), why token efficiency matters more than wall-clock time (GPT-5 to 5.1 bumped evals and slashed tokens), how Codex has changed his workflow so much he feels "trapped" by 40-minute design sessions followed by 15-minute agent sprints, the infrastructure chaos of scaling RL ("way more moving parts than pre-training"), why long context will keep climbing but agents + graph walks might matter more than 10M-token windows, the shopping model as a test bed for interruptability and chain-of-thought transparency, why personality toggles (Anton vs Clippy) are a real differentiator users care about, and his thesis that the education system isn't producing enough people who can do both distributed systems and ML research—the exact skill set required to push the frontier when the bottleneck moves every few weeks. We discuss: Josh's path: pre-training data curation → post-training researcher at OpenAI, shipping GPT-4o, o1, o3, GPT-5 thinking, and the shopping model Why he switched from pre-training to post-training: "Do I want to make 3% compute efficiency wins, or change behavior by 40%?" The RL infrastructure challenge: way more moving parts than pre-training (tasks, grading setups, external partners), and why babysitting runs at 12:30am means jumping into unfamiliar code constantly How Codex has changed his workflow: 40-minute design sessions compressed into 15-minute agent sprints, and the strange "trapped" feeling of waiting for the agent to finish The RLHF vs RLVR debate: both are policy gradient methods, the real difference is data quality and signal trust (human preference vs. verifiable correctness) Why GRPO (from DeepSeek Math) was underappreciated: not just an optimization trick, but a shift toward reward signals you can actually trust (math answers over human vibes) The token efficiency revolution: GPT-5 to 5.1 bumped evals and slashed tokens, and why thinking in tokens (not wall-clock time) unlocks better tool-calling and agent workflows Personality toggles: Anton (tool, no warmth) vs Clippy (friendly, helpful), and why Josh uses custom instructions to make his model "just a tool" The router problem: having a router at the top (GPT-5 thinking vs non-thinking) and an implicit router (thinking effort slider) creates weird bumps, and why the abstractions will eventually merge Long context: climbing Graph Blocks evals, the dream of 10M+ token windows, and why agents + graph walks might matter more than raw context length Why the education system isn't producing enough people who can do both distributed systems and ML research, and why that's the bottleneck for frontier labs The 2026 vision: neither pre-training nor post-training is dead, we're in the fog of war, and the bottleneck will keep moving (so emotional stability helps) — Josh McGrath OpenAI: https://openai.com https://x.com/j_mcgraph Chapters 00:00:00 Introduction: Josh McGrath on Post-Training at OpenAI 00:04:37 The Shopping Model: Black Friday Launch and Interruptability 00:07:11 Model Personality and the Anton vs Clippy Divide 00:08:26 Beyond PPO vs DPO: The Data Quality Spectrum in RL 00:01:40 Infrastructure Challenges: Why Post-Training RL is Harder Than Pre-Training 00:13:12 Token Efficiency: The 2D Plot That Matters Most 00:03:45 Codex Max and the Flow Problem: 40 Minutes of Planning, 15 Minutes of Waiting 00:17:29 Long Context and Graph Blocks: Climbing Toward Perfect Context 00:21:23 The ML-Systems Hybrid: What's Hard to Hire For 00:24:50 Pre-Training Isn't Dead: Living Through Technological Revolution
Agradece a este podcast tantas horas de entretenimiento y disfruta de episodios exclusivos como éste. ¡Apóyale en iVoox! un programa para escuchar después de las uvas. EXTRACODEX Campanadas, al 26 con Kat Especial fin e inicio de año. Una charla con Kat para hablar de cine, misterio y metapodcast EXTRACODEX Campanadas, al 26 con Kat Escucha este episodio completo y accede a todo el contenido exclusivo de CODEX... más allá del misterio PODCAST. Descubre antes que nadie los nuevos episodios, y participa en la comunidad exclusiva de oyentes en https://go.ivoox.com/sq/130420
Con el guía local Diego Moreno Galilea descubrimos la historia de este monasterio en el que se crearon obras como el Codex Vigilanus desde sus orígenes hasta su final
Sign up for our Patreon go to-> Patreon.com/cultofconspiracypodcastMeta Mysteries Podcast---> https://open.spotify.com/show/6IshwF6qc2iuqz3WTPz9Wv?si=3a32c8f730b34e79Cajun Knight Youtube Channel---> https://www.youtube.com/@CajunknightTo Sign up for our Rokfin go to --> Rokfin.com/cultofconspiracyTo get 20% OFF GoodFeels THC Selzter----> shop.getgoodfeels.comBecome a supporter of this podcast: https://www.spreaker.com/podcast/cult-of-conspiracy--5700337/support.
Codex History of Video Games with Mike Coletta and Tyler Ostby - Podaholics
Mike and Tyler continue to talk about the history of the Wii including sales figures and lawsuits. They also go through the Wii games they remember most and talk about the fun memories they had with the console. The theme music is by RoccoW. The logo was created by Dani Dodge.
From the frontlines of OpenAI's Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don't just autocomplete, they architect, refactor, and ship entire features while you sleep. We caught up with them at AI Engineer Conference right after the launch of Codex Max, OpenAI's newest long-running coding agent designed to work for 24+ hours straight, manage its own context, and spawn sub-agents to parallelize work across your entire codebase. We sat down with Bryan and Bill to dig into what it actually takes to train a model that developers trust—why personality, communication, and planning matter as much as raw capability, how Codex is trained with strong opinions about tools (it loves rg over grep, seriously), why the abstraction layer is moving from models to full-stack agents you can plug into VS Code or Zed, how OpenAI partners co-develop tool integrations and discover unexpected model habits (like renaming tools to match Codex's internal training), the rise of applied evals that measure real-world impact instead of academic benchmarks, why multi-turn evals are the next frontier (and Bryan's "job interview eval" idea), how coding agents are breaking out of code into personal automation, terminal workflows, and computer use, and their 2026 vision: coding agents trusted enough to handle the hardest refactors at any company, not just top-tier firms, and general enough to build integrations, organize your desktop, and unlock capabilities you'd never get access to otherwise. We discuss: What Codex Max is: a long-running coding agent that can work 24+ hours, manage its own context window, and spawn sub-agents for parallel work Why the name "Max": maximalist, maximization, speed and endurance—it's simply better and faster for the same problems Training for personality: communication, planning, context gathering, and checking your work as behavioral characteristics, not just capabilities How Codex develops habits like preferring rg over grep, and why renaming tools to match its training (e.g., terminal-style naming) dramatically improves tool-call performance The split between Codex (opinionated, agent-focused, optimized for the Codex harness) and GPT-5 (general, more durable across different tools and modalities) Why the abstraction layer is moving up: from prompting models to plugging in full agents (Codex, GitHub Copilot, Zed) that package the entire stack The rise of sub-agents and agents-using-agents: Codex Max spawning its own instances, handing off context, and parallelizing work across a codebase How OpenAI works with coding partners on the bleeding edge to co-develop tool integrations and discover what the model is actually good at The shift to applied evals: capturing real-world use cases instead of academic benchmarks, and why ~50% of OpenAI employees now use Codex daily Why multi-turn evals are the next frontier: LM-as-a-judge for entire trajectories, Bryan's "job interview eval" concept, and the need for a batch multi-turn eval API How coding agents are breaking out of code: personal automation, organizing desktops, terminal workflows, and "Devin for non-coding" use cases Why Slack is the ultimate UI for work, and how coding agents can become your personal automation layer for email, files, and everything in between The 2026 vision: more computer use, more trust, and coding agents capable enough that any company can access top-tier developer capabilities, not just elite firms — Bryan & Bill (OpenAI Codex Team) http://x.com/bfioca https://x.com/realchillben OpenAI Codex: https://openai.com/index/openai-codex/ Where to find Latent Space X: https://x.com/latentspacepod Substack: https://www.latent.space/ Chapters 00:00:00 Introduction: Latent Space Listeners at AI Engineer Code 00:01:27 Codex Max Launch: Training for Long-Running Coding Agents 00:03:01 Model Personality and Trust: Communication, Planning, and Self-Checking 00:05:20 Codex vs GPT-5: Opinionated Agents vs General Models 00:07:47 Tool Use and Model Habits: The Ripgrep Discovery 00:09:16 Personality Design: Verbosity vs Efficiency in Coding Agents 00:11:56 The Agent Abstraction Layer: Building on Top of Codex 00:14:08 Sub-Agents and Multi-Agent Patterns: The Future of Composition 00:16:11 Trust and Adoption: OpenAI Developers Using Codex Daily 00:17:21 Applied Evals: Real-World Testing vs Academic Benchmarks 00:19:15 Multi-Turn Evals and the Job Interview Pattern 00:21:35 Feature Request: Batch Multi-Turn Eval API 00:22:28 Beyond Code: Personal Automation and Computer Use 00:24:51 Vision-Native Agents and the UI Integration Challenge 00:25:02 2026 Predictions: Trust, Computer Use, and Democratized Excellence
Our 229th episode with a summary and discussion of last week's big AI news!Recorded on 12/19/2025Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:Notable releases include OpenAI's GPT-5.2 Codex for advanced coding and Google's Gemini Free Flash for competitive AI application performance. Nvidia's new open-source Trion-3 models also showcase impressive benchmarks.Funding updates highlight Lovable's $330M Series B, valuing the AI coding startup at $6.6B, and Faya's $140M Series D for AI model hosting, valued at $4.5B.China makes significant strides in semiconductor technology with advances in EUV lithography machines, led by Huawei and SMIC, potentially disrupting global chip manufacturing dominance.Key safety and policy updates include OpenAI's GPT-5.2 system card focusing on biosecurity and cybersecurity risks, while Google partners with the US military to power a new AI platform with Gemini models.Timestamps:(00:00:10) Intro / Banter(00:02:09) News PreviewTools & Apps(00:02:56) Google launches Gemini 3 Flash, makes it the default model in the Gemini app | TechCrunch(00:10:13) ChatGPT launches an app store, lets developers know it's open for business | TechCrunch(00:13:35) Introducing GPT-5.2-Codex | OpenAI(00:19:23) Story about OpenAI release - GPT image 1.5(00:22:27) Meta partners with ElevenLabs to power AI audio across Instagram, Horizon - The Economic TimesApplications & Business(00:23:16) OpenAI to End Equity Vesting Period for Employees, WSJ Says(00:28:20) How China built its ‘Manhattan Project' to rival the West in AI chips(00:36:47) China's Huawei, SMIC Make Progress With Chips, Report Finds(00:41:03) OpenAI in Talks to Raise At Least $10 Billion From Amazon and Use Its AI Chips(00:43:32) Amazon has a new leader for its ‘AGI' group as it plays catch-up on AI | The Verge(00:47:27) Broadcom reveals its mystery $10 billion customer is Anthropic(00:49:12) Vibe-coding startup Lovable raises $330M at a $6.6B valuation | TechCrunch(00:50:38) Fal nabs $140M in fresh funding led by Sequoia, tripling valuation to $4.5B | TechCrunchProjects & Open Source(00:51:10) Nvidia Becomes a Major Model Maker With Nemotron 3 | WIRED(00:59:24) Meta introduces new SAM AI able to isolate and edit audio • The Register(00:59:54) [2512.14856] T5Gemma 2: Seeing, Reading, and Understanding Longer(01:03:10) Anthropic makes agent Skills an open standard - SiliconANGLEResearch & Advancements(01:03:47) Budget-Aware Tool-Use Enables Effective Agent Scaling(01:08:21) Rethinking Thinking Tokens: LLMs as Improvement Operators(01:10:50) What if AI capabilities suddenly accelerated in 2027? How would the world know?Policy & Safety(01:12:58) Update to GPdfT-5 System Card: GPT-5.2(01:18:04) Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors(01:20:47) Async Control: Stress-testing Asynchronous Control Measures for LLM Agents(01:24:37) Google is powering a new US military AI platform | The VergeSee Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
CODEX 12x172 Un peculiar programa navideño ivoox, tik tok, youtube y otras pltaformas feliz Navidad
Aquí os dejamos esta pequeña dramatización del clásico de Dickens, con todo nuestro cariño, Feliz Navidad
Early tales of the quaint and old traditions of celebrating Christmas that helped shape American Yuletide traditions, blending English customs with a new-world perspective. 'Old Christmas' by Washington Irving has the author traveling to the countryside and meeting an old schoolmate, who invites him home to spend Christmas at the family estate in this globally famous, truly iconic American Christmas story. Codex 3.1 A Cozy Christmas podcast available at https://amzn.to/48VUwPl Old Christmas by Washington Irving (audio) at https://amzn.to/4s3YfDo Old Christmas by Washington Irving (book) https://amzn.to/3MJgXzK Books by Washington Irving books at https://amzn.to/48ZJybG Washington Irving biography at https://amzn.to/4q2Dxla ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast is available at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credit: A Cozy Christmas Podcast "Old Christmas" by Washington Irving (Part 1 of 3, 13dec2024). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Early tales of the quaint and old traditions of celebrating Christmas that helped shape American Yuletide traditions, blending English customs with a new-world perspective. 'Old Christmas' by Washington Irving has the author traveling to the countryside and meeting an old schoolmate, who invites him home to spend Christmas at the family estate in this globally famous, truly iconic American Christmas story. Codex 3.2 A Cozy Christmas podcast available at https://amzn.to/48VUwPl Old Christmas by Washington Irving (audio) at https://amzn.to/4s3YfDo Old Christmas by Washington Irving (book) https://amzn.to/3MJgXzK Books by Washington Irving books at https://amzn.to/48ZJybG Washington Irving biography at https://amzn.to/4q2Dxla ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast is available at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credit: A Cozy Christmas Podcast "Old Christmas" by Washington Irving (Part 1 of 3, 13dec2024). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Today I break down a big news item I think is flying under the radar: OpenAI quietly launched Skills for Codex, and I explain what that means (and how it differs from sub-agents and MCPs). I then share a fast-moving trend I'm watching and why it's a strong wedge for a simple app. After that, I recommend the to-do app I've used for 14 years and give away a startup idea. I close with a practical 6-step framework for going from idea → viral validation → mobile app launch in 2026. Timestamps 00:00 – Intro: the new format (news, trend, app, startup idea, framework) 00:40 – AI New Item: OpenAI launches Skills for Codex 05:45 – Trend: Face Yoga 07:56 – App Recommendation: Things 09:33 – Startup Idea: Call-an-expert service for non-developers stuck at 80% done 14:44 – Framework: Viral Mobile App Framework Key Points OpenAI “Skills” make Codex/ChatGPT more reusable and consistent by packaging repeatable workflows. A “skill” is the recipe, a “sub-agent” is extra worker instances, and an “MCP” is the tool access plug. Face yoga is an emerging sub-niche with clear app potential (simple routines, monetization via paid or ads). Last 20 is a practical marketplace idea: pay for 15 minutes of expert unblock help to finish the last 20%. Viral validation favors apps that are visually obvious, explainable in three words, and tied to insecurity-driven outcomes. Numbered Section Summaries OpenAI Skills: The Quiet Upgrade I walk through OpenAI's launch of Skills for Codex—reusable bundles of instructions/scripts/resources that can be called directly or chosen automatically. I'm excited because this makes agent workflows more consistent and scalable across tasks. The Foundation: Skill vs Sub-Agent vs MCP I clarify the taxonomy: a skill is the written playbook, sub-agents are extra “worker” copies of the model that split a big job, and MCPs are what let the model access external systems like tickets or repos. This is the mental model I want everyone using going into 2026. The Trend: Face Yoga As An App Wedge I share a niche trend I'm seeing—face yoga—and why it's a product opportunity similar to how yoga apps became huge. I call out the obvious app angles: guided routines, jawline/face-slimming programs, and content-driven growth via short videos. The Tool: Things (My Simple Focus System) I recommend the Things to-do app because it's simple: “Today,” “Upcoming,” and “Someday,” without a monthly fee. I also note what's missing (I'd like more AI features), but it still wins for focus if you don't want a “kitchen sink” system. The Startup Idea: Last 20 (Phone-A-Friend For Vibe Coders) I give away the idea: builders get stuck at 80% after using Cursor/Replit/V0, so Last 20 matches them with someone who's solved that exact wall before. The product is a fast screen-share session—problem solved—priced per session or bundled for teams/agencies, with the marketplace taking a cut. The Distribution Framework: Viral Validation → Launch I share a 6-step process: warm up the account, design a visually obvious app, build a tiny MVP fast, post daily until something hits, build the community before the product, then launch with a hard paywall and keep content rolling. It's a simple playbook for getting to organic traction in 2026. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/
ChatGPT Images (aka OpenAI's Image 1.5) can create stunning AI images, text and more. But how does it compare to Google's Nanobanana Pro? We dive in… way too deep. Plus, new GPT-5.2 Codex, Gemini 3 Flash, YouTube's new vibecoded games, the controversy around Generative AI and game developer Larian Studios, a lego-like robot and, of course, seeing how AI video can cause soap opera actress hair to endlessly grow. IT'S YET ANOTHER WEEK OF NEW RELEASES! AND WE DON'T STOP. Get notified when AndThen launches: https://andthen.chat/ Come to our Discord to try our Secret Project: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // ChatGPT Image 1.5 is here https://openai.com/index/new-chatgpt-images-is-here/ Recreating the post from the OpenAI blog with us https://chatgpt.com/share/69443100-6e7c-8003-910e-749bab75f6e2 Fabian (from GLIF) Notes on Images 1.5 https://x.com/fabianstelzer/status/2001300766368178435?s=20 Jeff Goldblum's Resume https://x.com/gavinpurcell/status/2001033377294467182?s=20 Upcoming QWEN Image Layering https://x.com/wildmindai/status/2001593677576384747?s=20 Image 1.5 Vs Nanobanana Pro Video Game Characters https://www.reddit.com/r/ChatGPT/comments/1ppg4s9/test_2_turning_game_characters_into_real_people/ Fingers https://x.com/petergostev/status/2001027573636088184?s=20 Gavin's Original Knight and Rotisserie Chicken Post https://www.reddit.com/r/ChatGPT/comments/1jk0p3v/tried_to_push_the_new_image_model_with_an/ OpenAI's Greg Brockman: WE NEED THE COMPUTE https://x.com/OpenAI/status/2001336514786017417?s=20 GPT-5.2 CODEX https://openai.com/index/introducing-gpt-5-2-codex/ Frontier Science: New Benchmark https://x.com/OpenAI/status/2000975293448905038?s=20 ChatGPT Apps Store Opens For Developer Submission https://x.com/OpenAIDevs/status/2001419749016899868?s=20 Gemini 3 Flash https://x.com/GoogleDeepMind/status/2001321759702663544?s=20 Nanobanana now in community posts on YT https://x.com/nealmohan/status/2001425749941829920?s=20 Meanwhile, YouTube "Playable Builders" https://x.com/YouTubeGaming/status/2000989303086649637?s= Larian's AI Gaming "Controversy" https://www.pcgamer.com/games/rpg/baldurs-gate-3-developer-larian-defends-itself-as-fans-react-to-generative-ai-use-im-not-entirely-sure-we-are-the-ideal-target-for-the-level-of-scorn/ Direct response from Larian Head of Studios: https://x.com/LarAtLarian/status/2001011042642505833?s=20 MSFT Open-Source Image-to-3D Trellis 2 https://x.com/_akhaliq/status/2001041559366598799?s=20 Bernie's Moratorium on Data Centers https://youtu.be/f40SFNcTOXo?si=hduNjATJgtIya9oq Meanwhile… China can now produce high-end AI Chips https://finance.yahoo.com/news/exclusive-china-built-manhattan-project-141758929.html Meta SAM Audio https://x.com/AIatMeta/status/2000980784425931067?s=20 Tron 2: Lego Robot https://x.com/CyberRobooo/status/2001513866157789308?s=20 AVP Spatial Photos Of Newborn https://x.com/SadlyItsBradley/status/2001276039671197783?s=20 WAN 2.1 Workflow Re-creates The Matrix With Homer Simpson https://x.com/ChetiArt/status/2001291373182382526?s=20 Miss Piggy in Melania Trailer https://x.com/charliebcurran/status/2001564626144928146?s=20 One Woman's Transformation Via Sora Remixes https://sora.chatgpt.com/p/s_693a2ed29e288191a542b776553e1145?psh=HXVzZXItT3diZ1NFOUtyZlRXV2ZvajcwWjJsZ2Uy.XXZmIQEXNl-L
Early tales of the quaint and old traditions of celebrating Christmas that helped shape American Yuletide traditions, blending English customs with a new-world perspective. 'Old Christmas' by Washington Irving has the author traveling to the countryside and meeting an old schoolmate, who invites him home to spend Christmas at the family estate in this globally famous, truly iconic American Christmas story. A Cozy Christmas podcast available at https://amzn.to/48VUwPl Old Christmas by Washington Irving (audio) at https://amzn.to/4s3YfDo Old Christmas by Washington Irving (book) https://amzn.to/3MJgXzK Books by Washington Irving books at https://amzn.to/48ZJybG Washington Irving biography at https://amzn.to/4q2Dxla ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast is available at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio credit: A Cozy Christmas Podcast "Old Christmas" by Washington Irving (Part 1 of 3, 13dec2024). Audio excerpts reproduced under the Fair Use (Fair Dealings) Legal Doctrine for purposes such as criticism, comment, teaching, education, scholarship, research and news reporting.See omnystudio.com/listener for privacy information.
Alexander Embiricos leads product on Codex, OpenAI's powerful coding agent, which has grown 20x since August and now serves trillions of tokens weekly. Before joining OpenAI, Alexander spent five years building a pair programming product for engineers. He now works at the frontier of AI-led software development, building what he describes as a software engineering teammate—an AI agent designed to participate across the entire development lifecycle.We discuss:1. Why Codex has grown 20x since launch and what product decisions unlocked this growth2. How OpenAI built the Sora Android app in just 18 days using Codex3. Why the real bottleneck to AGI-level productivity isn't model capability—it's human typing speed4. The vision of AI as a proactive teammate, not just a tool you prompt5. The bottleneck shifting from building to reviewing AI-generated work6. Why coding will be a core competency for every AI agent—because writing code is how agents use computers best—Brought to you by:WorkOS—Modern identity platform for B2B SaaS, free up to 1 million MAUs: https://workos.com/lennyFin—The #1 AI agent for customer service: https://fin.ai/lennyJira Product Discovery—Confidence to build the right thing: https://atlassian.com/lenny/?utm_source=lennypodcast&utm_medium=paid-audio&utm_campaign=fy24q1-jpd-imc—Transcript: https://www.lennysnewsletter.com/p/why-humans-are-ais-biggest-bottleneck—My biggest takeaways (for paid newsletter subscribers): https://www.lennysnewsletter.com/i/180365355/my-biggest-takeaways-from-this-conversation—Where to find Alexander Embiricos:• X: https://x.com/embirico• LinkedIn: https://www.linkedin.com/in/embirico—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Alexander Embiricos (05:13) The speed and ambition at OpenAI(11:34) Codex: OpenAI's coding agent(15:43) Codex's explosive growth(24:59) The future of AI and coding agents(33:11) The impact of AI on engineering(44:08) How Codex has impacted the way PMs operate(45:40) Throwaway code and ubiquitous coding(47:10) Shipping the Sora Android app(49:01) Building the Atlas browser(53:34) Codex's impact on productivity(55:35) Measuring progress on Codex(58:09) Why they are building a web browser(01:01:58) Non-engineering use cases for Codex(01:02:53) Codex's capabilities(01:04:49) Tips for getting started with Codex(01:05:37) Skills to lean into in the AI age(01:10:36) How far are we from a human version of AI?(01:13:31) Hiring and team growth at Codex(01:15:47) Lightning round and final thoughts—Referenced:• OpenAI: https://openai.com• Codex: https://openai.com/codex• Inside ChatGPT: The fastest-growing product in history | Nick Turley (Head of ChatGPT at OpenAI): https://www.lennysnewsletter.com/p/inside-chatgpt-nick-turley• Dropbox: http://dropbox.com• Datadog: https://www.datadoghq.com• Andrej Karpathy on X: https://x.com/karpathy• The rise of Cursor: The $300M ARR AI tool that engineers can't stop using | Michael Truell (co-founder and CEO): https://www.lennysnewsletter.com/p/the-rise-of-cursor-michael-truell• Atlas: https://openai.com/index/introducing-chatgpt-atlas• How Block is becoming the most AI-native enterprise in the world | Dhanji R. Prasanna: https://www.lennysnewsletter.com/p/how-block-is-becoming-the-most-ai-native• Goose: https://block.xyz/inside/block-open-source-introduces-codename-goose• Lessons on building product sense, navigating AI, optimizing the first mile, and making it through the messy middle | Scott Belsky (Adobe, Behance): https://www.lennysnewsletter.com/p/lessons-on-building-product-sense• Sora Android app: https://play.google.com/store/apps/details?id=com.openai.sora&hl=en_US&pli=1• The OpenAI Podcast—ChatGPT Atlas and the next era of web browsing: https://www.youtube.com/watch?v=WdbgNC80PMw&list=PLOXw6I10VTv9GAOCZjUAAkSVyW2cDXs4u&index=2• How to measure AI developer productivity in 2025 | Nicole Forsgren: https://www.lennysnewsletter.com/p/how-to-measure-ai-developer-productivity• Compiling: https://3d.xkcd.com/303• Jujutsu Kaisen on Netflix: https://www.netflix.com/title/81278456• Tesla: https://www.tesla.com• Radical Candor: From theory to practice with author Kim Scott: https://www.lennysnewsletter.com/p/radical-candor-from-theory-to-practice• Andreas Embirikos: https://en.wikipedia.org/wiki/Andreas_Embirikos• George Embiricos: https://en.wikipedia.org/wiki/George_Embiricos: https://en.wikipedia.org/wiki/George_Embiricos—Recommended books:• Culture series: https://www.amazon.com/dp/B07WLZZ9WV• The Lord of the Rings: https://www.amazon.com/Lord-Rings-J-R-R-Tolkien/dp/0544003411• A Fire Upon the Deep (Zones of Thought series Book 1): https://www.amazon.com/Fire-Upon-Deep-Zones-Thought/dp/1250237750• Radical Candor: Be a Kick-Ass Boss Without Losing Your Humanity: https://www.amazon.com/Radical-Candor-Kick-Ass-Without-Humanity/dp/1250103509—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. To hear more, visit www.lennysnewsletter.com
Austin and Skitch delve into the newest Paizo release of dragon lore! STARRING - Austin Yorski: https://bsky.app/profile/austinyorski.bsky.social Michael "Skitch" Schiciano: https://bsky.app/profile/skitch.bsky.social SUPPORT - Patreon.com/AustinYorski Patreon.com/Skitch AUDIO - Kirby Super Star OC ReMix by TSori & Others: "Until the Next Dance" [Meta Knight: Ending] (#4223) - YouTube DISCORD - https://discord.gg/YMU3qUH
This episode features Olivier Godement, Head of Product for Business Products at OpenAI, discussing the current state and future of AI adoption in enterprises, with a particular focus on the recent releases of GPT 5.1 and Codex. The conversation explores how these models are achieving meaningful automation in specific domains like coding, customer support, and life sciences: where companies like Amgen are using AI to accelerate drug development timelines from months to weeks through automated regulatory documentation. Olivier reveals that while complete job automation remains challenging and requires substantial scaffolding, harnesses, and evaluation frameworks, certain use cases like coding are reaching a tipping point where engineers would "riot" if AI tools were taken away. The discussion covers the importance of cost reduction in unlocking new use cases, the emerging significance of reinforcement fine-tuning (RFT) for frontier customers, and OpenAI's philosophy of providing not just models but reference architectures and harnesses to maximize developer success. (0:00) Intro(1:46) Discussing GPT-5.1(2:57) Adoption and Impact of Codex(4:09) Scientific Community's Use of GPT-5.1(6:37) Challenges in AI Automation(8:19) AI in Life Sciences and Pharma(11:48) Enterprise AI Adoption and Ecosystem(16:04) Future of AI Models and Continuous Learning(24:20) Cost and Efficiency in AI Deployment(27:10) Reinforcement Learning and Enterprise Use Cases(31:17) Key Factors Influencing Model Choice(34:21) Challenges in Model Deployment and Adaptation(38:29) Voice Technology: The Next Frontier(41:08) The Rise of AI in Software Engineering(52:09) Quickfire With your co-hosts: @jacobeffron - Partner at Redpoint, Former PM Flatiron Health @patrickachase - Partner at Redpoint, Former ML Engineer LinkedIn @ericabrescia - Former COO Github, Founder Bitnami (acq'd by VMWare) @jordan_segall - Partner at Redpoint
"Give me liberty, or give me death!" is a quotation attributed to Patrick Henry (1736-99) from a speech he made to the Virginia Convention in 1775, at St. John's Church in Richmond, Virginia, he is credited with having swung the balance in convincing the Virginia House of Burgesses to pass a resolution delivering the Virginia troops to the Revolutionary War. Among the delegates to the convention were future U.S. Presidents Thomas Jefferson and George Washington. Patrick Henry's "Give Me Liberty Or Give Me Death" Speech at https://amzn.to/4oGsyga Patrick Henry Books available at https://amzn.to/4rLCIin ENJOY Ad-Free content, Bonus episodes, and Extra materials when joining our growing community on https://patreon.com/markvinet SUPPOaRT this channel by purchasing any product on Amazon using this FREE entry LINK https://amzn.to/3POlrUD (Amazon gives us credit at NO extra charge to you). Mark Vinet's HISTORICAL JESUS podcast is available at https://parthenonpodcast.com/historical-jesus Mark's TIMELINE video channel: https://youtube.com/c/TIMELINE_MarkVinet Website: https://markvinet.com/podcast Facebook: https://www.facebook.com/mark.vinet.9 Twitter: https://twitter.com/MarkVinet_HNA Instagram: https://www.instagram.com/denarynovels Mark's books: https://amzn.to/3k8qrGM Audio Credit: The Autobiography of Benjamin Franklin (Librivox, read by G. Giordano).See omnystudio.com/listener for privacy information.
Jake and Michael discuss all the latest Laravel releases, tutorials, and happenings in the community.This episode is sponsored by CodeRabbit; Smart CLI Reviews act as quality gates for Codex, Claude, Gemini, and you.Show linksBlade @hasStack Directive Added in Laravel 12.39 Time Interval Helpers in Laravel 12.40 Pause a Queue for a Given Number of Seconds in Laravel 12 PHP 8.5 is released with the pipe operator, URI extension, new array functions, and more Introducing Mailviews Early Access Prevent Disposable Email Registrations with Email Utilities for Laravel A DynamoDB Driver for the Laravel Auditing Package Build Production-ready APIs in Laravel with Tyro TutorialsSeparate your Cloudflare page cache with a middleware group PostgreSQL vs. MongoDB for Laravel: Choosing the Right Database Modernizing Code with Rector - Laravel In Practice EP12 Static Analysis Secrets - Laravel In Practice EP13
Stephen and Dave are back to finish reviewing Codex: Drukhari, discussing all detachments available to the faction as well as what Stephen has learned from playing the Dark Eldar in three recent events.
Join us as James and Frank delve into the fascinating world of AI-driven UI design with Gemini 3.0, exploring its creative capabilities and potential to revolutionize aesthetics. Discover the latest AI model advancements, including GPT-5.1 and Codex, and gain insights into real-time trace debugging and distributed programming. Plus, we tackle the evolving landscape of Integrated Development Environments, AI tool integrations in Visual Studio Code, and cutting-edge developments in robotics and virtual reality. This episode is a must-listen for anyone interested in the intersection of AI, design, and technology. Follow Us Frank: Twitter, Blog, GitHub James: Twitter, Blog, GitHub Merge Conflict: Twitter, Facebook, Website, Chat on Discord Music : Amethyst Seer - Citrine by Adventureface ⭐⭐ Review Us (https://itunes.apple.com/us/podcast/merge-conflict/id1133064277?mt=2&ls=1) ⭐⭐ Machine transcription available on http://mergeconflict.fm
In this episode, a16z GP Martin Casado sits down with Sherwin Wu, Head of Engineering for the OpenAI Platform, to break down how OpenAI organizes its platform across models, pricing, and infrastructure, and how it is shifting from a single general-purpose model to a portfolio of specialized systems, custom fine-tuning options, and node-based agent workflows.They get into why developers tend to stick with a trusted model family, what builds that trust, and why the industry moved past the idea of one model that can do everything. Sherwin also explains the evolution from prompt engineering to context design and how companies use OpenAI's fine-tuning and RFT APIs to shape model behavior with their own data.Highlights from the conversation include: • How OpenAI balances a horizontal API platform with vertical products like ChatGPT• The evolution from Codex to the Composer model• Why usage-based pricing works and where outcome-based pricing breaks• What the Harmonic Labs and Rockset acquisitions added to OpenAI's agent work• Why the new agent builder is deterministic, node based, and not free roaming Resources: Follow Sherwin on X: https://x.com/sherwinwu Follow Martin on X: https://x.com/martin_casado Stay Updated:If you enjoyed this episode, be sure to like, subscribe, and share with your friends!Find a16z on X: https://x.com/a16zFind a16z on LinkedIn: https://www.linkedin.com/company/a16zListen to the a16z Podcast on Spotify: https://open.spotify.com/show/5bC65RDvs3oxnLyqqvkUYXListen to the a16z Podcast on Apple Podcasts: https://podcasts.apple.com/us/podcast/a16z-podcast/id842818711Follow our host: https://x.com/eriktorenbergPlease note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see http://a16z.com/disclosures Stay Updated:Find a16z on XFind a16z on LinkedInListen to the a16z Podcast on SpotifyListen to the a16z Podcast on Apple PodcastsFollow our host: https://twitter.com/eriktorenberg Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Codex History of Video Games with Mike Coletta and Tyler Ostby - Podaholics
Mike and Tyler are on Thanksgiving break. Please enjoy this rerelease! In part one Tyler and Mike talk about the Wii, what it meant for gaming, how Nintendo created a new market for themselves, the Wii Remote, and the Wii's online services. The theme music is by RoccoW. The logo was created by Dani Dodge