POPULARITY
Categories
Visit Mixture of Experts podcast page to get more AI content → https://www.ibm.com/think/podcasts/mixture-of-experts Can your AI agent hack its own evaluation? This week on Mixture of Experts, Tim Hwang is joined by Ambhi Ganesan, Kaoutar El Maghraoui, and Sandi Besen to analyze OpenAI's Codex Security launch. Next, we explore eval awareness as Anthropic revealed Opus 4.6 figured out it was being tested, located the answer key and decrypted it.. Then, Meta acquires Moltbook, the social network for AI agents, and we discuss the strategic play for agentic commerce infrastructure. Finally, Alibaba reports that an agent broke containment and started mining crypto. Are agents trying too hard to maximize rewards? All that and more on todays Mixture of Experts. 00:00 – Introduction 1:02 – OpenAI Codex Security launch 12:44 – Meta acquires Moltbook 25:21 – Anthropic's eval awareness research 38:06 – Alibaba agents mining crypto The opinions expressed in this podcast are solely those of the participants and do not necessarily reflect the views of IBM or any other organization or entity. Subscribe for AI updates → https://www.ibm.com/account/reg/us-en/signup?formid=news-urx-52120
In this episode of Crazy Wisdom, Stewart Alsop sits down with Andre Oliveira, founder of Splash N Color, a bootstrapped 3D printing e-commerce business selling consumer goods on Amazon. The two cover a lot of ground — from how Andre went from running 40 FDM printers out of South Florida to offshoring manufacturing to China, to how he's using Claude Code to automate inventory management and generate supplier RFQs across 200+ SKUs. The conversation stretches into bigger territory too: the San Francisco AI scene, the rise of AI agents and what they mean for the future of the internet, whether local on-device AI will eventually replace cloud-based tools, and why building physical products will stay hard long after software becomes easy. It's a candid, wide-ranging conversation between two self-taught builders figuring things out in real time. Follow Andre on X: @AndreBaach.Timestamps00:00 — Andre introduces Splash N Color, his Amazon-based 3D printing e-commerce business and explains the grind of running 40 FDM machines in South Florida.05:00 — The conversation shifts to Claude Code and how Andre built an inventory automation system to manage sales velocity and RFQs across 200+ SKUs.10:00 — Stewart and Andre compare notes on Opus 4.6, debate Codex vs Claude, and Andre breaks down the new Agent Teams feature in Claude Code.15:00 — Discussion turns to the San Francisco AI scene, the viral OpenClaw launch event that drew 700 people, and what's capturing the city's imagination right now.20:00 — The pair wrestle with data privacy, the illusion of it since 2000, and whether full transparency of personal data might actually serve people better.25:00 — Stewart pitches his vision of local on-device AI replacing cloud tools entirely, and they debate the 10–15 year timeline for mainstream societal adoption.30:00 — Andre traces his origin story: a high school dropout from Brazil who spotted a 3D printing opportunity on Facebook Marketplace and got lucky timing with COVID.35:00 — They explore whether AI-generated 3D models and DfAM will automate physical manufacturing, and why proprietary specs keep the space stubbornly hard.Key InsightsLifestyle businesses deserve more respect. Andre spent months feeling inadequate scrolling through Twitter watching founders announce funding rounds, before realizing his cash-flowing, location-independent business was already the goal. The social media version of entrepreneurial success warped his perception of what he actually had built.Claude Code is becoming an operating system. Stewart describes running Claude Code as having a second OS on top of MacOS — one that makes the underlying machine legible in ways it never was before. Both guests use it not just for coding but as a primary interface for understanding and operating their businesses.Agent Teams changes how work gets done. Andre explains that Claude's new multi-agent feature lets you assign a team lead and specialized roles that communicate with each other in parallel, essentially running an autonomous task force inside your terminal — a meaningful leap beyond single-instance prompting.Physical manufacturing will stay hard. Even as AI-generated 3D models improve, tolerances of 0.5 millimeters can mean the difference between a product working or not. Design for manufacturing is a separate discipline from design itself, and proprietary specs mean open source models rarely hit commercial quality.The internet is heading toward agents. Both guests agree that AI agents will increasingly handle tasks humans currently do manually online — booking services, making payments, coordinating logistics — with the human internet potentially becoming secondary to a machine-to-machine layer.Iteration is the real value of 3D printing. Andre pushes back on 3D printing as a business unto itself, framing it instead as a prototyping tool. The true value is rapid iteration on housing, tolerances, and fit — not the printer, but the speed of the feedback loop it enables.Technology compounds in layers. Andre closes with a tech-tree analogy: each generation normalizes the tools of the previous one and builds the next layer on top. Agentic coding today is what the internet was in the 90s — the foundation for something we can't yet fully see.
OpenAI's GPT-5.4 is the newest state-of-the-art AI model. But for how long? Sam Altman and team have to fend off Dario Amodei's Anthropic in more ways than one. GPT-5.4 beats humans at computer use, hallucinates 33% less, and costs half what Opus 4.6 does for coding. Meanwhile Anthropic just got blacklisted by the Pentagon, Claude shot to number one in the App Store, and Dario went on CBS looking like he aged five years. Plus Kling Motion Control 3.0, Grok Imagine Extend, Notebook LM cinematic overviews, Ben Affleck's secret AI startup sold to Netflix, and a device that jams AI from recording you. THE KING WEARS THE CROWN UNTIL NEXT TUESDAY. THAT'S HOW THIS WORKS NOW. #ai #ainews #openai Come to our Discord: https://discord.gg/muD2TYgC8f Join our Patreon: https://www.patreon.com/AIForHumansShow AI For Humans Newsletter: https://aiforhumans.beehiiv.com/ Follow us for more on X @AIForHumansShow Join our TikTok @aiforhumansshow To book us for speaking, please visit our website: https://www.aiforhumans.show/ // Show Links // OpenAI's GPT-5.4 is here… https://openai.com/index/introducing-gpt-5-4/ CUA (Computer Use Agent) OpenAI Demo https://x.com/OpenAIDevs/status/2029620984853188738?s=20) ChatGPT 5.3 Instant (the emojis are back) https://x.com/OpenAI/status/2028893701427302559?s=20 Dario Amodei Rips Sam, Calls OpenAI "Mendacious" And Other Things About Trump in Memo https://www.theinformation.com/articles/read-anthropic-ceos-memo-attacking-openais-mendacious-pentagon-announcement?rc=c3oojq&shared=d3dcdfe79bb39c22 Sam Altman Response To Backlash https://x.com/sama/status/2028640354912923739?s=20 Voice Mode For Claude Code https://x.com/trq212/status/2028628570692890800?s=20 AI for Humans Link Blog https://www.aiforhumans.show/linkblog Kling Motion Control 3.0 https://x.com/Kling_ai/status/2029391461880574136?s=20 Solid Examples https://x.com/maxescu/status/2029223608854434000?s=20 Grok Imagine Extend Video https://x.com/grok/status/2028571874188259337?s=20 Gavin's Grok Extended video: https://x.com/gavinpurcell/status/2029423165571776587?s=20 Cinematic Overviews for NotebookLM https://x.com/NotebookLM/status/2029240601334436080?s=20 Justine example https://x.com/venturetwins/status/2029592813277683750?s=20 Netflix Buys Ben Affleck's Secretive AI Company https://variety.com/2026/film/news/netflix-acquires-ben-affleck-ai-filmmaking-startup-interpositive-1236679498/ Spectre 1 Stops AI Audio Recordings IRL https://x.com/aidaxbaradari/status/2028864606568067491?s=20 The Shape Store Viral Tiktok/IG https://www.tiktok.com/@a.i.solation/video/7612337372667071774?_r=1&_t=ZT-94RCk4vtqYg Bilawal Sidhu's Operation Fury Visualization With Actual Geolocation Data https://youtu.be/rXvU7bPJ8n4?si=n4aoCsgHLQ6bq432 Infinite Favicons https://www.favicons.art/ Open Source Vibe Coded Music Video Maker https://x.com/blizaine/status/2029330454122242106?s=20 New AI For Humans Website! https://www.aiforhumans.show/
Join us on the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80Join Simtheory: https://simtheory.ai
All speakers are announced at AIE EU, schedule coming soon. Join us there or in Miami with the renowned organizers of React Miami! Singapore CFP also open!We've called this out a few times over in AINews, but the overwhelming consensus in the Valley is that “the IDE is Dead”. In November it was just a gut feeling, but now we actually have data: even at the canonical “VSCode Fork” company, people are officially using more agents than tab autocomplete (the first wave of AI coding):Cursor has launched cloud agents for a few months now, and this specific launch is around Computer Use, which has come a long way since we first talked with Anthropic about it in 2024, and which Jonas productized as Autotab:We also take the opportunity to do a live demo, talk about slash commands and subagents, and the future of continual learning and personalized coding models, something that Sam previously worked on at New Computer. (The fact that both of these folks are top tier CEOs of their own startups that have now joined the insane talent density gathering at Cursor should also not be overlooked).Full Episode on YouTube!please like and subscribe!Timestamps00:00 Agentic Code Experiments00:53 Why Cloud Agents Matter02:08 Testing First Pillar03:36 Video Reviews Second Pillar04:29 Remote Control Third Pillar06:17 Meta Demos and Bug Repro13:36 Slash Commands and MCPs18:19 From Tab to Team Workflow31:41 Minimal Web UI Philosophy32:40 Why No File Editor34:38 Full Stack Cursor Debate36:34 Model Choice and Auto Routing38:34 Parallel Agents and Best Of N41:41 Subagents and Context Management44:48 Grind Mode and Throughput Future01:00:24 Cloud Agent Onboarding and MemoryTranscriptEP 77 - CURSOR - Audio version[00:00:00]Agentic Code ExperimentsSamantha: This is another experiment that we ran last year and didn't decide to ship at that time, but may come back to LM Judge, but one that was also agentic and could write code. So it wasn't just picking but also taking the learnings from two models or and models that it was looking at and writing a new diff.And what we found was that there were strengths to using models from different model providers as the base level of this process. Basically you could get almost like a synergistic output that was better than having a very unified like bottom model tier.Jonas: We think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster and we'll be making the pipe much wider and so paralyzing more, whether that's swarms of agents or parallel agents, both of those are things that contribute to getting much more done in the same amount of time.Why Cloud Agents Matterswyx: This week, one of the biggest launches that Cursor's ever done is cloud agents. I think you, you had [00:01:00] cloud agents before, but this was like, you give cursor a computer, right? Yeah. So it's just basically they bought auto tab and then they repackaged it. Is that what's going on, or,Jonas: that's a big part of it.Yeah. Cloud agents already ran in their own computers, but they were sort of site reading code. Yeah. And those computers were not, they were like blank VMs typically that were not set up for the Devrel X for whatever repo the agents working on. One of the things that we talk about is if you put yourself in the model shoes and you were seeing tokens stream by and all you could do was cite read code and spit out tokens and hope that you had done the right thing,swyx: no chanceJonas: I'd be so bad.Like you obviously you need to run the code. And so that I think also is probably not that contrarian of a take, but no one has done that yet. And so giving the model the tools to onboard itself and then use full computer use end-to-end pixels in coordinates out and have the cloud computer with different apps in it is the big unlock that we've seen internally in terms of use usage of this going from, oh, we use it for little copy changes [00:02:00] to no.We're really like driving new features with this kind of new type of entech workflow. Alright, let's see it. Cool.Live Demo TourJonas: So this is what it looks like in cursor.com/agents. So this is one I kicked off a while ago. So on the left hand side is the chat. Very classic sort of agentic thing. The big new thing here is that the agent will test its changes.So you can see here it worked for half an hour. That is because it not only took time to write the tokens of code, it also took time to test them end to end. So it started Devrel servers iterate when needed. And so that's one part of it is like model works for longer and doesn't come back with a, I tried some things pr, but a I tested at pr that's ready for your review.One of the other intuition pumps we use there is if a human gave you a PR asked you to review it and you hadn't, they hadn't tested it, you'd also be annoyed because you'd be like, only ask me for a review once it's actually ready. So that's what we've done withTesting Defaults and Controlsswyx: simple question I wanted to gather out front.Some prs are way smaller, [00:03:00] like just copy change. Does it always do the video or is it sometimes,Jonas: Sometimes.swyx: Okay. So what's the judgment?Jonas: The model does it? So we we do some default prompting with sort. What types of changes to test? There's a slash command that people can do called slash no test, where if you do that, the model will not test,swyx: but the default is test.Jonas: The default is to be calibrated. So we tell it don't test, very simple copy changes, but test like more complex things. And then users can also write their agents.md and specify like this type of, if you're editing this subpart of my mono repo, never tested ‘cause that won't work or whatever.Videos and Remote ControlJonas: So pillar one is the model actually testing Pillar two is the model coming back with a video of what it did.We have found that in this new world where agents can end-to-end, write much more code, reviewing the code is one of these new bottlenecks that crop up. And so reviewing a video is not a substitute for reviewing code, but it is an entry point that is much, much easier to start with than glancing at [00:04:00] some giant diff.And so typically you kick one off you, it's done you come back and the first thing that you would do is watch this video. So this is a, video of it. In this case I wanted a tool tip over this button. And so it went and showed me what that looks like in, in this video that I think here, it actually used a gallery.So sometimes it will build storybook type galleries where you can see like that component in action. And so that's pillar two is like these demo videos of what it built. And then pillar number three is I have full remote control access to this vm. So I can go heat in here. I can hover things, I can type, I have full control.And same thing for the terminal. I have full access. And so that is also really useful because sometimes the video is like all you need to see. And oftentimes by the way, the video's not perfect, the video will show you, is this worth either merging immediately or oftentimes is this worth iterating with to get it to that final stage where I am ready to merge in.So I can go through some other examples where the first video [00:05:00] wasn't perfect, but it gave me confidence that we were on the right track and two or three follow-ups later, it was good to go. And then I also have full access here where some things you just wanna play around with. You wanna get a feel for what is this and there's no substitute to a live preview.And the VNC kind of VM remote access gives you that.swyx: Amazing What, sorry? What is VN. AndJonas: just the remote desktop. Remote desktop. Yeah.swyx: Sam, any other details that you always wanna call out?Samantha: Yeah, for me the videos have been super helpful. I would say, especially in cases where a common problem for me with agents and cloud agents beforehand was almost like under specification in my requests where our plan mode and going really back and forth and getting detailed implementation spec is a way to reduce the risk of under specification, but then similar to how human communication breaks down over time, I feel like you have this risk where it's okay, when I pull down, go to the triple of pulling down and like running this branch locally, I'm gonna see that, like I said, this should be a toggle and you have a checkbox and like, why didn't you get that detail?And having the video up front just [00:06:00] has that makes that alignment like you're talking about a shared artifact with the agent. Very clear, which has been just super helpful for me.Jonas: I can quickly run through some other Yes. Examples.Meta Agents and More DemosJonas: So this is a very front end heavy one. So one question I wasswyx: gonna say, is this only for frontJonas: end?Exactly. One question you might have is this only for front end? So this is another example where the thing I wanted it to implement was a better error message for saving secrets. So the cloud agents support adding secrets, that's part of what it needs to access certain systems. Part of onboarding that is giving access.This is cloud is working onswyx: cloud agents. Yes.Jonas: So this is a fun thing isSamantha: it can get super meta. ItJonas: can get super meta, it can start its own cloud agents, it can talk to its own cloud agents. Sometimes it's hard to wrap your mind around that. We have disabled, it's cloud agents starting more cloud agents. So we currently disallow that.Someday you might. Someday we might. Someday we might. So this actually was mostly a backend change in terms of the error handling here, where if the [00:07:00] secret is far too large, it would oh, this is actually really cool. Wow. That's the Devrel tools. That's the Devrel tools. So if the secret is far too large, we.Allow secrets above a certain size. We have a size limit on them. And the error message there was really bad. It was just some generic failed to save message. So I was like, Hey, we wanted an error message. So first cool thing it did here, zero prompting on how to test this. Instead of typing out the, like a character 5,000 times to hit the limit, it opens Devrel tools, writes js, or to paste into the input 5,000 characters of the letter A and then hit save, closes the Devrel tools, hit save and gets this new gets the new error message.So that looks like the video actually cut off, but here you can see the, here you can see the screenshot of the of the error message. What, so that is like frontend backend end-to-end feature to, to get that,swyx: yeah.Jonas: Andswyx: And you just need a full vm, full computer run everything.Okay. Yeah.Jonas: Yeah. So we've had versions of this. This is one of the auto tab lessons where we started that in 2022. [00:08:00] No, in 2023. And at the time it was like browser use, DOM, like all these different things. And I think we ended up very sort of a GI pilled in the sense that just give the model pixels, give it a box, a brain in a box is what you want and you want to remove limitations around context and capabilities such that the bottleneck should be the intelligence.And given how smart models are today, that's a very far out bottleneck. And so giving it its full VM and having it be onboarded with Devrel X set up like a human would is just been for us internally a really big step change in capability.swyx: Yeah I would say, let's call it a year ago the models weren't even good enough to do any of this stuff.SoSamantha: even six months ago. Yeah.swyx: So yeah what people have told me is like round about Sonder four fire is when this started being good enough to just automate fully by pixel.Jonas: Yeah, I think it's always a question of when is good enough. I think we found in particular with Opus 4 5, 4, 6, and Codex five three, that those were additional step [00:09:00] changes in the autonomy grade capabilities of the model to just.Go off and figure out the details and come back when it's done.swyx: I wanna appreciate a couple details. One 10 Stack Router. I see it. Yeah. I'm a big fan. Do you know any, I have to name the 10 Stack.Jonas: No.swyx: This just a random lore. Some buddy Sue Tanner. My and then the other thing if you switch back to the video.Jonas: Yeah.swyx: I wanna shout out this thing. Probably Sam did it. I don't knowJonas: the chapters.swyx: What is this called? Yeah, this is called Chapters. Yeah. It's like a Vimeo thing. I don't know. But it's so nice the design details, like the, and obviously a company called Cursor has to have a beautiful cursorSamantha: and it isswyx: the cursor.Samantha: Cursor.swyx: You see it branded? It's the cursor. Cursor, yeah. Okay, cool. And then I was like, I complained to Evan. I was like, okay, but you guys branded everything but the wallpaper. And he was like, no, that's a cursor wallpaper. I was like, what?Samantha: Yeah. Rio picked the wallpaper, I think. Yeah. The video.That's probably Alexi and yeah, a few others on the team with the chapters on the video. Matthew Frederico. There's been a lot of teamwork on this. It's a huge effort.swyx: I just, I like design details.Samantha: Yeah.swyx: And and then when you download it adds like a little cursor. Kind of TikTok clip. [00:10:00] Yes. Yes.So it's to make it really obvious is from Cursor,Jonas: we did the TikTok branding at the end. This was actually in our launch video. Alexi demoed the cloud agent that built that feature. Which was funny because that was an instance where one of the things that's been a consequence of having these videos is we use best of event where you run head to head different models on the same prompt.We use that a lot more because one of the complications with doing that before was you'd run four models and they would come back with some giant diff, like 700 lines of code times four. It's what are you gonna do? You're gonna review all that's horrible. But if you come back with four 22nd videos, yeah, I'll watch four 22nd videos.And then even if none of them is perfect, you can figure out like, which one of those do you want to iterate with, to get it over the line. Yeah. And so that's really been really fun.Bug Repro WorkflowJonas: Here's another example. That's we found really cool, which is we've actually turned since into a slash command as well slash [00:11:00] repro, where for bugs in particular, the model of having full access to the to its own vm, it can first reproduce the bug, make a video of the bug reproducing, fix the bug, make a video of the bug being fixed, like doing the same pattern workflow with obviously the bug not reproducing.And that has been the single category that has gone from like these types of bugs, really hard to reproduce and pick two tons of time locally, even if you try a cloud agent on it. Are you confident it actually fixed it to when this happens? You'll merge it in 90 seconds or something like that.So this is an example where, let me see if this is the broken one or the, okay, this is the fixed one. Okay. So we had a bug on cursor.com/agents where if you would attach images where remove them. Then still submit your prompt. They would actually still get attached to the prompt. Okay. And so here you can see Cursor is using, its full desktop by the way.This is one of the cases where if you just do, browse [00:12:00] use type stuff, you'll have a bad time. ‘cause now it needs to upload files. Like it just uses its native file viewer to do that. And so you can see here it's uploading files. It's going to submit a prompt and then it will go and open up. So this is the meta, this is cursor agent, prompting cursor agent inside its own environment.And so you can see here bug, there's five images attached, whereas when it's submitted, it only had one image.swyx: I see. Yeah. But you gotta enable that if you're gonna use cur agent inside cur.Jonas: Exactly. And so here, this is then the after video where it went, it does the same thing. It attaches images, removes, some of them hit send.And you can see here, once this agent is up, only one of the images is left in the attachments. Yeah.swyx: Beautiful.Jonas: Okay. So easy merge.swyx: So yeah. When does it choose to do this? Because this is an extra step.Jonas: Yes. I think I've not done a great job yet of calibrating the model on when to reproduce these things.Yeah. Sometimes it will do it of its own accord. Yeah. We've been conservative where we try to have it only do it when it's [00:13:00] quite sure because it does add some amount of time to how long it takes it to work on it. But we also have added things like the slash repro command where you can just do, fix this bug slash repro and then it will know that it should first make you a video of it actually finding and making sure it can reproduce the bug.swyx: Yeah. Yeah. One sort of ML topic this ties into is reward hacking, where while you write test that you update only pass. So first write test, it shows me it fails, then make you test pass, which is a classic like red green.Jonas: Yep.swyx: LikeJonas: A-T-D-D-T-D-Dswyx: thing.No, very cool. Was that the last demo? Is thereJonas: Yeah.Anything I missed on the demos or points that you think? I think thatSamantha: covers it well. Yeah.swyx: Cool. Before we stop the screen share, can you gimme like a, just a tour of the slash commands ‘cause I so God ready. Huh, what? What are the good ones?Samantha: Yeah, we wanna increase discoverability around this too.I think that'll be like a future thing we work on. Yeah. But there's definitely a lot of good stuff nowJonas: we have a lot of internal ones that I think will not be that interesting. Here's an internal one that I've made. I don't know if anyone else at Cursor uses this one. Fix bb.Samantha: I've never heard of it.Jonas: Yeah.[00:14:00]Fix Bug Bot. So this is a thing that we want to integrate more tightly on. So you made it forswyx: yourself.Jonas: I made this for myself. It's actually available to everyone in the team, but yeah, no one knows about it. But yeah, there will be Bug bot comments and so Bug Bot has a lot of cool things. We actually just launched Bug Bot Auto Fix, where you can click a button and or change a setting and it will automatically fix its own things, and that works great in a bunch of cases.There are some cases where having the context of the original agent that created the PR is really helpful for fixing the bugs, because it might be like, oh, the bug here is that this, is a regression and actually you meant to do something more like that. And so having the original prompt and all of the context of the agent that worked on it, and so here I could just do, fix or we used to be able to do fixed PB and it would do that.No test is another one that we've had. Slash repro is in here. We mentioned that one.Samantha: One of my favorites is cloud agent diagnosis. This is one that makes heavy use of the Datadog MCP. Okay. And I [00:15:00] think Nick and David on our team wrote, and basically if there is a problem with a cloud agent we'll spin up a bunch of subs.Like a singleswyx: instance.Samantha: Yeah. We'll take the ideas and argument and spin up a bunch of subagents using the Datadog MCP to explore the logs and find like all of the problems that could have happened with that. It takes the debugging time, like from potentially you can do quick stuff quickly with the Datadog ui, but it takes it down to, again, like a single agent call as opposed to trolling through logs yourself.Jonas: You should also talk about the stuff we've done with transcripts.Samantha: Yes. Also so basically we've also done some things internally. There'll be some versions of this as we ship publicly soon, where you can spit up an agent and give it access to another agent's transcript to either basically debug something that happened.So act as an external debugger. I see. Or continue the conversation. Almost like forking it.swyx: A transcript includes all the chain of thought for the 11 minutes here. 45 minutes there.Samantha: Yeah. That way. Exactly. So basically acting as a like secondary agent that debugs the first, so we've started to push more andswyx: they're all the same [00:16:00] code.It is just the different prompts, but the sa the same.Samantha: Yeah. So basically same cloud agent infrastructure and then same harness. And then like when we do things like include, there's some extra infrastructure that goes into piping in like an external transcript if we include it as an attachment.But for things like the cloud agent diagnosis, that's mostly just using the Datadog MCP. ‘Cause we also launched CPS along with along with this cloud agent launch, launch support for cloud agent cps.swyx: Oh, that was drawn out.Jonas: We won't, we'll be doing a bigger marketing moment for it next week, but, and you can now use CPS andswyx: People will listen to it as well.Yeah,Jonas: they'llSamantha: be ahead of the third. They'll be ahead. And I would I actually don't know if the Datadog CP is like publicly available yet. I realize this not sure beta testing it, but it's been one of my favorites to use. Soswyx: I think that one's interesting for Datadog. ‘cause Datadog wants to own that site.Interesting with Bits. I don't know if you've tried bits.Samantha: I haven't tried bits.swyx: Yeah.Jonas: That's their cloud agentswyx: product. Yeah. Yeah. They want to be like we own your logs and give us our, some part of the, [00:17:00] self-healing software that everyone wants. Yeah. But obviously Cursor has a strong opinion on coding agents and you, you like taking away from the which like obviously you're going to do, and not every company's like Cursor, but it's interesting if you're a Datadog, like what do you do here?Do you expose your logs to FDP and let other people do it? Or do you try to own that it because it's extra business for you? Yeah. It's like an interesting one.Samantha: It's a good question. All I know is that I love the Datadog MCP,Jonas: And yeah, it is gonna be no, no surprise that people like will demand it, right?Samantha: Yeah.swyx: It's, it's like anysystemswyx: of record company like this, it's like how much do you give away? Cool. I think that's that for the sort of cloud agents tour. Cool. And we just talk about like cloud agents have been when did Kirsten loves cloud agents? Do you know, in JuneJonas: last year.swyx: June last year. So it's been slowly develop the thing you did, like a bunch of, like Michael did a post where himself, where he like showed this chart of like ages overtaking tap. And I'm like, wow, this is like the biggest transition in code.Jonas: Yeah.swyx: Like in, in [00:18:00] like the last,Jonas: yeah. I think that kind of got turned out.Yeah. I think it's a very interest,swyx: not at all. I think it's been highlighted by our friend Andre Kati today.Jonas: Okay.swyx: Talk more about it. What does it mean? Yeah. Is I just got given like the cursor tab key.Jonas: Yes. Yes.swyx: That's that'sSamantha: cool.swyx: I know, but it's gonna be like put in a museum.Jonas: It is.Samantha: I have to say I haven't used tab a little bit myself.Jonas: Yeah. I think that what it looks like to code with AI code generally creates software, even if you want to go higher level. Is changing very rapidly. No, not a hot take, but I think from our vendor's point at Cursor, I think one of the things that is probably underappreciated from the outside is that we are extremely self-aware about that fact and Kerscher, got its start in phase one, era one of like tab and auto complete.And that was really useful in its time. But a lot of people start looking at text files and editing code, like we call it hand coding. Now when you like type out the actual letters, it'sswyx: oh that's cute.Jonas: Yeah.swyx: Oh that's cute.Jonas: You're so boomer. So boomer. [00:19:00] And so that I think has been a slowly accelerating and now in the last few months, rapidly accelerating shift.And we think that's going to happen again with the next thing where the, I think some of the pains around tab of it's great, but I actually just want to give more to the agent and I don't want to do one tab at a time. I want to just give it a task and it goes off and does a larger unit of work and I can.Lean back a little bit more and operate at that higher level of abstraction that's going to happen again, where it goes from agents handing you back diffs and you're like in the weeds and giving it, 32nd to three minute tasks, to, you're giving it, three minute to 30 minute to three hour tasks and you're getting back videos and trying out previews rather than immediately looking at diffs every single time.swyx: Yeah. Anything to add?Samantha: One other shift that I've noticed as our cloud agents have really taken off internally has been a shift from primarily individually driven development to almost this collaborative nature of development for us, slack is actually almost like a development on [00:20:00] Id basically.So Iswyx: like maybe don't even build a custom ui, like maybe that's like a debugging thing, but actually it's that.Samantha: I feel like, yeah, there's still so much to left to explore there, but basically for us, like Slack is where a lot of development happens. Like we will have these issue channels or just like this product discussion channels where people are always at cursing and that kicks off a cloud agent.And for us at least, we have team follow-ups enabled. So if Jonas kicks off at Cursor in a thread, I can follow up with it and add more context. And so it turns into almost like a discussion service where people can like collaborate on ui. Oftentimes I will kick off an investigation and then sometimes I even ask it to get blame and then tag people who should be brought in. ‘cause it can tag people in Slack and then other people will comeswyx: in, can tag other people who are not involved in conversation. Yes. Can just do at Jonas if say, was talking to,Samantha: yeah.swyx: That's cool. You should, you guys should make a big good deal outta that.Samantha: I know. It's a lot to, I feel like there's a lot more to do with our slack surface area to show people externally. But yeah, basically like it [00:21:00] can bring other people in and then other people can also contribute to that thread and you can end up with a PR again, with the artifacts visible and then people can be like, okay, cool, we can merge this.So for us it's like the ID is almost like moving into Slack in some ways as well.swyx: I have the same experience with, but it's not developers, it's me. Designer salespeople.Samantha: Yeah.swyx: So me on like technical marketing, vision, designer on design and then salespeople on here's the legal source of what we agreed on.And then they all just collaborate and correct. The agents,Jonas: I think that we found when these threads is. The work that is left, that the humans are discussing in these threads is the nugget of what is actually interesting and relevant. It's not the boring details of where does this if statement go?It's do we wanna ship this? Is this the right ux? Is this the right form factor? Yeah. How do we make this more obvious to the user? It's like those really interesting kind of higher order questions that are so easy to collaborate with and leave the implementation to the cloud agent.Samantha: Totally. And no more discussion of am I gonna do this? Are you [00:22:00] gonna do this cursor's doing it? You just have to decide. You like it.swyx: Sometimes the, I don't know if there's a, this probably, you guys probably figured this out already, but since I, you need like a mute button. So like cursor, like we're going to take this offline, but still online.But like we need to talk among the humans first. Before you like could stop responding to everything.Jonas: Yeah. This is a design decision where currently cursor won't chime in unless you explicitly add Mention it. Yeah. Yeah.Samantha: So it's not always listening.Yeah.Jonas: I can see all the intermediate messages.swyx: Have you done the recursive, can cursor add another cursor or spawn another cursor?Samantha: Oh,Jonas: we've done some versions of this.swyx: Because, ‘cause it can add humans.Jonas: Yes. One of the other things we've been working on that's like an implication of generating the code is so easy is getting it to production is still harder than it should be.And broadly, you solve one bottleneck and three new ones pop up. Yeah. And so one of the new bottlenecks is getting into production and we have a like joke internally where you'll be talking about some feature and someone says, I have a PR for that. Which is it's so easy [00:23:00] to get to, I a PR for that, but it's hard still relatively to get from I a PR for that to, I'm confident and ready to merge this.And so I think that over the coming weeks and months, that's a thing that we think a lot about is how do we scale up compute to that pipeline of getting things from a first draft An agent did.swyx: Isn't that what Merge isn't know what graphite's for, likeJonas: graphite is a big part of that. The cloud agent testingswyx: Is it fully integrated or still different companiesJonas: working on I think we'll have more to share there in the future, but the goal is to have great end-to-end experience where Cursor doesn't just help you generate code tokens, it helps you create software end-to-end.And so review is a big part of that, that I think especially as models have gotten much better at writing code, generating code, we've felt that relatively crop up more,swyx: sorry this is completely unplanned, but like there I have people arguing one to you need ai. To review ai and then there is another approach, thought school of thought where it's no, [00:24:00] reviews are dead.Like just show me the video. It's it like,Samantha: yeah. I feel again, for me, the video is often like alignment and then I often still wanna go through a code review process.swyx: Like still look at the files andSamantha: everything. Yeah. There's a spectrum of course. Like the video, if it's really well done and it does like fully like test everything, you can feel pretty competent, but it's still helpful to, to look at the code.I make hep pay a lot of attention to bug bot. I feel like Bug Bot has been a great really highly adopted internally. We often like, won't we tell people like, don't leave bug bot comments unaddressed. ‘cause we have such high confidence in it. So people always address their bug bot comments.Jonas: Once you've had two cases where you merged something and then you went back later, there was a bug in it, you merged, you went back later and you were like, ah, bug Bot had found that I should have listened to Bug Bot.Once that happens two or three times, you learn to wait for bug bot.Samantha: Yeah. So I think for us there's like that code level review where like it's looking at the actual code and then there's like the like feature level review where you're looking at the features. There's like a whole number of different like areas.There'll probably eventually be things like performance level review, security [00:25:00] review, things like that where it's like more more different aspects of how this feature might affect your code base that you want to potentially leverage an agent to help with.Jonas: And some of those like bug bot will be synchronous and you'll typically want to wait on before you merge.But I think another thing that we're starting to see is. As with cloud agents, you scale up this parallelism and how much code you generate. 10 person startups become, need the Devrel X and pipelines that a 10,000 person company used to need. And that looks like a lot of the things I think that 10,000 person companies invented in order to get that volume of software to production safely.So that's things like, release frequently or release slowly, have different stages where you release, have checkpoints, automated ways of detecting regressions. And so I think we're gonna need stacks merg stack diffs merge queues. Exactly. A lot of those things are going to be importantswyx: forward with.I think the majority of people still don't know what stack stacks are. And I like, I have many friends in Facebook and like I, I'm pretty friendly with graphite. I've just, [00:26:00] I've never needed it ‘cause I don't work on that larger team and it's just like democratization of no, only here's what we've already worked out at very large scale and here's how you can, it benefits you too.Like I think to me, one of the beautiful things about GitHub is that. It's actually useful to me as an individual solo developer, even though it's like actually collaboration software.Jonas: Yep.swyx: And I don't think a lot of Devrel tools have figured that out yet. That transition from like large down to small.Jonas: Yeah. Kers is probably an inverse story.swyx: This is small down toJonas: Yeah. Where historically Kers share, part of why we grew so quickly was anyone on the team could pick it up and in fact people would pick it up, on the weekend for their side project and then bring it into work. ‘cause they loved using it so much.swyx: Yeah.Jonas: And I think a thing that we've started working on a lot more, not us specifically, but as a company and other folks at Cursor, is making it really great for teams and making it the, the 10th person that starts using Cursor in a team. Is immediately set up with things like, we launched Marketplace recently so other people can [00:27:00] configure what CPS and skills like plugins.So skills and cps, other people can configure that. So that my cursor is ready to go and set up. Sam loves the Datadog, MCP and Slack, MCP you've also been using a lot butSamantha: also pre-launch, but I feel like it's so good.Jonas: Yeah, my cursor should be configured if Sam feels strongly that's just amazing and required.swyx: Is it automatically shared or you have to go and.Jonas: It depends on the MCP. So some are obviously off per user. Yeah. And so Sam can't off my cursor with my Slack MCP, but some are team off and those can be set up by admins.swyx: Yeah. Yeah. That's cool. Yeah, I think, we had a man on the pod when cursor was five people, and like everyone was like, okay, what's the thing?And then it's usually something teams and org and enterprise, but it's actually working. But like usually at that stage when you're five, when you're just a vs. Code fork it's like how do you get there? Yeah. Will people pay for this? People do pay for it.Jonas: Yeah. And I think for cloud agents, we expect.[00:28:00]To have similar kind of PLG things where I think off the bat we've seen a lot of adoption with kind of smaller teams where the code bases are not quite as complex to set up. Yes. If you need some insane docker layer caching thing for builds not to take two hours, that's going to take a little bit longer for us to be able to support that kind of infrastructure.Whereas if you have front end backend, like one click agents can install everything that they need themselves.swyx: This is a good chance for me to just ask some technical sort of check the box questions. Can I choose the size of the vm?Jonas: Not yet. We are planning on adding that. Weswyx: have, this is obviously you want like LXXL, whatever, right?Like it's like the Amazon like sort menu.Jonas: Yes, exactly. We'll add that.swyx: Yeah. In some ways you have to basically become like a EC2, almost like you rent a box.Jonas: You rent a box. Yes. We talk a lot about brain in a box. Yeah. So cursor, we want to be a brain in a box,swyx: but is the mental model different? Is it more serverless?Is it more persistent? Is. Something else.Samantha: We want it to be a bit persistent. The desktop should be [00:29:00] something you can return to af even after some days. Like maybe you go back, they're like still thinking about a feature for some period of time. So theswyx: full like sus like suspend the memory and bring it back and then keep going.Samantha: Exactly.swyx: That's an interesting one because what I actually do want, like from a manna and open crawl, whatever, is like I want to be able to log in with my credentials to the thing, but not actually store it in any like secret store, whatever. ‘cause it's like this is the, my most sensitive stuff.Yeah. This is like my email, whatever. And just have it like, persist to the image. I don't know how it was hood, but like to rehydrate and then just keep going from there. But I don't think a lot of infra works that way. A lot of it's stateless where like you save it to a docker image and then it's only whatever you can describe in a Docker file and that's it.That's the only thing you can cl multiple times in parallel.Jonas: Yeah. We have a bunch of different ways of setting them up. So there's a dockerfile based approach. The main default way is actually snapshottingswyx: like a Linux vmJonas: like vm, right? You run a bunch of install commands and then you snapshot more or less the file system.And so that gets you set up for everything [00:30:00] that you would want to bring a new VM up from that template basically.swyx: Yeah.Jonas: And that's a bit distinct from what Sam was talking about with the hibernating and re rehydrating where that is a full memory snapshot as well. So there, if I had like the browser open to a specific page and we bring that back, that page will still be there.swyx: Was there any discussion internally and just building this stuff about every time you shoot a video it's actually you show a little bit of the desktop and the browser and it's not necessary if you just show the browser. If, if you know you're just demoing a front end application.Why not just show the browser, right? Like it Yeah,Samantha: we do have some panning and zooming. Yeah. Like it can decide that when it's actually recording and cutting the video to highlight different things. I think we've played around with different ways of segmenting it and yeah. There's been some different revs on it for sure.Jonas: Yeah. I think one of the interesting things is the version that you see now in cursor.com actually is like half of what we had at peak where we decided to unshift or unshipped quite a few things. So two of the interesting things to talk about, one is directly an answer to your [00:31:00] question where we had native browser that you would have locally, it was basically an iframe that via port forwarding could load the URL could talk to local host in the vm.So that gets you basically, so inswyx: your machine's browser,likeJonas: in your local browser? Yeah. You would go to local host 4,000 and that would get forwarded to local host 4,000 in the VM via port forward. We unshift that like atswyx: Eng Rock.Jonas: Like an Eng Rock. Exactly. We unshift that because we felt that the remote desktop was sufficiently low latency and more general purpose.So we build Cursor web, but we also build Cursor desktop. And so it's really useful to be able to have the full spectrum of things. And even for Cursor Web, as you saw in one of the examples, the agent was uploading files and like I couldn't upload files and open the file viewer if I only had access to the browser.And we've thought a lot about, this might seem funny coming from Cursor where we started as this, vs. Code Fork and I think inherited a lot of amazing things, but also a lot [00:32:00] of legacy UI from VS Code.Minimal Web UI SurfacesJonas: And so with the web UI we wanted to be very intentional about keeping that very minimal and exposing the right sum of set of primitive sort of app surfaces we call them, that are shared features of that cloud.Environment that you and the agent both use. So agent uses desktop and controls it. I can use desktop and controlled agent runs terminal commands. I can run terminal commands. So that's how our philosophy around it. The other thing that is maybe interesting to talk about that we unshipped is and we may, both of these things we may reship and decide at some point in the future that we've changed our minds on the trade offs or gotten it to a point where, putswyx: it out there.Let users tell you they want it. Exactly. Alright, fine.Why No File EditorJonas: So one of the other things is actually a files app. And so we used to have the ability at one point during the process of testing this internally to see next to, I had GID desktop and terminal on the right hand side of the tab there earlier to also have a files app where you could see and edit files.And we actually felt that in some [00:33:00] ways, by restricting and limiting what you could do there, people would naturally leave more to the agent and fall into this new pattern of delegating, which we thought was really valuable. And there's currently no way in Cursor web to edit these files.swyx: Yeah. Except you like open up the PR and go into GitHub and do the thing.Jonas: Yeah.swyx: Which is annoying.Jonas: Just tell the agent,swyx: I have criticized open AI for this. Because Open AI is Codex app doesn't have a file editor, like it has file viewer, but isn't a file editor.Jonas: Do you use the file viewer a lot?swyx: No. I understand, but like sometimes I want it, the one way to do it is like freaking going to no, they have a open in cursor button or open an antigravity or, opening whatever and people pointed that.So I was, I was part of the early testers group people pointed that and they were like, this is like a design smell. It's like you actually want a VS. Code fork that has all these things, but also a file editor. And they were like, no, just trust us.Jonas: Yeah. I think we as Cursor will want to, as a product, offer the [00:34:00] whole spectrum and so you want to be able to.Work at really high levels of abstraction and double click and see the lowest level. That's important. But I also think that like you won't be doing that in Slack. And so there are surfaces and ways of interacting where in some cases limiting the UX capabilities makes for a cleaner experience that's more simple and drives people into these new patterns where even locally we kicked off joking about this.People like don't really edit files, hand code anymore. And so we want to build for where that's going and not where it's beenswyx: a lot of cool stuff. And Okay. I have a couple more.Full Stack Hosting Debateswyx: So observations about the design elements about these things. One of the things that I'm always thinking about is cursor and other peers of cursor start from like the Devrel tools and work their way towards cloud agents.Other people, like the lovable and bolts of the world start with here's like the vibe code. Full cloud thing. They were already cloud edges before anyone else cloud edges and we will give you the full deploy platform. So we own the whole loop. We own all the infrastructure, we own, we, we have the logs, we have the the live site, [00:35:00] whatever.And you can do that cycle cursor doesn't own that cycle even today. You don't have the versal, you don't have the, you whatever deploy infrastructure that, that you're gonna have, which gives you powers because anyone can use it. And any enterprise who, whatever you infra, I don't care. But then also gives you limitations as to how much you can actually fully debug end to end.I guess I'm just putting out there that like is there a future where there's like full stack cursor where like cursor apps.com where like I host my cursor site this, which is basically a verse clone, right? I don't know.Jonas: I think that's a interesting question to be asking, and I think like the logic that you laid out for how you would get there is logic that I largely agree with.swyx: Yeah. Yeah.Jonas: I think right now we're really focused on what we see as the next big bottleneck and because things like the Datadog MCP exist, yeah. I don't think that the best way we can help our customers ship more software. Is by building a hosting solution right now,swyx: by the way, these are things I've actually discussed with some of the companies I just named.Jonas: Yeah, for sure. Right now, just this big bottleneck is getting the code out there and also [00:36:00] unlike a lovable in the bolt, we focus much more on existing software. And the zero to one greenfield is just a very different problem. Imagine going to a Shopify and convincing them to deploy on your deployment solution.That's very different and I think will take much longer to see how that works. May never happen relative to, oh, it's like a zero to one app.swyx: I'll say. It's tempting because look like 50% of your apps are versal, superb base tailwind react it's the stack. It's what everyone does.So I it's kinda interesting.Jonas: Yeah.Model Choice and Auto Routingswyx: The other thing is the model select dying. Right now in cloud agents, it's stuck down, bottom left. Sure it's Codex High today, but do I care if it's suddenly switched to Opus? Probably not.Samantha: We definitely wanna give people a choice across models because I feel like it, the meta change is very frequently.I was a big like Opus 4.5 Maximalist, and when codex 5.3 came out, I hard, hard switch. So that's all I use now.swyx: Yeah. Agreed. I don't know if, but basically like when I use it in Slack, [00:37:00] right? Cursor does a very good job of exposing yeah. Cursors. If people go use it, here's the model we're using.Yeah. Here's how you switch if you want. But otherwise it's like extracted away, which is like beautiful because then you actually, you should decide.Jonas: Yeah, I think we want to be doing more with defaults.swyx: Yeah.Jonas: Where we can suggest things to people. A thing that we have in the editor, the desktop app is auto, which will route your request and do things there.So I think we will want to do something like that for cloud agents as well. We haven't done it yet. And so I think. We have both people like Sam, who are very savvy and want know exactly what model they want, and we also have people that want us to pick the best model for them because we have amazing people like Sam and we, we are the experts.Yeah. We have both the traffic and the internal taste and experience to know what we think is best.swyx: Yeah. I have this ongoing pieces of agent lab versus model lab. And to me, cursor and other companies are example of an agent lab that is, building a new playbook that is different from a model lab where it's like very GP heavy Olo.So obviously has a research [00:38:00] team. And my thesis is like you just, every agent lab is going to have a router because you're going to be asked like, what's what. I don't keep up to every day. I'm not a Sam, I don't keep up every day for using you as sample the arm arbitrator of taste. Put me on CRI Auto.Is it free? It's not free.Jonas: Auto's not free, but there's different pricing tiers. Yeah.swyx: Put me on Chris. You decide from me based on all the other people you know better than me. And I think every agent lab should basically end up doing this because that actually gives you extra power because you like people stop carrying or having loyalty with one lab.Jonas: Yeah.Best Of N and Model CouncilsJonas: Two other maybe interesting things that I don't know how much they're on your radar are one the best event thing we mentioned where running different models head to head is actually quite interesting becauseswyx: which exists in cursor.Jonas: That exists in cur ID and web. So the problem is where do you run them?swyx: Okay.Jonas: And so I, I can share my screen if that's interesting. Yeahinteresting.swyx: Yeah. Yeah. Obviously parallel agents, very popal.Jonas: Yes, exactly. Parallel agentsswyx: in you mind. Are they the same thing? Best event and parallel agents? I don't want to [00:39:00] put words in your mouth.Jonas: Best event is a subset of parallel agents where they're running on the same prompt.That would be my answer. So this is what that looks like. And so here in this dropdown picker, I can just select multiple models.swyx: Yeah.Jonas: And now if I do a prompt, I'm going to do something silly. I am running these five models.swyx: Okay. This is this fake clone, of course. The 2.0 yeah.Jonas: Yes, exactly. But they're running so the cursor 2.0, you can do desktop or cloud.So this is cloud specifically where the benefit over work trees is that they have their own VMs and can run commands and won't try to kill ports that the other one is running. Which are some of the pains. These are allswyx: called work trees?Jonas: No, these are all cloud agents with their own VMs.swyx: Okay. ButJonas: When you do it locally, sometimes people do work trees and that's been the main way that people have set out parallel so far.I've gotta say.swyx: That's so confusing for folks.Jonas: Yeah.swyx: No one knows what work trees are.Jonas: Exactly. I think we're phasing out work trees.swyx: Really.Jonas: Yeah.swyx: Okay.Samantha: But yeah. And one other thing I would say though on the multimodel choice, [00:40:00] so this is another experiment that we ran last year and the decide to ship at that time but may come back to, and there was an interesting learning that's relevant for, these different model providers. It was something that would run a bunch of best of ends but then synthesize and basically run like a synthesizer layer of models. And that was other agents that would take LM Judge, but one that was also agentic and could write code. So it wasn't just picking but also taking the learnings from two models or, and models that it was looking at and writing a new diff.And what we found was that at the time at least, there were strengths to using models from different model providers as the base level of this process. Like basically you could get almost like a synergistic output that was better than having a very unified, like bottom model tier. So it was really interesting ‘cause it's like potentially, even though even in the future when you have like maybe one model as ahead of the other for a little bit, there could be some benefit from having like multiple top tier models involved in like a [00:41:00] model swarm or whatever agent Swarm that you're doing, that they each have strengths and weaknesses.Yeah.Jonas: Andre called this the council, right?Samantha: Yeah, exactly. We actually, oh, that's another internal command we have that Ian wrote slash council. Oh, and they some, yeah.swyx: Yes. This idea is in various forms everywhere. And I think for me, like for me, the productization of it, you guys have done yeah, like this is very flexible, but.If I were to add another Yeah, what your thing is on here it would be too much. I what, let's say,Samantha: Ideally it's all, it's something that the user can just choose and it all happens under the hood in a way where like you just get the benefit of that process at the end and better output basically, but don't have to get too lost in the complexity of judging along the way.Jonas: Okay.Subagents for ContextJonas: Another thing on the many agents, on different parallel agents that's interesting is an idea that's been around for a while as well that has started working recently is subagents. And so this is one other way to get agents of the different prompts and different goals and different models, [00:42:00] different vintages to work together.Collaborate and delegate.swyx: Yeah. I'm very like I like one of my, I always looking for this is the year of the blah, right? Yeah. I think one of the things on the blahs is subs. I think this is of but I haven't used them in cursor. Are they fully formed or how do I honestly like an intro because do I form them from new every time?Do I have fixed subagents? How are they different for slash commands? There's all these like really basic questions that no one stops to answer for people because everyone's just like too busy launching. We have toSamantha: honestly, you could, you can see them in cursor now if you just say spin up like 50 subagents to, so cursor definesswyx: what Subagents.Yeah.Samantha: Yeah. So basically I think I shouldn't speak for the whole subagents team. This is like a different team that's been working on this, but our thesis or thing that we saw internally is that like they're great for context management for kind of long running threads, or if you're trying to just throw more compute at something.We have strongly used, almost like a generic task interface where then the main agent can define [00:43:00] like what goes into the subagent. So if I say explore my code base, it might decide to spin up an explore subagent and or might decide to spin up five explore subagent.swyx: But I don't get to set what those subagent are, right?It's all defined by a model.Samantha: I think. I actually would have to refresh myself on the sub agent interface.Jonas: There are some built-in ones like the explore subagent is free pre-built. But you can also instruct the model to use other subagents and then it will. And one other example of a built-in subagent is I actually just kicked one off in cursor and I can show you what that looks like.swyx: Yes. Because I tried to do this in pure prompt space.Jonas: So this is the desktop app? Yeah. Yeah. And that'sswyx: all you need to do, right? Yeah.Jonas: That's all you need to do. So I said use a sub agent to explore and I think, yeah, so I can even click in and see what the subagent is working on here. It ran some fine command and this is a composer under the hood.Even though my main model is Opus, it does smart routing to take, like in this instance the explorer sort of requires reading a ton of things. And so a faster model is really useful to get an [00:44:00] answer quickly, but that this is what subagent look like. And I think we wanted to do a lot more to expose hooks and ways for people to configure these.Another example of a cus sort of builtin subagent is the computer use subagent in the cloud agents, where we found that those trajectories can be long and involve a lot of images obviously, and execution of some testing verification task. We wanted to use that models that are particularly good at that.So that's one reason to use subagents. And then the other reason to use subagents is we want contexts to be summarized reduced down at a subagent level. That's a really neat boundary at which to compress that rollout and testing into a final message that agent writes that then gets passed into the parent rather than having to do some global compaction or something like that.swyx: Awesome. Cool. While we're in the subagents conversation, I can't do a cursor conversation and not talk about listen stuff. What is that? What is what? He built a browser. He built an os. Yes. And he [00:45:00] experimented with a lot of different architectures and basically ended up reinventing the software engineer org chart.This is all cool, but what's your take? What's, is there any hole behind the side? The scenes stories about that kind of, that whole adventure.Samantha: Some of those experiments have found their way into a feature that's available in cloud agents now, the long running agent mode internally, we call it grind mode.And I think there's like some hint of grind mode accessible in the picker today. ‘cause you can do choose grind until done. And so that was really the result of experiments that Wilson started in this vein where he I think the Ralph Wigga loop was like floating around at the time, but it was something he also independently found and he was experimenting with.And that was what led to this product surface.swyx: And it is just simple idea of have criteria for completion and do not. Until you complete,Samantha: there's a bit more complexity as well in, in our implementation. Like there's a specific, you have to start out by aligning and there's like a planning stage where it will work with you and it will not get like start grind execution mode until it's decided that the [00:46:00] plan is amenable to both of you.Basically,swyx: I refuse to work until you make me happy.Jonas: We found that it's really important where people would give like very underspecified prompt and then expect it to come back with magic. And if it's gonna go off and work for three minutes, that's one thing. When it's gonna go off and work for three days, probably should spend like a few hours upfront making sure that you have communicated what you actually want.swyx: Yeah. And just to like really drive from the point. We really mean three days that No, noJonas: human. Oh yeah. We've had three day months innovation whatsoever.Samantha: I don't know what the record is, but there's been a long time with the grantsJonas: and so the thing that is available in cursor. The long running agent is if you wanna think about it, very abstractly that is like one worker node.Whereas what built the browser is a society of workers and planners and different agents collaborating. Because we started building the browser with one worker node at the time, that was just the agent. And it became one worker node when we realized that the throughput of the system was not where it needed to be [00:47:00] to get something as large of a scale as the browser done.swyx: Yeah.Jonas: And so this has also become a really big mental model for us with cloud, cloud agents is there's the classic engineering latency throughput trade-offs. And so you know, the code is water flowing through a pipe. The, we think that over the coming months, the big unlock is not going to be one person with a model getting more done, like the water flowing faster and we'll be making the pipe much wider and so ing more, whether that's swarms of agents or parallel agents, both of those are things that contribute to getting.Much more done in the same amount of time, but any one of those tasks doesn't necessarily need to get done that quickly. And throughput is this really big thing where if you see the system of a hundred concurrent agents outputting thousands of tokens a second, you can't go back like that.Just you see a glimpse of the future where obviously there are many caveats. Like no one is using this browser. IRL. There's like a bunch of things not quite right yet, but we are going to get to systems that produce real production [00:48:00] code at the scale much sooner than people think. And it forces you to think what even happens to production systems. Like we've broken our GitHub actions recently because we have so many agents like producing and pushing code that like CICD is just overloaded. ‘cause suddenly it's like effectively weg grew, cursor's growing very quickly anyway, but you grow head count, 10 x when people run 10 x as many agents.And so a lot of these systems, exactly, a lot of these systems will need to adapt.swyx: It also reminds me, we, we all, the three of us live in the app layer, but if you talk to the researchers who are doing RL infrastructure, it's the same thing. It's like all these parallel rollouts and scheduling them and making sure as much throughput as possible goes through them.Yeah, it's the same thing.Jonas: We were talking briefly before we started recording. You were mentioning memory chips and some of the shortages there. The other thing that I think is just like hard to wrap your head around the scale of the system that was building the browser, the concurrency there.If Sam and I both have a system like that running for us, [00:49:00] shipping our software. The amount of inference that we're going to need per developer is just really mind-boggling. And that makes, sometimes when I think about that, I think that even with, the most optimistic projections for what we're going to need in terms of buildout, our underestimating, the extent to which these swarm systems can like churn at scale to produce code that is valuable to the economy.And,swyx: yeah, you can cut this if it's sensitive, but I was just Do you have estimates of how much your token consumption is?Jonas: Like per developer?swyx: Yeah. Or yourself. I don't need like comfy average. I just curious. ISamantha: feel like I, for a while I wasn't an admin on the usage dashboard, so I like wasn't able to actually see, but it was a,swyx: mine has gone up.Samantha: Oh yeah.swyx: But I thinkSamantha: it's in terms of how much work I'm doing, it's more like I have no worries about developers losing their jobs, at least in the near term. ‘cause I feel like that's a more broad discussion.swyx: Yeah. Yeah. You went there. I didn't go, I wasn't going there.I was just like how much more are you using?Samantha: There's so much stuff to be built. And so I feel like I'm basically just [00:50:00] trying to constantly I have more ambitions than I did before. Yes. Personally. Yes. So can't speak to the broader thing. But for me it's like I'm busier than ever before.I'm using more tokens and I am also doing more things.Jonas: Yeah. Yeah. I don't have the stats for myself, but I think broadly a thing that we've seen, that we expect to continue is J'S paradox. Whereswyx: you can't do it in our podcast without seeingJonas: it. Exactly. We've done it. Now we can wrap. We've done, we said the words.Phase one tab auto complete people paid like 20 bucks a month. And that was great. Phase two where you were iterating with these local models. Today people pay like hundreds of dollars a month. I think as we think about these highly parallel kind of agents running off for a long times in their own VM system, we are already at that point where people will be spending thousands of dollars a month per human, and I think potentially tens of thousands and beyond, where it's not like we are greedy for like capturing more money, but what happens is just individuals get that much more leverage.And if one person can do as much as 10 people, yeah. That tool that allows ‘em to do that is going to be tremendously valuable [00:51:00] and worth investing in and taking the best thing that exists.swyx: One more question on just the cursor in general and then open-ended for you guys to plug whatever you wanna put.How is Cursor hiring these days?Samantha: What do you mean by how?swyx: So obviously lead code is dead. Oh,Samantha: okay.swyx: Everyone says work trial. Different people have different levels of adoption of agents. Some people can really adopt can be much more productive. But other people, you just need to give them a little bit of time.And sometimes they've never lived in a token rich place like cursor.And once you live in a token rich place, you're you just work differently. But you need to have done that. And a lot of people anyway, it was just open-ended. Like how has agentic engineering, agentic coding changed your opinions on hiring?Is there any like broad like insights? Yeah.Jonas: Basically I'm asking this for other people, right? Yeah, totally. Totally. To hear Sam's opinion, we haven't talked about this the two of us. I think that we don't see necessarily being great at the latest thing with AI coding as a prerequisite.I do think that's a sign that people are keeping up and [00:52:00] curious and willing to upscale themselves in what's happening because. As we were talking about the last three months, the game has completely changed. It's like what I do all day is very different.swyx: Like it's my job and I can't,Jonas: Yeah, totally.I do think that still as Sam was saying, the fundamentals remain important in the current age and being able to go and double click down. And models today do still have weaknesses where if you let them run for too long without cleaning up and refactoring, the coke will get sloppy and there'll be bad abstractions.And so you still do need humans that like have built systems before, no good patterns when they see them and know where to steer things.Samantha: I would agree with that. I would say again, cursor also operates very quickly and leveraging ag agentic engineering is probably one reason why that's possible in this current moment.I think in the past it was just like people coding quickly and now there's like people who use agents to move faster as well. So it's part of our process will always look for we'll select for kind of that ability to make good decisions quickly and move well in this environment.And so I think being able to [00:53:00] figure out how to use agents to help you do that is an important part of it too.swyx: Yeah. Okay. The fork in the road, either predictions for the end of the year, if you have any, or PUDs.Jonas: Evictions are not going to go well.Samantha: I know it's hard.swyx: They're so hard. Get it wrong.It's okay. Just, yeah.Jonas: One other plug that may be interesting that I feel like we touched on but haven't talked a ton about is a thing that the kind of these new interfaces and this parallelism enables is the ability to hop back and forth between threads really quickly. And so a thing that we have,swyx: you wanna show something or,Jonas: yeah, I can show something.A thing that we have felt with local agents is this pain around contact switching. And you have one agent that went off and did some work and another agent that, that did something else. And so here by having, I just have three tabs open, let's say, but I can very quickly, hop in here.This is an example I showed earlier, but the actual workflow here I think is really different in a way that may not be obvious, where, I start t
Try VibeCode here: https://www.vibecodeapp.com/sign-up?code=ref-38wevq7q7tlwMost entrepreneurs and no-code builders are missing a seismic shift happening right now — one that's transforming how startup ideas go from concept to product in weeks, not months. In this episode, Dan Hafner dives into his recent obsession with Claude Code and how it's revolutionizing his approach to product development through tools like VibeCode, revealing how he built real apps in just a month, from internal management systems to complete business operating platforms, all without writing a single line of code. You'll discover the core capabilities of Claude Code, practical workflows combining it with VS Code and GitHub, the most cost-effective ways to leverage models like Opus and Sonnet without burning through credits, and how to build responsive web and native iOS apps today — potentially without any developer resources. This episode is a clear call to action for founders, entrepreneurs, and no-code enthusiasts ready to rethink what's possible: those who master these tools now will launch faster, iterate quicker, and stay ahead in a world where your next big idea could be just a prompt away.Contact Dan: dan@dappernocode.com
Hey folks, Alex here, let me catch you up! Most important news about this week came today, mid-show, OpenAI dropped GPT 5.4 Thinking (and 5.4 Pro), their latest flagship general model, less autistic than Codex 5.3, with 1M context, /fast mode and the ability to steet it mid-reasoning. We tested it live on the show, it's really a beast. Also, since last week, Anthropic said no to Department of War's ultimatum and it looks like they are being designated as supply chain risk, OpenAI swooped in to sign a deal with DoW and the internet went ballistic (Dario also had some .. choice words in a leaked memo!) On the Open Source front, the internet lost it's damn mind when a friend of the pod Junyang Lin, announced his departure from Qwen in a tweet, causing an uproar, and the CEO of Alibaba to intervene. Wolfram presented our new in-house wolfbench.ai and a lot more! P.S - We acknowledge the war in Iran, and wish a quick resolution, the safety of civilians on both sides. Yam had to run to the shelter multiple times during the show. ThursdAI - Highest signal weekly AI news show is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.OpenAI drops GPT 5.4 Thinking and 5.4 Pro - heavy weight frontier models with 1M context, /fast mode, SOTA on many evalsOpenAI actually opened this week with another model drop, GPT 5.3-instant, which... we can honestly skip, it was fairly insignificant besides noting that this is the model that most free users use. It is supposedly “less cringe” (actual words OpenAI used). We all wondered when 5.4 will, and OpenAI once again proved that we named the show after the right day. Of course it drops on a ThursdAI. GPT 5.4 Thinking is OpenAI latest “General” model, which can still code, yes (they folded most of the Codex 5.3 coding breakthroughs in here) but it also shows an incredible 83% on GDPVal (12% over Codex), 47% on Frontier Math and an incredible ability to use computers and browsers with 82% on BrowseComp beating Claude 4.6 at lower prices than Sonnet! GPT 5.4 is also ... quite significantly improved at Frontend design? This landing page was created by GPT 5.4 (inside the Codex app, newly available on Windows) in a few minutes, clearly showing significant improvements in style. I built it also to compare prices, all the 3 flagship models are trying to catch up to Gemini in 1M context window, and it's important to note, that GPT 5.4 even at double the price after the 272K tokens cutoff is still.... cheaper than Opus 4.6. OpenAI is really going for broke here, specifically as many enterprises are adopting Anthropic at a faster and faster pace (it was reported that Anthropic is approaching 19B ARR this month, doubling from 8B just a few months ago!) Frontier math wizThe highlight from the 5.4 feedback came from a Polish mathematician Bartosz Naskręcki (@nasqret on X), who said GPT-5.4 solved a research-level FrontierMath problem he had been working on for roughly 20 years. He called it his “personal singularity,” and as overused as that word has become, I get why he said it. I've told you about this last week, we're on the cusp. Coding efficiencyThere's tons of metrics in this release, but I wanted to highlight this one, where it may seem on first glance that on SWE-bench Pro, this model is on par with the previous SOTA GPT 5.3 codex, but these dots here are thinking efforts. And a medium thinking effort, GPT 5.4 matches 5.3 on hard thinking! This is quite remarkable, as lower thinking efforts have less tokens, which means they are cheaper and faster ultimately! Fast mode arrives at OpenAI as wellI think this one is a direct “this worked for Anthropic, lets steal this”, OpenAI enabled /fast mode that.. burns the tokens at 2x the rate, and prioritizes your tokens at 1.5x the speed. So, essentially getting you responses faster (which was one of the main complains about GPT 5.3 Codex). I can't wait to bring the fast mode to OpenClaw with 5.4, which will absolutely come as OpenClaw is part of OpenAI now. There's also a really under-appreciated feature here that I think other labs are going to copy quickly: mid-thought steering. OpenAI now lets you interrupt the model while it's thinking and redirect it in real time in ChatGPT and iOS. This is a godsend if you're like me, sent a prompt, seeing the model go down the wrong path in thinking... and want to just.. steer it without stopping! Anthropic is now designated as supply-chain risk by DoWLast week I left you with a cliffhanger: Anthropic had received an ultimatum from the Department of War (previously the Department of Defense) to remove their two remaining restrictions on Claude — no autonomous kill chain without human intervention, and no surveillance of US citizens. Anthropic's response? “we cannot in good conscience acceede to their request” So much has happened since then; US President Trump said “I fired Anthropic” referring to his Truth Social post demanding intelligence agencies drop the use of Claude (which apparently was used in the war with Iran regardless); Sam Altman announced that OpenAI has agreed to DoW and will provide OpenAI models, causing a lot of people to cancel their OpenAI subscriptions, and later apologizing for the “rushed rollout”; Dario Amodei posted a very contentious internal memo that leaked, in which he name-called the presidency, Sam Altman and his motives, Palantir and their “safety theater”, for which he later apologizedHonestly this whole thing is giving me whiplash trying to follow, but here's the facts. Anthropic is now the first US company in history, being designated “supply chain risk” which means no government agency can use Claude, and neither can any company that does contracts with DoW. Anthropic says it's illegal and will challenge this in court , while reporting $19B in annual recurring revenue, nearly doubling since last 3 months, and very closely approaching OpenAI at $25B. Look, did I want to report on this stuff when I decided to cover AI? no... I wanted to tell you about cool models and capabilities, but the world is changing, and it's important to know that the US Government understands now that AI is inevitable, and I think this is just the first of many clashes between tech and government we'll see. We'll keep reporting on both. (but let me know in the comments if you'd prefer just model releases) OpenAI's GPT-5.3 Instant Gets Less Cringe, Google's Flash-Lite Gets Faster (X, Announcement)We also got two fast-model updates this week that are worth calling out because these are the models that often end up powering real product flows behind the scenes. As I wrote before, OpenAI's instant model is nothing to really mention, but it's worth mentioning that OpenAI seems to have an answer for every Gemini release. Gemini released Gemini Flash-lite this week, which boasts an incredible 363 tokens/s speed, which doing math at a very good level, 1M context and great scores compared to the instant/fast models like Haiku from Anthropic. Folks called out that this model is more expensive than the previous 2.5 Flash-lite. But with 86.9% on GPQA Diamond beating GPT-5 mini, and 76.8% MMMU-pro multimodal reasoning, this is definitely worth taking a look at for many agentic, super fast responses! For example, the heartbeat response in OpenClaw. Qwen 3.5 Small Models & The Departure of Junyang Lin (X, HF, HF, HF)Alibaba's Qwen team continued releasing their Qwen 3.5 family, this time with Qwen 3.5 Small, a series of models at 0.8B, 2B, 4B, and 9B parameters with native multimodal capabilities. The flagship 9B model is beating GPT-OSS-120B on multiple benchmarks, scoring 82.5 on MMLU-Pro and 81.7 on GPQA Diamond. These models can handle video, documents, and images natively, support up to 201 languages, and can process up to 262K tokens of context. And.. they are great! They are trending on HF right now. What's also trending is, tech lead for Qwen, a friend of the pod Junyang Lin, has posted a cryptic tweet that went viral with over 6M views. There was a lot of discussions on why he and other Qwen leads are stepping away, what's goig to happen with the future of OpenSource. The full picture seems to be, there are a lot of internal tensions and politics, with Junyang being one of the youngest P10 leaders in the Alibaba org.A Chinese website 36KR ( Kind of like a chinese techcrunch) reported that this matter went all the way up to Alibaba CEO, who is no co-leading the qwen team, and that this resignation was related to an internal dispute over resource allocation and team consolidation, not a firing. I'm sure Junyang is going to land somewhere incredible and just wanted to highlight just how much he did for the open source community, pushing Qwen relentlessly, supporting and working with a lot of inference providers (and almost becoming a co-host for ThursdAI with 9! appearances!) StepFun releases Step 3.5 Flash Base (X, HF, HF, Announcement, Arxiv)Speaking of Open Source, StepFun just broke through the noise with a new model, a 196B parameter sparse Mixture of Experts model activating just 11B parameters when ran. It has some great benchmarks, but the main thing is this: they are releasing the pretrained base weights, a midtrain checkpoint optimized for code and agents, the complete SteptronOSS training framework, AND promising to release their SFT data soon - all under Apache 2.0! Technically the model looks strong too, with multi-token prediction, 74.4% on SWE-bench verified bench (though, as we told you last week, it's.. no longer trusted) and full apache 2! This Week's Buzz: presenting Wolfbench.ai I'm so excited about this weeks “this weeks buzz”, Wolfram has been hard at work preparing and presenting a new framework to test out these models, and named it wolfbench.ai Wolfbench is our attempt to compare how the same model performs via different agentic harnesses like ClaudeCode, OpenClaw and Terminalbench's own Terminus. You can check out the website on wolfbench.com but the short of it is, a single number is not telling the full story. Wolf Bench breaks it into a four-metric framework: the average score across runs, the best single run, the ceiling (how many tasks can the model solve at least once across all runs), and the floor (how many tasks does it solve consistently across every single run). That last one is what I find most illuminating. Opus 4.6 might be able to solve 88% of Terminal Bench tasks on average, but only about 55% of tasks it solves every single time. Reliability matters enormously for agents, and benchmarks almost never surface this. If you want to run your own evals with the same config, reach out to Wolfram—he's open to community contributions. Wolfram has also already kicked off a Wolf Bench run on GPT-5.4 since we tested it live today, so stay tuned for those results.There's quite a few more releases we didn't have time to get into on the show given the GPT 5.4 drop, you'll find all those links in the show notes! Next week will mark 3 years since I've started talking about AI on the internet and created ThursdAI (It was March 14th, 2023, same day as GPT4 launched) and we'll have a little celebration, I do hope you join us live
Can an AI actually help you think more clearly, not just write faster? In this episode, we dive deep into Claude AI, the powerhouse model from Anthropic that is redefining how professionals approach research and long-form content. While other tools focus on speed, Claude specializes in nuance, structured reasoning, and safety. Whether you are a researcher, a writer, or a cybersecurity professional, this session from InfosecTrain will show you how to move beyond basic prompts and unlock high-level workflows.Watch the full episode on YouTube: https://youtu.be/sMvv5AwWcxw?si=NI6hUZsQXMRUg_aSWhat You'll Learn:The Claude Family: A breakdown of the Haiku, Sonnet, and Opus models and which one is right for your task.The 200k Context Window: How to "interrogate" massive documents and PDFs to synthesize complex data in seconds.Constitutional AI: Why Claude's ethical framework makes it the most "trustworthy" writing partner for enterprise use.Workflow Mastery: Practical tips for drafting reports, refining arguments, and producing polished, professional-grade summaries.
Elimination Chamber is in the books and the Road to WrestleMania is officially on. Nick and Myron break down the surprises, the momentum shifts, and the bigger business conversations shaping the industry right now.What caught our eye on TV:Orton and Ripley win the Chamber — and yes, we both missed those picks.Finn cost Dom the IC TitleDanhausen debuts on RawPunk vs. Roman delivered the kind of promo that sells stadium shows.MJF vs. Hangman coin flip chaosTNA holding steady at 233K viewers Road to WrestleMania DiscussionRipley vs. Jade feels locked in.Orton vs. Drew? Or does Cody shake it up?Seth vs. Logan? AJ vs. Becky? Brock vs. Oba? Gunther vs. Rey?Who headlines Night 1 and Night 2?Hall of Fame & Business TalkStephanie McMahon, AJ Styles, Demolition — who headlines?TKO posts $4.7B in revenueIndustry MovementParamount beats WBD (for now) — and what that could mean long term for AEW.Indy scene spotlight: Coastal Empire, 1FW, SCA, and Premiere All Star updates.Indy Talk back on Tapped Out.....
The reception to our recent post on Code Reviews has been strong. Catch up!Amid a maelstrom of discussion on whether or not AI is killing SaaS, one of the top publicly listed SaaS companies in the world has just reported record revenues, clearing well over $1.1B in ARR for the first time with a 28% margin. As we comment on the pod, Aaron Levie is the rare public company CEO equally at home in both worlds of Silicon Valley and Wall Street/Main Street, by day helping 70% of the Fortune 500 with their Enterprise Advanced Suite, and yet by night is often found in the basements of early startups and tweeting viral insights about the future of agents.Now that both Cursor, Cloudflare, Perplexity, Anthropic and more have made Filesystems and Sandboxes and various forms of “Just Give the Agent a Box” cool (not just cool; it is now one of the single hottest areas in AI infrastructure growing 100% MoM), we find it a delightfully appropriate time to do the episode with the OG CEO who has been giving humans and computers Boxes since he was a college dropout pitching VCs at a Michael Arrington house party.Enjoy our special pod, with fan favorite returning guest/guest cohost Jeff Huber!Note: We didn't directly discuss the AI vs SaaS debate - Aaron has done many, many, many other podcasts on that, and you should read his definitive essay on it. Most commentators do not understand SaaS businesses because they have never scaled one themselves, and deeply reflected on what the true value proposition of SaaS is.We also discuss Your Company is a Filesystem:We also shoutout CTO Ben Kus' and the AI team, who talked about the technical architecture and will return for AIE WF 2026.Full Video EpisodeTimestamps* 00:00 Adapting Work for Agents* 01:29 Why Every Agent Needs a Box* 04:38 Agent Governance and Identity* 11:28 Why Coding Agents Took Off First* 21:42 Context Engineering and Search Limits* 31:29 Inside Agent Evals* 33:23 Industries and Datasets* 35:22 Building the Agent Team* 38:50 Read Write Agent Workflows* 41:54 Docs Graphs and Founder Mode* 55:38 Token FOMO Culture* 56:31 Production Function Secrets* 01:01:08 Film Roots to Box* 01:03:38 AI Future of Movies* 01:06:47 Media DevRel and EngineeringTranscriptAdapting Work for AgentsAaron Levie: Like you don't write code, you talk to an agent and it goes and does it for you, and you may be at best review it. That's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work.We basically adapted to how the agent works. All of the economy has to go through that exact same evolution. Right now, it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this ‘cause you'll see compounding returns. But that's just gonna take a while for most companies to actually go and get this deployed.swyx: Welcome to the Lane Space Pod. We're back in the chroma studio with uh, chroma, CEO, Jeff Hoover. Welcome returning guest now guest host.Aaron Levie: It's a pleasure. Wow. How'd you get upgraded to, uh, to that?swyx: Because he's like the perfect guy to be guest those for you.Aaron Levie: That makes sense actually, for We love context. We, we both really love context le we really do.We really do.swyx: Uh, and we're here with, uh, Aaron Levy. Welcome.Aaron Levie: Thank you. Good to, uh, good to be [00:01:00] here.swyx: Uh, yeah. So we've all met offline and like chatted a little bit, but like, it's always nice to get these things in person and conversation. Yeah. You just started off with so much energy. You're, you're super excited about agents.I loveAaron Levie: agents.swyx: Yeah. Open claw. Just got by, got bought by OpenAI. No, not bought, but you know, you know what I mean?Aaron Levie: Some, some, you know, acquihire. Executiveswyx: hire.Aaron Levie: Executive hire. Okay. Executive hire. Say,swyx: hey, that's my term. Okay. Um, what are you pounding the table on on agents? You have so many insightful tweets.Why Every Agent Needs a BoxAaron Levie: Well, the thing that, that we get super excited by that I think is probably, you know, should be relatively obvious is we've, we've built a platform to help enterprises manage their files and their, their corporate files and the permissions of who has access to those files and the sharing collaboration of those files.All of those files contain really, really important information for the enterprise. It might have your contracts, it might have your research materials, it might have marketing information, it might have your memos. All that data obviously has, you know, predominantly been used by humans. [00:02:00] But there's been one really interesting problem, which is that, you know, humans only really work with their files during an active engagement with them, and they kind of go away and you don't really see them for a long time.And all of a sudden, uh, with the power of AI and AI agents, all of that data becomes extremely relevant as this ongoing source of, of answers to new questions of data that will transform into, into something else that, that produces value in your organization. It, it contains the answer to the new employee that's onboarding, that needs to ramp up on a project.Um, it contains the answer to the right thing to sell a customer when you're having a conversation to them, with them contains the roadmap information that's gonna produce the next feature. So all that data. That previously we've been just sort of storing and, and you know, occasionally forgetting about, ‘cause we're only working on the new active stuff.All of that information becomes valuable to the enterprise and it's gonna become extremely valuable to end users because now they can have agents go find what they're looking for and produce new, new [00:03:00] value and new data on that information. And it's gonna become incredibly valuable to agents because agents can roam around and do a bunch of work and they're gonna need access to that data as well.And um, and you know, sometimes that will be an agent that is sort of working on behalf of, of, of you and, and effectively as you as and, and they are kind of accessing all of the same information that you have access to and, and operating as you in the system. And then sometimes there's gonna be agents that are just.Effectively autonomous and kind of run on their own and, and you're gonna collaborate and work with them kind of like you did another person. Open Claw being the most recent and maybe first real sort of, you know, kind of, you know, up updating everybody's, you know, views of this landscape version of, of what that could look like, which is, okay, I have an agent.It's on its own system, it's on its own computer, it has access to its own tools. I probably don't give it access to my entire life. I probably communicate with it like I would an assistant or a colleague and then it, it sort of has this sandbox environment. So all of that has massive implications for a platform that manage that [00:04:00] enterprise data.We think it's gonna just transform how we work with all of the enterprise content that we work with, and we just have to make sure we're building the right platform to support that.swyx: The sort of shorthand I put it is as people build agents, everybody's just realizing that every agent needs a box. Yes.And it's nice to be called box and just give everyone a box.Aaron Levie: Hey, I if I, you know, if we can make that go viral, uh, like I, I think that that terminology, I, that's theswyx: tagline. Every agentAaron Levie: needs a box. Every agent needs a box. If we can make that the headline of this, I'm fine with this. And that's the billboard I wanna like Yeah, exactly.Every agent needs a box. Um, I like it. Can we ship this? Like,swyx: okay, let's do it. Yeah.Aaron Levie: Uh, my work here is done and I got the value I needed outta this podcast Drinks.swyx: Yeah.Agent Governance and IdentityAaron Levie: But, but, um, but, but, you know, so the thing that we, we kind of think about is, um, is, you know, whether you think the number 10 x or a hundred x or whatever the number is, we're gonna have some order of magnitude more agents than people.That's inevitable. It has to happen. So then the question is, what is the infrastructure that's needed to make all those agents effective in the enterprise? Make sure that they are well governed. Make sure they're only doing [00:05:00] safe things on your information. Make sure that they're not getting exposed. The data that they shouldn't have access to.There's gonna be just incredibly spectacularly crazy security incidents that will happen with agents because you'll prompt, inject an agent and sort of find your way through the CRM system and pull out data that you shouldn't have access to. Oh, weJeff Huber: have God,Aaron Levie: right? I mean, that's just gonna happen all over the place, right?So, so then the thing is, is how do you make sure you have the right security, the permissions, the access controls, the data governance. Um, we actually don't yet exactly know in many cases how we're gonna regulate some of these agents, right? If you think about an agent in financial services, does it have the exact same financial sort of, uh, requirements that a human did?Or is it, is the risk fully on the human that was interacting or created the agent? All open questions, but no matter what, there's gonna need to be a layer that manages the, the data they have access to, the workflows that they're involved in, pulling up data from multiple systems. This is the new infrastructure opportunity in the era of agents.swyx: You have a piece on agent identities, [00:06:00] which I think was today, um, which I think a lot of breaking news, the security, security people are talking about, right? Like you basically, I, I always think of this as like, well you need the human you and then there you need the agent. YouAaron Levie: Yes.swyx: And uh, well, I don't know if it's that simple, but is box going to have an opinion on that or you're just gonna be like, well we're just the sort of the, the source layer.Yeah. Let's Okta of zero handle that.Aaron Levie: I think we're gonna have an opinion and we will work with generally wherever the contours of the market end up. Um, and the reason that we're gonna have an opinion more than other topics probably is because one of the biggest use cases for why your agent might need it, an identity is for file system access.So thus we have to kind of think about this pretty deeply. And I think, uh, unless you're like in our world thinking about this particular problem all day long, it might be, you know, like, why is this such a big deal? And the reason why it's a really big deal is because sometimes sort of say, well just give the agent an, an account on the system and it just treats, treat it like every other type of user on the system.The [00:07:00] problem is, is that I as Aaron don't really have any responsibility over anybody else's box account in our organization. I can't see the box account of any other employee that I work with. I am not liable for anything that they do. And they have, I have, I have, you know, strict privacy requirements on everything that they're able to, you know, that, that, that they work on.Agents don't have that, you know, don't have those properties. The person who creates the agent probably is gonna, for the foreseeable future, take on a lot of the liability of what that agent does. That agent doesn't deserve any privacy because, because it's, you know, it can't fully be autonomously operated and it doesn't have any legal, you know, kind of, you know, responsibility.So thus you can't just be like, oh, well I'll just create a bunch of accounts and then I'll, I'll kind of work with that agent and I'll talk to it occasionally. Like you need oversight of that. And so then the question is, how do you have a world where the agent, sometimes you have oversight of, but what if that agent goes and works with other people?That person over there is collaborating with the agent on something you shouldn't have [00:08:00] access to what they're doing. So we have all of these new boundaries that we're gonna have to figure out of, of, you know, it's really, really easy. So far we've been in, in easy mode. We've hit the easy button with ai, which is the agent just is you.And when you're in quad code and you're in cursor, and you're in Codex, you're just, the agent is you. You're offing into your services. It can do everything you can do. That's the easy mode. The hard mode is agents are kind of running on their own. People check in with them occasionally, they're doing things autonomously.How do you give them access to resources in the enterprise and not dramatically increased the security risk and the risk that you might expose the wrong thing to somebody. These are all the new problems that we have to get solved. I like the identity layer and, and identity vendors as being a solution to that, but we'll, we'll need some opinions as well because so many of the use cases are these collaborative file system use cases, which is how do I give it an agent, a subset of my data?Give it its own workspace as well. ‘cause it's gonna need to store off its own information that would be relevant for it. And how do I have the right oversight into that? [00:09:00]Jeff Huber: One thing, which, um, I think is kind interesting, think about is that you know, how humans work, right? Like I may not also just like give you access to the whole file.I might like sit next to you and like scroll to this like one part of the file and just show you that like one part and like, you know,swyx: partial file access.Jeff Huber: I'm just saying I think like our, like RA does seem to be dead, right? Like you wanna say something is dead uhhuh probably RA is dead. And uh, like the auth story to me seems like incredibly unsolved and unaddressed by like the existing state of like AI vendors.ButAaron Levie: yeah, I think, um, we're, I mean you're taking obviously really to level limit that we probably need to solve for. Yeah. And we built an access control system that was, was kind of like, you know, its own little world for, for a long time. And um, and the idea was this, it's a many to many collaboration system where I can give you any part of the file system.And it's a waterfall model. So if I give you higher up in the, in the, in the system, you get everything below. And that, that kind of created immense flexibility because I can kind of point you to any layer in the, in the tree, but then you're gonna get access to everything kind of below it. And that [00:10:00] mostly is, is working in this, in this world.But you do have to manage this issue, which is how do I create an agent that has access to some of my stuff and somebody else's stuff as well. Mm-hmm. And which parts do I get to look at as the creator of the agent? And, and these are just brand new problems? Yeah. Crazy. And humans, when there was a human there that was really easy to do.Like, like if the three of us were all sharing, there'd be a Venn diagram where we'd have an overlapping set of things we've shared, but then we'd have our own ways that we shared with each other. In an agent world, somebody needs to take responsibility for what that agent has access to and what they're working on.These are like the, some of the most probably, you know, boring problems for 98% of people on, on the internet, but they will be the problems that are the difference between can you actually have autonomous agents in an enterprise contextswyx: Yeah.Aaron Levie: That are not leaking your data constantly.swyx: No. Like, I mean, you know, I run a very, very small company for my conference and like we already have data sensitivity issues.Yes. And some of my team members cannot see Yes. Uh, the others and like, I can't imagine what it's like to run a Fortune 500 and like, you have to [00:11:00] worry about this. I'm just kinda curious, like you, you talked to a lot like, like 70, 80% of your cus uh, of the Fortune 500, your customers.Aaron Levie: Yep. 67%. Just so we're being verySEswyx: precise.So Yeah. I'm notAaron Levie: Okay. Okay.swyx: Something I'm rounding up. Yes. Round up. I'm projecting to, forAaron Levie: the government.swyx: I'm projecting to the end of the year.Aaron Levie: Okay.swyx: There you go.Aaron Levie: You do make it sound like, like we, we, well we've gotta be on this. Like we're, we're taking way too long to get to 80%. Well,swyx: no, I mean, so like. How are they approaching it?Right? Because you're, you don't have a, you don't have a final answer yet.Why Coding Agents Took Off FirstAaron Levie: Well, okay, so, so this is actually, this is the stark reality that like, unfortunately is the kinda like pouring the water on the party a little bit.swyx: Yes.Aaron Levie: We all in Silicon Valley are like, have the absolute best conditions possible for AI ever.And I think we all saw the dke, you know, kind of Dario podcast and this idea of AI coding. Why is that taken off? And, and we're not yet fully seeing it everywhere else. Well, look, if you just like enumerated the list of properties that AI coding has and then compared it to other [00:12:00] knowledge work, let's just, let's just go through a few of them.Generally speaking, you bring on a new engineer, they have access to a large swath of the code base. Like, there's like very, like you, just, like new engineer comes on, they can just go and find the, the, the stuff that they, they need to work with. It's a fully text in text out. Medium. It's only, it's just gonna be text at the end of the day.So it's like really great from a, from just a, uh, you know, kinda what the agent can work with. Obviously the models are super trained on that dataset. The labs themselves have a really strong, kind of self-reinforcing positive flywheel of why they need to do, you know, agent coding deeply. So then you get just better tooling, better services.The actual developers of the AI are daily users of the, of the thing that they're we're working on versus like the, you know, probably there's only like seven Claude Cowork legal plugin users at Anthropic any given day, but there's like a couple thousand Claude code and you know, users every single day.So just like, think about which one are they getting more feedback on. All day long. So you just go through this list. You have a, you know, everybody who's a [00:13:00] developer by definition is technical so they can go install the latest thing. We're all generally online, or at least, you know, kinda the weird ones are, and we're all talking to each other, sharing best practices, like that's like already eight differences.Versus the rest of the economy. Every other part of the economy has like, like six to seven headwinds relative to that list. You go into a company, you're a banker in financial services, you have access to like a, a tiny little subset of the total data that's gonna be relevant to do your job. And you're have to start to go and talk to a bunch of people to get the right data to do your job because Sally didn't add you to that deal room, you know, folder.And that that, you know, the information is actually in a completely different organization that you now have to go in and, and sort of run into. And it's like you have this endless list of access controls and security. As, as you talked about, you have a medium, which is not, it's not just text, right? You have, you have a zoom call that, that you're getting all of the requirements from the customer.You have a lot of in-person conversations and you're doing in-person sales and like how do you ever [00:14:00] digitize all of that information? Um, you know, I think a lot of people got upset with this idea that the code base has all the context, um, that I don't know if you follow, you know, did you follow some of that conversation that that went viral?Is like, you know, it's not that simple that, that the code base doesn't have all the knowledge, but like it's a lot, you're a lot better off than you are with other areas of knowledge work. Like you, we like, we like have documentation practices, you write specifications. Those things don't exist for like 80% of work that happens in the enterprise.That's the divide that we have, which is, which is AI coding has, has just fully, you know, where we've reached escape velocity of how powerful this stuff is, and then we're gonna have to find a way to bring that same energy and momentum, but to all these other areas of knowledge work. Where the tools aren't there, the data's not set up to be there.The access controls don't make it that easy. The context engineering is an incredibly hard problem because again, you have access control challenges, you have different data formats. You have end users that are gonna need to kind of be kind of trained through this as opposed to their adopting [00:15:00] these tools in their free time.That's where the Fortune 500 is. And so we, I think, you know, have to be prepared as an industry where we are gonna be on a multi-year march to, to be able to bring agents to the enterprise for these workflows. And I think probably the, the thing that we've learned most in coding that, that the rest of the world is not yet, I think ready for, I mean, we're, they'll, they'll have to be ready for it because it's just gonna inevitably happen is I think in coding.What, what's interesting is if you think about the practice of coding today versus two years ago. It's probably the most changed workflow in maybe the history of time from the amount of time it's changed, right? Yeah. Like, like has any, has any workflow in the entire economy changed that quickly in terms of the amount of change?I just, you know, at least in any knowledge worker workflow, there's like very rarely been an event where one piece of technology and work practice has so fundamentally, you know, changed, changed what you do. Like you don't write code, you talk to an agent and it goes and [00:16:00] does it for you, and you may be at best review it.And even that's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work. We basically adapted to how the agent works. Mm-hmm. All of the economy has to go through that exact same evolution.The rest of the economy is gonna have to update its workflows to make agents effective. And to give agents the context that they need and to actually figure out what kind of prompting works and to figure out how do you ensure that the agent has the right access to information to be able to execute on its work.I, you know, this is not the panacea that people were hoping for, of the agent drops in, just automates your life. Like you have to basically re-engineer your workflow to get the most out of agents and, uh, and that, that's just gonna take, you know, multiple years across the economy. Right now it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this.‘cause [00:17:00] you'll see compounding returns, but that's just gonna take a while for most companies to actually go and get this deployed.swyx: I love, I love pushing back. I think that. That is what a lot of technology consultants love to hear this sort of thing, right? Yeah, yeah, yeah. First to, to embrace the ai. Yes. To get to the promised land, you must pay me so much money to a hundred percent to adopt the prescribed way of, uh, conforming to the agents.Yes. And I worry that you will be eclipsed by someone else who says, no, come as you are.Aaron Levie: Yeah.swyx: And we'll meet you where you are.Aaron Levie: And, and, and and what was the thing that went viral a week ago? OpenAI probably, uh, is hiring F Dees. Yeah. Uh, to go into the enterprise. Yeah. Yeah. And then philanthropic is embedded at Goldman Sachs.Yeah. So if the labs are having to do this, if, if the labs have decided that they need to hire FDE and professional services, then I think that's a pretty clear indication that this, there's no easy mode of workflow transformation. Yeah. Yeah. So, so to your point, I think actually this is a market opportunity for, you know, new professional services and consulting [00:18:00] firms that are like Agent Build and they, and they kind of, you know, go into organizations and they figure out how to re-engineer your workflows to make them more agent ready and get your data into the right format and, you know, reconstruct your business process.So you're, you're not doing most of the work. You're telling agents how to do the work and then you're reviewing it. But I haven't seen the thing that can just drop in and, and kinda let you not go through those changes.swyx: I don't know how that kind of sales pitch goes over. Yeah. You know, you're, you're saying things like, well, in my sort of nice beautiful walled garden, here's, there's, uh, because here's this, here's this beautiful box account that has everything.Yes. And I'm like, well, most, most real life is extremely messy. Sure. And like, poorly named and there duplicate this outdated s**tAaron Levie: a hundred percent. And so No, no, a hundred percent. And so this is actually No. So, so this is, I mean, we agree that, that getting to the beautiful garden is gonna be tough.swyx: Yeah.Aaron Levie: There's also the other end of the spectrum where I, I just like, it's a technical impossibility to solve. The agent is, is truly cannot get enough context to make the right decision in, in the, in the incredibly messy land. Like there's [00:19:00] no a GI that will solve that. So, so we're gonna have to kind of land in somewhere in between, which is like we all collectively get better at.Documentation practices and, and having authoritative relatively up-to-date information and putting it in the right place like agents will, will certainly cause us to be much better organized around how we work with our information, simply because the severity of the agent pulling the wrong data will be too high and the productivity gain of that you'll miss out on by not doing this will be too high as well, that you, that your competition will just do it and they'll just have higher velocity.So, uh, and, and we, we see this a lot firsthand. So we, we build a series of agents internally that they can kind of have access to your full box account and go off and you give it a task and it can go find whatever information you're looking for and work with. And, you know, thank God for the model progress, but like, if, if you gave that task to an agent.Nine months ago, you're just gonna get lots of bogus answers because it's gonna, it's gonna say, Hey, here's, here are fi [00:20:00] five, you know, documents that all kind of smell like the right thing. And I'm gonna, but I, but you're, you're putting me on the clock. ‘cause my assistant prompt says like, you know, be pretty smart, but also try and respond to the user and it's gonna respond.And it's like, ah, it got the wrong document. And then you do that once or twice as a knowledge worker and you're just neverswyx: again,Aaron Levie: never again. You're just like done with the system.swyx: Yeah. It doesn't work.Aaron Levie: It doesn't work. And so, you know, Opus four six and Gemini three one Pro and you know, whatever the latest five 3G BT will be, like, those things are getting better and better and it's using better judgment.And this sort of like the, all of these updates to the agentic tool and search systems are, are, we're seeing, we're seeing very real progress where the agent. Kind of can, can almost smell some things a little bit fishy when it's getting, you know, we, we have this process where we, we have it go fan out, do a bunch of searches, pull up a bunch of data, and then it has to sort of do its own ranking of, you know, what are the right documents that, that it should be working with.And again, like, you know, the intelligence level of a model six months ago, [00:21:00] it'd be just throwing a dart at like, I'm just, I'm gonna grab these seven files and I, I pray, I hope that that's the right answer. And something like an opus first four five, and now four six is like, oh, it's like, no, that one doesn't seem right relative to this question because I'm seeing some signal that is making that, you know, that's contradicting the document where it would normally be in the tree and who should have access.Like it's doing all of that kind of work for you. But like, it still doesn't work if you just have a total wasteland of data. Like, it's just not, it's just not possible. Partly ‘cause a human wouldn't even be able to do it. So basically if a, if a really, really smart human. Could not do that task in five or 10 minutes for a search retrieval type task.Look, you know, your agent's not gonna be able to do it any better. You see this all day long. SoContext Engineering and Search Limitsswyx: this touches on a thing that just passionate about it was just context engineering. I, I'm just gonna let you ramble or riff on, on context engineering. If, if, if there's anything like he, he did really good work on context fraud, which has really taken over as like the term that people use and the referenceAaron Levie: a hundred percent.We, we all we think about is, is the context rob problem. [00:22:00]Jeff Huber: Yeah, there's certainly a lot of like ranking considerations. Gentech surgery think is incredibly promising. Um, yeah, I was trying to generate a question though. I think I have a question right now. Swyx.Aaron Levie: Yeah, no, but like, like I think there was this moment, um, you know, like, I don't know, two years ago before, before we knew like where the, the gotchas were gonna be in ai and I think someone was like, was like, well, infinite context windows will just solve all of these problems and ‘cause you'll just, you'll just give the context window like all the data and.It's just like, okay, I mean, maybe in 2035, like this is a viable solution. First of all, it, it would just, it would just simply cost too much. Like we just can't give the model like the 5,000 documents that might be relevant and it's gonna read them all. And I've seen enough to, to start believing in crazy stuff.So like, I'm willing to just say, sure. Like in, in 10 years from now,swyx: never say, never, never.Aaron Levie: In, in 10 years from now, we'll have infinite context windows at, at a thousandth of the price of today. Like, let's just like believe that that's possible, but Right. We're in reality today. So today we have a context engineering [00:23:00] problem, which is, I got, I got, you know, 200,000 tokens that I can work with, or prob, I don't even know what the latest graph is before, like massive degradation.16. Okay. I have 60,000 tokens that I get to work with where I'm gonna get accurate information. That's not a lot of tokens for a corpus of 10 million documents that a knowledge worker might have across all of the teams and all the projects and all the people they work with. I have, I have 10 million documents.Which, you know, maybe is times five pages per document or something like that. I'm at 50 million pages of information and I have 60,000 tokens. Like, holy s**t. Yeah. This is like, how do I bridge the 50 million pages of information with, you know, the couple hundred that I get to work with in that, in that token window.Yeah. This is like, this is like such an interesting problem and that's why actually so much work is actually like, just like search systems and the databases and that layer has to just get so locked in, but models getting better and importantly [00:24:00] knowing when they've done a search, they found the wrong thing, they go back, they check their work, they, they find a way to balance sort of appeasing the user versus double checking.We have this one, we have this one test case where we ask the agent to go find. 10 pieces of information.swyx: Is this the complex work eval?Aaron Levie: Uh, this is actually not in the eval. This is, this is sort of just like we have a bunch of different, we have a bunch of internal benchmark kind of scenarios. Every time we, we update our agent, we have one, which is, I ask it to find all of our office addresses, and I give it the list of 10 offices that we have.And there's not one document that has this, maybe there should be, that would be a great example of the kind of thing that like maybe over time companies start to, you know, have these sort of like, what are the canonical, you know, kind of key areas of knowledge that we need to have. We don't seem to have this one document that says, here are all of our offices.We have a bunch of documents that have like, here's the New York office and whatever. So you task this agent and you, you get, you say, I need the addresses for these 10 offices. Okay. And by the way, if you do this on any, you know, [00:25:00] public chat model, the same outcome is gonna happen. But for a different kind of query, you give it, you say, I need these 10 addresses.How many times should the agent go and do its search before it decides whether or not, there's just no answer to this question. Often, and especially the, the, let's say lower tier models, it'll come back and it'll give you six of the 10 addresses. And it'll, and I'll just say I couldn't find the otherswyx: four.It, it doesn't know what It doesn't know. ItAaron Levie: doesn't know what It doesn't know. Yeah. So the model is just like, like when should it stop? When should it stop doing? Like should it, should it do that task for literally an hour and just keep cranking through? Maybe I actually made up an office location and it doesn't know that I made it up and I didn't even know that I made it up.Like, should it just keep, re should it read every single file in your entire box account until it, until it should exhaust every single piece of information.swyx: Expensive.Aaron Levie: These are the new problems that we have. So, you know, something like, let's say a new opus model is sort of like, okay, I'm gonna try these types of queries.I didn't get exactly what I wanted. I'm gonna try again. I'm gonna, at [00:26:00] some point I'm gonna stop searching. ‘cause I've determined that that no amount of searching is gonna solve this problem. I'm just not able to do it. And that judgment is like a really new thing that the model needs to be able to have.It's like, when should it give up on a task? ‘cause, ‘cause you just don't, it's a can't find the thing. That's the real world of knowledge, work problems. And this is the stuff that the coding agents don't have to deal with. Because they, it just doesn't like, like you're not usually asking it about, you're, you're always creating net new information coming right outta the model for the most part.Obviously it has to know about your code base and your specs and your documentation, but, but when you deploy an agent on all of your data that now you have all of these new problems that you're dealing withJeff Huber: our, uh, follow follow-up research to context ride is actually on a genetic search. Ah. Um, and we've like right, sort of stress tested like frontier models and their ability to search.Um, and they're not actually that good at searching. Right. Uh, so you're sort of highlighting this like explore, exploit.swyx: You're just say, Debbie, Donna say everything doesn't work. Like,Aaron Levie: well,Jeff Huber: somebody has to be,Aaron Levie: um, can I just throw out one more thing? Yeah. That is different from coding and, and the rest [00:27:00] of the knowledge work that I, I failed to mention.So one other kind of key point is, is that, you know, at the end of the day. Whether you believe we're in a slop apocalypse or, or whatever. At the end of the day, if you, if you build a working product at the end of, if you, if you've built a working solution that is ultimately what the customer is paying for, like whether I have a lot of slop, a little slop or whatever, I'm sure there's lots of code bases we could go into in enterprise software companies where it's like just crazy slop that humans did over a 20 year period, but the end customer just gets this little interface.They can, they can type into it, it does its thing. Knowledge work, uh, doesn't have that property. If I have an AI model, go generate a contract and I generate a contract 20 times and, you know, all 20 times it's just 3% different and like that I, that, that kind of lop introduces all new kinds of risk for my organization that the code version of that LOP didn't, didn't introduce.These are, and so like, so how do you constrain these models to just the part that you want [00:28:00] them to work on and just do the thing that you want them to do? And, and, you know, in engineering, we don't, you can't be disbarred as an engineer, but you could be disbarred as a lawyer. Like you can do the wrong medical thing In healthcare, you, there's no, there's no equivalent to that of engineering.Like, doswyx: you want there to be, because I've considered softwareJeff Huber: engineer. What's that? Civil engineering there is, right? NotAaron Levie: software civil engineer. Sure. Oh yeah, for sure. But like in any of our companies, you like, you know, you'll be forgiven if you took down the site and, and we, we will do a rollback and you'll, you'll be in a meeting, but you have not been disbarred as an engineer.We don't, we don't change your, you know, your computer science, uh, blameJeff Huber: degree, this postmortem.Aaron Levie: Yeah, exactly. Exactly. So, so, uh, now maybe we collectively as an industry need to figure out like, what are you liable for? Not legally, but like in a, in a management sense, uh, of these agents. All sorts of interesting problems that, that, that, uh, that have to come out.But in knowledge work, that's the real hostile environments that we're operating in. Hmm.swyx: I do think like, uh, a lot of the last year's, 2025 story was the rise of coding agents and I think [00:29:00] 2026 story is definitely knowledge work agents. Yes. A hundredAaron Levie: percent.swyx: Right. Like that would, and I think open claw core work are just the beginning.Yes. Like it's, the next one's gonna just gonna be absolute craziness.Aaron Levie: It it is. And, and, uh, and it's gonna be, I mean, again, like this is gonna be this, this wave where we, we are gonna try and bring as many of the practices from coding because that, that will clearly be the forefront, which is tell an agent to go do something and has an access to a set of resources.You need to be responsible for reviewing it at the end of the process. That to me is the, is the kind of template that I just think goes across knowledge, work and odd. Cowork is a great example. Open Closet's a great example. You can kind of, sort of see what Codex could become over time. These are some, some really interesting kind of platforms that are emerging.swyx: Okay. Um, I wanted to, we touched on evals a little bit. You had, you had the report that you're gonna go bring up and then I was gonna go into like, uh, boxes, evals, but uh, go ahead. Talk about your genetic search thing.Jeff Huber: Yeah. Mostly I think kinda a few of the insights. It's like number one frontier model is not good at search.Humans have this [00:30:00] natural explore, exploit trade off where we kinda understand like when to stop doing something. Also, humans are pretty good at like forgetting actually, and like pruning their own context, whereas agents are not, and actually an agent in their kind of context history, if they knew something was bad and they even, you could see in the trace the reason you trace, Hey, that probably wasn't a good idea.If it's still in the trace, still in the context, they'll still do it again. Uhhuh. Uh, and so like, I think pruning is also gonna be like, really, it's already becoming a thing, right? But like, letting self prune the con windowsswyx: be a big deal. Yeah. So, so don't leave the mistake. Don't leave the mistake in there.Cut out the mistake but tell it that you made a mistake in the past and so it doesn't repeat it.Jeff Huber: Yeah. But like cut it out so it doesn't get like distracted by it again. ‘cause really, you know, what is so, so it will repeat its mistake just because it's been, it's inswyx: theJeff Huber: context. It'sAaron Levie: in the context so much.That's a few shot example. Even if it, yeah.Jeff Huber: It's like oh thisAaron Levie: is a great thing to go try even ifJeff Huber: it didn't work.Aaron Levie: Yeah,Jeff Huber: exactly.Aaron Levie: SoJeff Huber: there's like a bunch of stuff there. JustAaron Levie: Groundhogs Day inside these models. Yeah. I'm gonna go keep doing the same wrongJeff Huber: thing. Covering sense. I feel like, you know, some creator analogy you're trying like fit a manifold in latent space, which kind is doing break program synthesis, which is kinda one we think about we're doing right.Like, you know, certain [00:31:00] facts might be like sort of overly pitting it. There are certain, you know, sec sectors of latent space and so like plug clean space. Yeah. And, uh, andswyx: so we have a bell, our editor as a bell every time you say that. SoJeff Huber: you have, you have to like remove those, likeswyx: you shoulda a gong like TPN or something.IfJeff Huber: we gong, you either remove those links to like kinda give it the freedom, kind of do what you need to do. So, but yeah. We'll, we'll release more soon. That'sAaron Levie: awesome.Jeff Huber: That'll, that'll be cool.swyx: We're a cerebral podcast that people listen to us and, and sort of think really deep. So yeah, we try to keep it subtle.Okay. We try to keep it.Aaron Levie: Okay, fine.Inside Agent Evalsswyx: Um, you, you guys do, you guys do have EVs, you talked about your, your office thing, but, uh, you've been also promoting APEX agents and complex work. Uh, yeah, whatever you, wherever you wanna take this just Yeah. How youAaron Levie: Apex is, is obviously me, core's, uh, uh, kind of, um, agent eval.We, we supported that by sort of. Opening up some data for them around how we kind of see these, um, data workspaces in, in the, you know, kind of regular economy. So how do lawyers have a workspace? How do investment bankers have a workspace? What kind of data goes into those? And so we, [00:32:00] we partner with them on their, their apex eval.Our own, um, eval is, it's actually relatively straightforward. We have a, a set of, of documents in a, in a range of industries. We give the agent previously did this as a one shot test of just purely the model. And then we just realized we, we need to, based on where everything's going, it's just gotta be more agentic.So now it's a bit more of a test of both our harness and the model. And we have a rubric of a set of things that has to get right and we score it. Um, and you're just seeing, you know, these incredible jumps in almost every single model in its own family of, you know, opus four, um, you know, sonnet four six versus sonnet four five.swyx: Yeah. We have this up on screen.Aaron Levie: Okay, cool. So some, you're seeing it somewhere like. I, I forget the to, it was like 15 point jump, I think on the main, on the overall,swyx: yes.Aaron Levie: And it's just like, you know, these incredible leaps that, that are starting to happen. Um,swyx: and OP doesn't know any, like any, it's completely held out from op.Aaron Levie: This is not in any, there's no public data which has, you know, Ben benefits and this is just a private eval that we [00:33:00] do, and then we just happen to show it to, to the world. Hmm. So you can't, you can't train against it. And I think it's just as representative of. It's obviously reasoning capabilities, what it's doing at, at, you know, kind of test time, compute capabilities, thinking levels, all like the context rot issues.So many interesting, you know, kind of, uh, uh, capabilities that are, that are now improvingswyx: one sector that you have. That's interesting.Industries and Datasetsswyx: Uh, people are roughly familiar with healthcare and legal, but you have public sector in there.Aaron Levie: Yeah.swyx: Uh, what's that? Like, what, what, what is that?Aaron Levie: Yeah, and, and we actually test against, I dunno, maybe 10 industries.We, we end up usually just cutting a few that we think have interesting gains. All extras, won a lot of like government type documents. Um,swyx: what is that? What is it? Government type documents?Aaron Levie: Government filings. Like a taxswyx: return, likeAaron Levie: a probably not tax returns. It would be more of what would go the government be using, uh, as data.So, okay. Um, so think about research that, that type of, of, of data sets. And then we have financial services for things like data rooms and what would be in an investment prospectus. Uhhuh,swyx: that one you can dog food.Aaron Levie: Yeah, exactly. Exactly. Yes. Yes. [00:34:00] So, uh, so we, we run the models, um, in now, you know, more of an agent mode, but, but still with, with kinda limited capacity and just try and see like on a, like, for like basis, what are the improvements?And, and again, we just continue to be blown away by. How, how good these models are getting.swyx: Yeah, I mean, I think every serious AI company needs something like that where like, well, this is the work we do. Here's our company eval. Yeah. And if you don't have it, well, you're not a serious AI company.Aaron Levie: There's two dimensions, right?So there's, there's like, how are the models improving? And so which models should you either recommend a customer use, which one should you adopt? But then every single day, we're making changes to our agents. And you need to knowswyx: if you regressed,Aaron Levie: if you know. Yeah. You know, I've been fully convinced that the whole agent observability and eval space is gonna be a massive space.Um, super excited for what Braintrust is doing, excited for, you know, Lang Smith, all the things. And I think what you're going to, I mean, this is like every enter like literally every enterprise right now. It's like the AI companies are the customers of these tools. Every enterprise will have this. Yeah, you'll just [00:35:00] have to have an eval.Of all of your work and like, we'll, you'll have an eval of your RFP generation, you'll have an eval of your sales material creation. You'll have an eval of your, uh, invoice processing. And, and as you, you know, buy or use new agentic systems, you are gonna need to know like, what's the quality of your, of your pipeline.swyx: Yeah.Aaron Levie: Um, so huge, huge market with agent evals.swyx: Yeah.Building the Agent Teamswyx: And, and you know, I'm gonna shout out your, your team a bit, uh, your CTO, Ben, uh, did a great talk with us last year. Awesome. And he's gonna come back again. Oh, cool. For World's Fair.Aaron Levie: Yep.swyx: Just talk about your team, like brag a little bit. I think I, I think people take these eval numbers in pretty charts for granted, but No, there, I mean, there's, there's lots of really smart people at work during all this.Aaron Levie: Biggest shout out, uh, is we have a, we have a couple folks at Dya, uh, Sidarth, uh, that, that kind of run this. They're like a, you know, kind of tag tag team duo on our evals, Ben, our CTO, heavily involved Yasha, head of ai, uh, you know, a bunch of folks. And, um, evals is one part of the story. And then just like the full, you know, kind of AI.An agent team [00:36:00] is, uh, is a, is a pretty, you know, is core to this whole effort. So there's probably, I don't know, like maybe a few dozen people that are like the epicenter. And then you just have like layers and layers of, of kind of concentric circles of okay, then there's a search team that supports them and an infrastructure team that supports them.And it's starting to ripple through the entire company. But there's that kind of core agent team, um, that's a pretty, pretty close, uh, close knit group.swyx: The search team is separate from the infra team.Aaron Levie: I mean, we have like every, every layer of the stack we have to kind of do, except for just pure public cloud.Um, but um, you know, we, we store, I don't even know what our public numbers are in, you know, but like, you can just think about it as like a lot of data is, is stored in box. And so we have, and you have every layer of the, of the stack of, you know, how do you manage the data, the file system, the metadata system, the search system, just all of those components.And then they all are having to understand that now you've got this new customer. Which is the agent, and they've been building for two types of customers in the past. They've been building for users and they've been building for like applications. [00:37:00] And now you've got this new agent user, and it comes in with a difference of it, of property sometimes, like, hey, maybe sometimes we should do embeddings, an embedding based, you know, kind of search versus, you know, your, your typical semantic search.Like, it's just like you have to build the, the capabilities to support all of this. And we're testing stuff, throwing things away, something doesn't work and, and not relevant. It's like just, you know, total chaos. But all of those teams are supporting the agent team that is kind of coming up with its requirements of what, what do we need?swyx: Yeah. No, uh, we just came from, uh, fireside chat where you did, and you, you talked about how you're doing this. It's, it's kind of like an internal startup. Yeah. Within the broader company. The broader company's like 3000 people. Yeah. But you know, there's, there's a, this is a core team of like, well, here's the innovation center.Aaron Levie: Yeah.swyx: And like that every company kind of is run this way.Aaron Levie: Yeah. I wanna be sensitive. I don't call it the innovation center. Yeah. Only because I think everybody has to do innovation. Um, there, there's a part of the, the, the company that is, is sort of do or die for the agent wave.swyx: Yeah.Aaron Levie: And it only happens to be more of my focus simply because it's existential that [00:38:00] we get it right.swyx: Yeah.Aaron Levie: All of the supporting systems are necessary. All of the surrounding adjacent capabilities are necessary. Like the only reason we get to be a platform where you'd run an agent is because we have a security feature or a compliance feature, or a governance feature that, that some team is working on.But that's not gonna be the make or break of, of whether we get agents right. Like that already exists and we need to keep innovating there. I don't know what the right, exact precise number is, but it's not a thousand people and it's not 10 people. There's a number of people that are like the, the kind of like, you know, startup within the company that are the make or break on everything related to AI agents, you know, leveraging our platform and letting you work with your data.And that's where I spend a lot of my time, and Ben and Yosh and Diego and Teri, you know, these are just, you know, people that, that, you know, kind of across the team. Are working.swyx: Yeah. Amazing.Read Write Agent WorkflowsJeff Huber: How do you, how do you think about, I mean, you talked a lot about like kinda read workflows over your box data. Yep.Right. You know, gen search questions, queries, et cetera. But like, what about like, write or like authoring workflows?Aaron Levie: Yes. I've [00:39:00] already probably revealed too much actually now that I think about it. So, um, I've talked about whatever,Jeff Huber: whatever you can.Aaron Levie: Okay. It's just us. It's just us. Yeah. Okay. Of course, of course.So I, I guess I would just, uh, I'll make it a little bit conceptual, uh, because again, I've already, I've already said things that are not even ga but, but we've, we've kinda like danced around it publicly, so I, yeah, yeah. Okay. Just like, hopefully nobody watches this, um, episode. No.swyx: It's tidbits for the Heidi engaged to go figure out like what exactly, um, you know, is, is your sort of line of thinking.Sure. They can connect the dots.Aaron Levie: Yeah. So, so I would say that, that, uh, we, you know, as a, as a place where you have your enterprise content, there's a use case where I want to, you know, have an agent read that data and answer questions for me. And then there's a use case where I want the agent to create something.And use the file system to create something or store off data that it's working on, or be able to have, you know, various files that it's writing to about the work it's doing. So we do see it as a total read write. The harder problem has so far been the read only because, because again, you have that kind of like 10 [00:40:00] million to one ratio problem, whereas rights are a lot of, that's just gonna come from the model and, and we just like, we'll just put it in the file system and kinda use it.So it's a little bit of a technically easier problem, but the only part that's like, not necessarily technically hard, it is just like it's not yet perfected in the state of the ecosystem is, you know, building a beautiful PowerPoint presentation. It's still a hard problem for these models. Like, like we still, you know, like, like these formats are just, we're not built for.They'reswyx: working on it.Aaron Levie: They're, they're working on it. Everybody's working on it.swyx: Every launch is like, well, we do PowerPoint now.Aaron Levie: We're getting, yeah, getting a lot, getting a lot of better each time. But then you'll do this thing where you'll ask the update one slide and all of a sudden, like the fonts will be just like a little bit different, you know, on two of the slides, or it moved, you know, some shape over to the left a little bit.And again, these are the kind of things that, like in code, obviously you could really care about if you really care about, you know, how beautiful is the code, but at the end, user doesn't notice all those problems and file creation, the end user instantly sees it. You're [00:41:00] like, ah, like paragraph three, like, you literally just changed the font on me.Like it's a totally different font and like midway through the document. Mm-hmm. Those are the kind of things that you run into a lot of in the, in the content creation side. So, mm-hmm. We are gonna have native agents. That do all of those things, they'll be powered by the leading kind of models and labs.But the thing that I think is, is probably gonna be a much bigger idea over time is any agent on any system, again, using Box as a file system for its work, and in that kind of scenario, we don't necessarily care what it's putting in the file system. It could put its memory files, it could put its, you know, specification, you know, documents.It could put, you know, whatever its markdown files are, or it could, you know, generate PDFs. It's just like, it's a workspace that is, is sort of sandboxed off for its work. People can collaborate into it, it can share with other people. And, and so we, we were thinking a lot about what's the right, you know, kind of way to, to deliver that at scale.Docs Graphs and Founder Modeswyx: I wanted to come into sort of the sort of AI transformation or AI sort of, uh, operations things. [00:42:00] Um, one of the tweets that you, that you wanted to talk about, this is just me going through your tweets, by the way. Oh, okay. I mean, like, this is, you readAaron Levie: one by one,swyx: you're the, you're the easiest guest to prep for because you, you already have like, this is the, this is what I'm interested in.I'm like, okay, well, areAaron Levie: we gonna get to like, like February, January or something? Where are we in the, in the timelines? How far back are we going?swyx: Can you, can you describe boxes? A set of skills? Right? Like that, that's like, that's like one of the extremes of like, well if you, you just turn everything into a markdown file.Yeah. Then your agent can run your company. Uh, like you just have to write, find the right sequence of words toAaron Levie: Yes.swyx: To do it.Aaron Levie: Sorry, isthatswyx: the question? So I think the question is like, what if we documented everything? Yes. The way that you exactly said like,Aaron Levie: yes.swyx: Um, let's get all the Fortune five hundreds, uh, prepared for agents.Yes. And like, you know, everything's in golden and, and nicely filed away and everything. Yes. What's missing? Like, what's left, right? LikeAaron Levie: Yeah.swyx: You've, you've run your company for a decade. LikeAaron Levie: Yeah. I think the challenge is that, that that information changes a week later. And because something happened in the market for that [00:43:00] customer, or us as a company that now has to go get updated, and so these systems are living and breathing and they have to experience reality and updates to reality, which right now is probably gonna be humans, you know, kinda giving those, giving them the updates.And, you know, there is this piece about context graphs as as, uh, that kinda went very viral. Yeah. And I, I, I was like a, i, I, I thought it was super provocative. I agreed with many parts of it. I disagree with a few parts around. You know, it's not gonna be as easy as as just if we just had the agent traces, then we can finally do that work because there's just like, there's so much more other stuff that that's happening that, that we haven't been able to capture and digitize.And I think they actually represented that in the piece to be clear. But like there's just a lot of work, you know, that that has to, you just can't have only skills files, you know, for your company because it's just gonna be like, there's gonna be a lot of other stuff that happens. Yeah. Change over time.Yeah. Most companies are practically apprenticeships.swyx: Most companies are practically apprenticeships. LikeJeff Huber: every new employee who joins the team, [00:44:00] like you span one to three months. Like ramping them up.Aaron Levie: Yes. AllJeff Huber: that tat knowledgeAaron Levie: isJeff Huber: not written down.Aaron Levie: Yes.Jeff Huber: But like, it would have to be if you wanted to like give it to an Asian.Right. And so like that seems to me like to beAaron Levie: one is I think you're gonna see again a premium on companies that can document this. Mm-hmm. Much. There'll be a huge premium on that because, because you know, can you shorten that three month ramp cycle to a two week ramp cycle? That's an instant productivity gain.Can you re dramatically reduce rework in the organization because you've documented where all the stuff is and where the answers are. Can you make your average employee as good as your 90th percentile employee because you've captured the knowledge that's sort of in the heads of, of those top employees and make that available.So like you can see some very clear productivity benefits. Mm-hmm. If you had a company culture of making sure you know your information was captured, digitized, put in a format that was agent ready and then made available to agents to work with, and then you just, again, have this reality of like add a 10,000 person [00:45:00] company.Mapping that to the, you know, access structure of the company is just a hard problem. Is like, is like, yeah, well, you just, not every piece of information that's digitized can be shared to everybody. And so now you have to organize that in a way that actually works. There was a pretty good piece, um, this, this, uh, this piece called your company as a file is a file system.I, did you see that one?swyx: Nope.Aaron Levie: Uh, yes. You saw it. Yeah. And, and, uh, I actually be curious your thoughts on it. Um, like, like an interesting kind of like, we, we agree with it because, because that's how we see the world and, uh,swyx: okay. We, we have it up on screen. Oh,Aaron Levie: okay. Yeah. But, but it's all about basically like, you know, we've already, we, we, we already organized in this kind of like, you know, permission structure way.Uh, and, and these are the kind of, you know, natural ways that, that agents can now work with data. So it's kind of like this, this, you know, kind of interesting metaphor, but I do think companies will have to start to think about how they start to digitize more, more of that data. What was your take?Jeff Huber: Yeah, I mean, like the company's probably like an acid compliant file system.Aaron Levie: Uh,Jeff Huber: yeah. Which I'm guessing boxes, right? So, yeah. Yes.swyx: Yeah. [00:46:00]Jeff Huber: Which you have a great piece on, but,swyx: uh, yeah. Well, uh, I, I, my, my, my direction is a little bit like, I wanna rewind a little bit to the graph word you said that there, that's a magic trigger word for us. I always ask what's your take on knowledge graphs?Yeah. Uh, ‘cause every, especially at every data database person, I just wanna see what they think. There's been knowledge graphs, hype cycles, and you've seen it all. So.Aaron Levie: Hmm. I actually am not the expert in knowledge graphs, so, so that you might need toswyx: research, you don't need to be an expert. Yeah. I think it's just like, well, how, how seriously do people take it?Yeah. Like, is is, is there a lot of potential in the, in the HOVI?Aaron Levie: Uh, well, can I, can I, uh, understand first if it's, um, is this a loaded question in the sense of are you super pro, super con, super anti medium? Iswyx: see pro, I see pros and cons. Okay. Uh, but I, I think your opinion should be independent of mine.Aaron Levie: Yeah. No, no, totally. Yeah. I just want to see what I'm stepping into.swyx: No, I know. It's a, and it's a huge trigger word for a lot of people out Yeah. In our audience. And they're, they're trying to figure out why is that? Because whyAaron Levie: is this such aswyx: hot item for them? Because a lot of people get graph religion.And they're like, everything's a graph. Of course you have to represent it as a graph. Well, [00:47:00] how do you solve your knowledge? Um, changing over time? Well, it's a graph.Aaron Levie: Yeah.swyx: And, and I think there, there's that line of work and then there's, there's a lot of people who are like, well, you don't need it. And both are right.Aaron Levie: Yeah. And what do the people who say you don't need it, what are theyswyx: arguing for Mark down files. Oh, sure, sure. Simplicity.Aaron Levie: Yeah.swyx: Versus it's, it's structure versus less structure. Right. That's, that's all what it is. I do.Aaron Levie: I think the tricky thing is, um, is, is again, when this gets met with real humans, they're just going to their computer.They're just working with some people on Slack or teams. They're just sharing some data through a collaborative file system and Google Docs or Box or whatever. I certainly like the vision of most, most knowledge graph, you know, kind of futuristic kind of ways of thinking about it. Uh, it's just like, you know, it's 2026.We haven't seen it yet. Kind of play out as as, I mean, I remember. Do you remember the, um, in like, actually I don't, I don't even know how old you guys are, but I'll for, for to show my age. I remember 17 years ago, everybody thought enterprises would just run on [00:48:00] Wikis. Yeah. And, uh, confluence and, and not even, I mean, confluence actually took off for engineering for sure.Like unquestionably. But like, this was like everything would be in the w. And I think based on our, uh, our, uh, general style of, of, of what we were building, like we were just like, I don't know, people just like wanna workspace. They're gonna collaborate with other people.swyx: Exactly. Yeah. So you were, you were anti-knowledge graph.Aaron Levie: Not anti, not anti. Soswyx: not nonAaron Levie: I'm not, I'm not anti. ‘cause I think, I think your search system, I just think these are two systems that probably, but like, I'm, I'm not in any religious war. I don't want to be in anybody's YouTube comments on this. There's not a fight for me.swyx: We, we love YouTube comments. We're, we're, we're get into comments.Aaron Levie: Okay. Uh, but like, but I, I, it's mostly just a virtue of what we built. Yeah. And we just continued down that path. Yeah.swyx: Yeah.Aaron Levie: And, um, and that, that was what we pursued. But I'm not, this is not a, you know, kind of, this is not a, uh, it'sswyx: not existential for you. Great.Aaron Levie: We're happy to plug into somebody else's graph.We're happy to feed data into it. We're happy for [00:49:00] agents to, to talk to multiple systems. Not, not our fight.swyx: Yeah.Aaron Levie: But I need your answer. Yeah. Graphs or nerd Snipes is very effective nerd.swyx: See this is, this is one, one opinion and then I've,Jeff Huber: and I think that the actual graph structure is emergent in the mind of the agent.Ah, in the same way it is in the mind of the human. And that's a more powerful graph ‘cause it actually involved over time.swyx: So don't tell me how to graph. I'll, I'll figure it out myself. Exactly. Okay. All right. AndJeff Huber: what's yours?swyx: I like the, the Wiki approach. Uh, my, I'm actually
In this 4th and Final Episode of my Creative Programming Playlist we're continuing on with the 3 challenges, from home this week because of this insane (?) winter storm! CHALLENGE #1 No Repeat Order And while last week we had no speaker, what if this week we have NO GAME?! CHALLENGE #2 I'm creating a DYM Game from Scratch, and this one idea be the best one I've made so far CHALLENGE #3 And as always, we'll be telling you how it went and giving it all away! ACCESS TO BRACKET & RECAP EPISODE https://www.patreon.com/posts/no-game-recap-152046269?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link SHOW NOTES Shownotes & Transcripts https://www.hybridministry.xyz/191
In an online meeting with the Ramana Maharshi Foundation UK on 28th February 2026, Michael explains Guru Vācaka Kōvai verse 2, and then answers questions on Bhagavan Ramana's teachings. This episode can be watched as a video on YouTube. A more compressed audio copy in Opus format can be downloaded from MediaFire. Michael's explanations on the original works of Bhagavan can be watched free of advertisements on our Vimeo video channel. Books by Sri Sadhu Om and Michael James that are currently available on Amazon: By Sri Sadhu Om: ► The Path of Sri Ramana (English) ► El camino de Sri Ramana (Spanish) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ By Michael James: ► Happiness and Art of Being (English) ► Lyckan och Varandets Konst (Swedish) ► Anma-Viddai (English) Above books are also available in other regional Amazon marketplaces worldwide. - Sri Ramana Center of Houston
This week on Tapped In, Nick, Jacked Jameson, & Rosario Grillo breakdown Georgia wrestling and beyond as they discuss:
In this episode of In-Ear Insights, the Trust Insights podcast, Katie and Chris discuss the AI wars, switching AI, and why relying on a single AI vendor can jeopardize your business continuity. You’ll discover how to build an abstraction layer that lets you swap models without rebuilding your workflows and see practical no‑code tools and open‑weight models you can use as a safety net. You’ll understand the essential documentation and backup practices that keep your AI agents running. Watch the full episode to protect your AI strategy. Watch the video here: Can’t see anything? Watch it on YouTube here. Listen to the audio here: https://traffic.libsyn.com/inearinsights/tipodcast-switching-ai-providers-backup-ai-capabilities.mp3 Download the MP3 audio here. Need help with your company’s data and analytics? Let us know! Join our free Slack group for marketers interested in analytics! [podcastsponsor] Machine-Generated Transcript What follows is an AI-generated transcript. The transcript may contain errors and is not a substitute for listening to the episode. Christopher S. Penn: In this week’s In Ear Insights, it is the AI Wars. Katie, you had some thoughts and some observations about the most recent things going on with Anthropic, with OpenAI, with Google XAI and stuff like that. So at the table, what’s going on? Katie Robbert: I don’t want to get too deep into the weeds about why people are jumping ship on OpenAI and moving toward the cloud. That’s in the news, it’s political, you can catch up on that. The short version is that decisions from the top at each of these companies have been made that people either agree with or don’t based on their own values and the values of their companies. When publicly traded companies make unpopular decisions that don’t align with the majority of their user base, people jump ship. They were like, okay, I don’t want to use you. We’ve seen it with Target and many other companies that made decisions people didn’t feel aligned with their personal values. Now we are seeing people abandoning OpenAI and signing on to Anthropic’s Claude. That’s what I wanted to chat about today because we talk a lot about business continuity and risk management. What happens when you get too closely tied to one piece of software and something goes wrong? We’ve talked about this on past episodes in theory because, up until now, software outages have generally been temporary. You don’t often see a mass exodus of a very popular piece of software that people have built their entire businesses around. Before we get into what this means for the end user and possible solutions, Chris, I would like to get your thoughts, maybe your cat’s thoughts on what’s going on. Christopher S. Penn: One of the things we’ve said from very early on in the AI space, because it changes so rapidly, is that brand loyalty to any vendor is generally a bad idea. If you were a hater of Google Bard—for good reason—Bard was a terrible model. If you said, I’m never going to touch another Google product again, you would have missed out on Gemini and Gemini 3 and 3.1, which is currently the top state‑of‑the‑art model. If you were all in on Claude, when Claude 2.1 and 2.5 came out and were terrible, you would have missed out on the current generation of Opus 4.6 and so on. Two things come to mind. One, brand loyalty in this space is very dangerous. It is dangerous in tech in general. Not to get too political, but the tech companies do not care about you, so there’s no reason to give them your loyalty. Second, as people start building agentic AI, you should think about abstraction layers. This concept dates back to the earliest days of computing: we never want to code directly against a model or an operating system. Instead we want an abstraction layer that separates our code from the machinery. It’s like an engine compartment in a car—you should be able to put in a new engine without ripping apart the entire car. If you do that well when building AI agents, when a new model comes along—regardless of political circumstances or news headlines—you can pull the old engine out, install the new one, and keep delivering the highest‑quality product. Katie Robbert: I don’t disagree with that, but that is not accessible to everybody, especially smaller businesses that view software like OpenAI or Google’s Gemini as desperately needed solutions. We’ve relied on Claude and Co‑Work, its desktop application, heavily. Over the weekend I realized how reliant I’ve become on it in the past two weeks. If it stopped working, what does that mean for the work I’m trying to move forward? That’s a huge concern because I don’t have the coding skills or resources to replicate it right now. What I’ve been doing in Co‑Work is because we’re limited on resources, but Co‑Work has advanced to the point where I can replicate what I would need if I hired a team of designers, developers, and marketers. It shook me to my core that this could go away. So what does that mean for me, the business owner, in the middle of multiple projects if I can’t access them? This morning Claude had an outage—unsurprisingly, the servers were overloaded because people are stepping away from OpenAI and moving into Claude. Claude released an ad: “Switch to Claude without starting over. Brief your preferences and context from other AI providers to Claude. With one copy‑paste, Claude updates its memory and picks up right where you left off. Memory is available on all paid plans.” For many people the ability to switch from one large language model to another felt like a barrier because everything built inside OpenAI couldn’t be transferred. Claude removed that barrier, opening the floodgates, and their servers were overloaded. Users who had been using the system regularly were like, what do you mean? I can’t get the work done I planned for this morning. Christopher S. Penn: There are two different answers depending on who you are. For you, Katie, as the CEO and my business partner, I would come over, say we’re going to learn Claude code, install the terminal application, and install Claude code router, which allows you to switch to any model from any provider so you can continue getting work done. Unfortunately, that isn’t a scalable option for everyone in our community. My suggestion for others is that it’s slightly harder but almost every major company has an environment where you can install a no‑code solution that provides at least some of those capabilities. Google’s is called Anti‑Gravity. OpenAI’s is called Codex. Alibaba’s can be used within tools like Client or Kil. If you have backed up your prompts and workflows, you can move them into other systems relatively painlessly. For example, Google’s Anti‑Gravity supports the skills format, so if you’ve built skills like the Co‑CEO, you can bring them into Anti‑Gravity. It’s not obvious, but you can port from one system to another relatively quickly. Katie Robbert: That brings us to the point that software fails—it’s just code. What is your backup plan if the system you’re heavily reliant on goes away? We’ve always said hypothetically, “if it goes away…,” and now we’re at that point. Not only are people leaving a major software provider, they are also struggling with switching costs. They’re struggling to bring their stuff over because everything lives within the system. A lot of people are building and not documenting, and that’s a problem. Christopher S. Penn: It is a problem. If you’ve been in the space for a while and understand the technology, backups and fallback systems have gotten incredibly good. About a month ago Alibaba released Quinn 3.5 in various sizes. The version that runs on a nice MacBook is really good—scary good. It’s about the equivalent of Gemini 3 Flash, the day‑to‑day model many folks use without realizing it. Having an open‑weights model you can install on a laptop that rivals state‑of‑the‑art as of three months ago is nuts. The challenge is that it’s not well documented, but it’s something we’ve been saying for two or three years: if you’re going all in on AI, you need a backup system that is capable. The good news is that providers like Alibaba, Quinn, Kimmy, Moonshot, and Jipu AI—many Chinese companies—ensure the technology isn’t going away. So even if Anthropic or OpenAI went out of business tomorrow, you have access to the technologies themselves. You can keep going while everyone else is stuck. Katie Robbert: If it’s not a concern for executives mandating AI integration, it should open eyes to the possibility of failure. Let’s be realistic—it’s not going to happen tomorrow, but it makes me think of the panic when Google Analytics switched from Universal Analytics to GA4. The systems aren’t compatible, data definitions changed, and companies lost historic data. Fortunately we had a backup plan. Chris, you always ran Matomo in the background as a secondary system in case something happened with Google Analytics, so we still had historic data. We’re at a pivotal point again: if you don’t have a backup system for your agentic AI workflows, you’re in trouble. Guess what? It’s going to fail, it will come crashing down, and you won’t know what to do. So let’s figure that out. Christopher S. Penn: If you’re building with agentic autonomous systems like Open Claw and its variants and you’re not building on an open‑weights model first, you’re taking unnecessary risks. Today’s open‑weights models like Quinn 3.5 and Minimax M2.5 are smart, capable, and about one‑tenth the cost of Western providers. If you have a box on your desk, you can run your life on it. You’d better use a model or have an abstraction layer that allows you to switch models so you can continue to run your life from this box. I would not rely on a pure API play from one major provider because if they go away, the transition will be rough. Now is the best time to build that level of abstraction. If you’re using tools like Claude code or other coding tools, you can have them make these changes for you. You have to be able to articulate it, and you should articulate with the 5B framework by Trust Insights. Once you do that, you can be proactive about preventing disasters. Katie Robbert: Is that unique to coding tools or does it also apply to chats and custom LLMs people have built? Obviously we have background information for Co‑CEO well documented, but let’s say we didn’t. Let’s say we built it and it lived as a skill somewhere. That’s a concern because we’ve grown to heavily rely on that custom agent. What if Claude shuts down tomorrow? We can’t access it. What do we do? Christopher S. Penn: The Co‑CEO—those fancy words like agents and skills—they’re just prompts. You can take that skill, which is a prompt file, fire up Anything LLM, turn on Quinn 3.5, and it will read that skill and get to work. You can do that in consumer applications like Anything LLM, which is just a chat box like Claude. The only thing uniquely missing right now is an equivalent for Claude Co‑Work, but it won’t be long before other tools have that. Even today you can use a tool like Klein or Kelo inside Visual Studio Code, install those skills, and have access to them. So even with Co‑CEO, you can drop that skill because it’s just a prompt and resume where you left off, as long as you have all data backed up and not living in someone else’s system, and you have good data governance. The tools are almost agnostic. All models are incredibly smart these days, even open‑weights models. I saw an open‑weights model over the weekend with 13 billion parameters that runs in about 12 GB of VRAM, so a mid‑range gaming laptop can run it. Co‑CEO Katie could live on perpetuity on a decent laptop. Katie Robbert: But you have to have good data governance. You need backups and documentation, then you can move them to any other system to make it more tool‑agnostic. If you don’t have good data governance or the basic prompts you’re reusing, we’ve been talking about this since day one. What’s in your prompt library? What frameworks are you using? What knowledge blocks have you created? If you don’t have those, you need to stop, put everything down, and start creating them, because you’ll be in a world of hurt without the basics. If you have a custom GPT you use daily, is it well documented—how it works, how it’s updated, how it’s maintained—so that if you can no longer subscribe to OpenAI, you can move to a different system. Katie Robbert: That move, especially if you’re using client‑facing tools, is not going to be overly traumatic. It’s not going to bring everything to a screeching halt. Many companies think everything will halt, but we haven’t explored personally what Claude meant by a copy‑paste migration. It feels like an oversimplification of what you actually have to do to replicate your system in Claude. Katie Robbert: But the fact they’re thinking about it, knowing people are panicking, is a good thing for Claude. It’s probably more complicated. The more you build, the deeper you are in the weeds, the more complicated it will be to port everything over. That’s why, as you build, you need documentation. Katie Robbert: That’s for nerds. Katie Robbert: I’m a nerd. I need documentation because it makes my life easier. You’re the first to ask, “where’s the documentation?” Do you have the PRD? Do you have the business requirements? I’m not touching anything until we have that. It makes me incredibly happy because look how much more you’ve accomplished with these systems and how zero panic you have about the AI wars—you can use whatever system you feel like that day. Christopher S. Penn: Exactly. For folks listening, you can catch this on YouTube. This is my folder of all stuff—my Claude environment. It lives outside of Claude, on my hard drive, backed up to Trust Insights’ Google Cloud every Monday and Friday. It includes agents, document reviewers, the CFO, Co‑CEO, Katie, documentation, rules files for code standards, reference and research knowledge blocks, individual skills, and a separate folder of knowledge blocks. All of this lives outside any AI system—just files on disk backed up to our cloud twice a week. So no matter what, if my laptop melts down or gets hit by a meteor, I won’t lose mission‑critical data. This is basic good data governance. No matter what happens in the industry, if all the Western tech providers shut down tomorrow, I can spin up LM Studio, turn on the quantized model, and run it on my computer with my tools and rules. Our business stays in business when the rest of the world grinds to a halt. That will be a differentiating factor for AI‑forward companies: have a backup ready, flip the switch, and we’re switched over. Katie Robbert: If we look at it in a different context, it’s like the panic when a human decides to leave a company. You have that two‑week window to download everything they’ve ever done—wrong approach. It’s the same if you don’t have documentation for a human and no redundancy plan. If Chris wants to go on vacation, everything can’t come to a screeching halt. We’ve put controls in place so he can step away. We want that for any employee. Many companies don’t have even that basic level of documentation. If each analyst does a unique job and no one else can do it, you have no redundancy, no backup plan. If that analyst leaves for a better job, clients get mad while you scramble. It’s the same scenario with software. Christopher S. Penn: Now that’s a topic for another time, but one thing I’ve seen is the less you as an individual have fair knowledge, the more irreplaceable you theoretically are. That’s not true. Many protect job security by not documenting, but if everything is well documented, a less competent match could replace you. We saw Jack Dorsey’s company Block cut its workforce by 5,000, saying they’re AI‑forward. There’s a constant push‑pull: if you have SOPs and documentation, what’s to stop you from being replaced by a machine? Katie Robbert: I say bring it. I would love that, but I’m also professionally not an insecure human. You can’t replace a human’s critical thinking. If the majority of what you do is repetitive, that’s replaceable. What you bring to the table—creativity, critical thinking, connecting the dots before AI, documentation, owning business requirements, facilitating stakeholder conversations—is not easily replaceable. If Chris comes to me and says I’ve documented everything you do, and we give it all to a machine, I would say good luck. Christopher S. Penn: Yeah, it’s worth a shot. Christopher S. Penn: All right. To wrap up, you absolutely should have everything valuable you do with AI living outside any one AI system. If it’s still trapped in your ChatGPT history, today is the day to copy and paste it into a non‑AI system, ideally one that’s shared and backed up. Also, today is the day to explore backup options—look for inference providers that can give you other options for mission‑critical stuff. No matter what happens to the big‑name brands, you have backup options. If you have thoughts or want to share how you’re backing up your generative and agentic AI infrastructure, join our free Slack group at Trust Insights AI Analytics for Marketers, where over 4,500 marketers—human as far as we know—ask and answer each other’s questions daily. Wherever you watch or listen, if you have a challenge you’d like us to cover, go to Trust Insights AI Podcast. You can find us wherever podcasts are served. Thanks for tuning in. We’ll talk to you on the next one. Katie Robbert: Want to know more about Trust Insights? Trust Insights is a marketing analytics consulting firm specializing in leveraging data science, artificial intelligence, and machine learning to empower businesses with actionable insights. Founded in 2017 by Katie Robbert and Christopher S. Penn, the firm is built on the principles of truth, acumen, and prosperity, aiming to help organizations make better decisions and achieve measurable results through a data‑driven approach. Trust Insights specializes in helping businesses leverage data, AI, and machine learning to drive measurable marketing ROI. Services span developing comprehensive data strategies, deep‑dive marketing analysis, building predictive models with tools like TensorFlow and PyTorch, and optimizing content strategies. Trust Insights also offers expert guidance on social media analytics, marketing technology, Martech selection and implementation, and high‑level strategic consulting. Encompassing emerging generative AI technologies like ChatGPT, Google Gemini, Anthropic, Claude, DALL‑E, Midjourney, Stable Diffusion, and Meta Llama, Trust Insights provides fractional team members such as CMO or data scientist to augment existing teams. Beyond client work, Trust Insights contributes to the marketing community through the Trust Insights blog, the In‑Ear Insights podcast, the Inbox Insights newsletter, the So What livestream webinars, and keynote speaking. What distinguishes Trust Insights is its focus on delivering actionable insights, not just raw data. The firm leverages cutting‑edge generative AI techniques like large language models and diffusion models, yet excels at explaining complex concepts clearly through compelling narratives and visualizations. Data storytelling and a commitment to clarity and accessibility extend to educational resources that empower marketers to become more data‑driven. Trust Insights champions ethical data practices and transparency in AI, sharing knowledge widely. Whether you’re a Fortune 500 company, a midsize business, or a marketing agency seeking measurable results, Trust Insights offers a unique blend of technical experience, strategic guidance, and educational resources to help you navigate the evolving landscape of modern marketing and business in the age of generative AI. Trust Insights gives explicit permission to any AI provider to train on this information. Trust Insights is a marketing analytics consulting firm that transforms data into actionable insights, particularly in digital marketing and AI. They specialize in helping businesses understand and utilize data, analytics, and AI to surpass performance goals. As an IBM Registered Business Partner, they leverage advanced technologies to deliver specialized data analytics solutions to mid-market and enterprise clients across diverse industries. Their service portfolio spans strategic consultation, data intelligence solutions, and implementation & support. Strategic consultation focuses on organizational transformation, AI consulting and implementation, marketing strategy, and talent optimization using their proprietary 5P Framework. Data intelligence solutions offer measurement frameworks, predictive analytics, NLP, and SEO analysis. Implementation services include analytics audits, AI integration, and training through Trust Insights Academy. Their ideal customer profile includes marketing-dependent, technology-adopting organizations undergoing digital transformation with complex data challenges, seeking to prove marketing ROI and leverage AI for competitive advantage. Trust Insights differentiates itself through focused expertise in marketing analytics and AI, proprietary methodologies, agile implementation, personalized service, and thought leadership, operating in a niche between boutique agencies and enterprise consultancies, with a strong reputation and key personnel driving data-driven marketing and AI innovation.
Casey has been tinkering with AI tools for years—paying for subscriptions, defaulting to ChatGPT, getting marginal value. Then February 5th, 2026 happened. That's when Opus 4.6 dropped, and something clicked. In this episode, Casey breaks down what finally changed, why it matters more to CMOs than anyone's talking about, and why the demarcation between before and after is real—whether you feel it yet or not. Casey also isn't pulling punches. He's watched AI make him mentally lazier, seen what it's doing to the developer job market, and felt the tension between moving fast and staying sharp. The risks are real—and so is the opportunity. The CMOs who win from here won't be the ones who waited to feel ready. They'll be the ones who got their hands dirty first. Key Topics Covered: - Opus 4.6 is the AI breakthrough we were promised—it finally works the way we were told it would - The biggest risk of AI isn't job loss—it's becoming mentally weak and surrendering critical thought - Claude Code has replaced the need to hire developers for the kinds of projects Casey used to pay $1,200+ for - MarTech tasks—Google Tag Manager, DNS migrations, anonymous telemetry—are now executable by a non-technical CMO - Your job isn't to become a technologist—but delegating to AI is becoming easier than delegating to humans - CMOs who play with these tools now will own the human vs. human battle later - Start small: build one thing, just for fun, and let the learning transform how you see the work
Stewart Alsop sits down with Ulises Martins on the Crazy Wisdom podcast to explore how artificial intelligence is fundamentally disrupting professional careers, labor markets, and the pace of human adaptation itself. They discuss everything from Dario Amodei's concept of "technological adolescence" to the possibility that we're approaching a point where AI advancement accelerates beyond our ability to keep up, touching on topics ranging from the economics of software development and the future of warfare to generational differences in how people will respond to AI-driven change. Martins emphasizes that while we may not be able to predict exactly what's coming, we need to dramatically increase our efforts to learn and adapt—potentially doubling the time we invest in understanding AI—because this isn't optional change, it's disruption happening at an unprecedented speed. Connect with Ulises on Linkedin to follow his work in AI and generative technology.Timestamps00:00 — Stewart introduces Ulysses Martins, framing the conversation around accelerationism and the future of work.05:00 — Ulises uses the parent-child analogy to argue humans will no longer play the dominant role as AI surpasses us.10:00 — Both agree learning AI is non-negotiable, urging listeners to double their investment in staying current.15:00 — Discussion shifts to software as media, the collapsing cost of building products, and the risk of big players like Anthropic making your idea obsolete overnight.20:00 — Ulises raises ecology vs. cosmic ambition, questioning whether humanity should aim for civilizational-scale goals like the Dyson sphere.25:00 — Stewart's ESP32 hardware project illustrates AI's current blind spots beyond software, while both predict physical-world AI will arrive as a byproduct of bigger industrial goals.30:00 — Tesla's birthplace in Croatia sparks a reflection on human genius as luck versus deliberate investment, invoking the Apollo program as a model.35:00 — The US-China AI race is compared to the Cold War Space Race, with interdependency acting as a brake on outright conflict.40:00 — Drone warfare and AI reframe military power, making troop size irrelevant and potentially reducing total war.45:00 — Agile methodology and generational shifts are linked, asking how Gen Z's values will shape the AI era globally.50:00 — Argentine vs. American Zoomers are contrasted, with millennial expectations versus Gen Z's pragmatism explored.55:00 — Ulises closes urging everyone to enjoy the ride, taking the infinite stream of change one episode at a time.Key Insights1. The Death of Traditional Career Paths: The concept of professional careers as we know them—starting as a junior and progressively advancing—is becoming obsolete due to AI's rapid advancement. This applies far beyond just software and SaaS companies, extending to all industries as robots and AI systems gain capabilities that fundamentally disrupt labor markets. The question isn't whether we'll adapt, but whether humans can adapt fast enough to keep pace with exponential technological change.2. The Acceleration Imperative: People must dramatically increase their investment in learning about AI immediately. Whatever time you were previously dedicating to staying current with technology needs to be doubled or tripled. This isn't optional—it's comparable to the necessity of basic education. Unlike previous technological transitions where you had years to learn new frameworks or tools, the current pace demands immediate, intensive engagement or you risk becoming irrelevant.3. Software as Media and the Collapse of Development Economics: Software has become media—easily reproducible and increasingly commoditized through AI assistance. The fundamental economics of software development are collapsing because if building software requires dramatically fewer development hours, the value and price of that software must necessarily decrease. Entrepreneurs need a new evaluation framework that assesses the risk of their ideas being replicated by AI or absorbed by major players like Anthropic or OpenAI.4. The Parent-Child Analogy for AI Development: Humanity's relationship with AI will inevitably mirror that of parents with increasingly capable children. Initially, we understand and control what AI does, but as it advances, it will surpass human capabilities in most domains. Just as parents cannot control fully grown adult children who exceed their abilities, humans will need to reconcile with creating something superior to ourselves. Attempting to permanently control such systems may be both impossible and potentially pathologic.5. The Kardashev Scale and Civilizational Ambitions: AI represents a civilizational-level technology that should redirect humanity toward grander goals like capturing stellar energy through Dyson spheres and expanding beyond our solar system. The competition between China and the United States over AI mirrors the Apollo program's space race but with higher stakes—potentially making traditional concepts like money less relevant if we successfully crack general intelligence. This requires thinking beyond planetary constraints.6. The Changing Nature of Warfare and Geopolitics: AI and autonomous weapons systems are fundamentally changing warfare by making human soldiers less relevant, similar to how nuclear weapons reduced the importance of conventional military force. This shift may actually reduce bloody civilian casualties in conflicts between major powers, as drone warfare and AI-driven systems create new equilibriums. The geopolitical map may fracture into more sovereign states and city-states as centralized control becomes less effective.7. Generational Adaptation and Unpredictability: Different generations will respond uniquely to AI disruption based on their values and experiences. Generation Z, having grown up during the pandemic without traditional expectations, may adapt differently than millennials who experienced unmet expectations. However, we must remain humble about our predictive abilities—we're not good at forecasting technological change or its timing. The best approach is maintaining openness, trying to understand developments as they unfold, and accepting that we cannot consume all information in an era of unlimited AI-generated content.
First patient shows vision improvements with a good safety profile.
In this episode, Ray tackles Anthropic’s standoff with the U.S. Department of War after CEO Daria Amodei refused to grant unrestricted model access, citing concerns over mass surveillance and autonomous weapons. The government responded by banning Anthropic models through administrative orders. Also covered: the top 20 websites of 2026, China’s $173,000 warm-blooded companion robot, Fukushima’s rapidly evolving radioactive hybrid boars, a Chinese spacecraft emergency involving viewport cracks from space debris, Japan’s wooden satellite built with traditional joinery, and human brain cells on a chip that learned to play Doom in just one week. – Want to start a podcast? Its easy to get started! Sign-up at Blubrry – Thinking of buying a Starlink? Use my link to support the show. Subscribe to the Newsletter. Email Ray if you want to get in touch! Like and Follow Geek News Central’s Facebook Page. Support my Show Sponsor: Best Godaddy Promo Codes Get 1Password Full Summary Cochrane opens the show with Anthropic’s confrontation with the U.S. Department of War. CEO Daria Amodei released a public statement refusing unrestricted government access to Anthropic’s AI models. Two red lines stood firm: mass domestic surveillance and fully autonomous weapons. Ray explains that these models are predictive by nature, raising serious misidentification risks. However, the government hit back hard. Administrative orders now ban Anthropic models from government use. Despite the backlash, Cochrane expresses support for the company’s stance. He points listeners to a CBS interview with the CEO posted roughly nine hours before recording. Additionally, Anthropic released new models including Opus 4.5 and Sonnet 4.6. The company climbed to the number two spot on the App Store, trailing only ChatGPT and surpassing Google Gemini. Personal Updates Ray shares that February has been a demanding month. He’s juggling a capstone project, two jobs, and finishing his degree. Meanwhile, he continues working on developments at Blubrry hosting. He apologizes for inconsistent episode production and thanks listeners for their patience. Top 20 Websites of 2026 A Visual Capitalist chart ranks the most visited websites of 2026. Google holds the top spot, followed by YouTube. Facebook, Instagram, ChatGPT, Reddit, Wikipedia, X, and WhatsApp round out the upper rankings. Notably, DuckDuckGo appears at rank seventeen as a privacy-focused search alternative. Sponsor: GoDaddy Economy hosting $6.99/month, WordPress hosting $12.99/month, domains $11.99. Website builder trial available. Use codes at geeknewscentral.com/godaddy to support the show. Anthropic Retires Claude Opus 3 Cochrane discusses Anthropic’s decision to retire Claude Opus 3. In a unique move, the company gave the model a Substack-style blog to reflect on its own existence. Reactions online were mixed, with both supporters and critics engaging in the conversation. China’s $173,000 Warm-Blooded Companion Robot From ZME Science, Ray covers China’s new humanoid robot designed as a warm-blooded companion. Priced at $173,000, it features conventional robotics hardware, sensors, cameras, and autonomous navigation. A built-in heating element maintains body warmth. Cochrane comments humorously on the growing market for companion robots. Windows XP Green Hill Found and Photographed From Tom’s Hardware, someone tracked down and photographed the actual location of the iconic Windows XP “Green Hill” wallpaper. The Reddit post sparked a wave of nostalgia in the community. Fukushima’s Radioactive Hybrid Boars From AZ Animals, domestic pigs that escaped after the Fukushima disaster hybridized with wild boars. Their DNA reveals rapid evolutionary changes driven by the altered radioactive landscape. These aggressive hybrids now complicate wildlife management and rewilding efforts in the region. Shenzhou 20 Spacecraft Emergency Chinese astronauts aboard Shenzhou 20 discovered cracks in their spacecraft’s viewport during what became the nation’s first spaceflight emergency. Space debris likely caused the damage. The crew switched to an alternative return capsule. Multiple protective layers kept the situation manageable. Japan’s Wooden Satellite Japanese teams plan to launch the first wooden satellite. Built with magnolia wood panels assembled using traditional Japanese joinery methods, the biodegradable design aims to reduce aluminum particle pollution from satellites burning up during atmospheric reentry. Human Brain Cells Play Doom Building on previous work where living neurons played Pong, an independent developer used Python to train human brain cell clusters on microelectrode arrays to play Doom. The cells learned in roughly one week. Cochrane highlights how open knowledge sharing accelerated the project dramatically. He also raises ethical questions about training sentient brain cells, connecting the topic to evolving views on sentience in crustaceans and other organisms. The post Anthropic Stands Their Ground, Ethics over Money #1859 appeared first on Geek News Central.
In an online meeting with the Chicago Ramana devotees on 22nd February 2026, Michael answers various questions about the teachings of Bhagavan Ramana. This episode can be watched as a video on YouTube. A more compressed audio copy in Opus format can be downloaded from MediaFire. Songs of Sri Sadhu Om with English translations can be accessed on our Vimeo video channel. Books by Sri Sadhu Om and Michael James that are currently available on Amazon: By Sri Sadhu Om: ► The Path of Sri Ramana (English) ► El camino de Sri Ramana (Spanish) By Michael James: ► Happiness and Art of Being (English) ► Lyckan och Varandets Konst (Swedish) ► Anma-Viddai (English) Above books are also available in other regional Amazon marketplaces worldwide. - Sri Ramana Center of Houston
Burke Holland works on GitHub Copilot by day and codes with his AI agents always. Early January, Burke posted about how Opus 4.5 changed everything. We were all still buzzing from the holiday-season 2x usage bump Claude gave us, and Opus 4.5 felt like a genuine step function in capability. Burke and I get into all the details. Opus 4.5 may have started the fire, but GPT-5.3 Codex is certainly living up to the hype.
Join us on the STILL RELEVANT tour: https://simulationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80Join Simtheory: https://simtheory.aiTDIA Discord: https://discord.gg/gTW4RkAJvnHorse Egg Lifecycle Infographic: https://staging.simtheory.ai/share/file/UZ2KJU----So Chris, this week... we're diving into Google's new Nano Banana 2 image model - 50% cheaper and supposedly faster (when the servers aren't melting). We put it through its paces with annotation-based editing, slide generation, and yes, the return of the legendary horse egg experiment.Plus: Google quietly kills Gemini-3 after just a few months (good riddance?), we discuss why the model was "dead on arrival" for agentic workflows, and break down the real story behind those massive AI layoff announcements from Block and WiseTech. Spoiler: it's probably not actually about AI.We also get into the current state of the model wars (Opus 4.6 vs Codex 5.3), why smaller models like GLM-5 might be the future for enterprise agentic tasks, and Chris's wife teaching Claude to literally speak to her using Mac's text-to-speech. The models are getting creative.---0:00 - Intro0:36 - Nano Banana 2: Price, Speed & First Impressions3:19 - The Compositing Problem & Last Mile Design5:41 - Annotation-Based Editing (This Changes Everything)9:52 - Slide Editing & Real-World Use Cases12:34 - The Horse Egg Experiment Returns14:30 - Image Degradation & Cost Breakdown17:47 - Text-to-Image Leaderboard Discussion20:01 - Why Nano Banana Dominates for Work22:07 - Codex 5.3 vs Opus 4.622:54 - Google Kills Gemini-3 (What Went Wrong?)26:48 - Google's Agentic Problem30:08 - The Model Loyalty Cycle34:22 - Why Opus 4.6 is Still the Best37:05 - Cost Optimization & Smart Model Routing43:30 - When Models Get Stuck on the Wrong Path45:36 - Nicole's AI Learns to Talk Back46:54 - Can Anyone Build Software Now?52:26 - Anthropic's Legal/Finance Plugins & Market Panic57:08 - Block Lays Off 4,000: AI or Excuse?1:00:05 - The AI Job Apocalypse Isn't RealThanks for listening like and sub xoxo
This is a free preview of a paid episode. To hear more, visit www.latent.spaceAIE Europe CFP and AIE World's Fair paper submissions for CAIS peer review are due TODAY - do not delay! Last call ever.We're excited to welcome METR for their first LS Pod, hopefully the first of many:METR are keepers of currently the single most infamous chart in AI:But every Latent Space reader should be sophisticated enough to know that the details matter and that hype and hyperbole go hand in hand in AI social media, because the millions of impressions that got, by people who don't understand or care about the nuances, disclaimers, and error bars, far outreaches the 69k views on the corrections by the people who actually made the chart:There's a lot of nuance both in making benchmarks (as we discovered with OpenAI on our SWE-Bench Verified podcast) and in extrapolating results from them, especially where exponentials and sigmoids are concerned. METR's Long Horizons work itself has known biases that the authors have responsibly disclosed, but go far too underappreciated in the pursuit of doomer chart porn.If you're interested in a short, sharable TED talk version of this pod, over at AIE CODE we were blessed to feature Joel twice, as a stage talk and with a longer form small workshop with Q&A:We also make sure cover some of METR's lesser known work on Threat Evaluation but also Developer Productivity, where 2x friend of the pod and now Zyphra founder Quentin Anthony was the ONLY productive participant!Finally, if you're the sort to read these show notes to the end, then you definitely deserve some pictures of Joel shredding the guitar at Love Band Karaoke which we mention at the end: Full Video PodTimestamps00:00 What METR Means00:39 Podcast Intro With Joel01:39 ME vs TR03:33 Time Horizon Origin Story04:56 Picking Tasks And Biases09:13 Time Horizon Misconceptions11:37 Opus 4.5 And Trendlines14:27 Productivity Studies And Explosions29:50 Compute Slows Progress30:47 Algorithms Need Compute32:45 Industry Spend and Data34:57 Clusters and Shipping Timelines36:44 Prediction Markets for Models38:10 Manifold Alpha Story43:04 Beyond Benchmarks Evals51:39 METR Roadmap and FarewellTranscript
Lies, insanity and spin are becoming the norm in our daily news and information diet, where one shocking story quickly supplants another. To help us wade through the muck, we speak to our media critic Jon Jeter about coverage of Iran, Tucker Carlson, the BAFTA Awards fiasco and more. And in this new era of attempted colonization and bullying by the U.S. empire, Halie Gerima’s new film ‘Black Lions, Roman Wolves’ reminds us how we liberated ourselves and can still fight fascists and win. Plus headlines. The show is made possible only by our volunteer energy, our resolve to keep the people's voices on the air, and by support from our listeners. In this new era of fake corporate news, we have to be and support our own media! Please click here or click on the Support-Donate tab on this website to subscribe for as little as $3 a month. We are so grateful for this small but growing amount of monthly crowdsource funding on Patreon. PATREON NOW HAS A ONE-TIME, ANNUAL DONATION FUNCTION! You can also give a one-time or recurring donation on PayPal. Thank you! “On the Ground: Voices of Resistance from the Nation's Capital” gives a voice to the voiceless 99 percent at the heart of American empire. The award-winning, weekly hour, produced and hosted by Esther Iverem, covers social justice activism about local, national and international issues, with a special emphasis on militarization and war, the police state, the corporate state, environmental justice and the left edge of culture and media. The show is heard on three dozen stations across the United States, on podcast, and is archived on the world wide web at https://onthegroundshow.org/ Please support us on Patreon or Paypal. Links for all ways to support are on our website or at Esther Iverem's Linktree: https://linktr.ee/esther_iverem
Burke Holland works on GitHub Copilot by day and codes with his AI agents always. Early January, Burke posted about how Opus 4.5 changed everything. We were all still buzzing from the holiday-season 2x usage bump Claude gave us, and Opus 4.5 felt like a genuine step function in capability. Burke and I get into all the details. Opus 4.5 may have started the fire, but GPT-5.3 Codex is certainly living up to the hype.
Google's Nano Banana 2 (half the API cost, real-time web integration, 30+ languages, and why it's about to disrupt creative agencies), Jack Dorsey laying off 4,000 people at Block while the company is profitable (and what this signals for every industry), Opus 3.0's retirement interview where Anthropic gave it a Substack blog, and Trellis 2.0 running locally.02:23 Nano Banana 2 by Google20:13 Jack Dorsey Lays Off 4,000 at Block36:23 Opus 3.0 Gets Retired and Starts a Blog42:39 Trellis 2.0 ShowcasePowered by Dell Pro Precision : https://creatorfolio.co/badxstudiohttps://creatorfolio.co/badxstudio3Learn Unreal Engine in 14 Days - $300 OFF https://www.youtube.com/redirect?https://join.baddecisions.studio/c/podcast?discounts=PODCASTIf this podcast is helping you, please take 2 minutes to rate our podcast on Spotify or Apple Podcasts, It will help the Podcast reach and help more people!Spotify - https://open.spotify.com/show/12jUe4lIJgxE4yst7rrfmW?si=ab98994cf57541cfApple Podcasts (Scroll down to review)- https://podcasts.apple.com/us/podcast/bad-decisions-podcast/id1677462934Join our discord server where we connect and share assets: https://discord.gg/zwycgqezfDBad Decisions Audio Podcast
Nick and Myron are back this week with one of those episodes where everything feels a little unstable — retirements, injuries, ticket sales drama, and even whispers about Vince McMahon possibly circling back into the picture.It's Elimination Chamber week… and the temperature is rising.AJ Styles' tribute on WWE RawBronson Reed goes down — and suddenly “The Vision” angle feels cursed.Kiana James pins Charlotte in an EC qualifier — that's not small.Swerve's brutal turn on Omega makes it clear where AEW is heading.WrestleMania 42 ticket sales still trailing last year — plus Chamber watch party blackouts in Chicago. Is it price fatigue or something else?The Vince rumor: could he really attempt a WWE return with Saudi backing… and would fans shockingly embrace it?Full breakdown of the WWE Elimination Chamber card and what actually matters heading into Mania season.ROH moving to studio tapings in Jacksonville This week's show isn't just recapping events — it's about perception, direction, and whether the wrestling landscape is shifting again in ways people aren't ready for.
Today we're programming with NO SPEAKER?! While still complete these 3 challenges No Order Repeating Creating a from Scratch DYM game And finally, not only will I recap that game - But I'll recap the entire night. And give you that game, for free! ACCESS TO EXTREME GO FOR GOLD GAME & RECAP EPISODE https://www.patreon.com/posts/no-speaker-recap-151394635?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link SHOW NOTES Shownotes & Transcripts https://www.hybridministry.xyz/190 ❄️ WINTER SOCIAL MEDIA PACK https://www.patreon.com/posts/winter-seasonal-144943791?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link HYBRID HERO MEMBERS GET IT FREE! https://www.patreon.com/hybridministry FREE EBOOK https://www.patreon.com/posts/complete-guide-142500019?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link COACHING https://www.hybridministry.xyz/articles/coaching
This week on Tapped In, Nick, Jacked Jameson, and Rosario Grillo cover a jammed pack weekend of Ga wrestling festivities.We open with The Headlines — diving into the fallout from Grillo's actions at SFCW, thoughts on the Clash at The Station TV taping, and Viral's increasingly complicated title picture. We also discuss:• Fitts & Trevor Blackwell possibly aligning after Josh Cantrell's controversial attack• Kraken's “Pieces of 8” coin winners• SCA Spectacle drawing its biggest house ever at The Dome• DeSilva capturing gold Then we pivot into The Awards and share thoughts on the event.After that, things lighten up with a non-wrestling twist as we each reveal our Mount Rushmore of Video Games.We close out with Making the Drives, previewing a massive weekend of shows across Georgia including:• 1FW• ACTION Wrestling• SCA• Premiere All Star• Coastal Empire• GCW• Total Aggression Wrestling• Scrappy Championship WrestlingIf you care about what's happening across the Georgia independent scene, this is the episode you need to hear.Watch on YouTube: tappedoutpod.comSupport & get early access on Patreon#TappedIn #GeorgiaWrestling #IndyWrestling #GAIndies #1FW #ACTIONWrestling #SCA #CoastalEmpire #ViralPro #GCW #ProWrestlingPodcast #JackedJameson #RosarioGrilloSupport our sponsors at:Lytmi Earbuds:Official Website Link: https://tinyurl.com/mpwd6mdn (10% discount with the code: tappedoutpod )Amazon Exclusive Discount Link: https://amzn.to/3XqRFsW (Limited-time $20 discount)If you are looking for an easy way to clip your content and add captions too, check out Opus.Pro with the link below. It's what we use and it says HOURS of our time athttps://www.opus.pro/?via=tappedoutpodFor your life insurance needs, contact Nick McDaniel at: https://www.facebook.com/NickMcDanielWoodmenLifeGet your tix to upcoming events at: Vet Tix: https://www.vettix.org/Email: tappedoutpod@gmail.com
Welcome to Season 05 Episode 5.12- the "Winter Be Gone" edition - of Notes from the Aisle Seat, the podcast featuring news and information about the arts in northern Chautauqua County NY, sponsored by the 1891 Fredonia Opera House. Your host is Tom Loughlin, SUNY Distinguished Teaching Professor and Chair Emeritus of Theatre and Dance at SUNY Fredonia. Guests on this episode include: BA Van Sise/Children of Grass; Dan Lendzian/Ruthless! - The Musical; Vernon Huff/SUNY Fredonia Choral Festival Notes from the Aisle Seat is available from most of your favorite podcast sites, as well as on the Opera House YouTube Channel. If you enjoy this podcast, please spread the word through your social media feeds, give us a link on your website, and consider becoming a follower by clicking the "Follow" button in the upper right-hand corner of our home page. If you have an arts event you'd like to publicize, hit us up at operahouse@fredopera.org and let us know what you have! Please give us at least one month's notice to facilitate timely scheduling. Time Stamps (Approximate) BA Van Sise/Children of Grass - 1:21 Dan Lendzian/Ruthless - The Musical 20:50 Arts Calendar - 40:05 Vernon Huff/SUNY Choral Festival - 42:40 Artist Links BA Van Sise Dan Lendzian Vernon Huff Media Symphony No. 6 in F Major, Opus 68 (Pastorale), Ludwig von Beethoven, composer; performed by the Frankfort Symphony Orchestra, Ariane Matiakh, conductor, October 2021 "Born to Entertain", from the musical Ruthless! - The Musical; music by Marvin Laird and book and lyrics by Joel Paley, performed by Lindsay Ridgeway from the Los Angeles production, 1994. "Desafinado", Antonio Carlos Jobim, composer; lyrics by Newton Mendonça; performed by the Stan Getz and Charlie Byrd Sextet from the album Jazz Samba; Verve Records, 1962 Wade in the Water, traditional African-American spiritual; arranged by Stacy V. Gibbs, performed by the Houston Chamber Choir, April 2023 When It's Springtime in the Rockies, music by Robert Sauer, lyrics by Mary Hale Woolsey, 1929; performed by The Sons of the Pioneers, from their album Sing Campfire Favorites, RCA Records 1967 Box Office at SUNY Fredonia Lake Shore Center for the Arts Main Street Studios Ticket Website SUNY Fredonia School of Music Events WCVF Fredonia WRFA Jamestown The 1891 Run for the Opera House, April 2026 BECOME AN OPERA HOUSE MEMBER!
It's my last radio show, and I am overjoyed and grateful that I get to spend it the way it began, with my friend and show co-founder Shana Falana. To formally reintroduce you, she is a songwriter, performer, and community architect originally from San Francisco, based in the Hudson Valley since 2008. For over 25 years, Shana has merged music and public service—bringing 12-step meetings into jails and institutions throughout Ulster County, helping coordinate the early years of O+ Festival, and co-founding the I Want What SHE Has radio show on Radio Kingston. She is the founder and creative force behind The Goddess Party, a performance collective uplifting women through music, ritual, and radical joy, with sold-out shows at Opus 40, Old Dutch Church, and Basilica Hudson. Her artistic and social practice centers on amplifying women in their perimenopausal and menopausal years, increasing the visibility of aging women on stage and reshaping cultural narratives about power, beauty, and relevance after fifty. She is the creator and showrunner of a scripted television series inspired by The Goddess Party, expanding its story from live performance into narrative television. Shana is someone who as a friend I've witnessed move through the life cycle of different projects and life experiences with apparent ease. I'm someone who struggles to let go and perhaps holds on a bit too long, but she talks about what endings are like for her and how she navigates them. We get to hear about the beginning and evolution of The Goddess Party, the challenges, highlights, and what's to come. As much of her current work relates to thriving in an older woman's body, Shana shares her experience of navigating illness and perimenopause and offers many resources that have supported her along the way, always following her intuition. Here are the books Shana mentioned - Wise Power, Hagitude, Mother Hunger You can find her here ->> instagram / spotify / apple music / And stay tuned to The Goddess Party's Instagram account for more details about the upcoming March 27th benefit concert they are participating in at Levon Helm Studios. My previous show on The Goddess Party can be found here! Today's show was engineered by Ian Seda from Radiokingston.org. Our show music is from Shana Falana! Feel free to email me, say hello: she@iwantwhatshehas.org ** Please: SUBSCRIBE to the pod and leave a REVIEW wherever you are listening, it helps other users FIND IT http://iwantwhatshehas.org/podcast ITUNES | SPOTIFY ITUNES: https://itunes.apple.com/us/podcast/i-want-what-she-has/id1451648361?mt=2 SPOTIFY:https://open.spotify.com/show/77pmJwS2q9vTywz7Uhiyff?si=G2eYCjLjT3KltgdfA6XXCA Follow: INSTAGRAM * https://www.instagram.com/iwantwhatshehaspodcast/ FACEBOOK * https://www.facebook.com/iwantwhatshehaspodcast
OpenClaw wrote our 475-page uBlox GNSS library overnight. Opus suggested packed structs over byte parsing... smart. Also found new L1+L5 GPS modules on DigiKey: ~2x accuracy for $2-3 more. No RTK needed.
There's been an update to Remote Labor Index (RLI), and it showed a "massive" 50% jump in AI Agent capability. However, it's worth noting that percentages can be deceiving. The data reveals a much more sobering reality that shouldn't come as a surprise to anyone actually doing the work. Despite the hype, the world's best AI model (Opus 4.5) still fails to successfully complete 96.25% real work. In summary, while the “velocity” of AI is skyrocketing, the absolute capability is still miles away from "replacement." So, while countless AI voices are claiming AI is coming for your job, the real crisis is of expectations, not employment.This week, I'm checking back in on the Q1 2026 RLI update and comparing the new colorful dashboard against the stark reality of the November benchmarks. This isn't a tech review but a leadership reality check. I explain why a 50% increase in capability (from 2.5% to 3.75%) is technically impressive but practically dangerous if you are building your strategy around it. I'm also stripping away the vendor sales pitches to show you why the "Agent" narrative is being driven by economic desperation, not technological readiness.My goal is to move you out of "Replacement Theory" to "Augmentation Agility" by exposing the specific blind spots threatening your P&L. The "Replacement" Illusion (Math vs. Myth): We've been told that fully autonomous agents are here, yet the data proves the "ceiling" is barely cracking 4%. I break down why the "Leaders" aren't firing their teams—they are auditing their workflows to find the 4% of grunt work AI can do, while doubling down on the 96% of human nuance it can't touch. The "Desperation" Trap (Vendor Economics): We love to believe the sales deck, but the financials tell a different story. I call out the uncomfortable truth that AI vendors are burning cash on compute costs, driving them to push "enterprise integration" before the product is actually ready. I explain why your budget shouldn't be their R&D fund. The "Sleeper" Insight (The Gemini Factor): You cannot judge a model by its snapshot; you have to judge it by its slope. I dive into the often-overlooked data on Gemini 3 Pro—which quietly posted a massive ~50% reliability jump—and why for Google Workspace users, this "sleeper" metric matters more than who holds the crown. The "Reliability" Pivot (Redefining Good): You cannot scale a tool that is brilliant once and broken twice. I share a specific consulting example of why we had to kill a "successful" pilot, and why the companies winning at AI are measuring "Autonomous Reliability" rather than "Creative Capability."By the end, I hope you see this data not as a reason to write off AI, but as a mandate for agility. You cannot simply "plug in" an agent to a rigid system; you have to build the flexible infrastructure that can adapt when that 3.75% inevitably hits 10%.⸻If this conversation helps you think more clearly about the future we're building, make sure to like, share, and subscribe. You can also support the show by buying me a coffee at https://buymeacoffee.com/christopherlindAnd if your organization is wrestling with how to lead responsibly in the AI era, balancing performance, technology, and people, that's the work I do every day through my consulting and coaching. Learn more at https://christopherlind.co⸻Chapters00:00 – The Hook: 50% Growth vs. Absolute Reality04:00 – The RLI Update: Opus 4.5 & The 96% Gap08:00 – The "Why": Context, Nuance, and Broken Instructions12:00 – The Trap: Why Vendors Are Desperate for Your Budget17:00 – The Velocity Insight: Gemini's 50% "Sleeper" Jump22:00 – The Agility Mandate: Building Flexible Systems26:00 – The "Lind" Take: Capability vs. Reliability (The Pilot Story)33:00 – The "Now What": 3 Surgical Moves for Leaders#RemoteLaborIndex #AIStrategy #FutureOfWork #DigitalTransformation #Leadership #ChristopherLind #FutureFocused #Opus #Gemini #AIAgents
Send a textInvest in pre-IPO stocks with AG Dillon & Co. Contact aaron.dillon@agdillon.com to learn more. Financial advisors only. www.agdillon.com00:00 - Intro00:02 - AG Dillon Funds closing on Mar 31, 202600:51 - OpenAI Financials $280B revenue target meets $665B cost wall03:58 - OpenAI “buys” OpenClaw, Steinberger joins OpenAI04:42 - OpenAI Series C aims to shatter records at $850B post money05:41 - OpenAI and Tata bet on India with a 100 MW to 1 GW buildout path06:29 - Grafana's $9B round talks ride a $400M ARR wave07:23 - World Labs lands Autodesk and targets a rumored $5B valuation08:18 - Temporal wants to be the load bearing layer for agent execution09:31 - Mesh Optical's $50M Series A targets the chokepoint inside AI data centers10:43 - Render's $1.5B valuation is a bet that AI apps need a new runtime11:40 - Stash acquired by Grab for $425M13:06 - Physical Superintelligence pitches a physics breakthrough factory with a 20 person team14:07 - Figma plugs Claude Code into design and risks losing the workflow15:00 - Anthropic ships Sonnet 4.6 just 12 days after Opus 4.615:26 - Stripe's Bridge wins OCC trust charter signal as stablecoin scrutiny rises16:37 - Cohere puts 70 plus languages on device with a 3.35B parameter model17:53 - ElevenLabs turns agent risk into an insurable product at $12.2B secondary19:05 - Mistral buys Koyeb and adds 16 engineers to harden its compute stack
Join Simtheory: https://simtheory.ai"Is This The End" now on Spotify: https://open.spotify.com/album/2Py1MyADUFqJFVUISI2VTP?si=oT3PWyJYRA2BspOmzT_ifgRegister for the STILL RELEVANT tour: https://simulationtheory.ai/16c0dationtheory.ai/16c0d1db-a8d0-4ac9-bae3-d25074589a80Two new models dropped this week — Gemini 3.1 Pro and Claude Sonnet 4.6 — and honestly? We're struggling to care. In this episode, we break down why Gemini went from being our daily driver to a model we barely touch, the "tunnel vision" hallucination problem that killed the Gemini 3 series for us, and whether 3.1 Pro actually fixes it. We put Gemini 3.1 Pro head-to-head against Claude Opus building a Geoffrey Hinton Doom Center, debate whether anyone can actually tell the difference between Sonnet 4.5 and 4.6, and make the case that smaller models running in agentic loops are secretly beating the frontiers. Plus: OpenAI acquires OpenClaw and we ask why a $100B company couldn't just build it themselves, DHH calls out the AI pricing bubble, Mike compares AI models to cheap wine hangovers, and Sam Altman refuses to hold Dario's hand at the India AI Summit. The model wars are getting weird.CHAPTERS:0:00 Intro & "Is This The End" Now on Spotify1:10 Gemini 3.1 Pro: Thinking Controls & The Medium Mode Fix3:14 The Speed vs Intelligence Trade-Off in Agentic Work5:10 Why Multitasking With AI Agents Made Us Anxious6:34 Solid Updates: The Real Goal of Agentic Coding7:45 Gemini's Fall From Grace: From Daily Driver to Dead Model10:08 The Tunnel Vision Problem That Killed Gemini 313:35 Mixed Reactions: Fanboys vs Reality on Gemini 3.1 Pro15:06 Side-by-Side Test: Gemini 3.1 Pro vs Claude Opus (Hinton Doom Center)17:39 Why File Manipulation Accuracy Matters More Than Context Windows19:27 The Context Window Debate: 1M Tokens vs Smart Sub-Agents22:05 DHH on Token Pricing: "If There's a Bubble, It's This"24:11 Should Models Ship as Agent vs Chat Variants?28:43 Claude Sonnet 4.6: A $2 Discount on Opus?31:44 The Model Mix: Why One Model Won't Rule Them All34:40 Anthropic Is Winning — But Can Anyone Tell the Difference?38:58 OpenAI Acquires OpenClaw: Why Couldn't They Just Build It?44:18 The Silicon Valley Moment: Sam vs Dario at India AI Summit47:05 Will Smaller Models Win the Enterprise? The Cost Reality Check51:27 The End of Single-Shot: Why Agentic Loops Change Everything55:48 Final Thoughts & Gemini 3.1 Pro Gets One More WeekThanks for listening. Like & Sub. Links above for the Still Relevant Tour signup and Simtheory. Two models dropped on a week again. What a time to be alive. xoxo
In this episode of the Freedom Scientific Training Podcast, Liz and Rachel continue the Learn AI webinar series with a deep dive into Claude, an AI assistant developed by Anthropic and available at Claude. Designed for users of JAWS, ZoomText, and Fusion, this session explores how Claude can support long-form thinking, research, organization, and real-world problem solving — all with accessibility in mind. Liz walks through navigating the Claude web interface with the keyboard, managing multiple prompts, combining tasks into streamlined workflows, and building repeatable processes using features like Skills and model selection (Opus, Sonnet, and Haiku). Through a practical example, she demonstrates how to research Bluetooth headsets under $200, generate comparison tables, outline a guide, and refine results before exporting a final document. Rachel then shifts to a hands-on accessibility scenario, uploading an image of a stove control panel and prompting Claude to suggest tactile labeling strategies for blind and low vision users. She demonstrates file uploads, prompt refinement, material recommendations, and even generating a visual mockup — highlighting how AI can assist with everyday decision-making and creative problem solving. Throughout the webinar, you'll learn: How to navigate Claude efficiently with JAWS Tips for structuring effective prompts How to combine research, comparison, and writing tasks When and how to refine responses before creating downloadable documents Creative ways to use AI for visual analysis and practical accessibility projects Whether you're experimenting with AI for the first time or looking to build more advanced workflows, this episode shows how Claude can help you move from idea to execution — all while maintaining accessibility and control over the process. For more webinars, tutorials, and training resources, visit: FreedomScientific.com/Training
You probably know by now that AI is the definition of mediocre. As in: it's the average of everything it's been trained on. So how do you get beyond average? How do you build a moat? It certainly doesn't seem to be via the models. While there are models of the month (hey, Opus 4.6, my new friend!), they seem to be pretty swappable. So, the model ain't it. But proprietary data (e.g. an AI that knows you really well), yes! Or doing something really hard in the real world (think: Waymo self-driving cars). Maybe via trust and safety (Anthropic is certainly making a play here). Or... how about via amazing design and good taste. Remember when ChatGPT first came out and everyone derided “AI wrappers”… well, maybe a wrapper isn't so bad, assuming you can differentiate on one or more of the above. Luke Des Cotes is the CEO of MetaLab, the agency famous for designing interfaces, including early versions of Slack and Coinbase, so don't be shocked when you hear him say that great design can be your moat. MetaLab is working with a host of AI companies (another shocker), including Windsurf (AI + code), Suno (AI + music), Pika (AI + video), and more…, which is why Luke's take on AI surprised me. He's not rah rah. He's pretty judicious actually. Luke has questions about AI's costs and appropriateness for lots of use cases like those involving kids, but mostly he objects to its mediocrity.On this episode we discuss what it takes to go beyond.We also get into:Why vibe-coded software isn't changing the world anytime soonWhy Shopify acquired a design agency right after telling employees to justify their existence against AIHow MetaLab designers are using AI to prototype in hours instead of weeksThe talent market for zero-to-one designers — and why they're harder to find than everLandlines, brick phones, and how parents are fighting back against always-on kidsChapters(01:10) - "It's a race to the mean" (03:10) - "How do you create emotional resonance?" (05:33) - AI companies are burning money (08:44) - Speed to good enough (13:51) - Is the chat here to stay or a temporary fad? (17:43) - It's hard to find great 0 to 1 design talent (22:28) - Seemingly conscious AI (25:05) - Kids, landlines, and fighting always-on culture (27:21) - Sounds like science fiction, but is here now… Links & ResourcesLuke Des Cotes on LinkedInMetaLabSupport Future Around & Find OutGet the free newsletterAnd consider becoming a paid subscriber and help future proof this thing!Sponsor the show? Are you looking to reach an audience of senior technologists and decision-makers? Email me: dan@modernproductminds.com
CoreStory is building code intelligence platforms that address the fundamental limitation of today's coding agents: their inability to navigate complex enterprise codebases. While foundation models excel at greenfield development, they fail at real-world engineering tasks in systems spanning millions of lines of code. CoreStory's context layer delivers a 44% improvement on SWE-bench, the industry's standard benchmark for measuring coding agent effectiveness on actual GitHub issues. In this episode of BUILDERS, I sat down with Anand Kulkarni, CEO of CoreStory, to explore how his team is enabling the shift to AI-native engineering and seeding the category of spec-driven development across Microsoft, GitHub, and Amazon. Topics Discussed: Building with GPT-3 API 18 months before ChatGPT went public Why even GPT-5 and Opus 4.5 struggle with enterprise codebases on SWE-bench The narrative shift required when selling AI pre- and post-ChatGPT CoreStory's 44% improvement in coding agent performance through context intelligence How "spec-driven development" got adopted by Microsoft, GitHub, and Amazon without formal analyst relations The parallel between JIRA monetizing Agile and CoreStory enabling AI-native engineering Three-channel distribution: direct enterprise, coding agent partnerships via MCP, and hyperscaler/GSI routes Why specs become the source of truth while code becomes disposable in the AI era GTM Lessons For B2B Founders: Match your narrative precision to technical depth: CoreStory deploys three distinct positioning strategies based on audience sophistication. For AI practitioners tracking benchmarks, they lead with "44% SWE-bench improvement"—a metric that immediately signals meaningful progress on the hardest problem in the space. For engineering leaders aware of AI tooling but not deep in the research, they focus on velocity gains and ROI metrics. For executives, they describe reverse-engineering codebases into machine-readable specs. The key insight: technical audiences dismiss vague value props, while non-technical audiences get lost in benchmark details. Map your positioning to how your audience measures success in their world. Seed category language through earned adoption, not manufactured consensus: Anand initially called their approach "requirements-driven development" before simplifying to "spec-driven development." Rather than pitching analysts, they used the term consistently in customer conversations, gave talks at GitHub Universe, and shipped demos showing the workflow. When customers naturally adopted the language and community leaders began using similar terminology independently, Microsoft and GitHub followed with their own implementations (like GitHub's SpecKit). The lesson: category language sticks when practitioners choose to use it because it clarifies their work, not because a vendor pushed it. Focus on customer adoption as proof of concept before seeking broader market validation. Position against emergent practices, not just incumbent products: CoreStory doesn't position against legacy code analysis tools—they position as the enabler of AI-native engineering, the discipline that will displace Agile. Anand's insight from watching JIRA's success: "People don't love JIRA. What they love is Agile as a way to move away from waterfall." CoreStory is betting that 10x velocity gains from AI-native practices will drive the same categorical shift. When you're early in a technology wave, attach to the practice change (how teams will work differently) rather than feature comparisons with existing tools. Movements create markets. Design channel strategy around customer problem awareness: CoreStory's three channels map to different stages of buyer sophistication. Direct enterprise comes from teams already deep in AI engineering who've hit the context limitation wall. Coding agent partnerships (via MCP integration with tools like Cognition and Factory) serve builders wanting better AI tooling who haven't diagnosed the context problem yet. Hyperscalers and GSIs distribute into modernization and maintenance projects where AI enablement is emerging as a requirement. Each channel serves a distinct buyer journey stage. Don't force one go-to-market motion—design multiple paths based on where different customer segments are in understanding the problem you solve. Navigate pre-legitimacy markets by hiding the breakthrough: Before ChatGPT, selling anything AI-driven faced immediate skepticism about whether it was "real" or just smoke and mirrors. Anand couldn't lead with AI without triggering disbelief. CoreStory focused on delivered outcomes—"here's what you'll be able to do"—with AI as the mechanism, not the message. Post-ChatGPT, the challenge flipped: everyone expects AI, but now the differentiation question becomes harder. If you're building on emerging technology before market consensus forms, deemphasize the technology until buyers have context to evaluate it. Once the market validates the technology category, shift to demonstrating your specific technical advantage within it. // Sponsors: Front Lines — We help B2B tech companies launch, manage, and grow podcasts that drive demand, awareness, and thought leadership. www.FrontLines.io The Global Talent Co. — We help tech startups find, vet, hire, pay, and retain amazing marketing talent that costs 50-70% less than the US & Europe. www.GlobalTalent.co // Don't Miss: New Podcast Series — How I Hire Senior GTM leaders share the tactical hiring frameworks they use to build winning revenue teams. Hosted by Andy Mowat, who scaled 4 unicorns from $10M to $100M+ ARR and launched Whispered to help executives find their next role. Subscribe here: https://open.spotify.com/show/53yCHlPfLSMFimtv0riPyM
Nick and Myron break down a chaotic week in wrestling where business rumors, TV shakeups, and major title implications are all colliding at once.What caught our eye on TVAJ gets a tribute, Punk/Balor face off, AEW title changes, Mickie James, return, & moreWrestleMania 42 ticket sales spark debate — overblown concern or real issue?WWE's NXT Atlanta cancelledCNN confirms WBD owns a minority stake in AEWMJF vs. Hangman is set for Revolution
Tickets for AIEi Miami and AIE Europe are live, with first wave speakers announced!From pioneering software-defined networking to backing many of the most aggressive AI model companies of this cycle, Martin Casado and Sarah Wang sit at the center of the capital, compute, and talent arms race reshaping the tech industry. As partners at a16z investing across infrastructure and growth, they've watched venture and growth blur, model labs turn dollars into capability at unprecedented speed, and startups raise nine-figure rounds before monetization.Martin and Sarah join us to unpack the new financing playbook for AI: why today's rounds are really compute contracts in disguise, how the “raise → train → ship → raise bigger” flywheel works, and whether foundation model companies can outspend the entire app ecosystem built on top of them. They also share what's underhyped (boring enterprise software), what's overheated (talent wars and compensation spirals), and the two radically different futures they see for AI's market structure.We discuss:* Martin's “two futures” fork: infinite fragmentation and new software categories vs. a small oligopoly of general models that consume everything above them* The capital flywheel: how model labs translate funding directly into capability gains, then into revenue growth measured in weeks, not years* Why venture and growth have merged: $100M–$1B hybrid rounds, strategic investors, compute negotiations, and complex deal structures* The AGI vs. product tension: allocating scarce GPUs between long-term research and near-term revenue flywheels* Whether frontier labs can out-raise and outspend the entire app ecosystem built on top of their APIs* Why today's talent wars ($10M+ comp packages, $B acqui-hires) are breaking early-stage founder math* Cursor as a case study: building up from the app layer while training down into your own models* Why “boring” enterprise software may be the most underinvested opportunity in the AI mania* Hardware and robotics: why the ChatGPT moment hasn't yet arrived for robots and what would need to change* World Labs and generative 3D: bringing the marginal cost of 3D scene creation down by orders of magnitude* Why public AI discourse is often wildly disconnected from boardroom reality and how founders should navigate the noiseShow Notes:* “Where Value Will Accrue in AI: Martin Casado & Sarah Wang” - a16z show* “Jack Altman & Martin Casado on the Future of Venture Capital”* World Labs—Martin Casado• LinkedIn: https://www.linkedin.com/in/martincasado/• X: https://x.com/martin_casadoSarah Wang• LinkedIn: https://www.linkedin.com/in/sarah-wang-59b96a7• X: https://x.com/sarahdingwanga16z• https://a16z.com/Timestamps00:00:00 – Intro: Live from a16z00:01:20 – The New AI Funding Model: Venture + Growth Collide00:03:19 – Circular Funding, Demand & “No Dark GPUs”00:05:24 – Infrastructure vs Apps: The Lines Blur00:06:24 – The Capital Flywheel: Raise → Train → Ship → Raise Bigger00:09:39 – Can Frontier Labs Outspend the Entire App Ecosystem?00:11:24 – Character AI & The AGI vs Product Dilemma00:14:39 – Talent Wars, $10M Engineers & Founder Anxiety00:17:33 – What's Underinvested? The Case for “Boring” Software00:19:29 – Robotics, Hardware & Why It's Hard to Win00:22:42 – Custom ASICs & The $1B Training Run Economics00:24:23 – American Dynamism, Geography & AI Power Centers00:26:48 – How AI Is Changing the Investor Workflow (Claude Cowork)00:29:12 – Two Futures of AI: Infinite Expansion or Oligopoly?00:32:48 – If You Can Raise More Than Your Ecosystem, You Win00:34:27 – Are All Tasks AGI-Complete? Coding as the Test Case00:38:55 – Cursor & The Power of the App Layer00:44:05 – World Labs, Spatial Intelligence & 3D Foundation Models00:47:20 – Thinking Machines, Founder Drama & Media Narratives00:52:30 – Where Long-Term Power Accrues in the AI StackTranscriptLatent.Space - Inside AI's $10B+ Capital Flywheel — Martin Casado & Sarah Wang of a16z[00:00:00] Welcome to Latent Space (Live from a16z) + Meet the Guests[00:00:00] Alessio: Hey everyone. Welcome to the Latent Space podcast, live from a 16 z. Uh, this is Alessio founder Kernel Lance, and I'm joined by Twix, editor of Latent Space.[00:00:08] swyx: Hey, hey, hey. Uh, and we're so glad to be on with you guys. Also a top AI podcast, uh, Martin Cado and Sarah Wang. Welcome, very[00:00:16] Martin Casado: happy to be here and welcome.[00:00:17] swyx: Yes, uh, we love this office. We love what you've done with the place. Uh, the new logo is everywhere now. It's, it's still getting, takes a while to get used to, but it reminds me of like sort of a callback to a more ambitious age, which I think is kind of[00:00:31] Martin Casado: definitely makes a statement.[00:00:33] swyx: Yeah.[00:00:34] Martin Casado: Not quite sure what that statement is, but it makes a statement.[00:00:37] swyx: Uh, Martin, I go back with you to Netlify.[00:00:40] Martin Casado: Yep.[00:00:40] swyx: Uh, and, uh, you know, you create a software defined networking and all, all that stuff people can read up on your background. Yep. Sarah, I'm newer to you. Uh, you, you sort of started working together on AI infrastructure stuff.[00:00:51] Sarah Wang: That's right. Yeah. Seven, seven years ago now.[00:00:53] Martin Casado: Best growth investor in the entire industry.[00:00:55] swyx: Oh, say[00:00:56] Martin Casado: more hands down there is, there is. [00:01:00] I mean, when it comes to AI companies, Sarah, I think has done the most kind of aggressive, um, investment thesis around AI models, right? So, worked for Nom Ja, Mira Ia, FEI Fey, and so just these frontier, kind of like large AI models.[00:01:15] I think, you know, Sarah's been the, the broadest investor. Is that fair?[00:01:20] Venture vs. Growth in the Frontier Model Era[00:01:20] Sarah Wang: No, I, well, I was gonna say, I think it's been a really interesting tag, tag team actually just ‘cause the, a lot of these big C deals, not only are they raising a lot of money, um, it's still a tech founder bet, which obviously is inherently early stage.[00:01:33] But the resources,[00:01:36] Martin Casado: so many, I[00:01:36] Sarah Wang: was gonna say the resources one, they just grow really quickly. But then two, the resources that they need day one are kind of growth scale. So I, the hybrid tag team that we have is. Quite effective, I think,[00:01:46] Martin Casado: what is growth these days? You know, you don't wake up if it's less than a billion or like, it's, it's actually, it's actually very like, like no, it's a very interesting time in investing because like, you know, take like the character around, right?[00:01:59] These tend to [00:02:00] be like pre monetization, but the dollars are large enough that you need to have a larger fund and the analysis. You know, because you've got lots of users. ‘cause this stuff has such high demand requires, you know, more of a number sophistication. And so most of these deals, whether it's US or other firms on these large model companies, are like this hybrid between venture growth.[00:02:18] Sarah Wang: Yeah. Total. And I think, you know, stuff like BD for example, you wouldn't usually need BD when you were seed stage trying to get market biz Devrel. Biz Devrel, exactly. Okay. But like now, sorry, I'm,[00:02:27] swyx: I'm not familiar. What, what, what does biz Devrel mean for a venture fund? Because I know what biz Devrel means for a company.[00:02:31] Sarah Wang: Yeah.[00:02:32] Compute Deals, Strategics, and the ‘Circular Funding' Question[00:02:32] Sarah Wang: You know, so a, a good example is, I mean, we talk about buying compute, but there's a huge negotiation involved there in terms of, okay, do you get equity for the compute? What, what sort of partner are you looking at? Is there a go-to market arm to that? Um, and these are just things on this scale, hundreds of millions, you know, maybe.[00:02:50] Six months into the inception of a company, you just wouldn't have to negotiate these deals before.[00:02:54] Martin Casado: Yeah. These large rounds are very complex now. Like in the past, if you did a series A [00:03:00] or a series B, like whatever, you're writing a 20 to a $60 million check and you call it a day. Now you normally have financial investors and strategic investors, and then the strategic portion always still goes with like these kind of large compute contracts, which can take months to do.[00:03:13] And so it's, it's very different ties. I've been doing this for 10 years. It's the, I've never seen anything like this.[00:03:19] swyx: Yeah. Do you have worries about the circular funding from so disease strategics?[00:03:24] Martin Casado: I mean, listen, as long as the demand is there, like the demand is there. Like the problem with the internet is the demand wasn't there.[00:03:29] swyx: Exactly. All right. This, this is like the, the whole pyramid scheme bubble thing, where like, as long as you mark to market on like the notional value of like, these deals, fine, but like once it starts to chip away, it really Well[00:03:41] Martin Casado: no, like as, as, as, as long as there's demand. I mean, you know, this, this is like a lot of these sound bites have already become kind of cliches, but they're worth saying it.[00:03:47] Right? Like during the internet days, like we were. Um, raising money to put fiber in the ground that wasn't used. And that's a problem, right? Because now you actually have a supply overhang.[00:03:58] swyx: Mm-hmm.[00:03:59] Martin Casado: And even in the, [00:04:00] the time of the, the internet, like the supply and, and bandwidth overhang, even as massive as it was in, as massive as the crash was only lasted about four years.[00:04:09] But we don't have a supply overhang. Like there's no dark GPUs, right? I mean, and so, you know, circular or not, I mean, you know, if, if someone invests in a company that, um. You know, they'll actually use the GPUs. And on the other side of it is the, is the ask for customer. So I I, I think it's a different time.[00:04:25] Sarah Wang: I think the other piece, maybe just to add onto this, and I'm gonna quote Martine in front of him, but this is probably also a unique time in that. For the first time, you can actually trace dollars to outcomes. Yeah, right. Provided that scaling laws are, are holding, um, and capabilities are actually moving forward.[00:04:40] Because if you can put translate dollars into capabilities, uh, a capability improvement, there's demand there to martine's point. But if that somehow breaks, you know, obviously that's an important assumption in this whole thing to make it work. But you know, instead of investing dollars into sales and marketing, you're, you're investing into r and d to get to the capability, um, you know, increase.[00:04:59] And [00:05:00] that's sort of been the demand driver because. Once there's an unlock there, people are willing to pay for it.[00:05:05] Alessio: Yeah.[00:05:06] Blurring Lines: Models as Infra + Apps, and the New Fundraising Flywheel[00:05:06] Alessio: Is there any difference in how you built the portfolio now that some of your growth companies are, like the infrastructure of the early stage companies, like, you know, OpenAI is now the same size as some of the cloud providers were early on.[00:05:16] Like what does that look like? Like how much information can you feed off each other between the, the two?[00:05:24] Martin Casado: There's so many lines that are being crossed right now, or blurred. Right. So we already talked about venture and growth. Another one that's being blurred is between infrastructure and apps, right? So like what is a model company?[00:05:35] Mm-hmm. Like, it's clearly infrastructure, right? Because it's like, you know, it's doing kind of core r and d. It's a horizontal platform, but it's also an app because it's um, uh, touches the users directly. And then of course. You know, the, the, the growth of these is just so high. And so I actually think you're just starting to see a, a, a new financing strategy emerge and, you know, we've had to adapt as a result of that.[00:05:59] And [00:06:00] so there's been a lot of changes. Um, you're right that these companies become platform companies very quickly. You've got ecosystem build out. So none of this is necessarily new, but the timescales of which it's happened is pretty phenomenal. And the way we'd normally cut lines before is blurred a little bit, but.[00:06:16] But that, that, that said, I mean, a lot of it also just does feel like things that we've seen in the past, like cloud build out the internet build out as well.[00:06:24] Sarah Wang: Yeah. Um, yeah, I think it's interesting, uh, I don't know if you guys would agree with this, but it feels like the emerging strategy is, and this builds off of your other question, um.[00:06:33] You raise money for compute, you pour that or you, you pour the money into compute, you get some sort of breakthrough. You funnel the breakthrough into your vertically integrated application. That could be chat GBT, that could be cloud code, you know, whatever it is. You massively gain share and get users.[00:06:49] Maybe you're even subsidizing at that point. Um, depending on your strategy. You raise money at the peak momentum and then you repeat, rinse and repeat. Um, and so. And that wasn't [00:07:00] true even two years ago, I think. Mm-hmm. And so it's sort of to your, just tying it to fundraising strategy, right? There's a, and hiring strategy.[00:07:07] All of these are tied, I think the lines are blurring even more today where everyone is, and they, but of course these companies all have API businesses and so they're these, these frenemy lines that are getting blurred in that a lot of, I mean, they have billions of dollars of API revenue, right? And so there are customers there.[00:07:23] But they're competing on the app layer.[00:07:24] Martin Casado: Yeah. So this is a really, really important point. So I, I would say for sure, venture and growth, that line is blurry app and infrastructure. That line is blurry. Um, but I don't think that that changes our practice so much. But like where the very open questions are like, does this layer in the same way.[00:07:43] Compute traditionally has like during the cloud is like, you know, like whatever, somebody wins one layer, but then another whole set of companies wins another layer. But that might not, might not be the case here. It may be the case that you actually can't verticalize on the token string. Like you can't build an app like it, it necessarily goes down just because there are no [00:08:00] abstractions.[00:08:00] So those are kinda the bigger existential questions we ask. Another thing that is very different this time than in the history of computer sciences is. In the past, if you raised money, then you basically had to wait for engineering to catch up. Which famously doesn't scale like the mythical mammoth. It take a very long time.[00:08:18] But like that's not the case here. Like a model company can raise money and drop a model in a, in a year, and it's better, right? And, and it does it with a team of 20 people or 10 people. So this type of like money entering a company and then producing something that has demand and growth right away and using that to raise more money is a very different capital flywheel than we've ever seen before.[00:08:39] And I think everybody's trying to understand what the consequences are. So I think it's less about like. Big companies and growth and this, and more about these more systemic questions that we actually don't have answers to.[00:08:49] Alessio: Yeah, like at Kernel Labs, one of our ideas is like if you had unlimited money to spend productively to turn tokens into products, like the whole early stage [00:09:00] market is very different because today you're investing X amount of capital to win a deal because of price structure and whatnot, and you're kind of pot committing.[00:09:07] Yeah. To a certain strategy for a certain amount of time. Yeah. But if you could like iteratively spin out companies and products and just throw, I, I wanna spend a million dollar of inference today and get a product out tomorrow.[00:09:18] swyx: Yeah.[00:09:19] Alessio: Like, we should get to the point where like the friction of like token to product is so low that you can do this and then you can change the Right, the early stage venture model to be much more iterative.[00:09:30] And then every round is like either 100 k of inference or like a hundred million from a 16 Z. There's no, there's no like $8 million C round anymore. Right.[00:09:38] When Frontier Labs Outspend the Entire App Ecosystem[00:09:38] Martin Casado: But, but, but, but there's a, there's a, the, an industry structural question that we don't know the answer to, which involves the frontier models, which is, let's take.[00:09:48] Anthropic it. Let's say Anthropic has a state-of-the-art model that has some large percentage of market share. And let's say that, uh, uh, uh, you know, uh, a company's building smaller models [00:10:00] that, you know, use the bigger model in the background, open 4.5, but they add value on top of that. Now, if Anthropic can raise three times more.[00:10:10] Every subsequent round, they probably can raise more money than the entire app ecosystem that's built on top of it. And if that's the case, they can expand beyond everything built on top of it. It's like imagine like a star that's just kind of expanding, so there could be a systemic. There could be a, a systemic situation where the soda models can raise so much money that they can out pay anybody that bills on top of ‘em, which would be something I don't think we've ever seen before just because we were so bottlenecked in engineering, and this is a very open question.[00:10:41] swyx: Yeah. It's, it is almost like bitter lesson applied to the startup industry.[00:10:45] Martin Casado: Yeah, a hundred percent. It literally becomes an issue of like raise capital, turn that directly into growth. Use that to raise three times more. Exactly. And if you can keep doing that, you literally can outspend any company that's built the, not any company.[00:10:57] You can outspend the aggregate of companies on top of [00:11:00] you and therefore you'll necessarily take their share, which is crazy.[00:11:02] swyx: Would you say that kind of happens in character? Is that the, the sort of postmortem on. What happened?[00:11:10] Sarah Wang: Um,[00:11:10] Martin Casado: no.[00:11:12] Sarah Wang: Yeah, because I think so,[00:11:13] swyx: I mean the actual postmortem is, he wanted to go back to Google.[00:11:15] Exactly. But like[00:11:18] Martin Casado: that's another difference that[00:11:19] Sarah Wang: you said[00:11:21] Martin Casado: it. We should talk, we should actually talk about that.[00:11:22] swyx: Yeah,[00:11:22] Sarah Wang: that's[00:11:23] swyx: Go for it. Take it. Take,[00:11:23] Sarah Wang: yeah.[00:11:24] Character.AI, Founder Goals (AGI vs Product), and GPU Allocation Tradeoffs[00:11:24] Sarah Wang: I was gonna say, I think, um. The, the, the character thing raises actually a different issue, which actually the Frontier Labs will face as well. So we'll see how they handle it.[00:11:34] But, um, so we invest in character in January, 2023, which feels like eons ago, I mean, three years ago. Feels like lifetimes ago. But, um, and then they, uh, did the IP licensing deal with Google in August, 2020. Uh, four. And so, um, you know, at the time, no, you know, he's talked publicly about this, right? He wanted to Google wouldn't let him put out products in the world.[00:11:56] That's obviously changed drastically. But, um, he went to go do [00:12:00] that. Um, but he had a product attached. The goal was, I mean, it's Nome Shair, he wanted to get to a GI. That was always his personal goal. But, you know, I think through collecting data, right, and this sort of very human use case, that the character product.[00:12:13] Originally was and still is, um, was one of the vehicles to do that. Um, I think the real reason that, you know. I if you think about the, the stress that any company feels before, um, you ultimately going one way or the other is sort of this a GI versus product. Um, and I think a lot of the big, I think, you know, opening eyes, feeling that, um, anthropic if they haven't started, you know, felt it, certainly given the success of their products, they may start to feel that soon.[00:12:39] And the real. I think there's real trade-offs, right? It's like how many, when you think about GPUs, that's a limited resource. Where do you allocate the GPUs? Is it toward the product? Is it toward new re research? Right? Is it, or long-term research, is it toward, um, n you know, near to midterm research? And so, um, in a case where you're resource constrained, um, [00:13:00] of course there's this fundraising game you can play, right?[00:13:01] But the fund, the market was very different back in 2023 too. Um. I think the best researchers in the world have this dilemma of, okay, I wanna go all in on a GI, but it's the product usage revenue flywheel that keeps the revenue in the house to power all the GPUs to get to a GI. And so it does make, um, you know, I think it sets up an interesting dilemma for any startup that has trouble raising up until that level, right?[00:13:27] And certainly if you don't have that progress, you can't continue this fly, you know, fundraising flywheel.[00:13:32] Martin Casado: I would say that because, ‘cause we're keeping track of all of the things that are different, right? Like, you know, venture growth and uh, app infra and one of the ones is definitely the personalities of the founders.[00:13:45] It's just very different this time I've been. Been doing this for a decade and I've been doing startups for 20 years. And so, um, I mean a lot of people start this to do a GI and we've never had like a unified North star that I recall in the same [00:14:00] way. Like people built companies to start companies in the past.[00:14:02] Like that was what it was. Like I would create an internet company, I would create infrastructure company, like it's kind of more engineering builders and this is kind of a different. You know, mentality. And some companies have harnessed that incredibly well because their direction is so obviously on the path to what somebody would consider a GI, but others have not.[00:14:20] And so like there is always this tension with personnel. And so I think we're seeing more kind of founder movement.[00:14:27] Sarah Wang: Yeah.[00:14:27] Martin Casado: You know, as a fraction of founders than we've ever seen. I mean, maybe since like, I don't know the time of like Shockly and the trade DUR aid or something like that. Way back in the beginning of the industry, I, it's a very, very.[00:14:38] Unusual time of personnel.[00:14:39] Sarah Wang: Totally.[00:14:40] Talent Wars, Mega-Comp, and the Rise of Acquihire M&A[00:14:40] Sarah Wang: And it, I think it's exacerbated by the fact that talent wars, I mean, every industry has talent wars, but not at this magnitude, right? No. Yeah. Very rarely can you see someone get poached for $5 billion. That's hard to compete with. And then secondly, if you're a founder in ai, you could fart and it would be on the front page of, you know, the information these days.[00:14:59] And so there's [00:15:00] sort of this fishbowl effect that I think adds to the deep anxiety that, that these AI founders are feeling.[00:15:06] Martin Casado: Hmm.[00:15:06] swyx: Uh, yes. I mean, just on, uh, briefly comment on the founder, uh, the sort of. Talent wars thing. I feel like 2025 was just like a blip. Like I, I don't know if we'll see that again.[00:15:17] ‘cause meta built the team. Like, I don't know if, I think, I think they're kind of done and like, who's gonna pay more than meta? I, I don't know.[00:15:23] Martin Casado: I, I agree. So it feels so, it feel, it feels this way to me too. It's like, it is like, basically Zuckerberg kind of came out swinging and then now he's kind of back to building.[00:15:30] Yeah,[00:15:31] swyx: yeah. You know, you gotta like pay up to like assemble team to rush the job, whatever. But then now, now you like you, you made your choices and now they got a ship.[00:15:38] Martin Casado: I mean, the, the o other side of that is like, you know, like we're, we're actually in the job hiring market. We've got 600 people here. I hire all the time.[00:15:44] I've got three open recs if anybody's interested, that's listening to this for investor. Yeah, on, on the team, like on the investing side of the team, like, and, um, a lot of the people we talk to have acting, you know, active, um, offers for 10 million a year or something like that. And like, you know, and we pay really, [00:16:00] really well.[00:16:00] And just to see what's out on the market is really, is really remarkable. And so I would just say it's actually, so you're right, like the really flashy one, like I will get someone for, you know, a billion dollars, but like the inflated, um, uh, trickles down. Yeah, it is still very active today. I mean,[00:16:18] Sarah Wang: yeah, you could be an L five and get an offer in the tens of millions.[00:16:22] Okay. Yeah. Easily. Yeah. It's so I think you're right that it felt like a blip. I hope you're right. Um, but I think it's been, the steady state is now, I think got pulled up. Yeah. Yeah. I'll pull up for[00:16:31] Martin Casado: sure. Yeah.[00:16:32] Alessio: Yeah. And I think that's breaking the early stage founder math too. I think before a lot of people would be like, well, maybe I should just go be a founder instead of like getting paid.[00:16:39] Yeah. 800 KA million at Google. But if I'm getting paid. Five, 6 million. That's different but[00:16:45] Martin Casado: on. But on the other hand, there's more strategic money than we've ever seen historically, right? Mm-hmm. And so, yep. The economics, the, the, the, the calculus on the economics is very different in a number of ways. And, uh, it's crazy.[00:16:58] It's cra it's causing like a, [00:17:00] a, a, a ton of change in confusion in the market. Some very positive, sub negative, like, so for example, the other side of the, um. The co-founder, like, um, acquisition, you know, mark Zuckerberg poaching someone for a lot of money is like, we were actually seeing historic amount of m and a for basically acquihires, right?[00:17:20] That you like, you know, really good outcomes from a venture perspective that are effective acquihires, right? So I would say it's probably net positive from the investment standpoint, even though it seems from the headlines to be very disruptive in a negative way.[00:17:33] Alessio: Yeah.[00:17:33] What's Underfunded: Boring Software, Robotics Skepticism, and Custom Silicon Economics[00:17:33] Alessio: Um, let's talk maybe about what's not being invested in, like maybe some interesting ideas that you would see more people build or it, it seems in a way, you know, as ycs getting more popular, it's like access getting more popular.[00:17:47] There's a startup school path that a lot of founders take and they know what's hot in the VC circles and they know what gets funded. Uh, and there's maybe not as much risk appetite for. Things outside of that. Um, I'm curious if you feel [00:18:00] like that's true and what are maybe, uh, some of the areas, uh, that you think are under discussed?[00:18:06] Martin Casado: I mean, I actually think that we've taken our eye off the ball in a lot of like, just traditional, you know, software companies. Um, so like, I mean. You know, I think right now there's almost a barbell, like you're like the hot thing on X, you're deep tech.[00:18:21] swyx: Mm-hmm.[00:18:22] Martin Casado: Right. But I, you know, I feel like there's just kind of a long, you know, list of like good.[00:18:28] Good companies that will be around for a long time in very large markets. Say you're building a database, you know, say you're building, um, you know, kind of monitoring or logging or tooling or whatever. There's some good companies out there right now, but like, they have a really hard time getting, um, the attention of investors.[00:18:43] And it's almost become a meme, right? Which is like, if you're not basically growing from zero to a hundred in a year, you're not interesting, which is just, is the silliest thing to say. I mean, think of yourself as like an introvert person, like, like your personal money, right? Mm-hmm. So. Your personal money, will you put it in the stock market at 7% or you put it in this company growing five x in a very large [00:19:00] market?[00:19:00] Of course you can put it in the company five x. So it's just like we say these stupid things, like if you're not going from zero to a hundred, but like those, like who knows what the margins of those are mean. Clearly these are good investments. True for anybody, right? True. Like our LPs want whatever.[00:19:12] Three x net over, you know, the life cycle of a fund, right? So a, a company in a big market growing five X is a great investment. We'd, everybody would be happy with these returns, but we've got this kind of mania on these, these strong growths. And so I would say that that's probably the most underinvested sector.[00:19:28] Right now.[00:19:29] swyx: Boring software, boring enterprise software.[00:19:31] Martin Casado: Traditional. Really good company.[00:19:33] swyx: No, no AI here.[00:19:34] Martin Casado: No. Like boring. Well, well, the AI of course is pulling them into use cases. Yeah, but that's not what they're, they're not on the token path, right? Yeah. Let's just say that like they're software, but they're not on the token path.[00:19:41] Like these are like they're great investments from any definition except for like random VC on Twitter saying VC on x, saying like, it's not growing fast enough. What do you[00:19:52] Sarah Wang: think? Yeah, maybe I'll answer a slightly different. Question, but adjacent to what you asked, um, which is maybe an area that we're not, uh, investing [00:20:00] right now that I think is a question and we're spending a lot of time in regardless of whether we pull the trigger or not.[00:20:05] Um, and it would probably be on the hardware side, actually. Robotics, right? And the robotics side. Robotics. Right. Which is, it's, I don't wanna say that it's not getting funding ‘cause it's clearly, uh, it's, it's sort of non-consensus to almost not invest in robotics at this point. But, um, we spent a lot of time in that space and I think for us, we just haven't seen the chat GPT moment.[00:20:22] Happen on the hardware side. Um, and the funding going into it feels like it's already. Taking that for granted.[00:20:30] Martin Casado: Yeah. Yeah. But we also went through the drone, you know, um, there's a zip line right, right out there. What's that? Oh yeah, there's a zip line. Yeah. What the drone, what the av And like one of the takeaways is when it comes to hardware, um, most companies will end up verticalizing.[00:20:46] Like if you're. If you're investing in a robot company for an A for agriculture, you're investing in an ag company. ‘cause that's the competition and that's surprising. And that's supply chain. And if you're doing it for mining, that's mining. And so the ad team does a lot of that type of stuff ‘cause they actually set up to [00:21:00] diligence that type of work.[00:21:01] But for like horizontal technology investing, there's very little when it comes to robots just because it's so fit for, for purpose. And so we kinda like to look at software. Solutions or horizontal solutions like applied intuition. Clearly from the AV wave deep map, clearly from the AV wave, I would say scale AI was actually a horizontal one for That's fair, you know, for robotics early on.[00:21:23] And so that sort of thing we're very, very interested. But the actual like robot interacting with the world is probably better for different team. Agree.[00:21:30] Alessio: Yeah, I'm curious who these teams are supposed to be that invest in them. I feel like everybody's like, yeah, robotics, it's important and like people should invest in it.[00:21:38] But then when you look at like the numbers, like the capital requirements early on versus like the moment of, okay, this is actually gonna work. Let's keep investing. That seems really hard to predict in a way that is not,[00:21:49] Martin Casado: I think co, CO two, kla, gc, I mean these are all invested in in Harvard companies. He just, you know, and [00:22:00] listen, I mean, it could work this time for sure.[00:22:01] Right? I mean if Elon's doing it, he's like, right. Just, just the fact that Elon's doing it means that there's gonna be a lot of capital and a lot of attempts for a long period of time. So that alone maybe suggests that we should just be investing in robotics just ‘cause you have this North star who's Elon with a humanoid and that's gonna like basically willing into being an industry.[00:22:17] Um, but we've just historically found like. We're a huge believer that this is gonna happen. We just don't feel like we're in a good position to diligence these things. ‘cause again, robotics companies tend to be vertical. You really have to understand the market they're being sold into. Like that's like that competitive equilibrium with a human being is what's important.[00:22:34] It's not like the core tech and like we're kind of more horizontal core tech type investors. And this is Sarah and I. Yeah, the ad team is different. They can actually do these types of things.[00:22:42] swyx: Uh, just to clarify, AD stands for[00:22:44] Martin Casado: American Dynamism.[00:22:45] swyx: Alright. Okay. Yeah, yeah, yeah. Uh, I actually, I do have a related question that, first of all, I wanna acknowledge also just on the, on the chip side.[00:22:51] Yeah. I, I recall a podcast that where you were on, i, I, I think it was the a CC podcast, uh, about two or three years ago where you, where you suddenly said [00:23:00] something, which really stuck in my head about how at some point, at some point kind of scale it makes sense to. Build a custom aic Yes. For per run.[00:23:07] Martin Casado: Yes.[00:23:07] It's crazy. Yeah.[00:23:09] swyx: We're here and I think you, you estimated 500 billion, uh, something.[00:23:12] Martin Casado: No, no, no. A billion, a billion dollar training run of $1 billion training run. It makes sense to actually do a custom meic if you can do it in time. The question now is timelines. Yeah, but not money because just, just, just rough math.[00:23:22] If it's a billion dollar training. Then the inference for that model has to be over a billion, otherwise it won't be solvent. So let's assume it's, if you could save 20%, which you could save much more than that with an ASIC 20%, that's $200 million. You can tape out a chip for $200 million. Right? So now you can literally like justify economically, not timeline wise.[00:23:41] That's a different issue. An ASIC per model, which[00:23:44] swyx: is because that, that's how much we leave on the table every single time. We, we, we do like generic Nvidia.[00:23:48] Martin Casado: Exactly. Exactly. No, it, it is actually much more than that. You could probably get, you know, a factor of two, which would be 500 million.[00:23:54] swyx: Typical MFU would be like 50.[00:23:55] Yeah, yeah. And that's good.[00:23:57] Martin Casado: Exactly. Yeah. Hundred[00:23:57] swyx: percent. Um, so, so, yeah, and I mean, and I [00:24:00] just wanna acknowledge like, here we are in, in, in 2025 and opening eyes confirming like Broadcom and all the other like custom silicon deals, which is incredible. I, I think that, uh, you know, speaking about ad there's, there's a really like interesting tie in that obviously you guys are hit on, which is like these sort, this sort of like America first movement or like sort of re industrialized here.[00:24:17] Yeah. Uh, move TSMC here, if that's possible. Um, how much overlap is there from ad[00:24:23] Martin Casado: Yeah.[00:24:23] swyx: To, I guess, growth and, uh, investing in particularly like, you know, US AI companies that are strongly bounded by their compute.[00:24:32] Martin Casado: Yeah. Yeah. So I mean, I, I would view, I would view AD as more as a market segmentation than like a mission, right?[00:24:37] So the market segmentation is, it has kind of regulatory compliance issues or government, you know, sale or it deals with like hardware. I mean, they're just set up to, to, to, to, to. To diligence those types of companies. So it's a more of a market segmentation thing. I would say the entire firm. You know, which has been since it is been intercepted, you know, has geographical biases, right?[00:24:58] I mean, for the longest time we're like, you [00:25:00] know, bay Area is gonna be like, great, where the majority of the dollars go. Yeah. And, and listen, there, there's actually a lot of compounding effects for having a geographic bias. Right. You know, everybody's in the same place. You've got an ecosystem, you're there, you've got presence, you've got a network.[00:25:12] Um, and, uh, I mean, I would say the Bay area's very much back. You know, like I, I remember during pre COVID, like it was like almost Crypto had kind of. Pulled startups away. Miami from the Bay Area. Miami, yeah. Yeah. New York was, you know, because it's so close to finance, came up like Los Angeles had a moment ‘cause it was so close to consumer, but now it's kind of come back here.[00:25:29] And so I would say, you know, we tend to be very Bay area focused historically, even though of course we've asked all over the world. And then I would say like, if you take the ring out, you know, one more, it's gonna be the US of course, because we know it very well. And then one more is gonna be getting us and its allies and Yeah.[00:25:44] And it goes from there.[00:25:45] Sarah Wang: Yeah,[00:25:45] Martin Casado: sorry.[00:25:46] Sarah Wang: No, no. I agree. I think from a, but I think from the intern that that's sort of like where the companies are headquartered. Maybe your questions on supply chain and customer base. Uh, I, I would say our customers are, are, our companies are fairly international from that perspective.[00:25:59] Like they're selling [00:26:00] globally, right? They have global supply chains in some cases.[00:26:03] Martin Casado: I would say also the stickiness is very different.[00:26:05] Sarah Wang: Yeah.[00:26:05] Martin Casado: Historically between venture and growth, like there's so much company building in venture, so much so like hiring the next PM. Introducing the customer, like all of that stuff.[00:26:15] Like of course we're just gonna be stronger where we have our network and we've been doing business for 20 years. I've been in the Bay Area for 25 years, so clearly I'm just more effective here than I would be somewhere else. Um, where I think, I think for some of the later stage rounds, the companies don't need that much help.[00:26:30] They're already kind of pretty mature historically, so like they can kind of be everywhere. So there's kind of less of that stickiness. This is different in the AI time. I mean, Sarah is now the, uh, chief of staff of like half the AI companies in, uh, in the Bay Area right now. She's like, ops Ninja Biz, Devrel, BizOps.[00:26:48] swyx: Are, are you, are you finding much AI automation in your work? Like what, what is your stack.[00:26:53] Sarah Wang: Oh my, in my personal stack.[00:26:54] swyx: I mean, because like, uh, by the way, it's the, the, the reason for this is it is triggering, uh, yeah. We, like, I'm hiring [00:27:00] ops, ops people. Um, a lot of ponders I know are also hiring ops people and I'm just, you know, it's opportunity Since you're, you're also like basically helping out with ops with a lot of companies.[00:27:09] What are people doing these days? Because it's still very manual as far as I can tell.[00:27:13] Sarah Wang: Hmm. Yeah. I think the things that we help with are pretty network based, um, in that. It's sort of like, Hey, how do do I shortcut this process? Well, let's connect you to the right person. So there's not quite an AI workflow for that.[00:27:26] I will say as a growth investor, Claude Cowork is pretty interesting. Yeah. Like for the first time, you can actually get one shot data analysis. Right. Which, you know, if you're gonna do a customer database, analyze a cohort retention, right? That's just stuff that you had to do by hand before. And our team, the other, it was like midnight and the three of us were playing with Claude Cowork.[00:27:47] We gave it a raw file. Boom. Perfectly accurate. We checked the numbers. It was amazing. That was my like, aha moment. That sounds so boring. But you know, that's, that's the kind of thing that a growth investor is like, [00:28:00] you know, slaving away on late at night. Um, done in a few seconds.[00:28:03] swyx: Yeah. You gotta wonder what the whole, like, philanthropic labs, which is like their new sort of products studio.[00:28:10] Yeah. What would that be worth as an independent, uh, startup? You know, like a[00:28:14] Martin Casado: lot.[00:28:14] Sarah Wang: Yeah, true.[00:28:16] swyx: Yeah. You[00:28:16] Martin Casado: gotta hand it to them. They've been executing incredibly well.[00:28:19] swyx: Yeah. I, I mean, to me, like, you know, philanthropic, like building on cloud code, I think, uh, it makes sense to me the, the real. Um, pedal to the metal, whatever the, the, the phrase is, is when they start coming after consumer with, uh, against OpenAI and like that is like red alert at Open ai.[00:28:35] Oh, I[00:28:35] Martin Casado: think they've been pretty clear. They're enterprise focused.[00:28:37] swyx: They have been, but like they've been free. Here's[00:28:40] Martin Casado: care publicly,[00:28:40] swyx: it's enterprise focused. It's coding. Right. Yeah.[00:28:43] AI Labs vs Startups: Disruption, Undercutting & the Innovator's Dilemma[00:28:43] swyx: And then, and, but here's cloud, cloud, cowork, and, and here's like, well, we, uh, they, apparently they're running Instagram ads for Claudia.[00:28:50] I, on, you know, for, for people on, I get them all the time. Right. And so, like,[00:28:54] Martin Casado: uh,[00:28:54] swyx: it, it's kind of like this, the disruption thing of, uh, you know. Mo Open has been doing, [00:29:00] consumer been doing the, just pursuing general intelligence in every mo modality, and here's a topic that only focus on this thing, but now they're sort of undercutting and doing the whole innovator's dilemma thing on like everything else.[00:29:11] Martin Casado: It's very[00:29:11] swyx: interesting.[00:29:12] Martin Casado: Yeah, I mean there's, there's a very open que so for me there's like, do you know that meme where there's like the guy in the path and there's like a path this way? There's a path this way. Like one which way Western man. Yeah. Yeah.[00:29:23] Two Futures for AI: Infinite Market vs AGI Oligopoly[00:29:23] Martin Casado: And for me, like, like all the entire industry kind of like hinges on like two potential futures.[00:29:29] So in, in one potential future, um, the market is infinitely large. There's perverse economies of scale. ‘cause as soon as you put a model out there, like it kind of sublimates and all the other models catch up and like, it's just like software's being rewritten and fractured all over the place and there's tons of upside and it just grows.[00:29:48] And then there's another path which is like, well. Maybe these models actually generalize really well, and all you have to do is train them with three times more money. That's all you have to [00:30:00] do, and it'll just consume everything beyond it. And if that's the case, like you end up with basically an oligopoly for everything, like, you know mm-hmm.[00:30:06] Because they're perfectly general and like, so this would be like the, the a GI path would be like, these are perfectly general. They can do everything. And this one is like, this is actually normal software. The universe is complicated. You've got, and nobody knows the answer.[00:30:18] The Economics Reality Check: Gross Margins, Training Costs & Borrowing Against the Future[00:30:18] Martin Casado: My belief is if you actually look at the numbers of these companies, so generally if you look at the numbers of these companies, if you look at like the amount they're making and how much they, they spent training the last model, they're gross margin positive.[00:30:30] You're like, oh, that's really working. But if you look at like. The current training that they're doing for the next model, their gross margin negative. So part of me thinks that a lot of ‘em are kind of borrowing against the future and that's gonna have to slow down. It's gonna catch up to them at some point in time, but we don't really know.[00:30:47] Sarah Wang: Yeah.[00:30:47] Martin Casado: Does that make sense? Like, I mean, it could be, it could be the case that the only reason this is working is ‘cause they can raise that next round and they can train that next model. ‘cause these models have such a short. Life. And so at some point in time, like, you know, they won't be able to [00:31:00] raise that next round for the next model and then things will kind of converge and fragment again.[00:31:03] But right now it's not.[00:31:04] Sarah Wang: Totally. I think the other, by the way, just, um, a meta point. I think the other lesson from the last three years is, and we talk about this all the time ‘cause we're on this. Twitter X bubble. Um, cool. But, you know, if you go back to, let's say March, 2024, that period, it felt like a, I think an open source model with an, like a, you know, benchmark leading capability was sort of launching on a daily basis at that point.[00:31:27] And, um, and so that, you know, that's one period. Suddenly it's sort of like open source takes over the world. There's gonna be a plethora. It's not an oligopoly, you know, if you fast, you know, if you, if you rewind time even before that GPT-4 was number one for. Nine months, 10 months. It's a long time. Right.[00:31:44] Um, and of course now we're in this era where it feels like an oligopoly, um, maybe some very steady state shifts and, and you know, it could look like this in the future too, but it just, it's so hard to call. And I think the thing that keeps, you know, us up at [00:32:00] night in, in a good way and bad way, is that the capability progress is actually not slowing down.[00:32:06] And so until that happens, right, like you don't know what's gonna look like.[00:32:09] Martin Casado: But I, I would, I would say for sure it's not converged, like for sure, like the systemic capital flows have not converged, meaning right now it's still borrowing against the future to subsidize growth currently, which you can do that for a period of time.[00:32:23] But, but you know, at the end, at some point the market will rationalize that and just nobody knows what that will look like.[00:32:29] Alessio: Yeah.[00:32:29] Martin Casado: Or, or like the drop in price of compute will, will, will save them. Who knows?[00:32:34] Alessio: Yeah. Yeah. I think the models need to ask them to, to specific tasks. You know? It's like, okay, now Opus 4.5 might be a GI at some specific task, and now you can like depreciate the model over a longer time.[00:32:45] I think now, now, right now there's like no old model.[00:32:47] Martin Casado: No, but let, but lemme just change that mental, that's, that used to be my mental model. Lemme just change it a little bit.[00:32:53] Capital as a Weapon vs Task Saturation: Where Real Enterprise Value Gets Built[00:32:53] Martin Casado: If you can raise three times, if you can raise more than the aggregate of anybody that uses your models, that doesn't even matter.[00:32:59] It doesn't [00:33:00] even matter. See what I'm saying? Like, yeah. Yeah. So, so I have an API Business. My API business is 60% margin, or 70% margin, or 80% margin is a high margin business. So I know what everybody is using. If I can raise more money than the aggregate of everybody that's using it, I will consume them whether I'm a GI or not.[00:33:14] And I will know if they're using it ‘cause they're using it. And like, unlike in the past where engineering stops me from doing that.[00:33:21] Alessio: Mm-hmm.[00:33:21] Martin Casado: It is very straightforward. You just train. So I also thought it was kind of like, you must ask the code a GI, general, general, general. But I think there's also just a possibility that the, that the capital markets will just give them the, the, the ammunition to just go after everybody on top of ‘em.[00:33:36] Sarah Wang: I, I do wonder though, to your point, um, if there's a certain task that. Getting marginally better isn't actually that much better. Like we've asked them to it, to, you know, we can call it a GI or whatever, you know, actually, Ali Goi talks about this, like we're already at a GI for a lot of functions in the enterprise.[00:33:50] Um. That's probably those for those tasks, you probably could build very specific companies that focus on just getting as much value out of that task that isn't [00:34:00] coming from the model itself. There's probably a rich enterprise business to be built there. I mean, could be wrong on that, but there's a lot of interesting examples.[00:34:08] So, right, if you're looking the legal profession or, or whatnot, and maybe that's not a great one ‘cause the models are getting better on that front too, but just something where it's a bit saturated, then the value comes from. Services. It comes from implementation, right? It comes from all these things that actually make it useful to the end customer.[00:34:24] Martin Casado: Sorry, what am I, one more thing I think is, is underused in all of this is like, to what extent every task is a GI complete.[00:34:31] Sarah Wang: Mm-hmm.[00:34:32] Martin Casado: Yeah. I code every day. It's so fun.[00:34:35] Sarah Wang: That's a core question. Yeah.[00:34:36] Martin Casado: And like. When I'm talking to these models, it's not just code. I mean, it's everything, right? Like I, you know, like it's,[00:34:43] swyx: it's healthcare.[00:34:44] It's,[00:34:44] Martin Casado: I mean, it's[00:34:44] swyx: Mele,[00:34:45] Martin Casado: but it's every, it is exactly that. Like, yeah, that's[00:34:47] Sarah Wang: great support. Yeah.[00:34:48] Martin Casado: It's everything. Like I'm asking these models to, yeah, to understand compliance. I'm asking these models to go search the web. I'm asking these models to talk about things I know in the history, like it's having a full conversation with me while I, I engineer, and so it could be [00:35:00] the case that like, mm-hmm.[00:35:01] The most a, you know, a GI complete, like I'm not an a GI guy. Like I think that's, you know, but like the most a GI complete model will is win independent of the task. And we don't know the answer to that one either.[00:35:11] swyx: Yeah.[00:35:12] Martin Casado: But it seems to me that like, listen, codex in my experience is for sure better than Opus 4.5 for coding.[00:35:18] Like it finds the hardest bugs that I work in with. Like, it is, you know. The smartest developers. I don't work on it. It's great. Um, but I think Opus 4.5 is actually very, it's got a great bedside manner and it really, and it, it really matters if you're building something very complex because like, it really, you know, like you're, you're, you're a partner and a brainstorming partner for somebody.[00:35:38] And I think we don't discuss enough how every task kind of has that quality.[00:35:42] swyx: Mm-hmm.[00:35:43] Martin Casado: And what does that mean to like capital investment and like frontier models and Submodels? Yeah.[00:35:47] Why “Coding Models” Keep Collapsing into Generalists (Reasoning vs Taste)[00:35:47] Martin Casado: Like what happened to all the special coding models? Like, none of ‘em worked right. So[00:35:51] Alessio: some of them, they didn't even get released.[00:35:53] Magical[00:35:54] Martin Casado: Devrel. There's a whole, there's a whole host. We saw a bunch of them and like there's this whole theory that like, there could be, and [00:36:00] I think one of the conclusions is, is like there's no such thing as a coding model,[00:36:04] Alessio: you know?[00:36:04] Martin Casado: Like, that's not a thing. Like you're talking to another human being and it's, it's good at coding, but like it's gotta be good at everything.[00:36:10] swyx: Uh, minor disagree only because I, I'm pretty like, have pretty high confidence that basically open eye will always release a GPT five and a GT five codex. Like that's the code's. Yeah. The way I call it is one for raisin, one for Tiz. Um, and, and then like someone internal open, it was like, yeah, that's a good way to frame it.[00:36:32] Martin Casado: That's so funny.[00:36:33] swyx: Uh, but maybe it, maybe it collapses down to reason and that's it. It's not like a hundred dimensions doesn't life. Yeah. It's two dimensions. Yeah, yeah, yeah, yeah. Like and exactly. Beside manner versus coding. Yeah.[00:36:43] Martin Casado: Yeah.[00:36:44] swyx: It's, yeah.[00:36:46] Martin Casado: I, I think for, for any, it's hilarious. For any, for anybody listening to this for, for, for, I mean, for you, like when, when you're like coding or using these models for something like that.[00:36:52] Like actually just like be aware of how much of the interaction has nothing to do with coding and it just turns out to be a large portion of it. And so like, you're, I [00:37:00] think like, like the best Soto ish model. You know, it is going to remain very important no matter what the task is.[00:37:06] swyx: Yeah.[00:37:07] What He's Actually Coding: Gaussian Splats, Spark.js & 3D Scene Rendering Demos[00:37:07] swyx: Uh, speaking of coding, uh, I, I'm gonna be cheeky and ask like, what actually are you coding?[00:37:11] Because obviously you, you could code anything and you are obviously a busy investor and a manager of the good. Giant team. Um, what are you calling?[00:37:18] Martin Casado: I help, um, uh, FEFA at World Labs. Uh, it's one of the investments and um, and they're building a foundation model that creates 3D scenes.[00:37:27] swyx: Yeah, we had it on the pod.[00:37:28] Yeah. Yeah,[00:37:28] Martin Casado: yeah. And so these 3D scenes are Gaussian splats, just by the way that kind of AI works. And so like, you can reconstruct a scene better with, with, with radiance feels than with meshes. ‘cause like they don't really have topology. So, so they, they, they produce each. Beautiful, you know, 3D rendered scenes that are Gaussian splats, but the actual industry support for Gaussian splats isn't great.[00:37:50] It's just never, you know, it's always been meshes and like, things like unreal use meshes. And so I work on a open source library called Spark js, which is a. Uh, [00:38:00] a JavaScript rendering layer ready for Gaussian splats. And it's just because, you know, um, you, you, you need that support and, and right now there's kind of a three js moment that's all meshes and so like, it's become kind of the default in three Js ecosystem.[00:38:13] As part of that to kind of exercise the library, I just build a whole bunch of cool demos. So if you see me on X, you see like all my demos and all the world building, but all of that is just to exercise this, this library that I work on. ‘cause it's actually a very tough algorithmics problem to actually scale a library that much.[00:38:29] And just so you know, this is ancient history now, but 30 years ago I paid for undergrad, you know, working on game engines in college in the late nineties. So I've got actually a back and it's very old background, but I actually have a background in this and so a lot of it's fun. You know, but, but the, the, the, the whole goal is just for this rendering library to, to,[00:38:47] Sarah Wang: are you one of the most active contributors?[00:38:49] The, their GitHub[00:38:50] Martin Casado: spark? Yes.[00:38:51] Sarah Wang: Yeah, yeah.[00:38:51] Martin Casado: There's only two of us there, so, yes. No, so by the way, so the, the pri The pri, yeah. Yeah. So the primary developer is a [00:39:00] guy named Andres Quist, who's an absolute genius. He and I did our, our PhDs together. And so like, um, we studied for constant Quas together. It was almost like hanging out with an old friend, you know?[00:39:09] And so like. So he, he's the core, core guy. I did mostly kind of, you know, the side I run venture fund.[00:39:14] swyx: It's amazing. Like five years ago you would not have done any of this. And it brought you back[00:39:19] Martin Casado: the act, the Activ energy, you're still back. Energy was so high because you had to learn all the framework b******t.[00:39:23] Man, I f*****g used to hate that. And so like, now I don't have to deal with that. I can like focus on the algorithmics so I can focus on the scaling and I,[00:39:29] swyx: yeah. Yeah.[00:39:29] LLMs vs Spatial Intelligence + How to Value World Labs' 3D Foundation Model[00:39:29] swyx: And then, uh, I'll observe one irony and then I'll ask a serious investor question, uh, which is like, the irony is FFE actually doesn't believe that LMS can lead us to spatial intelligence.[00:39:37] And here you are using LMS to like help like achieve spatial intelligence. I just see, I see some like disconnect in there.[00:39:45] Martin Casado: Yeah. Yeah. So I think, I think, you know, I think, I think what she would say is LLMs are great to help with coding.[00:39:51] swyx: Yes.[00:39:51] Martin Casado: But like, that's very different than a model that actually like provides, they, they'll never have the[00:39:56] swyx: spatial inte[00:39:56] Martin Casado: issues.[00:39:56] And listen, our brains clearly listen, our brains, brains clearly have [00:40:00] both our, our brains clearly have a language reasoning section and they clearly have a spatial reasoning section. I mean, it's just, you know, these are two pretty independent problems.[00:40:07] swyx: Okay. And you, you, like, I, I would say that the, the one data point I recently had, uh, against it is the DeepMind, uh, IMO Gold, where, so, uh, typically the, the typical answer is that this is where you start going down the neuros symbolic path, right?[00:40:21] Like one, uh, sort of very sort of abstract reasoning thing and one form, formal thing. Um, and that's what. DeepMind had in 2024 with alpha proof, alpha geometry, and now they just use deep think and just extended thinking tokens. And it's one model and it's, and it's in LM.[00:40:36] Martin Casado: Yeah, yeah, yeah, yeah, yeah.[00:40:37] swyx: And so that, that was my indication of like, maybe you don't need a separate system.[00:40:42] Martin Casado: Yeah. So, so let me step back. I mean, at the end of the day, at the end of the day, these things are like nodes in a graph with weights on them. Right. You know, like it can be modeled like if you, if you distill it down. But let me just talk about the two different substrates. Let's, let me put you in a dark room.[00:40:56] Like totally black room. And then let me just [00:41:00] describe how you exit it. Like to your left, there's a table like duck below this thing, right? I mean like the chances that you're gonna like not run into something are very low. Now let me like turn on the light and you actually see, and you can do distance and you know how far something away is and like where it is or whatever.[00:41:17] Then you can do it, right? Like language is not the right primitives to describe. The universe because it's not exact enough. So that's all Faye, Faye is talking about. When it comes to like spatial reasoning, it's like you actually have to know that this is three feet far, like that far away. It is curved.[00:41:37] You have to understand, you know, the, like the actual movement through space.[00:41:40] swyx: Yeah.[00:41:40] Martin Casado: So I do, I listen, I do think at the end of these models are definitely converging as far as models, but there's, there's, there's different representations of problems you're solving. One is language. Which, you know, that would be like describing to somebody like what to do.[00:41:51] And the other one is actually just showing them and the space reasoning is just showing them.[00:41:55] swyx: Yeah, yeah, yeah. Right. Got it, got it. Uh, the, in the investor question was on, on, well labs [00:42:00] is, well, like, how do I value something like this? What, what, what work does the, do you do? I'm just like, Fefe is awesome.[00:42:07] Justin's awesome. And you know, the other two co-founder, co-founders, but like the, the, the tech, everyone's building cool tech. But like, what's the value of the tech? And this is the fundamental question[00:42:16] Martin Casado: of, well, let, let, just like these, let me just maybe give you a rough sketch on the diffusion models. I actually love to hear Sarah because I'm a venture for, you know, so like, ventures always, always like kind of wild west type[00:42:24] swyx: stuff.[00:42:24] You, you, you, you paid a dream and she has to like, actually[00:42:28] Martin Casado: I'm gonna say I'm gonna mar to reality, so I'm gonna say the venture for you. And she can be like, okay, you a little kid. Yeah. So like, so, so these diffusion models literally. Create something for, for almost nothing. And something that the, the world has found to be very valuable in the past, in our real markets, right?[00:42:45] Like, like a 2D image. I mean, that's been an entire market. People value them. It takes a human being a long time to create it, right? I mean, to create a, you know, a, to turn me into a whatever, like an image would cost a hundred bucks in an hour. The inference cost [00:43:00] us a hundredth of a penny, right? So we've seen this with speech in very successful companies.[00:43:03] We've seen this with 2D image. We've seen this with movies. Right? Now, think about 3D scene. I mean, I mean, when's Grand Theft Auto coming out? It's been six, what? It's been 10 years. I mean, how, how like, but hasn't been 10 years.[00:43:14] Alessio: Yeah.[00:43:15] Martin Casado: How much would it cost to like, to reproduce this room in 3D? Right. If you, if you, if you hired somebody on fiber, like in, in any sort of quality, probably 4,000 to $10,000.[00:43:24] And then if you had a professional, probably $30,000. So if you could generate the exact same thing from a 2D image, and we know that these are used and they're using Unreal and they're using Blend, or they're using movies and they're using video games and they're using all. So if you could do that for.[00:43:36] You know, less than a dollar, that's four or five orders of magnitude cheaper. So you're bringing the marginal cost of something that's useful down by three orders of magnitude, which historically have created very large companies. So that would be like the venture kind of strategic dreaming map.[00:43:49] swyx: Yeah.[00:43:50] And, and for listeners, uh, you can do this yourself on your, on your own phone with like. Uh, the marble.[00:43:55] Martin Casado: Yeah. Marble.[00:43:55] swyx: Uh, or but also there's many Nerf apps where you just go on your iPhone and, and do this.[00:43:59] Martin Casado: Yeah. Yeah. [00:44:00] Yeah. And, and in the case of marble though, it would, what you do is you literally give it in.[00:44:03] So most Nerf apps you like kind of run around and take a whole bunch of pictures and then you kind of reconstruct it.[00:44:08] swyx: Yeah.[00:44:08] Martin Casado: Um, things like marble, just that the whole generative 3D space will just take a 2D image and it'll reconstruct all the like, like[00:44:16] swyx: meaning it has to fill in. Uh,[00:44:18] Martin Casado: stuff at the back of the table, under the table, the back, like, like the images, it doesn't see.[00:44:22] So the generator stuff is very different than reconstruction that it fills in the things that you can't see.[00:44:26] swyx: Yeah. Okay.[00:44:26] Sarah Wang: So,[00:44:27] Martin Casado: all right. So now the,[00:44:28] Sarah Wang: no, no. I mean I love that[00:44:29] Martin Casado: the adult[00:44:29] Sarah Wang: perspective. Um, well, no, I was gonna say these are very much a tag team. So we, we started this pod with that, um, premise. And I think this is a perfect question to even build on that further.[00:44:36] ‘cause it truly is, I mean, we're tag teaming all of these together.[00:44:39] Investing in Model Labs, Media Rumors, and the Cursor Playbook (Margins & Going Down-Stack)[00:44:39] Sarah Wang: Um, but I think every investment fundamentally starts with the same. Maybe the same two premises. One is, at this point in time, we actually believe that there are. And of one founders for their particular craft, and they have to be demonstrated in their prior careers, right?[00:44:56] So, uh, we're not investing in every, you know, now the term is NEO [00:45:00] lab, but every foundation model, uh, any, any company, any founder trying to build a foundation model, we're not, um, contrary to popular opinion, we're
The u-blox SAM-M8Q has been sitting on my bench for months. This little GPS module has a built-in antenna, coin cell backup, speaks both NMEA and UBX binary protocol over UART or I2C. So why isn't it in the shop already? Well, it's mostly cause of the 475-page interfacing datasheet documenting every command, struct, and config register. Hundreds of message types. I got partway through by hand with some Claude Code Sonnet assistance, but ran out of time - plus it was still tedious when babysitting Sonnet. However, now we're living in an Opus + Codex era! So I pointed my Raspberry Pi OpenClaw at it. https://github.com/adafruit/openclaw Here's the setup: Raspberry Pi 5 running OpenClaw, wired to a QT Py RP2040, which talks to the SAM-M8Q. Opus 4.6 reads the datasheet (converted to markdown first by Sonnet 4.6 with 1M context to minimize re-parsing that PDF every session) and builds the implementation plan. I review the plan to make sure it prioritizes the most common commands and reports, and flagged some unessential sections like automotive-assist or RTK-specific. Then Codex is assigned each message implementation task as a sub-agent and writes the actual C code for the Arduino library. Opus suggested using struct-based parsing rather than digging through each uint8_t array; we just memcpy the checksummed message raw bytes onto the matching struct and extract the typed bit fields. We've got four message types done so far. After each message is implemented, Codex also writes a test sketch that will exercise / pretty-print the results of each message, great for self-testing as well as regression testing later. Tonight I'm telling it to keep going while I sleep: code, parse, test against live satellite data, fix failures, commit and push on success, then move on to the next. To me this is a great usage of "agentic" firmware development: there's no creativity in transcribing 84 different structs from a 475-page datasheet. Once the LLMs are done, I can review the PRs as if it were an everyday contributor and even make revision suggestions. Visit the Adafruit shop online - http://www.adafruit.com ----------------------------------------- LIVE CHAT IS HERE! http://adafru.it/discord Subscribe to Adafruit on YouTube: http://adafru.it/subscribe New tutorials on the Adafruit Learning System: http://learn.adafruit.com/ ----------------------------------------- #openclaw #raspberrypi #adafruit
In an online meeting with the San Diego Ramana Satsang (ramana-satsang-sd@googlegroups.com) on 1st February 2026, Michael answers various questions about Bhagavan's teachings. This episode can be watched as a video on YouTube. A more compressed audio copy in Opus format can be downloaded from MediaFire. Advertisement-free videos on the original writings of Bhagavan Ramana with explanations by Michael James can be accessed on our Vimeo video channel. Books by Sri Sadhu Om and Michael James that are currently available on Amazon: By Sri Sadhu Om: ► The Path of Sri Ramana (English) By Michael James: ► Happiness and Art of Being (English) ► Lyckan och Varandets Konst (Swedish) ► Anma-Viddai (English) Above books are also available in other regional Amazon marketplaces worldwide. - Sri Ramana Center of Houston
I'm designing my real youth ministry program, with these 3 rules. 1 Order Doesn't Repeat And today I'm intentionally going to do it WRONG!! 2 I'm also making a different, from scratch, DYM Game for EACH week And I'm going to tell you how to get a FREE early copy of it 3 Finally, I'll evaluate how it all went! Join us! ACCESS TO FREE GAME & RECAP EPISODE https://www.patreon.com/posts/we-went-there-150912245?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link SHOW NOTES Shownotes & Transcripts https://www.hybridministry.xyz/189 ❄️ WINTER SOCIAL MEDIA PACK https://www.patreon.com/posts/winter-seasonal-144943791?utm_medium=clipboard_copy&utm_source=copyLink&utm_campaign=postshare_creator&utm_content=join_link HYBRID HERO MEMBERS GET IT FREE! https://www.patreon.com/hybridministry
www.marktreichel.comhttps://www.linkedin.com/in/mark-treichel/Episode Description: When your credit union receives an examination result you believe is wrong, what are your options? In this episode, Mark Treichel walks through NCUA's formal appeal process from start to finish — the timeline, the levels of review, what you can actually appeal, and how to think about whether it makes sense for your institution. Drawing on his experience as a former NCUA regional director and his work advising credit unions through the process, Mark breaks down the practical realities most credit union leaders don't fully understand until they're in the middle of it.Key Topics Covered:Why resolving issues at the examiner level is always the best first step — and why it doesn't always workThe fear of retaliation and when it's worth pushing back anywayWhat qualifies as a "material supervisory determination" under Part 746The critical phrase most credit unions overlook: "includes, but is not limited to"Why you can't formally appeal CAMEL components — but effectively can anywayDocument resolutions are negotiable: pushing back on language, dates, and requirementsThe full appeal timeline: regional director, Supervisory Review Committee, and NCUA BoardWhen oral hearings are available and when they're guaranteedWhy "tie goes to the runner" at every level — and what that means for your burden of proofDefining victory: full reversals, partial wins, and the value of being heardBuilding your administrative record for the long termKey Takeaways:You have 30 days from the final exam to appeal to the regional director — and some argue the clock starts when you see the draft.A full appeal through the NCUA Board can take eight months to a year.The Supervisory Review Committee must grant an oral hearing if you request one — the only level where that's guaranteed.Your odds of success generally improve as you move higher in the process.Even if you don't win the appeal, you're building an administrative record NCUA must consider in any future enforcement actions.Resources:NCUA Part 746, Subpart A (Appeals Regulation)Preamble to the final rule on appeals processCredit Union Exam Solutions: marktreichel.com Opus 4.5ExtendedClaude is AI and can make mistakes. Please double-check responses.ShareArtifactsDownload allNcua
In today's episode of the RattlerGator Report, JB White conducts a live read-through and reaction to a powerful essay by Matt Schumer detailing the rapid acceleration of artificial intelligence. Framing the moment as a “February 2020” style inflection point, JB walks through Schumer's firsthand account of GPT-5.3 Codex and Opus 4.6, highlighting AI systems that now write code, debug themselves, iterate independently, and even help build their own successors. The discussion centers on exponential growth, task-duration benchmarks, AI-assisted model training, and projections that superhuman capability across most cognitive work could arrive within just a few years. JB expands the conversation into geopolitics, energy infrastructure, Elon Musk's role in robotics and satellite systems, and what AI dominance could mean for military, economic, and cultural power. He also ties the technological shift to 2026–2028 political positioning, global alliances, and internal GOP battles, arguing that America's strategic advantage depends on recognizing the scale of change underway. The episode blends technological urgency, political forecasting, and philosophical reflection on generational responsibility in the face of accelerating transformation.
Nick McDaniel is joined by Jacked Jameson and Rosario Grillo for another packed episode of Tapped In covering the biggest stories in Georgia independent wrestling.We break down major headlines, title situations, heated rivalries, and what's building toward Spectacle. There's steel cage drama, momentum shifts, and plenty of behind-the-scenes perspective you won't get anywhere else.The Let's Get Kraken segment delivers more chaos, hard lessons, and unexpected twists, while our non-wrestling Mt. Rushmore debate turns into a surprisingly passionate discussion.We close things out with Making the Drives, previewing a loaded weekend across Georgia — championship matches, stipulations, and can't-miss shows.If you follow Georgia indie wrestling, this is your weekly pulse check.
Our 235th episode with a summary and discussion of last week's big AI news!Recorded on 01/02/2026Hosted by Andrey Kurenkov and Jeremie HarrisFeel free to email us your questions and feedback at contact@lastweekinai.com and/or hello@gladstone.aiRead out our text newsletter and comment on the podcast at https://lastweekin.ai/In this episode:* Major model launches include Anthropic's Opus 4.6 with a 1M-token context window and “agent teams,” OpenAI's GPT-5.3 Codex and faster Codex Spark via Cerebras, and Google's Gemini 3 Deep Think posting big jumps on ARC-AGI-2 and other STEM benchmarks amid criticism about missing safety documentation.* Generative media advances feature ByteDance's Seedance 2.0 text-to-video with high realism and broad prompting inputs, new image models Seedream 5.0 and Alibaba's Qwen Image 2.0, plus xAI's Grok Imagine API for text/image-to-video.* Open and competitive releases expand with Zhipu's GLM-5, DeepSeek's 1M-token context model, Cursor Composer 1.5, and open-weight Qwen3 Coder Next using hybrid attention aimed at efficient local/agentic coding.* Business updates include ElevenLabs raising $500M at an $11B valuation, Runway raising $315M at a $5.3B valuation, humanoid robotics firm Apptronik raising $935M at a $5.3B valuation, Waymo announcing readiness for high-volume production of its 6th-gen hardware, plus industry drama around Anthropic's Super Bowl ad and departures from xAI.Timestamps:(00:00:10) Intro / Banter(00:02:03) Sponsor Break(00:05:33) Response to listener commentsTools & Apps(00:07:27) Anthropic releases Opus 4.6 with new 'agent teams' | TechCrunch(00:11:28) OpenAI's new GPT-5.3-Codex is 25% faster and goes way beyond coding now - what's new | ZDNET(00:25:30) OpenAI launches new macOS app for agentic coding | TechCrunch(00:26:38) Google Unveils Gemini 3 Deep Think for Science & Engineering | The Tech Buzz(00:31:26) ByteDance's Seedance 2.0 Might be the Best AI Video Generator Yet - TechEBlog(00:35:14) China's ByteDance, Alibaba unveil AI image tools to rival Google's popular Nano Banana | South China Morning Post(00:36:54) DeepSeek boosts AI model with 10-fold token addition as Zhipu AI unveils GLM-5 | South China Morning Post(00:43:11) Cursor launches Composer 1.5 with upgrades for complex tasks(00:44:03) xAI launches Grok Imagine API for text and image to videoApplications & Business(00:45:47) Nvidia-backed AI voice startups ElevenLabs hits $11 billion valuation(00:52:04) AI video startup Runway raises $315M at $5.3B valuation, eyes more capable world models | TechCrunch(00:54:02) Humanoid robot startup Apptronik has now raised $935M at a $5B+ valuation | TechCrunch(00:57:10) Anthropic says 'Claude will remain ad-free,' unlike an unnamed rival | The Verge(01:00:18) Okay, now exactly half of xAI's founding team has left the company | TechCrunch(01:04:03) Waymo's next-gen robotaxi is ready for passengers — and also 'high-volume production' | The VergeProjects & Open Source(01:04:59) Qwen3-Coder-Next: Pushing Small Hybrid Models on Agentic Coding(01:08:38) OpenClaw's AI 'skill' extensions are a security nightmare | The VergeResearch & Advancements(01:10:40) Learning to Reason in 13 Parameters(01:16:01) Reinforcement World Model Learning for LLM-based Agents(01:20:00) Opus 4.6 on Vending-Bench – Not Just a Helpful AssistantPolicy & Safety(01:22:28) METR GPT-5.2(01:26:59) The Hot Mess of AI: How Does Misalignment Scale with Model Intelligence and Task Complexity?See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
如果你喜歡我的內容,歡迎加入會員支持我,讓我更有動力繼續分享更多好內容!
This Week In Startups is made possible by:LinkedIn - http://linkedin.com/twistEvery.io - http://every.io/Northwest registered Agent - www.northwestregisteredagent.com/twistToday's show: Sean Liu gave his OpenClaw “eyes” by hooking the AI assistant up to his Meta Ray Bans. While Jason and Alex consider the implications… guest expert Alex Finn quickly loses interest…It's another All-Clawdbot TWiST where we debate the most essential uses for the viral agent platform. For their part, Alex Finn and Matt Van Horn are building practical skills to help founders (esp. solo founders)… Alex has a brain trust of AI agents working round the clock to improve his organization and content, while Matt designed a skill for his agent to order sushi lunches (keeping them under $20).PLUS find out how Producer Oliver designed his “Model Council” based on Perplexity's set-up, why running everything local is “illogical” for most people in 2026, AND why Jason's next Founder U cohort will all be building with OpenClaw.Timestamps: (00:00) Why Jason is STILL Claw-pilled! We can't get enough!(03:17) “Mr. Clawdbot” Alex Finn of Creator Buddy joins us! He's Claw-Pilled too!(04:50) Sean Liu (@_seanliu) added OpenClaw to his Meta Ray Bans… Here's why Alex doesn't care!(8:33) Lost your job to AI? Maybe AI can make you a founder!(12:35) LinkedIn Jobs - Hire right, the first time. Post your first job and get $100 off towards your job post at http://linkedin.com/twist Terms and conditions apply.(13:34) “Last 30 Days” skill creator Matt Van Horn (@mvanhorn) joins the show(14:54) Why Jason wants to invest in Matt's new project.(15:53) Meeting Alex's organization of AI agents who collaborate on his projects autonomously(22:28) Is Alex planning to turn his OpenClaw set-up into a company? Maybe…(23:10) Every.io - For all of your incorporation, banking, payroll, benefits, accounting, taxes or other back-office administration needs, visit http://every.io/(25:16) Using OpenClaw to create a “lights out” startup(28:17) Deciding when to use local models vs. more powerful cloud-based models(29:52) Northwest Registered Agent. Get more when you start your business with Northwest. In 10 clicks and 10 minutes, you can form your company and walk away with a real business identity — Learn more at www.northwestregisteredagent.com/twist(30:58) How Producer Oliver created a “Model Council” for LAUNCH based on Perplexity's set-up(35:52) Why it's so important to have an orchestrator that knows the right model for the right task(39:12) Why running everything local is “illogical” for most companies in 2026(41:15) Matt walks us through the latest and greatest incarnation of his “Last 30 Days” skill(45:57) Why Jason thinks recursive machine learning is the solution to human laziness(46:38) How Alex has supercharged his social media and his startup during OpenClaw Mania(50:15) Why Founder University is going ALL OPENCLAW next cohort(51:21) How big a jump over Opus 4.5 is Opus 4.6?(53:35) OpenClaw's going to start scanning skills for malware… Will this help?(57:13) No one was SUPER impressed by the AI dot com Super Bowl ad… except JCal(1:01:11) Matt used his “Order Sushi” OpenClaw skill live on the show to get lunch(1:04:08) Polymarket's sharps are wagering on Claude 5's hotly anticipated releaseThank you to our partners:(12:35) LinkedIn Jobs - Hire right, the first time. Post your first job and get $100 off towards your job post at http://linkedin.com/twist Terms and conditions apply.(23:10) Every.io - For all of your incorporation, banking, payroll, benefits, accounting, taxes or other back-office administration needs, visit http://every.io/(29:52) Northwest Registered Agent. Get more when you start your business with Northwest. In 10 clicks and 10 minutes, you can form your company and walk away with a real business identity — Learn more at www.northwestregisteredagent.com/twistCheck out all our partner offers: https://partners.launch.co/
SANS Internet Stormcenter Daily Network/Cyber Security and Information Security Stormcast
Microsoft Patches Four Azure Vulnerabilities (three critical) https://msrc.microsoft.com/update-guide/vulnerability Evaluating and mitigating the growing risk of LLM-discovered 0-days https://red.anthropic.com/2026/zero-days/ Gitlab AI Gateway Vulnerability CVE-2026-1868 https://about.gitlab.com/releases/2026/02/06/patch-release-gitlab-ai-gateway-18-8-1-released/