POPULARITY
This Episode is Sponsored by StayFi Your ultimate tool for Vacation Rental WiFi marketing allowing you to collect guest emails automatically via custom captive WiFi login splash pages. Drive repeat direct bookings and convert your OTA bookings to book direct for their next visit. Visit https://stayfi.com/vrsuccess/ and use code VRSUCCESS for 50% off 3 months of StayFi service. _________________________________________________________________________________________________________ Jodi Bourne is back for the latest installment of the new regular segment with Heather, built around a simple premise: AI is moving fast, and the two of them are going to keep working through it together, out loud, for listeners who want to come along. This conversation goes deeper into the practical mechanics of working with Claude. Heather and Jodi talk through connectors (MCPs that link Claude to tools like Gmail, Google Drive, Asana, and accounting software), skills (saved, reusable instruction sets that replace the old habit of copying and pasting prompts), and what both of them call their AI business brain - a structured foundational document that teaches Claude who you are, what you sell, and how you sound, before you ask it to produce anything. You'll come away with a clear starting point: build the foundation first, connect the tools you already use, and create one simple skill - Jodi's suggestion is a daily "morning coffee" briefing - before trying to do anything more ambitious. Key Takeaways AI output defaults to generic. The fix isn't a better prompt - it's a structured foundation document (Jodi calls hers the Hospitality Brand Bible; Heather calls hers her Business Brain) that teaches the model your business, voice, and audience before you ask it to create anything. Building that foundation properly is not a five-minute job. Heather recommends setting aside the better part of a day and using reverse prompting - asking Claude to interview you, question by question, until it has a full picture of your business. Connectors (MCPs) link Claude directly to the tools already in use - Gmail, Google Drive, Google Calendar, Asana, accounting platforms - so requests can be carried out end to end instead of copying information back and forth manually. Skills replace the old habit of maintaining a library of saved prompts. A skill is a reusable, named instruction set that automatically pulls in the right reference documents and brand voice without being told to every time. The recommended first skill for anyone starting out is a daily briefing - a "morning coffee" routine that summarizes email, flags anything unanswered, and reviews the calendar - because it is simple, immediately useful, and teaches the basics of how skills work. AI will hallucinate and occasionally get things wrong with total confidence. Both hosts were emphatic that nothing goes out the door - a guest bio, an email, an Instacart order - without a human checking it first. ________________________________________________________________________________________________________________________________________
Thanks Pressable for supporting the show! Get your special hosting deal at https://pressable.com/wpminuteBecome a WP Minute Supporter & Slack member at https://thewpminute.com/supportOn this episode of The WP Minute+ podcast, GravityKit's Zack Katz joins Eric to discuss his company's forward-thinking features, including cryptographic signing on plugin updates and the new Block MCP tool. Zack shares that the recent plugin supply chain attacks inspired a more secure method for product distribution – potentially the first for a commercial plugin. Meanwhile, Block MCP fills a gap in the current WordPress AI landscape by working within the native block structure, rather than raw HTML. This episode provides you with the inside scoop on making WordPress safer and more user-friendly.Takeaways:Cryptographic signing ensures plugin updates are secure.Supply chain attacks are a real concern for plugin developers.GravityKit is the first to implement cryptographic signing in WordPress plugins.The Block MCP tool addresses frustrations with existing MCPs.AI can significantly enhance the editing experience in WordPress.Granular editing is simplified with the Block MCP tool.The Block MCP tool can automatically identify and use the best blocks.Internal linking can be improved using AI with the Block MCP.The plugin allows for non-destructive edits and easy rollbacks.Important Links:GravityKit products now give you a stronger reason to trust what you installIntroducing Block MCP: the WordPress MCP we built because nothing else workedBlock MCP: GitHub | PluginThe WP Minute+ Podcast: thewpminute.com/subscribe ★ Support this podcast ★
In this episode of Business Brain, we dig into the question of who really controls AI. We trade notes on Anthropic’s new Mythos model and its Fable guardrails — including a jaw-dropping account of how relentlessly capable these tools have become — and we wrestle with the bigger issue lurking underneath: when AI decides what we can and can’t do, who’s holding the keys? We talk search-engine parallels, data retention, the push for government oversight, and why locally run, private AI might be the move for protecting our business data while still tapping the power. Then we get practical with MCPs — Model Context Protocol — and why this might be the easiest upgrade we can make to how we work. No fussy API tokens, no burning through credits letting AI drive a browser. We share real wins: connecting analytics dashboards, newsletter platforms, and entire email inboxes so our AI can summarize, draft, and act on our behalf. It’s platform-agnostic, dead simple to set up, and a genuine game-changer — exactly the kind of leverage that keeps us building the Charmed Life. 00:00:00 Business Brain – The Entrepreneurs' Podcast #763 for Casual FridAI, June 19, 2026 June 19th: Juneteenth and National Martini Day 00:01:44 AI Censorship Fable/Mythos is relentlessly good! Dave (unintentionally) hit Fable's cybersecurity guidelines 00:12:45 SPONSOR: Bitdefender. Keep your small business safe with Bitdefender Ultimate Small Business Security. Save 30% when you go to https://bitdefender.com/BRAIN 00:14:26 SPONSOR: OneSkin. Born from over a decade of longevity research, OneSkin's OS-01 Peptide is proven to target the visible signs of aging, helping you unlock your healthiest skin now and as you age. Get 15% off OneSkin with the code BRAIN at https://www.oneskin.co/BRAIN #oneskinpod #ad 00:16:38 David-China turns on underwater datacenter 00:17:32 MCPs – Model Context Protocols Claude Cowork is becoming my primary email agent Fastmail Official MCP 00:21:44 This Episode's Big Takeway: MCPs are easy to implement, connect your AI to more things than you realize 00:23:00 Business Brain 763 Outtro Check out Business Brain Blueprints Tell Your Friends! Business Blueprints Review Business Brain Subscribe to the show feedback@businessbrain.show Call/Text: (567) 274-6977 X/Twitter: @ShannonJean & @DaveHamilton, & @BizBrainShow LinkedIn: Shannon Jean, Dave Hamilton, & Business Brain Facebook: Dave Hamilton, Shannon Jean, & Business Brain The post FridAI – AI Censorship and MCPs – Business Brain 763 appeared first on Business Brain - The Entrepreneurs' Podcast.
The skills problem isn't going anywhere — it's just wearing new clothes. In this episode, I unpack how the lessons we learned decades ago (limiting work in progress, the theory of constraints, test-driven development) are coming roaring back as the fundamentals that will carry you through the agentic shift. The bottleneck has moved, and knowing where it went changes how you should work. A lot of what we're learning about building with agentic tooling isn't new at all — it's a re-emphasis on lessons software engineers learned twenty years ago, just arriving in a new form. In today's episode, I walk through why the fundamentals are becoming more important than ever, why so many of us feel scattered despite having the most powerful tooling we've ever had, and where the real bottleneck in software delivery has quietly moved. My goal isn't to convince you that your job is now babysitting AI — it's to show you which parts of the work are still squarely yours, and how older principles can make you faster and more confident right now. Limiting Work in Progress Is Back: Just because you can spin up fifty agents doesn't mean you should split your focus across fifty things. Orchestrated fan-outs are powerful, but a human juggling agents across hiring, on-call, and a project all at once still pays the same old context-switching tax — and the quality drops while the speed never improves. Work Deeper, Not Wider: Instead of spreading yourself shallowly across more tickets, run multiple sessions on the same domain. Write a competing or adversarial version that critiques your assumptions, develop better documentation, or capture what you're learning as a reusable skill. Depth beats breadth. The Scattered-Engineer Epidemic: Engineers are burning out faster, not slower. We have the capacity to push more through the pipeline, so we're getting handed (or choosing) more than we can carry. Reducing parallelism often holds your delivery speed steady while dropping your cycle time and raising quality. The Theory of Constraints, Revisited: Treat your software development lifecycle as a pipeline with a bottleneck — and if you can't find one, you've optimized one part too far. Writing code used to be the choke point, so we spent enormous energy de-risking work before it ever reached an engineer. The Bottleneck Has Moved: When production gets cheap, it's no longer worth heavily de-risking upstream — which is why engineers are picking up more experimental, proof-of-concept, discovery work, and product folks are prototyping with these tools too. The new constraint isn't writing the code; it's verifying the agent didn't ship something broken. Verification Scales With Your Effort: The more an agent produces, the bigger the pile of PRs, MRs, and outputs waiting on human review. That backlog is the new bottleneck — and skepticism is creeping in because we're not even sure our tests are sufficient to verify what the agent built. Why TDD Fits This Moment: The honest question isn't "Can I trust the agent?" — it's "What verification loop do I need to build so I can trust it more?" Clear requirements feed a clear testing loop: write the failing test, let the agent write the code to turn it green, and you bridge the gap between requirements gathered and requirements met. It's not as simple as "go write a test," but it's a strong fit for where we are right now. Episode Homework: Go dig into the fundamentals — limiting WIP, the theory of constraints, test-driven development. Find the old lesson that still applies to your workflow today, bring it to your team's flow, and email me about what you discover.
AI Chat: ChatGPT & AI News, Artificial Intelligence, OpenAI, Machine Learning
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
ChatGPT: OpenAI, Sam Altman, AI, Joe Rogan, Artificial Intelligence, Practical AI
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter
ChatGPT: News on Open AI, MidJourney, NVIDIA, Anthropic, Open Source LLMs, Machine Learning
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter
In this episode, we explore the functionalities of MCPs and how they enhance the capabilities of AI tools like Claude and ChatGPT. We also discuss the differences between MCPs and APIs, share practical use cases, and highlight some of the most effective MCPs available for maximizing your AI integrations.Chapters00:00 Introduction to MCPs02:00 Understanding APIs vs. MCPs03:59 Setting Up MCPs Easily09:58 Top MCP Tools and Recommendations15:01 Benefits of Using MCPs Show LinksGet the AI Box MCP: https://aibox.ai/mcpHow I Grow and Scale My Business with AI: https://www.skool.com/aihustleGet the AI Chat Daily Newsletter: https://www.aichatdaily.com/newsletter See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.
Montgomery County, MD, pauses the processing of data centers permits for six months, Student journalists in MCPS protest a new rule that requires prior review of all school-sponsored publications. We have crowd-sourced data on MCPS schools affected by HVAC failures. And Delegate Kris Fair from Frederick County has an explainer for what happens with the replacement ballots Maryland had to send out. And more. Music from Seth Kibel's brand new album, Clarinet Without A Net.
The skills that survive every industry shakeup aren't the ones you can Google — they're softer, harder to name, and far more durable. In this episode, Jonathan explores principle-oriented thinking: the practice of stripping away the labels we attach to tools, roles, and even ourselves to see what something actually does at its core. It's the difference between handing your coding off to an agent and rethinking your entire workflow around what these new materials are truly capable of. If you've been following along with our recent focus on durable skills, you know we've been hunting for the abilities that translate beyond this month, this year, or whatever AI does to our industry next. Today's skill doesn't have a tidy name you can search for — it's softer than that. Jonathan calls it "principle-oriented thinking": the habit of deconstructing the labels we put on things to understand their core components, properties, and capabilities. It's how NASA engineers turned a sock into a water filter on Apollo 13, and it's how forward-thinking engineers are reframing what AI can actually do rather than jamming it into a predetermined slot. Labels Are Useful Shortcuts — Until They Aren't: Every label, from "software engineer" to "sock," carries baggage, heuristics, and presupposition. That's not a flaw — labels are how we move through the world quickly. But when a label is the only lens you have, it quietly caps how much value you can get out of the thing you're looking at. The Apollo 13 Sock: When the crew needed to fix a life-threatening problem with mismatched parts, the engineers on the ground had to forget what a sock was for and ask what it actually is — a piece of cloth with tensile strength, flexibility, and filtering properties. Strip the assumption that it goes on a foot, and a whole new set of uses opens up. Stop Slotting AI Into Old Roles: The common move is to take one responsibility — coding, debugging, refactoring — hand it to an agent, and keep everything else the same. That works, but it's low-leverage. The more powerful approach starts by asking what the agent is fundamentally capable of, then rebuilding the workflow around those raw materials. See Things as Materials, Not Fixed Functions: When you deconstruct out from under a label, tools and concepts start to look like craftable raw materials. You can then combine them in new, valuable ways they haven't been combined before — alloying old methods with new capabilities to create properties neither had on its own. Reason From Properties, Not Personas: Ask what the actual properties of an LLM are. Non-determinism isn't a bug to apologize for — it's a property you can exploit. The existence of many different models is a property too, which is exactly what makes adversarial review possible. That's principle-oriented thinking applied to agents. Extend the Latticework: Charlie Munger talked about a latticework of mental models that weave together rather than sit in isolation. The durable skill isn't quarantining your concept of "AI" off to the side — it's grafting a new section onto the existing tapestry and letting it reshape everything you already understood. Episode Takeaway: Look at how you spend your time and ask new questions of it. What is the material here? What kind of thinking does the agent actually do? What can a human do that an LLM can't — and the other way around? That's how you avoid believing a sock is only ever good for a foot.
I've been seeing a recurring pattern with companies selling APIs, MCPs, data feeds, and other developer-focused AI products. While the technology is often sound if not impressive, sales momentum sometimes slows when prospects have to imagine how the product will create value in their own environment. My perspective on this is that the flexibility that makes these tools powerful can also make them harder to evaluate. Flexibility can adversely increase the Invisible Intelligence Gap, and I think certain types of AI-based solutions (LLM) may actually increase this because the boundaries of the product are often so much wider than ever before (if not invisible to the buyer). So, how to close this gap? Well, one way is to build a visual UI that showcases what's possible with your API/feed/data solution. You take the buyer out of the conceptual space and make things concrete. So today, that's what we dig into: when to consider adding a UI, how far you need to go with it, how you can use Copilot/AI agents to help customize these example implementations, and the benefits you might see. Highlights / Skip to: The challenges of selling API-based analytics and AI products (0:56) Why this topic matters right now (2:48) The Invisible Intelligence Gap that may be slowing your sales (3:34) Strategies for bridging the Invisible Intelligence Gap with a UI (user interface) layer (7:01) Client case study: the impact and results you may see adding a UI on top of your technical product (14:05) Signs that you should consider adding UI to your technical product (18:23) Leveraging humans' highly developed visual system to help potential customers see the full value of your product (26:24) Conclusion (27:32) Links Invisible Intelligence Gap Azeem Azhar's Exponential View (6/4/26 episode)
DMV Hoops Podcast – Episode 107
DMV Hoops Podcast – Episode 109
Interest in macrocyclic peptides (MCPs) continues to grow, which means manufacturers are facing mounting pressure to develop production methods capable of supporting commercial-scale demand of these molecules. While they offer a unique combination of potency, selectivity, and drug-like properties, the structural complexity of MCPs has historically made them difficult and costly to manufacture using traditional peptide synthesis techniques. As a result, new manufacturing approaches are emerging that aim to improve efficiency, scalability, and sustainability while expanding access to this promising class of therapeutics. In this episode of Off Script, we spoke with David Thaisrivongs, executive director, head of biocatalysis at Merck, about research recently published in Science detailing a biocatalytic manufacturing process for enlicitide, an investigational oral macrocyclic peptide designed to lower LDL cholesterol. The conversation explores the limitations of conventional solid-phase peptide synthesis, how Merck leveraged enzyme-driven manufacturing and crystallization strategies to significantly reduce process complexity, and why minimizing chromatography can be critical for commercial-scale peptide production. He also discussed the broader implications of biocatalysis for manufacturing increasingly complex therapeutic modalities and how the technology could help shape the future of pharmaceutical production.
Montgomery County Public Schools are looking to cut hundreds of positions to close a nearly $40 million budget gap. The cuts follow the Montgomery County Council's failure to fully fund MCPS's budget request. Throughout the budget process, District 5 Councilmember Kristin Mink looked for ways to fully fund the school district. She joins the show to explain what the council can do now to support schools. Plus, she discusses her support for bills signed by the County Executive this week limiting federal immigration enforcement in the county, and we ask her to weigh in on the County Executive race.The D.C. Council is set for its first budget vote next week. It's been a challenging year as federal cuts and a decline in tax revenue have forced city leaders to weigh big cuts. Councilmember Christina Henderson takes us behind the scenes of negotiations and explains why she thinks slashing a fund that pays early childhood educators is the wrong move. Henderson also weighs in on the fight over a youth curfew, and we ask whether she's ready to endorse a candidate in the city's mayoral race.Send us questions and comments for guests: kojo@wamu.orgFollow us on Instagram: instagram.com/wamu885Follow us on Bluesky: bsky.app/wamu.org
We dedicate an episode to catching up on appsec news with Kalyani Pawar. We see parsing problems that led to the BadHost vuln, which exposed lots of LLMs, MCPs, and agents to potential compromise. We wonder where to look for security education and practice as the camaraderie of the CTF community becomes infiltrated by LLMs. We talk about the tradeoffs in trust between using public packages vs. having agents write replacements from scratch. And we examine some of the appsec details that the Verizon DBIR reveals about how orgs are being attacked -- and how orgs might use that information to protect themselves. Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-385
DMV Hoops Podcast – Episode 103
Affordable Maryland PAC's ad attacking Will Jawando's record on education leads to push back. PAC Chair Jonathan Robinson joins us. The Montgomery County Board of Education is going to vote on a long list of position cuts this week and MCPS school psychologist Alli Jacobus and parent Rachel Singer join to talk about the impact in one department. MCPS College and Career Navigator Sarah Kessler (whose position is also on the chopping block) joins to talk about University of Maryland's Fall 2026 admission numbers and clear up some common misconceptions about who is admitted and who is not. Music by Silver Spring rock musician MYSTR Treefrog.
We dedicate an episode to catching up on appsec news with Kalyani Pawar. We see parsing problems that led to the BadHost vuln, which exposed lots of LLMs, MCPs, and agents to potential compromise. We wonder where to look for security education and practice as the camaraderie of the CTF community becomes infiltrated by LLMs. We talk about the tradeoffs in trust between using public packages vs. having agents write replacements from scratch. And we examine some of the appsec details that the Verizon DBIR reveals about how orgs are being attacked -- and how orgs might use that information to protect themselves. Show Notes: https://securityweekly.com/asw-385
We dedicate an episode to catching up on appsec news with Kalyani Pawar. We see parsing problems that led to the BadHost vuln, which exposed lots of LLMs, MCPs, and agents to potential compromise. We wonder where to look for security education and practice as the camaraderie of the CTF community becomes infiltrated by LLMs. We talk about the tradeoffs in trust between using public packages vs. having agents write replacements from scratch. And we examine some of the appsec details that the Verizon DBIR reveals about how orgs are being attacked -- and how orgs might use that information to protect themselves. Visit https://www.securityweekly.com/asw for all the latest episodes! Show Notes: https://securityweekly.com/asw-385
We dedicate an episode to catching up on appsec news with Kalyani Pawar. We see parsing problems that led to the BadHost vuln, which exposed lots of LLMs, MCPs, and agents to potential compromise. We wonder where to look for security education and practice as the camaraderie of the CTF community becomes infiltrated by LLMs. We talk about the tradeoffs in trust between using public packages vs. having agents write replacements from scratch. And we examine some of the appsec details that the Verizon DBIR reveals about how orgs are being attacked -- and how orgs might use that information to protect themselves. Show Notes: https://securityweekly.com/asw-385
Can Aavegotchi DAO takeover the project, the State of Pixels, and the rise of open source in the agentic era. [00:35] Aavegotchi dev Pixelcraft is one of the OG web3 gaming studios.[05:16] It's looking to hand over control of Aavegotchi to the DAO.[06:28] DAOs haven't been successful for reasons like coordination and authority.[07:25] It's a nice vision, but the reality is Pixelcraft ran out of money. [08:01] By 1st September, the DAO has to have decided what's happening going forward. [09:16] Why “gamey games” are harder to hand over to communities or DAOs.[09:55] State of Pixels. It's sustainable but not growing.11:30 Pixels is now considering adding open-source elements. [12:05] AI significantly changes what community developers can build in blockchain games.[13:50] The emerging pattern is surviving web3 games are moving to APIs, MCPs and agent access.[15:15] Why blockchain and AI fit together culturally and technically.[19:05] Define “game games” versus “non-game games”.[20:49] Why blockchain games should focus less on moment-to-moment fun and more on meta. [23:30] EVE Frontier, MapleStory and Soccerverse as examples of meta-focused web3 games. [25:25] These games have emergent experiences. They don't require constant content updates. [28:30] Don't put things onchain to create value. Put existing value onchain so it can be realized.[32:40] Community-built Soccerverse fantasy football as a sign of where this goes next.[35:05] The first 10 years of blockchain gaming were about discovering what didn't work.[35:40] AI plus blockchain will enable things the traditional games industry won't build.[37:06] Why agents will become native players for blockchain games. [38:20] The future split: Mario-like gameplay games versus agent-filled systemic web3 worlds.
On the second hour of Nuanez Now, Colter Nuanez continues to provide live updates on prep sports action from across Montana, including results and storylines in track and field, softball, baseball, tennis, and more as postseason competition heats up around the state.Next, Colter is joined by fan favorite Carolyn, the Chick Who Doesn't Know Sports, for a fun discussion covering the Enhanced Games in Las Vegas, Johnny Manziel's recent fight, a near-disaster at the French Open, and plenty more entertaining headlines from around the sports world.
Colter Nuanez is live from MCPS Stadium in Missoula, Montana, bringing listeners live coverage and updates from prep sports action across the state. Colter breaks down the latest results and storylines in track & field, softball, baseball, and more as Montana's postseason competition continues to heat up.Later, Sammy Akem and Keenan Curran sit down with former Montana Grizzlies star receiver and current San Francisco 49ers player Junior Bergen to discuss his first year in the NFL, the transition to the professional level, and his decision to transfer from Montana State to Montana.
The new AIEWF website is live! CFPs close in 2 days and we will run our first New Engineer Orientation this weekend, get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!One of the central tensions in the agents industry is that even while there are major decacorn agent labs like Sierra, Decagon, Notion and Cursor being built up, it is also true that it has never been easier to DIY agents, with a plethora of agent frameworks like LangGraph and Pydantic and Flue, and managed agents from Anthropic and Gemini and Amazon. There has been a wave of companies building their own background agents from Shopify to Stripe to Paradigm to Razorpay, and even Cognition's friends Ramp have built their own coding agent with other friend Modal.You'd think Cognition might feel a bit threatened, but they're not - even after all this, they were way oversubscribed for the $1B Series D they just announced:Walden Yan, coiner of context engineering and Chief Product Officer/Cofounder of Cognition, invited OpenInspect's Cole Murray to talk about why the Devin is in the Details.Full conversation live on the pod today: In retrospect, async agents were the most AGI pilled bet you could make in 2024 - the models weren't good enough yet to vibecode, and people didn't trust AI enough to let it rip, nobody (including early Cognition) was sure about the form factors. Now it is obvious:* The first wave of AI coding tools made the developer faster but remain heavily in the loop. Copilor and Cursor's tab autocomplete are prime examples However, the workflow was still heavily centered around and bottlenecked by the developer's local workflow: a developer in an IDE, watching the model, accepting or rejecting changes, and pushing code one interaction at a time.* The second wave was local agents: Claude Code, Windsurf, Cursor's agents pane: first one and increasingly many terminals all running concurrently.* The current Age of Async Agents points to a different future focused more on agent orchestration which drives end-to-end development.According to previous guest Steve Yegge, there are finer-grained 8 levels to agent adoption, but we have collapsed it into three.As Cursor's Michael Truell put it in The third era of AI software development:Cursor is no longer primarily about writing code. It is about helping developers build the factory that creates their software. This factory is made up of fleets of agents that they interact with as teammates: providing initial direction, equipping them with the tools to work independently, and reviewing their work.The agent should not sit solely inside the developer's flow. It should be setup to work in the background so that you can give it a task, a repo, a machine, a shell, a browser, tests, memory, and review loops to go do the work somewhere else.In less than a year, the sentiment has shifted from avoiding multi-agent systems:to suggesting approaches that actually work:From coining “context engineering” to building the infrastructure behind Devin's 7x PR growth and jump from 16% to 80% of commits across Cognition repos, Walden Yan has had a front-row seat to the background-agent shift. In this episode, Cognition co-founder and CPO Walden Yan joins swyx alongside Cole Murray, creator of OpenInspect, to unpack why everyone is building their own Devin, what changed after the December 2025 model inflection, and why “spec to pull request” is now becoming a real production workflow.We go deep on the architecture of background agents: harness-in-the-box vs out-of-the-box, why Devin separates the “brain” from the machine, why repo setup is still one of the hardest problems, why Docker is not always enough, and how full VMs, snapshots, scoped secrets, GitHub bots, Slack integrations, and video-based testing all fit together. Walden and Cole also dig into memory, MCP limitations, multi-agent orchestration, AI code review, SRE auto-triage, PMs shipping code from Slack, Windsurf 2.0, hybrid frontier/sub-frontier systems, and the real failure mode of uncontrolled vibe coding: your codebase regressing to your worst engineer.And as agents eat software… and software eats the world… you can draw the conclusion on what is next:We discuss:* Why the engineering world is waking up to background agents and cloud agents* The December 2025 model inflection that made spec-to-PR workflows practical* Devin's 7x merged PR growth and rise from 16% to 80% of commits* Why Cole built OpenInspect as an open-source background-agent system* The economics of $20/seat agent products and why monetization is tricky* What Cognition actually sells beyond Devin: infra, onboarding, integrations, and adoption* Harness in the box vs out of the box, and why architecture matters* Why Devin separates the brain from the machine for security and permissions* Repo setup, scoped secrets, Docker Compose, and agent-ready dev environments* Why full VMs matter when agents need to run real applications and test them* Android, macOS, Windows, nested virtualization, and machine-specific agent work* Why testing is much harder than “computer use”* Screenshots, video verification, and the “I know it works” merge moment* GitHub UX, Devin Review, AI reviewers, and agents responding to PR comments* Why MCP alone is not enough for first-class Slack and enterprise integrations* Memory, Knowledge, skills, Claude.md, and why retrieval is still unsolved* Devin's auto-generated memories and the challenge of memory pruning* Always-on agents as permanent PMs for issues, tickets, and product areas* Sub-agents, meta-Devin management, and what multi-agent systems actually add* Why pure auto-merge vibe coding breaks down after about two weeks* AI code smells, lint rules, reward hacking, and Semgrep for agent-written code* GitAI, inline context, and preserving the “why” behind code changes* Local testing, mock servers, older codebases, and preparing companies for agents* Windsurf 2.0 and the handoff between local foreground agents and cloud background agents* SRE auto-triage, support workflows, and agents as first responders* PMs, marketing, and non-engineers creating pull requests from Slack* AI agent budgets, $1k-$5k per engineer spend, and hybrid frontier/sub-frontier systems* The rise of autonomous coding factories and who Cognition is hiringWalden Yan* X: https://x.com/walden_yan* LinkedIn: https://www.linkedin.com/in/waldenyan/Cole Murray* X: https://x.com/_colemurray* LinkedIn: https://www.linkedin.com/in/colemurray/* OpenInspect / Background Agents: https://github.com/ColeMurray/background-agentsTimestamps00:00:00 Introduction00:00:43 Why Everyone Is Building Their Own Devin00:01:57 Devin's 2025 Ramp: 7x PR Growth and 80% of Commits00:03:49 OpenInspect and the Rise of Open-Source Background Agents00:07:59 What Cognition Actually Sells Beyond Devin00:09:56 Background Agent Architecture: Harness In vs Out of the Box00:12:08 Separating the Brain from the Machine00:14:07 Repo Setup, Secrets, Docker, and Full VMs00:19:13 Why Testing Is Harder Than Computer Use00:22:40 Video Verification and the “I Know It Works” Merge Moment00:23:19 GitHub UX, Devin Review, and AI Code Review00:25:42 MCP, Slack, and Enterprise Agent Integrations00:28:59 Memory, Knowledge, and Always-On Agents00:36:16 Sub-Agents, Multi-Agent Orchestration, and Meta-Devin00:43:55 Vibe Coding, Auto-Merge, and Codebase Decay00:48:38 Agent Infra, VPCs, Cloud Providers, and Fast VM Restore00:52:25 AI Code Smells, Reward Hacking, and Code Review Systems00:56:10 Making Codebases Agent-Ready00:58:30 Windsurf 2.0 and the Local-to-Cloud Agent Handoff01:01:15 SRE Auto-Triage, PMs Shipping Code, and Agent Use Cases01:04:32 Agent Budgets, Hybrid Models, and Autonomous Coding Factories01:06:51 Hiring at Cognition and OpenInspect Consulting01:07:45 OutroTranscriptIntroduction: Walden Yan, Cole Murray, and Context EngineeringSwyx [00:00:00]: All right, we're in the studio with Walden Yan, co-founder of Cognition, CPO.Walden [00:00:08]: Happy to be here.Swyx [00:00:09]: Which is a cool title. And coiner of context engineering.Walden [00:00:15]: Although I think there are many people who'd used the terms in various ways beforehand, but I did find that people, both internally and externally, enjoyed the upgrade from prompt engineering or model wrapping into maybe a more thoughtful way to build agents.Swyx [00:00:33]: For those who haven't caught up on that, I have on screen the Don't Build Multi-Agents post, which you should go read on and we might refer to, and Cole Murray, who created OpenInspect.Cole [00:00:43]: Great to be here.Swyx [00:00:43]: So let's talk about it. Everyone is building their own Devins. What's going on?The December Shift: From Handholding Models to Autonomous PRsCole [00:00:51]: So I think the engineering world is waking up to this idea of background agents, cloud agents, whatever you'd like to call it. And I think we saw a shift around the December timeframe of 2025, where the models Opus 4.5 and GPT 5.2, they reached a capability where we moved away from handholding the model and being able to actually more or less autonomously drive the model. And what I mean by that is that we could pretty much go from a specification to a completed pull request, assuming the spec was good enough, with very little friction. And that paradigm alone, I think, changed a lot of how we interact with agents, and opened this world where background agents became more practical.Swyx [00:01:41]: I think for Cole, everyone experienced this in December, but I feel like there was just this increasing ramp, right? There was this moment which was, I think, Sonnet 3.7, where, You guys rewrote Devin in one night or something. So describe 2025 or how it felt from your side.Walden [00:02:01]: In retrospect, we always thought it was ramping up, but then even now, over the last three, four months from today, it's been ramping up even faster. So it's almost funny to be talking about how, big of a leap Sonnet 3.7 was, and honestly, a lot of it was stripping out parts of Devin that were no longer needed with that jump in of intelligence. But I also just think that a lot of the recent leaps, especially, you look at, models like Opus and the latest GPT models, they are reaching levels of autonomy where people are actually finding that they actually can just be hands-off. And people who were once debating, “Oh, do I need to be in the weeds with my model in the IDE? Can I just completely move it off into the cloud?” That's a more serious conversation, and we've seen that in all of our growth charts. Internally there's this funny graph where our usage has, of PRs, our merged PRs, has grown 7X since I forget what it was called.Swyx [00:02:57]: I think Dev, maybe tweeted that. Yes.Walden [00:03:01]: it grew like 7X over, the last, I think it was, two months, three months, something like that. And then you see our engineering headcount growth. It's, gone up by, 10% or something.Swyx [00:03:11]: We were, we were afraid To release this. So this is Devin commit percentages on all Devin repos, was 16% in January and now 80% in March.Walden [00:03:25]: It's a big shift right now. And so it makes sense that a lot of people are now thinking about, buying Devin, but also maybe, trying to build their own and there's Lots of I have a lot of fun building Devin, so I can see why other people would want to build their own cloud agents as well. Matt, well, maybe it's good to hear, what initially inspired you to try to build OpenInspect?OpenInspect: Ramp, Cloud Agents, and Open SourceCole [00:03:49]: OpenInspect came about, through primarily my clients observing how they were using tools like Claude, OpenAI's Codex at the time, and seeing some of the friction that they were having with it. Primarily the Claude was being used through Slack, and a big issue they ran into was that the sessions that were launched were specific to whoever called it via Slack. And so if a PM was the one who invoked the session and they would then go to pass context to engineering can't see the session. And that in itself was a deal breaker because the PM, “Hey, engineering, can you jump in?” But there's nothing to jump in on unless they're copy-pasting out or the single response that came back. And so seeing some of these problems, I had built a similar architecture internally, just to experiment with, test out different ideas as this trend of moving off of localhost was starting to become, And as Ramp released their blog post, I had a lot of the pieces for this already in place, and just thought it would be funny to, see what Claude could do just purely from the blog post. And on my X account, there's actually a thread of where I live tweeted, going through thisCole [00:05:14]: comparing GPT and Claude as both of them are going through it.Swyx [00:05:17]: On the announcement thing or something else?Cole [00:05:19]: right after it got released. We can put it in the show notes. Yeah, it was helpful that I had already knew how to verify the system. I knew what I was looking for. I think Ramp did a great job of really illustrating, the technical aspects of how to build something. It was much more than just like, “Hey, we built a great system.” It was, “And here's how you can build it too.” And so, I resonated a lot with that, just with the problems that I was already seeing, and I thought that, looking around, I didn't really see anything in the open source community that, met this type of system. I think there's a lot that run, in localhost like Superset, Conductor, and many others.But nothing that was actually running in the cloud. And so, I built it, and I thought it was interesting to just open source it and allow anyone to then have a foundation that they can mix and match on top of.The Business of Background Agents: Open Source vs. DevinSwyx [00:06:16]: So literally after Devin was launched was, there was OpenDevin Which became All Hands. I don't know if you tried that orWalden [00:06:22]: I was going to say, one of the things that interested me a lot with OpenInspect was, you didn't try to go make it then something you monetize. There are a lot of, I think, these open source projects would then go and really try to, raise VSwyx [00:06:36]: That's why no OpenDevin. Yeah.Walden [00:06:38]: yeah, and how did you think about that? I thought that was very interesting.Cole [00:06:44]: I thought, and just what I had seen across my clients, was that having a background agent system is going to become a critical infrastructure within their company. And so because of that, I think that I wanted to open source it so that they could fork it and put in whatever customization they wanted. To that question though, I get asked all, “Oh, are you going to raise? Are you going to turn this into a service?”Walden [00:07:08]: I'm sure you've gotten offers.Cole [00:07:09]: but primarily I don't want to do that for a few reasons. One, I think that I don't want to compete for, $20 a seat. I think that is just a really difficult business. I think it's very easy to copy the main pieces of it. Again, I built this fairly quickly. And I think because you are not owning, I guess, the entire stack, it's hard to monetize. You have money being made at the sandbox layer with Daytona, E2b, many other players. You have money being made at the model layer. And you sit in this weird in-between gray area where what are you actually selling? You're selling, I guess, the infrastructure. You're selling, the integrations maybe.Swyx [00:07:55]: let's ask the guy. What are you What are you selling?Walden [00:07:59]: Well, yeah, there's multiple layers to this in practice, and actually it's funny you mentioned the infrastructure, ‘cause when we got started building Devin as well, we had to go figure out how to make the infrastructure as well because,Swyx [00:08:10]: You had to build this two years before everyone else,?Swyx [00:08:15]: Including, the model sideWalden [00:08:17]: It was not, it was not very polished at the start, when we just built it off of raw VMs from cloud providers like EC2, the boot up time was so slow, I think, And especially then, turning off the machines, saving them, and then to be able to bring them back up again when the, when you want Devin to wake up again later. It would just be out cold for like 10 minutes because that's just how long these systems took. They were not built for this repeated down and up usage. And so we actually had to go do all of that. And as a result now, one thing we offer when we go and sell Devin to people is, you don't have to worry about all the compute side of things. We'll make it work. We'll make it work in your cloud if you want it to. But aside from the product, and I want to go into the agents and the tuning of the intelligence part later, but I think a big part of what we do at Cognition as well is to just make sure that your company learns and uses and adopts these coding agents. ‘Cause I think for especially the largest enterprises in the world, you find that there is a lot of people who want to move over to using AI for their day-to-day workloads. But because of the way projects are planned, because, not everyone is literate in using AI in these ways, having a team of engineers who can actually go in and onboard you, set up all the integrations you need, the automations you need to really get to that level of, leverage with AI, is super helpful. And so We do that. We show thought partners to the customers that we work with as well.Swyx [00:09:56]: So let's talk about, architectural stuff. I think that's always, that is something that was the topic of conversation between the two of you. Is this, the mental model that you want to start with or something else? I'll just leave the floor open to you guys.Agent Architecture: Harness in the Box vs. Out of the BoxCole [00:10:11]: I think, maybe we can start here as just a general what are the pieces of a background agent system. And then maybe we can go into some of the nuances of, Decisions that you can make.Swyx [00:10:22]: But I guess I also Like, what, maybe what Walden is saying is the agent is like in this open code box, I guess. Right? This is infra, and then there's, that's the agent. And you had this discussion about whether you put the agent in here or in Out externally. Can you tease that out?Cole [00:10:39]: In a background agent systems, you have a decision to make of where the agent is actually going to run. This is typically described as the harness in the box or out of the box. With running the agent in the box, you're making some trade-offs by doing that. The negative trade-off you're making is primarily security. Because the agent is running in that box, unless you otherwise design it, all of your secrets need to go into that box as well. And given the nature of AI, it can be unpredictable, and you could very easily end up accidentally exfilling your secrets, or other unintended behavior. Now, the out of the box is the idea that we are going to have the actual agent running not directly in the sandbox, and we will have, quote-unquote, the brain of the agent running in some type of worker, control plane. That sandbox then is going to serve as the hands where the brain is basically operating and making tool calls into that environment to manipulate it. I guess other trade-off that you're making between the two systems is that, in my opinion, running it out of the box is much more complex because, you have state that has to be managed, whereas if you're running it in the box, all of the state of that agent is actually in the box, and yes, it's you could persist it elsewhere, but it's all localized and you have less concerns to worry about.Walden [00:12:08]: I think a lot of that, what you mentioned, is why we actually from the start built Devin to what we called separate the brain from the machine. The other thing that this allows you to do is reuse any existing infrastructure you have for dev boxes Perhaps. And so you don't have to worry as much about making a new type of dev box that has all the dependencies the brain needs, as you mentioned, the secrets the brain needs as well. One thing that we've seen some customers run into is, you have a GitHub app and you want Devin, your agent, whatever, be able to interact with GitHub through this application, but then you have different users with different actual permissions. If they are all interacting through the same GitHub app and there's no actual, separation between the system that decides, what it does and the actual secrets on the machine, then you run into an issue where, okay, it's hard to do the separation. But in practice, with Devin, it's much easier because we just say whatever you put on the machine, that is, the scope of basically what the user is free to do, what the agent is free to do. So only put the most scoped secrets on that machine, and then the brain is fully not accessible from the machine. So you don't have to worry about messing with the, any of the most secure parts of the brain if the user is free to do whatever they want with the machine.Swyx [00:13:31]: I was going to just bring, I have this, chart from OpenAI, where I don't know if this is, in the box, out of the box. That is something that they do use to describe it. And then also recently Anthropic did, managed agentsSwyx [00:13:44]: Which is, this is their thing. I don't know. It's all, it's all variations of the same pattern, right?Cole [00:13:49]: So this would be out of the box.Swyx [00:13:51]: Which, is preferable for them because it's less work?Cole [00:13:56]: I would say it's more work.Swyx [00:13:58]: It's more work?Cole [00:13:58]: But it, in my opinion, it is the better architecture of the two. It's just, you're taking on a bit of complexity by doing that.Repo Setup, Docker, and VM-Based Development EnvironmentsWalden [00:14:07]: One thing I've not seen a lot of other players do well is how do you manage what's actually on the box? And this can be complex for many reasons. Let's say you have a big repository that's changing and updating a lot with changing dependencies. How do you make sure that the working environment of the agent actually stays up to date, has all the credentials it needs to, let's say, run the app and test it, and all the things you want your autonomousSwyx [00:14:34]: So a repo setup.Walden [00:14:35]: Exactly. So in, internally At Cognition, we call this repo setup.Cole [00:14:39]: The hardest part ofWalden [00:14:40]: It's been a perennial problem since the start of the company, of how do we help people get this set up? Because not everyone just has, working cloud environments working out of the box. And do you find this to be a common problem withSwyx [00:14:53]: How do you solve it?Walden [00:14:53]: Your clients?Cole [00:14:54]: This is a very common problem, and through my consulting, this is a lot of what I help teams do. A lot of teams don't really have great developer environment setups, if any. A lot of the times it's, “Go talk to Bob and get the secrets,” and that obviously doesn't work when the agent needs to actually set this up. And so a lot of that, most teams are using Docker Compose or some type of microservices. And so for theSwyx [00:15:19]: Even in prod?Cole [00:15:20]: Not in prod. With the OpenInspect, you are using this primarily to interact, and make code changes. There is other use cases, but you can hook, whether through CLI, MCPs, other tools, you can then hook that into your production systems primarily for, SRE type use cases. But you are not, necessarily, trying to test your prod internal microservice through the system.Walden [00:15:48]: And you mentioned Docker Compose. I think one direction we saw some of our friends take early on was, using Docker containers as the level of abstraction for their models. There's lots of reasons, I think, why Docker containers are not great. One thing is, Docker container's not really a true security boundary, for one. But the other is, if you are running real applications, a lot of times those applications use Docker, and then you have to think about Docker in Docker, which is, really weird. And so I think part of, the really hard challenge of getting VMs to work, why did we do that? Well, it was because we realized that you actually needed, full VMs to be able to do these types of things. And especially nowadays where there's actually value in running the application and clicking around and sending you screen recordings of these things. The value just, keeps adding on top of that. But it is a decision I see people run into when they try to build their own systems, is, “Oh, do we, in addition to this, do we put the agent in the machine or out of the machine? Do we use Docker? Do we use something else?” What do you recommend people nowadays?Cole [00:16:57]: I think Docker is a good solution for maybe not running the agent, but running your infrastructure, because that is more or less the same setup your engineers are probably already using. If they're not, then I don't know what they're using. But they're probably already using Docker Compose.Swyx [00:17:14]: I've always had a small candle for web containers. I don't know if you guys have tried them before.Swyx [00:17:19]: To me, they were, supposed to be like Docker Light.Cole [00:17:22]: Is it?Swyx [00:17:22]: I don't know.Cole [00:17:22]: No, I haven't tried it. But yeah, I think any environment that you've set up that is a good experience for your developer naturally lends itself to being easy to set up for the agent. And once you figure out that local developer story, you've more or less solved the agent in a sandbox, environment setup. OpenInspect does have hooks as well, where you can, run a setup SH script that will pre-install everything. You can then pre-snapshot that build so it starts instantly, and then there is a second hook to actually then, restore the state of the sandbox when it comes back. And so you can already have all of those microservices running and basically get the same experience that you would on your machine within the sandbox.Testing Agents: Computer Use, Screenshots, and Real App WorkflowsWalden [00:18:08]: Another thing that we've been thinking a lot about is like Different VM service offerings. Have you had customers where they needed like macOS specific VMs or like Windows specificWalden [00:18:20]: VMs?Walden [00:18:22]: There are like many technologies in the world that only work on specific types of machines, right? If you're building a.NET application that has to run on Windows or like, maybe more commonly if you want to build iOS or macOS Does that workSwyx [00:18:32]: Does Commission supportSwyx [00:18:33]: Choices like that?Walden [00:18:35]: The fundamental architecture we do, because we do the separation, it does support, but the actual work in progress is happening right now on these. Another thing that we've actually recently added support now for, it's in beta, is doing Android development. To do that, we needed to support, I think, nested virtualization within our machines because the VM itself is like a, is a virtualized Firecracker instance, and then you had to then run another Android emulator inside. And there's like weird performance issues that like, it, which is why it's like still in beta. We have to think through these problems, but it unlocks a lot for anyone who wants to do Android development.Swyx [00:19:13]: I was trying to find like a reference video for the testing thing. I couldn't find it, but I think you worked on the testing, capability. Why call it testing and not like computer use or I don't know, it's, what's the general Category of problem?Walden [00:19:26]: I think that when people think about the ability of an AI to run your app and test it, I think they actually over-index on the computer use part of it because computer use in my mind is the literal, okay, you want what button you want to click. Can you emit the right coordinates to go click that button? I think testing is actually a really interesting likeWalden [00:19:48]: Problem-solving, challenge for these AIs because if you wanted to do arbitrary testing, imagine you make a change that spans the frontend and the backend, maybe, even some other like even more deeply nested service. To actually test that change, we have to reason through what-- how do you first run these applications to orchestrate with each other with the right version of the code? Then, okay, how do I trigger the feature or how do I make the thing actually happen? And this can get arbitrarily hard, maybe you have to be an admin. Maybe a certain thing has to be feature flagged on. Maybe, you have to like run two sessions and then send us a very specific word into one of them to trigger a specific behavior. And figuring out how do you do that requires a lot of code base context, requires, a lot of orchestration that we've specifically done. And in some cases, we found that you actually, no one frontier model can actually do this full end-to-end task itself.Walden [00:20:42]: We've seen cases where we actually had to orchestrate different frontier models together to solve this problem together. That is where we spend most of our time when we think about this testing problem, not so much the computer use part. Computer use for what it's worth has gotten a lot better with recent models and it's made that part of the job certainly easier.Swyx [00:20:58]: Especially with like even 4.7, that they released yesterday, apparently like way better in terms of the vision stuff, which is going to be encompassing computer use.Walden [00:21:08]: Having evals for all these as well is something that like takes a while to build up. And having the evals be right is tricky as well. Do you ever see like, clients who are building their own agents have to start standing up evals to make sure things don't regress?Swyx [00:21:25]: Not so much evals in the traditional sense, but specific to the testing part that has just gone in. I just added support for screenshots And in theory you can also do video. I need to put in a plugin to do that. But they do show up natively, and it was a very heavily requested feature, especially after Cursor's recording came out. I think that was very enlightening for everyone of like, “Oh, this is a very good feature to actually have.”, I think with Devin you guys have had this for a while.Swyx [00:21:57]: Oh, yeah. See how screenshots work. Yeah, I don't know if there's anything, super and not obvious. It's like once what feature to build, you can just prompt it and it Will mostly work.Walden [00:22:09]: I think to Walden's point, though, the computer use is a subset of the larger testing problem, and I think that's very specific to the code base that you're working and it's not something that, out of the box that you could just solve it. The-- you do need the code base context to actually know how to test it. And I think in the case of a background agent system, you fortunately do have that code base locally that what is changing and could then inspect it and use that to drive the model.Swyx [00:22:40]: For those who haven't seen it before, this is an example of how it works. You, after the PR is done, you click testing approved, and then it sends you back a video. What I really like is that it labels, It's very small here, but it actually labels what it's testing. And then it-- and then you actually see the cursor and everything. So I don't know, yeah, the engineering in this, just Whatever you want to show. ‘cause this is like, this is one of those like, oh, few of the AGI moments, right? ‘cause Once I look at this, I actually don't I wish I can just merge inside Of Slack instead of going to GitHub ‘cause I don't need to see the code. I know it works.Walden [00:23:19]: Maybe a new feature in Cursor. Yeah, the annotations at the bottom was also a big difference for me when I, when I added those.Swyx [00:23:27]: It's just like, what am I looking at? What are you trying to demonstrate?Walden [00:23:30]: Exactly. There's a surprisingly long tail of small details that ends up making a big difference for this end metric of like how fast do you actually merge the code in. One experience that we spent a lot of time tuning early on was what is the right experience on GitHub for these tools. Because I think, most tools out there when you build the agent, you'll think about, oh, it'll create the PR for you. We try to take that a step further and say, “Oh, what if we actually made sure you could interact Devin, with direct Devin directly on GitHub?” And so we made sure that you can comment on GitHub, and Devin would actually receive those comments and address them back. But there's actually quite a bit of tuning you have to do here because you can imagine that actually like-We recently have Devin Review, for example. Devin Review will post comments on his own PR And then Devin has to then goGitHub Workflows: Devin Review, Comments, and PR AutomationSwyx [00:24:23]: He answers his own comments, which is Really loopy. So like, yeah, I like that it just updates here that it's, that I have commented But usually it's just me saying like, “Hey, merged, fix any merge conflicts.”Walden [00:24:37]: The, so when Devin fixes his own comments, you might be scared that, oh, maybe I'll infinite loop. But we've put a lot of work into making sure it doesn't, both by making sure that the comments are high signal, but also that the agent is thoughtful about what comments it immediately goes and tries to fix, and what comments it's like, “Wait a second, I think you're wrong.” Actually, that's one of my favorite moments is when Devin tells me that I'm wrong, when I try to get it to do something different. But tuning that behavior, actually makes a big difference in terms of how useful the actual GitHub experience is.Cole [00:25:06]: I think to touch on that as well, I think having the AI reviewer integrated into the system is a critical part of this background system. OpenInspect does have that. It has a GitHub code reviewer that you can control the prompt. It does do comments as well. It doesn't do them automatically yet. The capability is there, but it's not fully used.Swyx [00:25:27]: So you have to ask for it?Cole [00:25:28]: you do, yeah. You can tag it on GitHub, and then whatever you named your, GitHub bot, it will then follow up on it. It will then, if you have merge conflicts or whatever you have asked it to resolve, it will then resolve it, but it doesn't do it automatically yet.Integrations: Slack, MCP, and First-Party Agent InterfacesWalden [00:25:42]: Well, I'm curious, what is, the most common thing that people end up requesting, that they still need on top of OpenInspect when you help them go implement it?Cole [00:25:52]: I think a lot of it comes down to actually integrating it into the company. It's one thing to have the background agent system set up, but if it isn't actually integrated into your larger ecosystem, it isn't that useful. It is useful to be able to kick off sessions, but what we really want to be able to do is hook it into all of our other systems, whether that is the production database with read-only credentials, the logs, a Confluence or internal knowledge-based system. I think that is where I see the huge leap for companies, and that can be a challenge for companies as well who are maybe not familiar with exactly how to approach it, especially if they're in environments that have more compliance type things where, access control can be pretty big and how do you deliberately think about these problems, I find to be, one of the problems that comes with a system like this.Walden [00:26:46]: The thing we found is So, MCPs, obviously it has been like this, really big explosion of, oh, you can go, integrate it with all these different things. But to actually get the integration right and the and get the right experience, oftentimes we found that we had to go build our own ad hoc things. I think Slack is a great example of this. You could give your agent a Slack MCP and okay, it can post messages back to you on Slack. But we actually use Devin like a coworker in Slack, and that's how it's been built from the ground up. But to do that, you actually need to, support webhooks that come back, right? And then Devin has to respond in a natural way and then hopefully don't spam your threads too much and annoy the people in your company. So you got to tune that experience just right. Especially when there's a lot of back and forths, we find that we actually have to go beyond the simple MCP integrations in these places.Swyx [00:27:39]: I just pulled up the MCP marketplace. I know this is a Fair amount of work. Is the answer to eventually take first party control of all the top MCPs? Is that theWalden [00:27:48]: I would love a world where you could have something that's more expressive than MCP. That, goes both ways, not just a set of tools, but a proper system that interacts back and lets it Have the right experience with all these interfaces.Swyx [00:28:03]: So there actually is sampling in the MCP spec, but nobody Uses it, right?Walden [00:28:07]: And so I think that's the other part is, actually we found that when the MCP spec starts to get too complicated, it starts to lose its original promise of Being like a simple one-step connect. Now then we have to go figure out how to support all these different variations of things and It starts to look a lot like just building the first party integrations in a lot of these cases now.Cole [00:28:29]: I think it matters, too, how critical it is to your company, right? If this is something that nearly every session is going through, it probably makes sense to own it so that you can make optimizations on top of it Versus just whatever is off the shelf.Swyx [00:28:43]: Awesome. Other than MCPs, what else, sorry, well, I don't know if that's Narrowing in too much on, integrations. But what else? What other elements of building OpenInspect or Devin that you guys really sink on?Memory and Knowledge: What Agents Should RememberCole [00:28:59]: I think, a problem that comes up very frequently is this idea of memories or knowledge base.Swyx [00:29:05]: Oh, boy. How do you solve it?Cole [00:29:08]: so not solved yet, is the short answer.Cole [00:29:11]: it's something, there's a open issue for it, someone asking about it.Swyx [00:29:16]: There's, I, D Wiki hasn't indexed anything about memory yet.Cole [00:29:20]: how I'm seeing it solved across my clients is primarily through skills. I find that skills can be a good gap within that or updating Claude MD, but I think memory as a whole is a pretty unsolved problem, and it is why I've been hesitant to add it. I think there is parts of memory and that can be addressed, but I think as a whole it's a very difficult retrieval problem.Swyx [00:29:44]: Oh my God. RAMP didn't write anything about memory? I see zero search results.Walden [00:29:50]: No. Memory can be quite tricky to get right because it's the retrieval, but also the generation of the memories that can be really tricky. You don't want it to just like Remember very specific details.Swyx [00:29:59]: Walk us through the Devin memory journey because I know there's been a journey.Walden [00:30:03]: the first version of memory that like stuck around for a while was A system we have called Knowledge. And the idea was we wanted it to pick up things over time and not need the user to be proactive about teaching Devin things. So, okay, any time you remind Devin, “Wait, no, that's not quite the way you're supposed to use Git”Like, we actually want Devin to say, “Hey, do you want me to actually just remember this for the future?” And for you to just basically quickly approve or reject and for it to build up over time. ‘Cause I find that, 95%, I think, or some crazy stat like that of the memories that Devin has are all through these auto-generated things. Very few people actually just want to sit down and write big docs on Here's how you're supposed to work with the technology, et cetera. The generation and the retrieval has been something that we've been trying to tune a lot over the years. Generation, you don't want it to remember something like, if you asked one time to like, “Oh, please open as a draft PR,” you don't want to be like, “Oh, everyone forever now should get their PRs as draft PRs.” But you do want some, conveyor. Maybe you want to say like, “Oh, Cole generally likes, things to be created as draft PRs.” Same with retrieval, if you have thousands of these memories, how do you actually make sure they're retrieved at the right time? And that can be quite tricky to do right without exploding the context with a bunch of useful yeah, useless information. Surprising amount of just, eval work to just make sure that, memory is, remains a reliable system as new models come and go.Cole [00:31:31]: Do you have anything that you could share on, memory pruning? And like the temporal aspect of memory?Swyx [00:31:36]: Deleting and forgetting?Walden [00:31:39]: The, today, the, So the things they could do is it could edit memories. And so if your memory used to say like, “Oh, Cole likes to open everything as like a draft PR,” then you can imagine, “No, don't do that.” And then it'll say, “Oh, do you want me to update the memory to be Cole now want everything as, open PRs?” I think that at the same time we don't know if this is going to be the final version of the system. Whatever we have here will probably, translate into the new system that we'll be coming up with. But I think one big difference between two years ago and today is these agents are really good at using anything that resembles a file system natively. And so part of us are, is thinking, “Oh, should we rebuild memories to feel more like a file system that we let the agent navigate on its own?” That's been an interesting exploration. Also similar ideas in the scale space.Swyx [00:32:35]: I am pulling up OpenClaude's memory thing right now. So memory, OpenClaude has like this like daily memory journal thing, right? And you can I mean, that is a file system you can grep through and is a source of truth. I don't know if it's the best. It's probably super noisy, but at least, if you lose something you can discover it or you can apply some, forgetting algorithm to, more ancient memories that don't get recalled again or something. I don't know.Walden [00:33:01]: One thing we've been trying to do to push the boundaries of how you use agents at your company is letting an agent basically have a very similar file, a memory.md or something, and just like be your permanent PM for a specific set of issues maybe. So we have like some Slack channels internally, maybe a Slack channel dedicated to, a specific product like DeepWiki maybe. And you can imagine that, or you want a Devin that never stops, it's just always awake, but it has this like memory dock that it can just maintain for itself about, okay, what are like the number one priorities of what we have to fix and prioritize? Who is responsible for some upcoming work? Maybe they'll even Devin will even tag you on some recurring basis. And so it's been an interesting move to see, okay, how can we actually use Devin for more than just engineering? Can we actually upstream above the engineering process and maybe it's just Devin creating tickets, which then maybe some humans do, but then maybe other Devins do.Swyx [00:34:00]: One of my more fun automations is go research competitors and just suggest stuff to me on a weekly basis. That's the automation. I can't find it right now, but basically it just like, “Look at competitors and suggest things.” “And here are three things that you've suggested that I don't want any more of,” and you just stick that in the prompts. But like I wish actually So for like when I, for example, when I reject a PR, I wish that it updated memory so that I can then just not have to go up, go back and update the scheduled, sync, but anyway, feature request.Walden [00:34:31]: what? We might change it soon. I guess OpenInspect, in the time you've been around, has there been anything you tried to implement but then you had to like undo and like do a different way?OpenInspect Architecture: Webhooks, Control Planes, and Agent StateCole [00:34:41]: Nothing yet, but something that is on my mind. The initial way that I built it was that each of the integrations lives as its own package. And so you have The Slack bot, which is what's handling the webhooks, and then is basically interacting with the control plane. As I'm seeing the system starting to be more integrated, specifically with the GitHub bot integration, I'm considering bringing that all into the central control plane because especially now I want to start, And a request that I'm getting is the ability to monitor, the actual, pull requests being merged, as well as just tracking ofSwyx [00:35:19]: What do I have open?Cole [00:35:21]: What do I have open? How many of these are getting merged? How many comments are showing up? To just understand the health of the system. And so in the case of a GitHub app, you only have one webhook. And so then it's a question of do I put that webhook in that GitHub bot package? That's weird. It doesn't really make sense to live there because that package is more for like the code reviewer. Or do I like centralize it? So that's something that's on my mind of, making that decision. I think the other one we touched on earlier is the harness in the box versus out of the box. I think long term the architecture will eventually come back out of the box. Some of the newer tools that I've added are calling back into the control plane so that you don't have the secrets in the sandbox. And so I think long term I probably will pull the actual, agent out of the box, but I think for now it's fine.Subagents and Multi-Agent Systems: When Parallelism Helps or HurtsSwyx [00:36:16]: Just, a quick question on pulling the agent out of the box. I'm One thing I'm very bullish on this year is agents calling other agents or spawning sub-agents or Whatever you want to call it. Does that make it harder or easier? I can't tell. Because if the harness is in the box, you can just spin up more boxes. If the harness is outside the box, then you're, it's less easy because you are, you have a unicorn pet of a, of a harness that's, living outside the box.Cole [00:36:45]: In theory it would be the same way, right? Whether, one agent has launched many, sub-sessions within it, OpenInspect, for example, can launch sub-sessions and actually create other environments and then monitor them. In the case where it is out of the box, that would basically just be an additional session that's running. And so that session is also running outside of the box. It's running in your worker plane, wherever you're running this. And then you really just have to think about how does your top level agent then interact with it. I do think it can be more complex, just ‘cause again, you have now a more difficult architecture. But I think if you figured it out once, it's probably fine.Swyx [00:37:26]: Well, then I'm just, throwing it open to you in terms of, I call this like meta Devin management. Which is like the, Devin's calling Devins or Devin scheduling Devins or querying trajectories or anything like that. What have you built or unshipped, anything?Cole [00:37:46]: I think one of the surprising things we've seen is that a lot of the ways that, these, separate agents work with each other, and you want them to, parallelize their work, has still mostly followed the same manager sub-agents regime. And a lot of people I think are excited about this world where you have swarms of agents that, talk with each other all over the place. We've actually given Devin an MCP so they can just go arbitrarily message other Devins And create new Devins, et cetera. But I guess, it somehow creates, a really chaotic world in that sense. And so we've still found that most practical use on a day-to-day basis has been one single Devin.Cole [00:38:33]: Figuring out how to segregate the work and get, have other Devins work on it in, a relatively isolated sense, each with their own boxes Not sharing machines, so there's, a very little room for conflict is the regime that you have to create today.Swyx [00:38:50]: I'll call out, the experiments from Cursor, right? This is Wilson Lin's work on Single agent to multi-agent, and you're obviously famously on the side of don't build multi-agent. But they went through the whole thing, only to arrive at, this Which is exactly what Devin has, I think.Cole [00:39:08]: I think there will be a revision to that post at some point AboutSwyx [00:39:12]: Tell us about itCole [00:39:12]: I think multi-agents were very much not at all possible a year ago. You do see more multi-agent experiments today, but you can argue, are they really multi-agents, or are they just just, tool calls,? There are people who, will create sub-agents to go look for XYZ file, XYZ implementation. Has really nice context management benefits because all of the tool calls and tokens that it spends then get collapsed back to just the answer for the main agent. There's a lot of benefits to doing this. We basically have Devin do this with Deep Bookie, make a call out to Deep Bookie, give you back the results, but that feels like a tool call,? It's not like these, two collaborators actually talking back with each, back and forth with each other. But I think the thing that gives me the most bullishness that multi-agents might actually be possible is actually what I said earlier about Devin will actually sometimes tell me I'm wrong and push back, and I think that demonstrates a level of maturity and communication today that makes a multi-agent world possible. One, can two agents who have seen different information come back to each other and actually figure out who is right, what is the correct implementation? They're not just, yes men. Claude, I guess is like, used to just say, what is it? “You're right,” or,Swyx [00:40:25]: “You're absolutely right.”Cole [00:40:26]: “You're absolutely right.” Yeah.Swyx [00:40:28]: The Have you seen, did you seeCole [00:40:29]: The age is overSwyx [00:40:30]: The Codex app troll in Topic? This is the Codex app. Inside of Settings, there's a little, there's a little Easter egg, right? So if you go to, the Themes or Appearance, right? There's all these, color codes, and the top is absolutely, and it's the Topic's colors. Which is such a troll. Anyway.Model Behavior: Pushback, Adversarial Prompts, and Agent SkepticismCole [00:40:53]: I love that Easter egg. Did you discover that yourself?Swyx [00:40:54]: No, it was, someone was, tweeting about it And I was like, I was like, “Is this true?” Because, sometimes people just tweet stuff to, get a rise out of you. But yeah, there you go, in Topic colors.Cole [00:41:06]: Yeah. So yeah, we're out of this regime where, it just says you're absolutely right, and they can have real conversations and real back and forths.Swyx [00:41:13]: You can prompt it as well to be more adversarial or whatever. Yeah. Okay. Yeah, that, I mean, to me, that is more intelligence, right? That is not just something that's, a dumb tool, it's actually pushing back on you I think. Yeah.Cole [00:41:24]: when you mentioned, of course, the blog posts. There was one blog they had where they fed a swarm of agents together and built a browser.Swyx [00:41:34]: That was I think that was the one.Cole [00:41:36]: You can have, likeSwyx [00:41:37]: I think it's the same oneCole [00:41:37]: Creation of it. We found a surprising success of, don't do a swarm or anything, just have one Devin, it does its own context management. Just let it keep running for a while and give it some crazy tasks. I think we asked it to, rebuild, a Windows OS system. And it managed to do it just like, going on for long enough. It'sSwyx [00:41:55]: Was this Andrew's thing?Cole [00:41:58]: there were lots of demos that we ended up not posting, ‘cause at some point we'd just be posting way too much a bunch of, Demos. But I love that because it shows that I think the multi-agent thing still has, a bit of exciting sexiness to it, which is maybe still beyond still, the actual delta it adds to the capabilities of these systems. But it's absolutely the future. I think we're heading in that direction and we can see the progress being made there already.Swyx [00:42:25]: If I were to, make one super minor pushback because I don't feel that confident about it yetCole [00:42:33]: Go for itSwyx [00:42:33]: But I've had Ryan Lopopolo from OpenAI on the pod And he's a super slop cannon, right? Oh my God, that's my coding agent being done. I downloaded this, Peon Ping. I don't know if you guys have heard this. It takes like-, sound packs from popular games like, Command and Conquer and Warcraft, and then it plays it whenever it's done. And so it's like, “Work,” or whatever, “At your command,” or something. Anyway, what I got from the Cursor code base and from Ryan's thing was that there's a slop cannon approach where you try to loosen the single agent's, bottleneck, and I feel like that is, probably an, a very important thing to try to figure out. I don't think anyone's, really solved it. Because then you just have more reviewer slop on top of the agent slop To try to wrangle it all. Ryan will probably very strongly object that I say that he hasn't solved it, but he thinks he's He thinks he's completely solved it. But I think it's still I think it's, very important, ‘cause, that is a bottleneck, right? I feel Devin is slow sometimes Because I'm like, well, yeah, this is very readable and very sensible, but also it is slower than it could be if I just, I want a button to just say, “Just ramp this up 1,000 next parallel, in parallel and just, see what happens,”? And I don't know if that's, feasible at some point in the future.Code Review, Entropy, and AI SlopWalden [00:43:55]: I And we've also run experiments internally where we've basically tried to build entire products, true products that we knew we would eventually ship, but for now, let's try to see if we can do it just by purely, vibe coding on top of each other, auto merge, no code review at all. And then there's this benchmark of how many weeks can you go onto this for Before you say, “We have the trashiest code base.”Walden [00:44:18]: “Let's actually rewrite it from scratch.”Swyx [00:44:19]: Start a new factory, yeah. What'd you find?Walden [00:44:21]: I think we found that the state-of-the-art in December was you can probably, run this for about two weeks. By the end of those two weeks, you'd find that, hey, you want to, change the color of a button. Well, it turns out this button is implemented in, 10 different places, and they, have All these different variations, and oh, you forgot one of them, and actually it's a slightly different color in one spot. And you're like, “Okay, this is too much to work with. Let's actually try to do code review at the same time.” And make sure that we're on top of our software, actually cleaning it up a bit And making sure it's done in a scalable way.Cole [00:44:54]: I think building on that, the idea of, you don't have to look at code, I think is generally a bad idea. And the meme that I have for thatWalden [00:45:03]: What timeline, all right, is Do you think that statement will be true on?Cole [00:45:06]: I think probably for a while it'll be true that you should continue to look at your code. A problem that I see a lot of teams run into that I work with who are embracing AI native, AI first coding, is The meme that I have is that your code base regresses to your worst engineer, because that engineer who is, very gung-ho about AI and is not auditing their code, their pattern starts cementing into the code, and now the AI is referencing their patterns. And so now their if/else block that, is 20 if/elses back and forth, the AI is seeing that as the pattern of how things are done and starts to then exponentially grow this slop. And I find to your point, a pretty good approach to that is having scheduled cleanup, whether by humans or through systems, that are looking for duplication. They then address that. You'll end up with like 12 helpers for how to format a date. And you need to address that, because otherwise it will continue to sprawl.Swyx [00:46:09]: Within balance, I think it's fine to have some duplication, and then sometimes To have garbage collection, right? Yeah. The What I've been, talking about with a lot of engineering leaders is that you want to be very strict about the boundaries between modules, and it's your job as an architect, as a CTO, whatever, to say like, “Okay, here's the hard contract between you guys and you guys. Whatever you do inside this black box is your business. You do whatever. But between these guys, let's be, really damn clear, and any movement must be signed off by a human or me,” or. Then, and like that's that. I don't know if you have any other modifications or advice.Walden [00:46:44]: Well, I guess generally on the topic of, where humans can be useful, I found that ‘cause, some of these, really deep infra problems, sometimes just having a human that just has, really deep expertise can make a big difference. I've actually seen this come into play when actually building agents. So we've had a few friends now, try building their own coding agents, and I think one same problem that I recurringly heard a lot of them run into was this problem of like, “Oh, Grep is really slow on our agents' machines.” And so a lot of them, I assume because they're using AI and they themselves don't have, super deep infra background knowledge, say, “Okay, we're going to go build our own custom Grep index. It's going to be really fast,” and use that as a way around this problem. When we ran into this problem About like, maybe like a year and a half ago when we were, in the early days of building Devin, we obviously didn't have AI then. We just asked our, how to, how to do this. You can just swap out a new Grep index, so.Infrastructure Details: Grep, File Systems, and SandboxesSwyx [00:47:45]: What do you mean you hand-coded Devin? What?Walden [00:47:48]: It's like, can you believe we hand-wrote this code? And we had, our infra people who are really amazing, they were looking into it and they're like, “Oh, what? We realized that actually the root cause of this problem is actually super simple, but like fine-grain detail,” which is that a lot of these virtual machines actually underlying them don't use real file systems. They use these, network file systems where things are actually cached over the network actually in S3. So when you're Grepping, you're actually making network calls Every time you're doing these things, and that's why Grep is extremely slow on these machines. And so again, goes back to, what is all of the crazy infra work that we had to do to actually get these machines working. If you try to do this yourself, there are tons of small details like this, and so we had to eventually go swap out that network file system. ButSwyx [00:48:35]: I think there's a write-up about it, right? Silas did one about the virtual file system.Walden [00:48:38]: Oh, that was a whole other thing. TheSwyx [00:48:39]: Oh, that's a different thingWalden [00:48:40]: The BlockDev file storage formatSwyx [00:48:42]: I'll bring it upWalden [00:48:42]: Which is, a file system format that we built so that the VMs could be spun up and down very quickly. Basically, the intuition behind this is-Imagine you have, a terabyte of disk, and your agent only, wrote, a hundred lines of code on top of that disk. How long does it, say, take to, save and re-bring up that disk? And most systems, because you're not optimizing for this case, it's just, on the order of a terabyte of work because you have to Save all of that and bring it back up. In our system, we try to build a file system that incrementally builds on top of each other. So every time you save and bring the machine back up, you're only doing work that is proportional to effectively the diff in the file system. And so this, shaves off a lot of time in the boot-up process of Devin. I think we This is actually now outdated. We have a newer system inside of Devin. But yeah, there's a lot of tiny details you have to get right here to actually get the day-to-day experience of Devin to be good.Swyx [00:49:39]: It's, not technically agents, but it is agent infra, and when you sell an agent as a company, you sell agent plus agent infra.Walden [00:49:46]: At least the way we do it be And the other The nice thing about having the agent infra being done together is, you We get to deploy Devin in whatever environment we want now. We don't need to wait for some underlying infra provider to also go and support VPC or on-prem or FedGovCloud, for instance. So we can actually go and figure out, okay, since we own the infrastructure, how can we get that set up for you?Cloud Providers: Modal, Daytona, and Enterprise SandboxesSwyx [00:50:12]: Whereas you're Cloudflare dependent.Cole [00:50:15]: so Cloudflare runs the control plane. The sandboxes, Modal is supported. A contributor just added Daytona. E2B is on the roadmap, and I think there's an abstraction in place that if any contributor wants to add a new provider, they can add that in.Walden [00:50:32]: Well, what are, How are the customers you work with Do they generally try to then go set up a contract with another one of these third-party providers? Do they try to do the VMs in-house?Cole [00:50:44]: most of them I see using Modal. I think Modal has a greatWalden [00:50:48]: Shout out Modal.Swyx [00:50:48]: Shout out Modal.Cole [00:50:50]: I think Modal has a great offering. It captures all of the sandbox pieces you need, snapshots being a pretty big piece of that, and given that they also offer GPUs, I think it's a pretty nice offering as a whole.Swyx [00:51:04]: no debate there.Walden [00:51:07]: Modal is great, especially, I think their container offering is, the most natural, and so especially if you are willing to, forego, the full VM requirements Modal is, a really vast place you can spin something up on.Swyx [00:51:20]: Is there a point So Modal's very Python, and I feel like most workload, has really shifted to JavaScript. I don't know if you guys Get the same feeling. So, okay, when I started Landspace and IE and all these things, I was like 50/50 Python and JS, right? That's roughly. I think that's wrong now. I think JS has won. I don't know if you guys Like, I Maybe I'm overstating it, and maybe for cognition, there's, C# and Java and what have you. But for, new greenfield apps, do you feel that Do you get that sense? Does it matter?Cole [00:51:52]: I think that most of the libraries that I see in this space are Python native first, especially in theCole [00:51:58]: Observability space. That said, I think that there is a pretty big appeal of having your entire system in one language. Especially when you have both your frontend and backend communicating, you can have one central type Which is very nice.Swyx [00:52:11]: That's my case against Modal, which is Then you have to run JS. You can run JS inside Modal. It's just, one extra step That, isn't native to the runtime. I don't know ifWalden [00:52:22]: I don't knowSwyx [00:52:23]: Reviews. Do you have numbers? I don't know.Walden [00:52:25]: the one thing I don't like about Python is whenever AI, whenever it writes Python, it always does, the weirdest patterns, andSwyx [00:52:32]: Oh, because it's, mixing two and three or what?Walden [00:52:34]: I think it's something mixing two and three, yeah. The I don't know if you see this. It always tries to do, has attribute on objects as likeCole [00:52:41]: Oh, my God.Walden [00:52:41]: But it's like But that you shouldn't be doing that. It should error if there wasSwyx [00:52:45]: Because it's training on library code?Cole [00:52:47]: I think it's more of, likeCole [00:52:48]: From what I've seen, it's more of, a reward hacking mechanism where it doesn't want to basicallyWalden [00:52:54]: It'll never error.Cole [00:52:54]: It doesn't want the code to fail. And so it Even when it knows it has the attribute, it'll call getattr on a, and for a lot of my clients who have moved towards more autonomous coding, we've put that in as a lint rule That if you do getattr, your pull request is going to fail.Slop Signatures: Comments, Backwards Compatibility, and TypesSwyx [00:53:12]: Ooh, this is a fun topic. Can you tell me more about this? What else is a sign of AI coding that you have to put guards in?Walden [00:53:21]: So we were talking just before this about Opus 4.7. One of the things this new model likes to do is it writes lots of comments. Not like, it'll, comment every line, but it'll write, paragraph, PRDs, on top of every function. But I will say, to its credit, these aren't slop, descriptions like they were before. “Oh, here's what this function does.” It's like, “Oh, here's actually the r
Right now, the questions we have about our careers feel existential. We keep coming back to the same theme: how do you prepare for an industry that's changing this fast, and what mindset actually works in this new reality? One skill keeps surfacing as the answer — your ability to update your own mental models. In today's episode, I want to push on that further and put some of software engineering's most beloved thinking models under scrutiny. Some of these models served you well for years. Some of them now deserve to be challenged, replaced, or thrown out entirely — and learning how to tell the difference is itself the skill that will determine whether you hit a ceiling. Move Past "So What" Questions: The typical engineering objection to agentic coding is that it produces quality issues. But the people deciding to adopt these tools already accept that. Our job is to stop arguing the surface-level point and start asking the real one: so what do we actually do about this new economic reality? The Economics of Acceptable Loss: Abstraction always leaves something to be desired. An agent's code may not match what a staff engineer produces by hand over months — but that gap is usually an acceptable trade against shipping something two, three, or four times faster. Understand the cost-benefit picture instead of pretending the cost doesn't exist. Abstraction Has Always Done This: This isn't new. The calculator dissolved the specialization once required for complex math. Spreadsheets commoditized ledgering and accounting. Agentic coding is the same pattern arriving for our work — making something that required deep specialization suddenly far more accessible. Roles Are Blurring: As these generic tools raise everyone's ability to abstract, the boundaries soften. You're already seeing product managers open pull requests and engineers making product decisions. The neat lines around "what an engineer is" are not as fixed as they used to feel. Why Your Hard-Won Wisdom Is the Target: If you've spent years in this industry, your models were bought with blood, sweat, and failed projects. That experience is real wisdom — and it's exactly what I'm asking you to be willing to challenge, because the thing that always worked for you is the thing most likely to become a ceiling. This Skill Survives Either Way: Even if you think AI is mostly hype and I've been infected by it — fine. The ability to challenge your pre-existing models is a critical skill regardless. It's how you keep growing as you get more senior instead of repeating what used to work. Models Are Approximations: The whole point of a model is to approximate the reality around us. That's their value and their limitation. When the underlying reality shifts this dramatically, holding tightly to an old approximation stops being wisdom and starts being a liability.
In this episode, we're joined by Eric Ries, creator of The Lean Startup, to discuss insights from his latest book, Incorruptible: Why Good Companies Go Bad… and How Great Companies Stay Great. Eric shares what inspired him to write the book and why we need to move beyond and redefine what true profit looks like. He shares the history behind businesses transitioning from serving public interests to shareholder primacy and why leaving behind a people-first business approach can actually reduce profitability. Additionally, Eric discusses financial gravity, the “harder is easier” principle, and how these practices connect to AI & current engineering leadership challenges. ABOUT ERIC RIES Over the last two decades, Eric Ries's ideas about continuous innovation, long-term thinking, governance, and market reform have reshaped company building and management practices. He is the creator of the Lean Startup method, and the author of the New York Times bestseller The Lean Startup; The Leader's Guide; and The Startup Way. As a founder, he has put his own ideas into practice with The Long-Term Stock Exchange (LTSE); Answer.AI, an AI R&D lab; Virgil, a legal services startup; and IMVU. On The Eric Ries Show, he talks with world-class technologists, thought leaders, and executives building for the long-term. He lives in the San Francisco Bay Area with his wife and three children. Unblocked: The context engine your coding agents are missing. Give your coding agents the context your best engineers have. Your agents can read code, but they don't know how your team works. Rules and MCPs give access to information but not understanding. That's why you still have to tell them where to look and what to look for. Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops. getunblocked.com/elc SHOW NOTES: The inspiration behind Eric's new book Incorruptible (5:22) What it means to redefine profit (8:03) Understanding profit considerations like externality, ethics, and inputs (10:44) Why human life / value can never be an input factor of production (12:31) The history behind business practices benefitting the public (15:00) When businesses transitioned to shareholder primacy over public interest (17:16) Navigating the tension between mission vs. fiduciary responsibility (21:01) The role of financial gravity & shareholder primacy in the Silicon Valley bank story (25:04) Using Eric's book to build a mission-driven roadmap (29:12) How committing to a principled way of business can drive profitability (31:15) An example of the principle “harder is easier” (33:40) How this connects to AI & emerging eng leadership challenges (36:53) LINKS AND RESOURCES Incorruptible: Why Good Companies Go Bad and How Great Companies Stay Great - Drawing on two decades of work with founders, CEOs, and investors, best-selling author Eric Ries reveals the forces that make companies vulnerable to destruction from within and without. Then he offers solutions that safeguard against them for the long-term. Incorruptible is the blueprint for companies that will prosper and endure without losing their soul. Its lessons and tools are designed to help founders, executives, investors, and citizens of all kinds build organizations – and a society – truly aligned with human flourishing. https://news.theleanstartup.com/ - Eric's newsletter with ideas about how and why to build companies focused on human flourishing — and stories of the people who are doing it. The Eric Ries Show - Founder, entrepreneur, and best-selling author of The Lean Startup Eric Ries discusses how to build profitable companies for the long-term benefit of society. Ries talks with world-class technologists, thought leaders, executives, and others working to create a new ecosystem of trustworthy organizations with limitless potential for growth and a deep commitment to purpose. Together, they uncover the tools and methods to ensure the next generation of companies are designed to maximize human flourishing for generations. This episode wouldn't have been possible without the help of our incredible production team: Patrick Gallagher - Producer & Co-Host Jerry Li - Co-Host Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/ Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/ Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/ Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Immigration court data reveals dramatically different treatment of enforcement cases in Maryland and Virginia. Montgomery County Council passed its 2027 operating budget and MCPS announced its proposed cuts at a Board of Education meeting. The school district wanted to end open lunch, which is the practice of allowing students to leave the school campus during lunch time, with procedural argument. And more. Music by Silver Spring rock musician MYSTR Treefrog.
En el episodio anterior te estuve hablando de tres pilares fundamentales que cambian por completo las reglas del juego cuando queremos ir un paso más allá de los modelos de lenguaje convencionales: el RAG (la memoria), las habilidades y las herramientas. Hoy no nos vamos a quedar en las nubes de la teoría. Hoy nos arremangamos y vamos directos al turrón con un ejemplo totalmente práctico, porque al final lo que queremos es ver cómo se hace, cómo se lleva a cabo en nuestro propio servidor y cómo podemos empezar a sacarle partido a estas tecnologías desde ya.¿Por qué Rust es el rey del cacharreo con MCPs?Si buscas tutoriales en la red, verás que la inmensa mayoría de servidores MCP se desarrollan en Python. No me malinterpretes, Python es fantástico para escribir código rápido, pero en el mundo de los microservicios autohospedados y los contenedores tiene ciertos inconvenientes difíciles de ignorar. Python tarda más en arrancar y consume una cantidad considerable de memoria RAM por el simple hecho de existir.Por este motivo decidí programar todos mis MCPs utilizando Rust. Rust nos compila un binario nativo, limpio y directo. No hay intérpretes pesados de por medio. La latencia de respuesta es prácticamente cero, el consumo de memoria es insignificante y se ejecuta a una velocidad de vértigo. Además, gracias a editores modernos equipados con IA como OpenCode, una vez que logras pulir y estructurar tu primer MCP en Rust (por ejemplo, el del tiempo), crear el siguiente es sencillísimo. Solo tienes que proporcionarle a tu herramienta de código la estructura de tu primer desarrollo y pedirle que adapte esa misma lógica para conectar cualquier otra API o base de datos que necesites. ¡Es una delicia ver cómo escala el sistema!Bajo el capó: APIs públicas, Docker y QuadletsPara hacer realidad este MCP meteorológico, he combinado el poder de dos APIs públicas muy conocidas:Nominatim (OpenStreetMap): Como las APIs del tiempo necesitan coordenadas geográficas (latitud y longitud), Nominatim se encarga de traducir textos legibles como "Valencia" o "Tokio" en datos numéricos de localización.Open-Meteo: Recibe las coordenadas enviadas por el MCP y devuelve la previsión meteorológica actual, horaria o diaria sin necesidad de usar claves de API complejas ni registros restrictivos.Todo este flujo de datos se empaqueta de forma elegante en un contenedor de Docker y se gestiona mediante un Quadlet de Podman para garantizar que se inicie de forma nativa e integrada con el sistema operativo de nuestro servidor.Y más adelante nos sumergiremos en el fascinante universo del RAG local.Capítulos del episodio:00:00:00 Introducción y repaso del episodio anterior00:00:43 El problema de los modelos estáticos de IA00:01:29 El ejemplo práctico: Preguntando el tiempo00:03:20 Ahorro extremo de tokens con MCP00:04:49 Taller de IA agéntica y automatización con Slimbook00:06:22 Cacharreando con DeepSeek V4 Flash en OpenCode00:07:33 ¿Qué es y cómo funciona un MCP?00:09:13 Por qué desarrollo mis MCPs en Rust (y no en Python)00:11:13 Limpieza de datos y gestión de errores00:12:40 Cómo conectar un MCP a Open Web UI paso a paso00:14:18 Probando la previsión meteorológica en vivo00:15:37 El motor bajo el capó: Open-Meteo, Nominatim y Docker00:17:25 Codegraph: Analizando código para ahorrar tokens00:18:22 Próximo episodio: Guardar tareas persistentes con MCP To Do00:19:48 Otros MCPs listos para el taller de IA00:21:22 El futuro del podcast: RAG local, notas y más cacharreo00:22:50 Despedida, enlaces de interés y cierreMás información, enlaces y notas en https://atareao.es/podcast/799
In this episode, Geddes Munson (SVP of Engineering @ Affirm) joins us to discuss operational / engineering excellence, scaling, and AI-native transformation! We explore Affirm's approach to operational and engineering excellence and how a 2024 outage became a turning point in refining that focus. We deconstruct “AI retooling week”, the internal tools it inspired (including an incident tracing system), how the AI-native transition is impacting operational / engineering excellence, and how to connect these projects to business goals. Plus, we take a look at their early work building in agentic commerce, infrastructure decisions they made years ago setting them up for success now, how they're thinking about designing for agent-first experiences. ABOUT GEDDES MUNSON Geddes Munson serves as Affirm's SVP, Engineering. Previously, Geddes held several engineering leadership roles at Affirm, including oversight of the merchant engineering group, where he was responsible for the development of Affirm's solutions for key partners including Amazon, Shopify and Walmart. Prior to Affirm, Geddes held various technical leadership roles at rapidly growing startups including Mixpanel, SingleStore and EasyPost. He received his B.A. from Haverford College, where he started the Linux club on campus. Geddes lives in New Jersey with his wife and three children. Unblocked: The context engine your coding agents are missing. Give your coding agents the context your best engineers have. Your agents can read code, but they don't know how your team works. Rules and MCPs give access to information but not understanding. That's why you still have to tell them where to look and what to look for. Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops. getunblocked.com/elc SHOW NOTES: Defining operational excellence & what it looks like @ Affirm (4:36) Understand why your company / product matters to your customers (8:11) Key pivot points around engineering excellence @ Affirm (11:10) Creating a genuine culture change of operational / engineering excellence (14:27) Adopting agentic models @ Affirm (16:30) Navigating the balance between transformation, safety & reliability (18:30) Affirm's AI retooling week & hackathon setup (20:57) How the hackathon helped quickly change the company culture (23:15) Ensuring your practices serve your overall organizational vision & goals (26:11) Insights on scaling & increasing CICD investment @ Affirm (28:28) Approaches to building agentic commerce products (30:11) Strategies for building an agent-first experience (33:33) Bridging the gap between engineering & business goals / outcomes (35:44) Rapid fire questions (38:46) LINKS AND RESOURCES 1929: Inside the Greatest Crash in History – and How It Shattered a Nation - New York Times bestselling author Andrew Ross Sorkin takes readers inside the chaos of the crash, behind the scenes of a raging battle between Wall Street and Washington and the larger-than-life characters whose ambition and naivete in an endless boom led to disaster. The dizzying highs and brutal lows of this era eerily mirror today's world—where markets soar, political tensions mount, and the fight over financial influence plays out once again. Delivering Happiness: A Path to Profits, Passion, and Purpose - a best-selling 2010 memoir by former Zappos CEO Tony Hsieh detailing his entrepreneurial journey and outlines his core philosophy: building a phenomenal corporate culture and focusing on the happiness of employees and customers ultimately drives long-term profits and business success. This episode wouldn't have been possible without the help of our incredible production team: Patrick Gallagher - Producer & Co-Host Jerry Li - Co-Host Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/ Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/ Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/ Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
¿Quieres usar agentes de IA para programar sin arruinarte? En este episodio de atareao con Linux comparo las dos opciones más interesantes para desarrolladores en 2026: OpenCode Go y OpenRouter.Durante las últimas semanas he estado completamente volcado con OpenCode, usándolo tanto para generar código como para revisar código existente. Y en el proceso me he encontrado con una pregunta clave: ¿cómo accedo a los modelos de IA sin arruinarme?La respuesta no es trivial. Tienes dos opciones clásicas: comprar hardware dedicado o pagar servicios en la nube como ChatGPT o Gemini. Pero hay una tercera vía: combinar herramientas open source con servicios de bajo coste.En este episodio te cuento:Qué es OpenCode Go y por qué $10/mes pueden ser suficienteQué es OpenRouter y cómo usar 400+ modelos (algunos gratis)Comparativa directa de precios, modelos, ventajas y desventajasCuál elegir según tu caso de usoCaso práctico: cómo mejoré mi herramienta Shul con skills de Rust y ReactPor qué las skills son el verdadero game-changer de los agentes de IATambién te adelanto lo que viene en los próximos episodios: flujo de trabajo completo con skills, RAG, MCPs... la cosa se pone muy caliente.Capítulos:00:00 — Introducción: el dilema de la IA y el dinero02:30 — ¿Qué es OpenCode?04:50 — OpenCode Go: la suscripción de $10/mes08:20 — OpenRouter: el agregador de 400+ modelos10:50 — Comparativa directa13:00 — Caso práctico: mejorando Shul con Skills16:00 — El poder de las Skills19:00 — Conclusiones y cuál elegir22:00 — Próximos episodiosMás información y enlaces en las notas del episodio
In episode 68 of Fast Hours, Drew and Rory return from a two-week hiatus to prove that yes, the AI news cycle did continue without their permission. Rude.They dig into Freepik changing its name to Magnific, why enterprise AI image tools are starting to feel more like creative operating systems, and how brands may be better off using approved model aggregators instead of building weird internal Franken-tools that immediately become outdated.Then things get nerdier. Obviously.Rory breaks down how he's using Codex, GPT-Image-2, Claude Code, MCPs, Higgsfield, Seedance, and visual style reference sheets to create repeatable image systems, character references, and bulk creative workflows without living inside a giant text prompt forever. Drew pushes into where Midjourney V8.1 still dominates, especially photorealistic faces, color, texture, and images that do not look like corporate stock photography that lost the will to live.They also talk about Midjourney's upcoming 8.2, 8.3, V9 roadmap, edit model ambiguity, personalization drift, Luma Uni comparisons, Pinterest's internal AI image model, Salesforce going headless, and why AI video audio still sounds like it was recorded inside a cursed podcast booth.And because no episode is complete without accidentally getting philosophical, they close with the viral Claude Monet AI social experiment, the weird bias people bring to AI-generated images, and why “how it was made” keeps hijacking whether people can actually see what's in front of them.Basically, it's an episode about the future of AI creative tools, with two guys trying to sound calm while the ground turns into soup beneath them.--⏱️ Fast Hour00:00 Cold open01:12 AI news fatigue is real01:44 Claude Code runs the day now03:11 Remote work and coffee shop crimes08:21 3 Ninjas nostalgia break10:09 Freepik becomes Magnific11:47 Why Magnific works for enterprise13:00 Model aggregators vs internal tools18:01 Pinterest builds its own AI image model22:11 Salesforce goes "headless"24:08 Higgsfield, MCPs, and Meta ads29:51 Codex for GPT-Image-2 workflows32:45 Pulling style from video frames34:05 Building visual style reference sheets37:36 Codex and textured illustration systems39:34 The evolution beyond text prompts41:50 Seedance storyboards and visual prompts43:01 Reference images as reusable seeds44:51 AI video still has an audio problem46:13 Audio reference hacks in Dreamina50:52 Omni-reference for video control52:37 Midjourney V8.1 updated take53:28 The blue and pink problem returns55:37 Midjourney still owns realistic faces58:56 Reworking old prompts with Describe01:01:10 Luma Uni vs Midjourney color01:02:11 Midjourney 8.2, 8.3, and V901:03:46 Midjourney edit model questions01:07:13 Midjourney plus Seedance films01:09:03 Midjourney's strange lane01:12:28 The Monet AI social experiment01:15:13 Why people over-detect AI01:20:09 AI backlash and disclosure debates01:22:50 AI as a career unlock01:24:30 Keep making weird stuff01:25:45 Wrap-up and seamstress CTA
Network automation has been "coming soon" for over a decade. So what's actually different this time? John Capobianco, Head of AI & Developer Relations at Itential, built NetClaw — a CCIE-level AI agent that manages network infrastructure through Slack and WhatsApp. It hit 300 GitHub stars in two weeks. It can analyze packet captures, configure routers, run compliance tests, and generate documentation — all through natural language. John spent 15 years as a network engineer before becoming one of the leading voices in network automation. He's published multiple books, created dozens of open-source projects, and just launched the VibeOps community where 600+ network engineers share AI code without judgment. Key takeaways: • Why natural language is the breakthrough that makes network automation finally work (hint: nobody has to learn Python anymore) • The 5 use cases beyond config management that deliver value on day one — all read-only, all low-risk • How to go from human-in-the-loop to fully agentic network operations without triggering panic • Why "shadow AI" is the new shadow IT — and what leadership needs to do about it • The contrarian case that writing configs by hand is now a solved problem Guest: John Capobianco — Head of AI & Developer Relations, Itential LinkedIn: linkedin.com/in/john-capobianco-644a1515 X/Twitter: @John_Capobianco NetClaw: github.com/automateyournetwork/netclaw VibeOps Forum: Reach John on LinkedIn or X for invite Chapters 0:00 Why AI Is Different for Network Automation 2:32 Natural Language: The Interface That Changes Everything 3:51 "The Network Should Be Like a Telephone" — Why Engineers Resist Change 6:08 The No-Win Life of a Network Engineer 8:08 OpenClaw: More GitHub Stars Than Linux 10:15 What NetClaw Actually Does (90 Skills, 43 MCPs) 11:37 The RFC Documentation Problem AI Can Solve 13:03 Day One Agent Rules: Start Read-Only 13:58 When Was the Last Time We Hired a Junior? 15:54 How NetClaw Hit 300 Stars in Two Weeks 19:54 Deterministic vs Non-Deterministic: Getting Engineers Over the Hump 23:36 War Stories: Fat Fingers, MTU Issues, and the DNS Nightmare 28:32 Documentation: The AI Use Case Nobody Can Argue With 32:34 Beyond Config Management: 5 AI Use Cases That Matter Now 36:00 The IDS/IPS Analogy: Why AI Agents Succeed Where Signatures Failed 40:02 AI Hallucination Is Overstated — Misalignment Is the Real Problem 41:53 Model Convergence: Why the Stuff Around the Model Matters More 46:00 Shadow AI Is the New Shadow IT 47:59 What Happens When AI Understands Your Business Context 53:59 The Optimistic Case for AI and Humanity 56:05 VibeOps: Building a Safe Space for AI-Curious Engineers 1:00:36 Is Vibe Coding Just Coding Now? 1:01:54 "Don't Write the Configs Anymore" 1:02:43 Closing & Where to Find John -- This episode of IT Visionaries is brought to you by Meter - the company building better networks. Businesses today are frustrated with outdated providers, rigid pricing, and fragmented tools. Meter changes that with a single integrated solution that covers everything wired, wireless, and even cellular networking. They design the hardware, write the firmware, build the software, and manage it all so your team doesn't have to.That means you get fast, secure, and scalable connectivity without the complexity of juggling multiple providers. Thanks to meter for sponsoring. Go to meter.com/itv to book a demo.---IT Visionaries is made by the team at Mission.org. Learn more about our media studio and network of podcasts at mission.org. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
In this episode, Nathan Wrigley and Matt Schwartz continue their discussion about how WordPress agencies can harness AI. They cover practical applications like connecting AI to agency documentation, implementing guardrails with MCPs, using AI for internal tools and QA, and the evolving impact on the WordPress plugin ecosystem. They also discuss risks such as security concerns, over-dependence on vendors, and the importance of human oversight. The conversation concludes with predictions about agency workflows, cautionary advice, and encouragement to responsibly experiment with AI in agency environments.
Nick agreed to personally set up your Orgo in a 15 min call: https://startup-ideas-pod.link/orgo_ai I sit down with Nick from Orgo to break down exactly how to run a one-person AI agent business that can realistically clear a few million dollars a year. Nick walks through the offer, the verticals worth chasing, the full software stack, and the live setup of an agent that manages other agents. We focus on tactics over theory, with specific tools, pricing, and the playbook for landing customers as a solopreneur. By the end, anyone with solid AI fluency will have a clear path from offer design to fulfillment. Timestamps 00:00 – Intro 02:54 – Designing the AI Agent Business Offer 06:38– Selling an AI Employee, Not an Agent 07:26 – Industries to Target (and Two to Avoid) 14:54 – Content Is Overpowered and How to Get Customers 17:51 – The Customer-Facing Tool Stack 20:49 – Building Agents Stack 25:51 – Model Picks: GPT 5.5, GLM 5.1, Kimmy, Opus 4.7 27:08 – Nick's Stack 28:14 – Why Obsidian Is the Second Brain Layer 30:22 – Live Walkthrough: Spinning Up a Cloud Computer in Orgo 33:53 – Cloud Computers vs. Mac Minis 38:37 – Building Agents and Structuring Workspaces for Customers 43:56 – Watchdogs, Observability, and Reliability 45:28 – Closing Thoughts on the Solopreneur Era Key Points Sell unlimited agents, unlimited usage, and unlimited support to remove friction; most customers actually use one to three agents. Avoid healthcare and finance to start; focus on legacy verticals like marketing, law, insurance, manufacturing, wholesale, and real estate. OpenClaw agents go for around 5K a month; Hermes agents can go for 10K a month. The full stack: Granola, Trello, Loom, Superhuman, Asana, Codex, Hermes, Orgo, Composio, Agent Mail, and Obsidian. GPT 5.5 is the recommended default model for tool calling; GLM 5.1 and Kimmy work for lighter tasks; Opus 4.7 fits long-horizon coding. Use agents to set up other agents — pair Cloud Code or Codex with MCPs like Perplexity, Context7, and X MCP for live docs. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ FIND NICK ON SOCIAL Youtube: https://www.youtube.com/@nickvasiles Instagram: https://www.instagram.com/nickvasilescu/ Personal Website: https://www.nickvasilescu.com/
Andrew McNamara, Director of Applied Machine Learning @ Shopify, joins the ELC podcast to share insights on building agentic platforms at scale, like Sidekick, that must keep reliability for its users at the forefront. Andrew describes the building philosophy behind Shopify and what it means to cultivate a culture of prototype-first while prioritizing hiring early-stage talent. We cover Sidekick's development journey and how user feedback impacted its product vision, why evaluation is so important for determining ground truth sets, and the benefit of user-driven use cases. Andrew also dissects how they went about making product design decisions, such as building proactive agents and identifying subagent specializations. ABOUT ANDREW MCNAMARA Andrew McNamara is Director of Applied Machine Learning at Shopify, where he leads the team behind Shopify Sidekick, an AI co-founder that gives merchants access to the e-commerce expertise they need to run and grow their business. With 16 years of experience building AI assistants, he brings a rare combination of applied research depth and production-scale thinking to some of the hardest problems in AI: getting systems to work reliably for people who depend on them. Andrew's work pushes Shopify to measure AI quality by whether it achieves what the user set out to do, a core standard in building AI that merchants trust. Outside Shopify, he runs Setting North, a small Canadian maple syrup brand built on the same platform he helps make for everyone else. Unblocked: The context engine your coding agents are missing. Give your coding agents the context your best engineers have. Your agents can read code, but they don't know how your team works. Rules and MCPs give access to information but not understanding. That's why you still have to tell them where to look and what to look for. Unblocked gives your agents the history, conventions, and decisions behind your code so they generate mergeable output without the back and forth. It automatically surfaces the right context for every task, so agents stay on track without the set up tax or the correction loops. getunblocked.com/elc SHOW NOTES: How Shopify utilizes reflexive AI & Andrew's building philosophy (2:38) Developing a prototype-first company culture (5:07) Andrew's reflections on building AI-enabled projects like Sidekick at scale (7:25) Translating customer surveys into Sidekick's product vision (9:34) Key inflection points while scaling out Sidekick (11:23) Strategies for evaluation / building a ground truth set (13:26) Analyzing the good & bad within ground truth sets (15:27) Shopify's system openness model to drive user-discovered use cases (17:47) How subagents fit into the Sidekick's model (19:55) Prioritization conversations around subagent specializations (23:06) Designing an agent with high-impact prompt optimization (27:22) Considerations for building highly reliable systems (29:40) Andrew's perspective on latency (31:24) Rapid fire questions (33:49) LINKS AND RESOURCES Cradle - a New York Times best-selling series from Will Wight following a character's growth as he goes from one of the weakest users of his world's magic to among the strongest. The series features an original magic system inspired by Chinese cultivation and martial arts novels, with a heavy emphasis on anime-style super-powered battles. This episode wouldn't have been possible without the help of our incredible production team: Patrick Gallagher - Producer & Co-Host Jerry Li - Co-Host Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/ Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/ Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/ Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.
Andrew Brown (ExamPro) joins the vBrownBag crew to talk Gen AI skills, bootcamps, and whether vibe coding has made "learning to code" irrelevant.
The AI Breakdown: Daily Artificial Intelligence News and Discussions
In this sponsored bonus episode, NLW is joined by Atlassian co-founder and CEO Mike Cannon-Brookes for a conversation about how to build AI native teams. They discuss what separates enterprise AI leaders from laggards, why context is becoming a critical layer of AI adoption, how agents and MCPs are changing the way people work with software, and why 2026 may be the year AI moves beyond chat into more natural product experiences. This episode is presented in partnership with Atlassian, and includes a companion quiz to help you find out what kind of AI team you are.Sponsored by Atlassian https://www.atlassian.com/Find our what kind of team you are: The AI Native Team Quiz - https://play.aidailybrief.ai/episodes/ai-team-archetypes/
What if the dashboards you rely on today are already obsolete? In this forward-looking conversation, AirDNA CEO Rohit Bezewada joins Jamie Lane to unpack how AI is fundamentally reshaping the short-term rental industry—from how software is built to how hosts operate day-to-day.This episode goes beyond the hype. Rohit breaks down what “AI-native” actually means (and why most tools aren't there yet), how agents and MCPs are changing the way hosts interact with data, and why the future of STR tech may not live inside traditional platforms at all. The conversation also dives into AirDNA's latest moves—from launching an AI-powered pricing tool to rebuilding its entire data architecture to support a new generation of decision-making.For hosts, property managers, and investors, the takeaway is clear: AI isn't just another feature—it's redefining how you analyze deals, set pricing, and run operations. Those who learn how to leverage it effectively will have a meaningful edge in an increasingly competitive market.You don't want to miss this episode.Key Takeaways:Dashboards are giving way to AI interfaces: Instead of static reports, the future is dynamic—hosts can query their data directly and get tailored insights instantly through AI agents.“AI-native” tools require more than a chatbot: True AI-native platforms are built across multiple layers—data ownership, normalization, memory/context, model flexibility, and user interface. Most tools today only scratch the surface.Agents are becoming your operational co-pilot: From pricing adjustments to performance tracking, AI agents can handle repetitive, analytical tasks—freeing up hosts to focus on guest experience and hospitality.The STR tech stack is consolidating: Expect fewer point solutions and more all-in-one platforms that combine market data, pricing, listing optimization, and performance tracking into a single ecosystem.Human touch still wins in hospitality: While AI can automate operations and analytics, guest experience remains a key differentiator—personalization and service still drive reviews and repeat bookings.Sign up for AirDNA for FREE
Why I'm Not "Picking a Fight" on AI: A listener asked if I'm intentionally stoking a flame war by treating agentic coding as a foregone conclusion. The honest answer is that I've used it, the data points one direction, and a show built around pretending otherwise would slowly drift away from reality — and away from being useful to you. Respecting the Misgivings, Without Getting Stuck in Them: Ethical concerns, skill atrophy worries, and questions about long-term effects are all legitimate. But the goal of this show is practical applicability, so we focus on mental models you can use Monday morning rather than litigating every angle of the debate. The "Minecraft" Principle: If I ask you to "build Minecraft," I've handed you several chapters of specification in a single word. That's meaning-rich abstraction — language that points at a huge amount of shared context with very little token cost. Meaning-Rich AND Specific: "Human history" is meaning-rich but uselessly broad. "Block-building game" is specific but loses fidelity. The sweet spot is vocabulary that is both compact and unambiguous — sitting in the top right of the meaning-density / specificity graph. A Real Example — Strategy Pattern: When working on authorization rules, I didn't want a pipeline. Instead of describing base classes, shared interfaces, and parallel execution to the LLM, I used the words "strategy pattern." Three words did the work of three paragraphs, and the output landed where I wanted it. Vocabulary as Leverage: Named patterns, named algorithms (Monte Carlo, etc.), named architectural concepts — these act like compressed pointers. The more of them you genuinely understand, the higher the leverage of every prompt you write and every conversation you have with another engineer. How to Build This Vocabulary: Have conversations with senior engineers. Ask an LLM what patterns are at play in a codebase, which ones you're using incorrectly, and which ones you're tricked into thinking you're using. Learn the abstraction layer that sits one step above your day-to-day implementation work. The Asterisk — Shared Context Required: This only works when both sides know the term. Public, well-documented concepts (patterns, papers, algorithms) translate immediately to LLMs. Private or organization-specific concepts need to be loaded into context — via CLAUDE.md, AGENTS.md, or skills — before that compression kicks in. Episode Homework: Pick one area of your current codebase. Ask an LLM to name the patterns in play, the patterns you're using incorrectly, and the ones you might be missing. Use that conversation to add at least one new piece of meaning-rich vocabulary to your working set.
I sit down with Riley Brown to get a hands-on tour of OpenAI's Codex, which he argues is the most powerful single interface for using AI agents today. Riley walks me through how Codex unifies vibe coding, knowledge work, browser use, computer use, and automations into one app, all running on GPT 5.5. I come in as a complete Codex skeptic who has spent most of my time in Claude Code, and Riley shows me skills, plugins, projects, Remotion, Chronicle, and the in-app browser to make his case. By the end, the question becomes whether the era of separate tools for documents, decks, code, and research is collapsing into a single super app. 00:00 – Intro 03:23 – What is Codex 06:46 – Why a GUI beats the terminal for most users 10:13 – Codex: the all in one platform 12:48 – Atlas browser inside Codex 14:21 – Remotion explained and motion graphics workflows 19:28 – Computer use and Chronicle 22:26 – Plugins, skills, MCPs, and integrations 31:57 – Evals, examples, and good outpu 38:43 – Hard questions: who Codex is built for 40:44 – Browser use plays itself in chess 43:20 – Running Claude Code inside Codex 45:58 – GPT 5.5 cost and effort settings 48:50 – GPT Images 2.0 54:09 – Why most people feel overwhelmed by AI tools 57:09 – Three projects to start with on day one and Closing thoughts Key Points Codex is positioned as a super app where coding, documents, decks, research, and automations live in one interface, with GPT 5.5 as the underlying model. The trend across Codex, Cursor, and the Claude Code desktop app is the same GUI pattern: chats on the left, agent in the middle, output on the right. Plugins offer official integrations like Slack, Notion, Sheets, Remotion, and Canva, while skills are user-created instructions stored as a SKILL.md file. Computer use and browser use have crossed a speed threshold; the chess demo runs at near-human pace, a leap from earlier "dial-up" feeling agents. Running Claude Code inside the Codex terminal lets you stack both subscriptions and use each model where it shines. The biggest unlock for companies is collecting good examples of finished work so agents can match the bar. The #1 tool to find startup ideas/trends - https://www.ideabrowser.com LCA helps Fortune 500s and fast-growing startups build their future - from Warner Music to Fortnite to Dropbox. We turn 'what if' into reality with AI, apps, and next-gen products https://latecheckout.agency/ The Vibe Marketer - Resources for people into vibe marketing/marketing with AI: https://www.thevibemarketer.com/ FIND ME ON SOCIAL X/Twitter: https://twitter.com/gregisenberg Instagram: https://instagram.com/gregisenberg/ LinkedIn: https://www.linkedin.com/in/gisenberg/ FIND RILEY ON SOCIAL X/Twitter: https://x.com/rileybrown Vibe Code App: https://www.vibecodeapp.com Youtube: https://www.youtube.com/@rileybrownai/videos