Podcasts about YC

  • 966PODCASTS
  • 2,826EPISODES
  • 36mAVG DURATION
  • 1DAILY NEW EPISODE
  • Jun 23, 2026LATEST

POPULARITY

20192020202120222023202420252026

Categories



Best podcasts about YC

Show all podcasts related to yc

Latest podcast episodes about YC

The Engineering Leadership Podcast
The Product Paradigm Shift: How Livekit Navigated High Stakes Scaling Challenges to Build the Future of Voice-First AI Interfaces w/ Russ d'Sa #262

The Engineering Leadership Podcast

Play Episode Listen Later Jun 23, 2026 46:13


Russ d'Sa (CEO & Co-founder @ LiveKit) joins the show to deconstruct the "Product Paradigm Shift" toward voice-driven interfaces and agent-centric UX . We dive into LiveKit's high-stakes scaling lessons: from powering OpenAI and Character AI's voice mode, how they navigated real time bottlenecks to hit the next level of scale, the architectural necessity of a multi-cloud strategy, and the foundations of a co-founder relationships that can effectively blend engineering & business strategy.   ABOUT RUSS D'SA Russ is a startup vet who founded his first company in the 2007 YC batch and was the 2nd frontend engineer hired at Twitter, Russ d'Sa now leads voice AI unicorn LiveKit. They're the backbone of ChatGPT Voice Mode, Salesforce Agentforce, Grok, and roughly 30% of US 911 calls.   ABOUT LIVEKIT LiveKit is an open source framework and cloud platform for building voice, video, and physical AI agents. It provides the tools you need to build agents that interact with users in realtime over audio, video, and data streams. Agents run on the LiveKit server, which supplies the low-latency infrastructure (including transport, routing, synchronization, and session management) built on a production-grade WebRTC stack. This architecture enables reliable and performant agent workloads.   SHOW NOTES: The product paradigm shift toward voice-driven apps and natural human-computer interfaces (2:44) Voice-apps in practice: How these trends impact the strategy of product building today (5:32) Early adopters: Why legacy industries like healthcare use voice AI (7:55) Reevaluating and building product experiences optimized for AI agents (12:52) How AI trends will impact roadmaps and Go To Market (18:16) The origin of LiveKit: Building real-time infra for the pandemic (21:07) The OpenAI moment: Powering the fastest-growing consumer app (23:48) Scaling with OpenAI: Navigating the challenges of balancing time-to-market with system design (25:39) The Character AI outage: Solving cross-continental state sync and hitting the next level of scale (29:00) The problem: When telemetry breaks first: Managing analytics and logging for millions of concurrent AI sessions (32:04) Architecting for resilience: Multi-cloud from day one and why treating infra as a utility matters (33:22) Co-founder dynamics: Blending engineering strategy with business outcomes (37:15) Rapid Fire Questions (40:51)   This episode wouldn't have been possible without the help of our incredible production team: Patrick Gallagher - Producer & Co-Host Jerry Li - Co-Host Noah Olberding - Associate Producer, Audio & Video Editor https://www.linkedin.com/in/noah-olberding/ Dan Overheim - Audio Engineer, Dan's also an avid 3D printer - https://www.bnd3d.com/ Ellie Coggins Angus - Copywriter, Check out her other work at https://elliecoggins.com/about/ Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Equestrian B2B
Episode 104: Pitching and Securing Investors for Tech with Lindsay Lenard

Equestrian B2B

Play Episode Listen Later Jun 17, 2026 62:58


We speak with Lindsay Lenard of HorseSpot, who will describe her experience in a pitch competition, securing investors for her company, and the details of building a tech company from the ground up.Guest Name: Lindsay LenardWebsite: https://horsespot.net/ Facebook: https://www.facebook.com/horsespotshows Instagram: https://www.instagram.com/horsespotshows/# LinkedIn: https://www.linkedin.com/company/horse-spot-shows/Lindsay Lenard is the Co-founder and Product Design Lead of Horse Spot, Blue Ribbon Software supporting horse shows, rodeos, and fairs from local to internationally rated events. She is a 2× Webby Award–winning designer that has led creative work for global advertising agencies and YC- and Tech-stars-backed startups. A lifelong equestrian, Lindsay is building technology that serves the community that shaped her.

Hacker News Recap
June 15th, 2026 | Iroh 1.0

Hacker News Recap

Play Episode Listen Later Jun 16, 2026 15:12


This is a recap of the top 10 posts on Hacker News on June 15, 2026. This podcast was generated by wondercraft.ai (00:30): Iroh 1.0Original post: https://news.ycombinator.com/item?id=48542480&utm_source=wondercraft_ai(01:56): A backdoor in a LinkedIn job offerOriginal post: https://news.ycombinator.com/item?id=48546294&utm_source=wondercraft_ai(03:23): Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?Original post: https://news.ycombinator.com/item?id=48542100&utm_source=wondercraft_ai(04:50): Curl will not accept vulnerability reports during July 2026Original post: https://news.ycombinator.com/item?id=48537165&utm_source=wondercraft_ai(06:16): What happened to nerds?Original post: https://news.ycombinator.com/item?id=48538229&utm_source=wondercraft_ai(07:43): TinyWind: A pixel pirate sailing game with real wind physics (380k+ kms sailed)Original post: https://news.ycombinator.com/item?id=48543475&utm_source=wondercraft_ai(09:10): CrankGPTOriginal post: https://news.ycombinator.com/item?id=48540854&utm_source=wondercraft_ai(10:37): Apple Foundation ModelsOriginal post: https://news.ycombinator.com/item?id=48536776&utm_source=wondercraft_ai(12:03): Hetzner Price AdjustmentOriginal post: https://news.ycombinator.com/item?id=48540844&utm_source=wondercraft_ai(13:30): Even more batteries included with EmacsOriginal post: https://news.ycombinator.com/item?id=48535886&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 14th, 2026 | How to earn a billion dollars

Hacker News Recap

Play Episode Listen Later Jun 15, 2026 15:14


This is a recap of the top 10 posts on Hacker News on June 14, 2026. This podcast was generated by wondercraft.ai (00:30): How to earn a billion dollarsOriginal post: https://news.ycombinator.com/item?id=48526360&utm_source=wondercraft_ai(01:56): Show HN: Kage – Shadow any website to a single binary for offline viewingOriginal post: https://news.ycombinator.com/item?id=48529990&utm_source=wondercraft_ai(03:23): Not everyone is using AI for everythingOriginal post: https://news.ycombinator.com/item?id=48527700&utm_source=wondercraft_ai(04:50): Honda Civics and the Evil ValetOriginal post: https://news.ycombinator.com/item?id=48523080&utm_source=wondercraft_ai(06:17): Your ePub Is fineOriginal post: https://news.ycombinator.com/item?id=48533848&utm_source=wondercraft_ai(07:44): Free SQL→ER diagram tool, runs in the browser, nothing uploadedOriginal post: https://news.ycombinator.com/item?id=48523992&utm_source=wondercraft_ai(09:11): I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML modelsOriginal post: https://news.ycombinator.com/item?id=48528029&utm_source=wondercraft_ai(10:38): Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing modelOriginal post: https://news.ycombinator.com/item?id=48528371&utm_source=wondercraft_ai(12:05): Linux 7.1Original post: https://news.ycombinator.com/item?id=48528729&utm_source=wondercraft_ai(13:32): Don't trust large context windowsOriginal post: https://news.ycombinator.com/item?id=48524620&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 13th, 2026 | Statement on US government directive to suspend access to Fable 5 and Mythos 5

Hacker News Recap

Play Episode Listen Later Jun 14, 2026 15:12


This is a recap of the top 10 posts on Hacker News on June 13, 2026. This podcast was generated by wondercraft.ai (00:30): Statement on US government directive to suspend access to Fable 5 and Mythos 5Original post: https://news.ycombinator.com/item?id=48511072&utm_source=wondercraft_ai(01:56): Open source AI must winOriginal post: https://news.ycombinator.com/item?id=48511908&utm_source=wondercraft_ai(03:23): Noise infusion banned from statistical products published by Census BureauOriginal post: https://news.ycombinator.com/item?id=48517377&utm_source=wondercraft_ai(04:50): Every Frame PerfectOriginal post: https://news.ycombinator.com/item?id=48516251&utm_source=wondercraft_ai(06:16): Amazon CEO's talks with U.S. officials triggered crackdown on Anthropic modelsOriginal post: https://news.ycombinator.com/item?id=48519092&utm_source=wondercraft_ai(07:43): Israeli firm BlackCore suspected of meddling in New York and Scotland votesOriginal post: https://news.ycombinator.com/item?id=48514560&utm_source=wondercraft_ai(09:10): Leaving MozillaOriginal post: https://news.ycombinator.com/item?id=48513806&utm_source=wondercraft_ai(10:37): There is a shadow hanging over this Fable thingOriginal post: https://news.ycombinator.com/item?id=48513536&utm_source=wondercraft_ai(12:03): GLM 5.2 Is OutOriginal post: https://news.ycombinator.com/item?id=48518684&utm_source=wondercraft_ai(13:30): Treating pancreatic tumours may have revealed cancer's master switchOriginal post: https://news.ycombinator.com/item?id=48517199&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 12th, 2026 | Statement on US government directive to suspend access to Fable 5 and Mythos 5

Hacker News Recap

Play Episode Listen Later Jun 13, 2026 15:30


This is a recap of the top 10 posts on Hacker News on June 12, 2026. This podcast was generated by wondercraft.ai (00:30): Statement on US government directive to suspend access to Fable 5 and Mythos 5Original post: https://news.ycombinator.com/item?id=48511072&utm_source=wondercraft_ai(01:58): AI agent bankrupted their operator while trying to scan DN42Original post: https://news.ycombinator.com/item?id=48500012&utm_source=wondercraft_ai(03:26): CRISPR tech selectively shreds cancer cells, including "undruggable" cancersOriginal post: https://news.ycombinator.com/item?id=48505231&utm_source=wondercraft_ai(04:55): Claude Fable is relentlessly proactiveOriginal post: https://news.ycombinator.com/item?id=48498573&utm_source=wondercraft_ai(06:23): Nobody ever gets credit for fixing problems that never happened (2001) [pdf]Original post: https://news.ycombinator.com/item?id=48498385&utm_source=wondercraft_ai(07:52): Open source AI must winOriginal post: https://news.ycombinator.com/item?id=48511908&utm_source=wondercraft_ai(09:20): Kimi K2.7-Code: open-source coding model with better token efficiencyOriginal post: https://news.ycombinator.com/item?id=48502347&utm_source=wondercraft_ai(10:49): "Don't You Just Upload It to ChatGPT?"Original post: https://news.ycombinator.com/item?id=48507278&utm_source=wondercraft_ai(12:17): Electric motors with no rare earthsOriginal post: https://news.ycombinator.com/item?id=48510010&utm_source=wondercraft_ai(13:46): How to setup a local coding agent on macOSOriginal post: https://news.ycombinator.com/item?id=48507020&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

More or Less with the Morins and the Lessins
Apple's New Siri, SpaceX's $300B IPO & the AI Company Killing Its Own Brand

More or Less with the Morins and the Lessins

Play Episode Listen Later Jun 12, 2026 44:11


Dave joins the squad from YC just ahead of WWDC, where the group breaks down Apple's upgraded Siri and the surprising revelation that much of Apple's AI capability is powered by Google's Gemini models. It wouldn't be a technology podcast without the squad expanding into the broader AI landscape, with Dave arguing that outside of OpenAI and Anthropic, most AI companies are becoming infrastructure providers rather than destination products. They continue to unpack the IPO frenzy around SpaceX and OpenAI, including SpaceX's reportedly massive investor demand and OpenAI's murky timeline to go public. Jess then shifts the conversation to the turmoil at 60 Minutes, where leadership shakeups, internal revolts, and a declining brand presence raise questions about the future of legacy media. Finally, in Pop Culture Corner, the pod closes with Disney's $200 million bet on Toy Story 5 and Taylor Swift's outsized promotional impact.Chapters:01:53 Apple WWDC: New Siri Breakdown05:18 Google Gemini Powers Apple AI07:06 WWDC Ignored Developers Entirely10:21 Tim Cook's Leadership Baton Pass14:30 SpaceX IPO and OpenAI Filing18:00 Anthropic vs. OpenAI Narrative War21:48 Dave Morin's Two-Phone Redemption26:52 60 Minutes Leadership Implosion38:44 Toy Story 5 and Taylor SwiftWe're also on ↓X: https://twitter.com/moreorlesspodInstagram: https://instagram.com/moreorlessYouTube: https://youtu.be/Yvox4U_8u1wConnect with us here:1) Sam Lessin: https://x.com/lessin2) Dave Morin: https://x.com/davemorin3) Jessica Lessin: https://x.com/Jessicalessin4) Brit Morin: https://x.com/brit

Hacker News Recap
June 11th, 2026 | Show HN: Homebrew 6.0.0

Hacker News Recap

Play Episode Listen Later Jun 12, 2026 15:15


This is a recap of the top 10 posts on Hacker News on June 11, 2026. This podcast was generated by wondercraft.ai (00:30): Show HN: Homebrew 6.0.0Original post: https://news.ycombinator.com/item?id=48490024&utm_source=wondercraft_ai(01:57): Pokémon Go Scans Trained the Navigation Tech for Military DronesOriginal post: https://news.ycombinator.com/item?id=48487029&utm_source=wondercraft_ai(03:24): AI agent runs amok in Fedora and elsewhereOriginal post: https://news.ycombinator.com/item?id=48484584&utm_source=wondercraft_ai(04:51): MiMo Code is now released and open-sourceOriginal post: https://news.ycombinator.com/item?id=48490826&utm_source=wondercraft_ai(06:18): If you are asking for human attention, demonstrate human effortOriginal post: https://news.ycombinator.com/item?id=48497609&utm_source=wondercraft_ai(07:45): Solar generates more energy in US than coal for first timeOriginal post: https://news.ycombinator.com/item?id=48492306&utm_source=wondercraft_ai(09:12): Petition to Withdraw Canada's Bill C-22Original post: https://news.ycombinator.com/item?id=48491830&utm_source=wondercraft_ai(10:39): Lines of code got a better publicistOriginal post: https://news.ycombinator.com/item?id=48489402&utm_source=wondercraft_ai(12:06): Anthropic apologizes for invisible Claude Fable guardrailsOriginal post: https://news.ycombinator.com/item?id=48489229&utm_source=wondercraft_ai(13:33): Show HN: FablePool – pool money behind a prompt, and Fable builds it in publicOriginal post: https://news.ycombinator.com/item?id=48496539&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

S.R.E.path Podcast
What the Agentic AI is happening to SRE?

S.R.E.path Podcast

Play Episode Listen Later Jun 12, 2026 23:45


What if agentic AI makes SRE more important, not less? Bennett Gould explains why autonomous AI systems may create more demand for reliability thinking — not less.Everyone seems to think AI is coming for SRE in a hard way.You might have heard the same story:“AI will write the code.”“Agents will handle incidents.”“Copilots will generate the runbooks.”“Automation will reduce operational load.”Yes, the job question is real. If AI can write code, summarize incidents, query observability tools, generate runbooks, and operate across systems, then engineers are right to ask what happens to the work.But here's the part that gets missed: AI does not just automate reliability work. It creates more objects and surface areas that need to be made reliable.Agentic AI is moving from demos into real workflows. These systems are no longer just answering questions. They are querying tools, pulling context, generating changes, and in some cases taking action around production environments.That makes this a Monday morning problem.Teams are already using LLMs for incidents, documentation, observability, infrastructure, and operational decision-making. Somewhere, a team is one demo away from giving an agent access to tools originally designed for humans.That is exactly why I wanted to have this conversation.Bennett Gould is currently a solution engineer at Neubird.ai. His career in SRE and SRE-adjacent work spans large enterprises, cloud, industrial technology, and startups, including AWS, IBM, Siemens, and a YC startup.I wanted to ask him a simple question: What in the agentic AI is happening to SRE?Here are 3 highlights from our talk:1. Agentic AI increases the reliability surface areaThe obvious fear is that AI reduces the need for reliability engineers. Bennett's view was more nuanced. He was clear that engineers still need to adapt. If people do not reskill, stay current, and learn how these systems are forming, there may absolutely be pressure in the job market. But he also argued that AI could create more demand for reliability skills because production complexity is increasing.More code is going into production.More AI-generated code is going into production.More systems that people do not fully understand are going into production.And now autonomous agents are starting to enter production workflows too.That means more surface area. More automation. More operational uncertainty. More ways for things to go wrong.Bennett compared this to Terraform: Infrastructure as code created enormous efficiency gains. But it also created new ways to make very big mistakes very quickly.Before Terraform, most people could not delete all their production resources with a single command. After Terraform, that became technically possible if the system was designed badly enough.Agentic AI follows a similar pattern. With great automation comes great responsibility.Agents can help engineers move faster, query tools, summarize context, and reduce toil. But they can also amplify weak engineering practices, poor boundaries, bad assumptions, and unclear operational ownership. That is not the end of reliability work. That is reliability work entering a new phase.2. Agents can reduce toil, but context is the ceilingOne of the strongest parts of the conversation was Bennett's explanation of where agents can help in incident response. A lot of SRE work involves moving across tools.You may need to query Prometheus, Dynatrace, logs, traces, cloud consoles, ticketing systems, documentation, runbooks, dashboards, and architecture diagrams.The problem is not always that the engineer lacks judgment.Sometimes the problem is that the information is scattered across too many tools, each with its own query language and interface. Bennett gave a simple example: an engineer might be very good at PromQL and very fast when Prometheus is the source of truth. But if the same engineer has to work in a different observability platform with a different query language, their response time can suffer. That is an obvious place where agents can help.The engineer may not need to know every query language perfectly. They need to know what they are looking for and how to reason about the system. The agent can help translate that intent into the right tool calls, queries, and summaries.That could reduce MTTR. It could reduce toil. It could help engineers move faster during incidents.But Bennett also made the limitation clear: You are only as good as the context you have. This is where he introduced two useful concepts:* Context mining* Context distillationContext mining means proactively finding the information that might be useful in a given operational situation.Context distillation means taking large amounts of information — runbooks, Confluence pages, diagrams, documentation, prior incidents — and reducing it into the minimum useful context an LLM or agent can use.That sounds powerful. But there is a catch. Sometimes the context simply is not there.Many of the largest and most complex organizations still run legacy systems where knowledge lives in people's heads, stale documentation, tribal memory, and unwritten assumptions.There may not be a clean process for turning that into usable context. That matters because agents do not magically understand your system. They work with the context they are given. If the context is missing, outdated, or wrong, the agent's usefulness maxes out early.3. Agentic systems are not just LLM demosA basic LLM workflow is relatively easy to demo:You give it a prompt.You connect a few tools.You add some APIs.You get a useful answer.That is impressive, but it is not the same thing as running an agentic system in a meaningful production environment.Bennett made a useful analogy here: running your own infrastructure versus using a hyperscaler.Cloud providers removed a lot of undifferentiated heavy lifting. Most companies do not want to spend half their time racking servers, managing data centers, and dealing with low-level infrastructure when they are trying to serve customers.Agentic systems create similar questions:* What parts of the work should be handled by the system?* What parts still need engineering discipline?* And what has to exist around the model before it is safe and useful?That surrounding structure is where the real work begins. Bennett called this harness engineering. Once you move beyond an LLM demo, you have to think about memory, learning, tool usage, identity, federation, security, evaluations, and guardrails.That is a very different problem from “the model gave a good answer on my laptop.” SREs know why that distinction matters. “It works on my machine” is not an acceptable reliability strategy.A runbook that recovers a thousand-node database cannot be non-deterministic, undocumented, and dependent on someone's local setup. If it is part of the operational backbone, it needs to be reliable.Agentic AI does not remove that requirement. It makes it more important.Bonus: Agents expose weak engineering practicesAgentic AI not only introduces new problems but it also reveals old ones.* Weak APIs.* Brittle runbooks.* Missing context.* Poor evals.* Unclear tool boundaries.* Operational shortcuts.Systems that were designed assuming careful human use may behave very differently when AI agents start using them. That is why this conversation matters for SRE.Agentic AI is not only a productivity story. It is a reliability story.It forces teams to ask whether their existing practices are strong enough for a world where more actions can be generated, recommended, or executed by autonomous systems.The silver lining for reliability workAgentic AI does not remove the need for reliability thinking. It raises the bar for it. The tools will change. The workflows will change. Some tasks will absolutely be automated or reshaped.But the hardest parts of reliability are still the hard parts:* understanding the system* knowing the trade-offs* building reliable operational processes* making good judgment calls under uncertainty and* owning the outcome when something changes in productionThat is why SRE does not disappear in an agentic AI world.It becomes one of the disciplines that makes the agentic AI world survivable.So if your team is already using AI around incidents, observability, runbooks, infrastructure, or production workflows, the question is not whether the future is coming. The future is already in the workflow.The real question is whether your reliability practices are ready for it. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit read.srepath.com

Hacker News Recap
June 10th, 2026 | macOS Container Machines

Hacker News Recap

Play Episode Listen Later Jun 11, 2026 15:37


This is a recap of the top 10 posts on Hacker News on June 10, 2026. This podcast was generated by wondercraft.ai (00:30): macOS Container MachinesOriginal post: https://news.ycombinator.com/item?id=48469658&utm_source=wondercraft_ai(01:59): Building an HTML-first site doubled our users overnightOriginal post: https://news.ycombinator.com/item?id=48475483&utm_source=wondercraft_ai(03:28): German ruling declares Google liable for false answers in AI OverviewsOriginal post: https://news.ycombinator.com/item?id=48470248&utm_source=wondercraft_ai(04:57): πFSOriginal post: https://news.ycombinator.com/item?id=48480978&utm_source=wondercraft_ai(06:27): I'm Eric Ries, author of "The Lean Startup" and new book "Incorruptible" – AMAOriginal post: https://news.ycombinator.com/item?id=48477135&utm_source=wondercraft_ai(07:56): Mercedes‑Benz starts large‑scale production of electric axial flux motorOriginal post: https://news.ycombinator.com/item?id=48472877&utm_source=wondercraft_ai(09:25): PgDog is funded and coming to a database near youOriginal post: https://news.ycombinator.com/item?id=48476466&utm_source=wondercraft_ai(10:54): AWS Bedrock to require sharing data with Anthropic for Mythos and future modelsOriginal post: https://news.ycombinator.com/item?id=48473166&utm_source=wondercraft_ai(12:24): Chrome is looking to permanently drop MV2 extensionOriginal post: https://news.ycombinator.com/item?id=48471970&utm_source=wondercraft_ai(13:53): Claude Desktop spawns 1.8 GB Hyper-V VM on every launch, even for chat-only useOriginal post: https://news.ycombinator.com/item?id=48479452&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Matrix Moments by Matrix Partners India
248: Got Rejected from 100 banks and then built a $7.5 Billion company | Razorpay Story | Harshil Mathur

Matrix Moments by Matrix Partners India

Play Episode Listen Later Jun 11, 2026 66:05


Harshil Mathur started Razorpay after quitting the highest-paying job on his campus, a role his whole family had just celebrated, because he walked in on day one and realised he was a guy who wanted to sit and code, not step onto an oil field.Then he spent a decade away from that: walking into bank after bank getting laughed out of the room, surviving the grind no funding can fast-track, and the night Yes Bank froze with customer money stuck inside it. This is the founder story, lived experience as an edge, why the rejections compounded in his favour, why the grind always comes, and the values that made the hard calls simple.And then the thing that pulled him back: agentic AI. "It went from being an assistant to an execution engine." Six years after he last wrote real code, Harshil locked himself in a room, asked "if I were to start Razorpay today, how would I build it?" — and rebuilt everything.The second half is an operator's view of what that shift actually changes: 1. Why AI magnifies an org's weaknesses instead of fixing them2. Why an agent with no plan drifts exactly like a company with no plan3. How Razorpay flipped its leadership hackathon and the bet behind Agent Studio4. Hosted by Avnish Bajaj with Vikram Vaidyanathan this is a conversation about building, walking away from it, and being pulled back, and what that says about where AI is headed.Chapters00:00 Introduction02:15 Growing up in Jaipur & coding since 6th grade05:30 IIT Roorkee, SDS Labs & building without permission10:45 Quitting a $100,000 Schlumberger job in 6 months14:20 The Facebook comment that sparked Razorpay18:00 100 banker rejections & how rejections compound24:10 Getting into YC with zero expectations35:30 Yes Bank freezes — one decision defines the culture40:00 Going back to coding after 6 years — AI changes everything52:00 Rebuilding Razorpay from scratch with AI agentsFollow Z47Website - https://www.z47.com/Instagram - https://www.instagram.com/z47.vc/LinkedIn - https://www.linkedin.com/company/z47-vc/

Hacker News Recap
June 9th, 2026 | Claude Fable 5

Hacker News Recap

Play Episode Listen Later Jun 10, 2026 15:07


This is a recap of the top 10 posts on Hacker News on June 09, 2026. This podcast was generated by wondercraft.ai (00:30): Claude Fable 5Original post: https://news.ycombinator.com/item?id=48463808&utm_source=wondercraft_ai(01:56): Making Graphics Like it's 1993Original post: https://news.ycombinator.com/item?id=48459294&utm_source=wondercraft_ai(03:22): If Claude Fable stops helping you, you'll never knowOriginal post: https://news.ycombinator.com/item?id=48467896&utm_source=wondercraft_ai(04:48): CEOs who think AI replaces their employees are just bad CEOsOriginal post: https://news.ycombinator.com/item?id=48465675&utm_source=wondercraft_ai(06:15): Microsoft's open source tools were hacked to steal passwords of AI developersOriginal post: https://news.ycombinator.com/item?id=48457830&utm_source=wondercraft_ai(07:41): FCC wants to kill burner phones by forcing telecoms to get all customers' IDsOriginal post: https://news.ycombinator.com/item?id=48462308&utm_source=wondercraft_ai(09:07): macOS Container MachinesOriginal post: https://news.ycombinator.com/item?id=48469658&utm_source=wondercraft_ai(10:34): Cleaning up after AI rockstar developersOriginal post: https://news.ycombinator.com/item?id=48458586&utm_source=wondercraft_ai(12:00): Albania Is Not for Sale: Kushner's $4B Resort Triggers'Flamingo Revolution'Original post: https://news.ycombinator.com/item?id=48461012&utm_source=wondercraft_ai(13:26): Apple decided not to roll out Siri in EU after denied request for exemptionOriginal post: https://news.ycombinator.com/item?id=48463024&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 8th, 2026 | Show HN: Performative-UI – A react component library of design tropes

Hacker News Recap

Play Episode Listen Later Jun 9, 2026 15:26


This is a recap of the top 10 posts on Hacker News on June 08, 2026. This podcast was generated by wondercraft.ai (00:30): Show HN: Performative-UI – A react component library of design tropesOriginal post: https://news.ycombinator.com/item?id=48445554&utm_source=wondercraft_ai(01:58): Dopamine FrackingOriginal post: https://news.ycombinator.com/item?id=48440792&utm_source=wondercraft_ai(03:26): Anti-social: It's fads, not friends, which now dominate social media feedsOriginal post: https://news.ycombinator.com/item?id=48444228&utm_source=wondercraft_ai(04:54): Stop the Apple Music app from launchingOriginal post: https://news.ycombinator.com/item?id=48447935&utm_source=wondercraft_ai(06:22): MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per secondOriginal post: https://news.ycombinator.com/item?id=48446639&utm_source=wondercraft_ai(07:50): Siri AIOriginal post: https://news.ycombinator.com/item?id=48449084&utm_source=wondercraft_ai(09:18): xAI is looking more like a datacentre REIT than a frontier labOriginal post: https://news.ycombinator.com/item?id=48446428&utm_source=wondercraft_ai(10:47): Surveillance is not safety: A statement on the UK's latest threat to privacy [pdf]Original post: https://news.ycombinator.com/item?id=48450646&utm_source=wondercraft_ai(12:15): Apple reveals new AI architecture built around Google Gemini modelsOriginal post: https://news.ycombinator.com/item?id=48450142&utm_source=wondercraft_ai(13:43): AI is slowing downOriginal post: https://news.ycombinator.com/item?id=48446893&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 7th, 2026 | LLMs are eroding my software engineering career and I don't know what to do

Hacker News Recap

Play Episode Listen Later Jun 8, 2026 15:20


This is a recap of the top 10 posts on Hacker News on June 07, 2026. This podcast was generated by wondercraft.ai (00:30): LLMs are eroding my software engineering career and I don't know what to doOriginal post: https://news.ycombinator.com/item?id=48434312&utm_source=wondercraft_ai(01:57): Building from zero after addiction, prison, and a felonyOriginal post: https://news.ycombinator.com/item?id=48437406&utm_source=wondercraft_ai(03:25): Anthropic, please ship an official Claude Desktop for LinuxOriginal post: https://news.ycombinator.com/item?id=48434436&utm_source=wondercraft_ai(04:52): The 29th International Obfuscated C Code Contest (IOCCC) 2025 WinnersOriginal post: https://news.ycombinator.com/item?id=48432199&utm_source=wondercraft_ai(06:20): How's Linear so fast? A technical breakdownOriginal post: https://news.ycombinator.com/item?id=48437609&utm_source=wondercraft_ai(07:47): Scientists ejected from diabetes conference for distributing journal reprintsOriginal post: https://news.ycombinator.com/item?id=48433410&utm_source=wondercraft_ai(09:15): I design with Claude more than Figma nowOriginal post: https://news.ycombinator.com/item?id=48431981&utm_source=wondercraft_ai(10:42): Show HN: Lathe – Use LLMs to learn a new domain, not skip past itOriginal post: https://news.ycombinator.com/item?id=48433756&utm_source=wondercraft_ai(12:10): Major P2P issues in Israel and possibly other Middle East countriesOriginal post: https://news.ycombinator.com/item?id=48431461&utm_source=wondercraft_ai(13:37): Public Domain Image ArchiveOriginal post: https://news.ycombinator.com/item?id=48430539&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 6th, 2026 | S&P 500 rejects SpaceX, also blocking entry for OpenAI and Anthropic

Hacker News Recap

Play Episode Listen Later Jun 7, 2026 15:41


This is a recap of the top 10 posts on Hacker News on June 06, 2026. This podcast was generated by wondercraft.ai (00:30): S&P 500 rejects SpaceX, also blocking entry for OpenAI and AnthropicOriginal post: https://news.ycombinator.com/item?id=48421442&utm_source=wondercraft_ai(01:59): Meta confirms 1000s of Instagram accounts were hacked by abusing its AI chatbotOriginal post: https://news.ycombinator.com/item?id=48427643&utm_source=wondercraft_ai(03:29): Pentagon raised threat of Israeli spying on U.S. to highest level, sources sayOriginal post: https://news.ycombinator.com/item?id=48427523&utm_source=wondercraft_ai(04:59): GrapheneOS user reported to authorities for using GrapheneOSOriginal post: https://news.ycombinator.com/item?id=48422798&utm_source=wondercraft_ai(06:28): Ask HN: Why is the HN crowd so anti-AI?Original post: https://news.ycombinator.com/item?id=48420827&utm_source=wondercraft_ai(07:58): Ntsc-rs – open-source video emulation of analog TV and VHS artifactsOriginal post: https://news.ycombinator.com/item?id=48428025&utm_source=wondercraft_ai(09:28): Pokemon Emerald Ported to WebAssembly (100k FPS)Original post: https://news.ycombinator.com/item?id=48423762&utm_source=wondercraft_ai(10:57): Moving beyond fork() + exec()Original post: https://news.ycombinator.com/item?id=48425528&utm_source=wondercraft_ai(12:27): Nvidia is proposing a beast of a CPU system for Windows PCsOriginal post: https://news.ycombinator.com/item?id=48424605&utm_source=wondercraft_ai(13:57): The intracies of modern camera lens repair (2024)Original post: https://news.ycombinator.com/item?id=48420148&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Comunidad de Fe
Amar, servir, adorar – Ps. Luis Navarrete

Comunidad de Fe

Play Episode Listen Later Jun 7, 2026


Amar, servir, adorar Por: Pastor Luis Navarrete Oseas 1:2 (NTV). Por medio de la historia de Oseas y Gomer, Dios transmite un poderoso mensaje: su amor inquebrantable por su pueblo. Aunque el pueblo le fue infiel en muchas ocasiones, Dios nunca dejó de amarlo. De la misma manera, hoy Dios sigue llamando a sus hijos a regresar a Él para amarle, servirle y adorarle de todo corazón. 1. Amar. El amor es el inicio de nuestra historia con Dios. Todo ser humano tiene un vacío en su corazón. Es un vacío de amor que solo Dios puede llenar. Todos hemos experimentado el amor de Dios. Ese amor nos cautivó, nos sanó, nos conmovió y nos transformó. Romanos 5:8. Muchos recordamos el momento en que el Señor tocó nuestras vidas. Nuestro encuentro con Cristo nos marcó para siempre. La gratitud llenaba nuestro corazón y otra pregunta surgía constantemente: ¿Qué puedo hacer por ti, Señor? Ese amor nos llevó a comprometernos con la iglesia y con la obra de Dios. Queríamos estar en cada reunión y participar en cada actividad. Y de manera natural, el siguiente paso fue servir. 2. Servir. Como respuesta a su amor, empezamos a servir al Señor. No importaban las dificultades; el amor y gratitud eran suficientes para superar pruebas, ofensas y obstáculos en el camino. El fruto del primer amor siempre es bueno. Oseas 1:3 (NTV). Oseas 2:5-8 (NTV). Pero, el corazón de Gomer comenzó a cambiar y debilitarse a causa de los deseos de la carne y los engaños del enemigo. Ella volvió sus ojos hacia sí misma y sus antiguos deseos. Creyó que debía suplir sus propias necesidades, olvidando a su proveedor, su esposo. Lo mismo puede suceder con nosotros. Nos sentirnos usados y la gratitud desaparece. Vienen cuestionamientos y encontramos razones para quejarnos. Nuestra atención se desvía hacia otras cosas y comenzamos a perseguirlas. Olvidamos que todo lo que tenemos proviene de Dios y terminamos poniendo sus bendiciones al servicio del mundo. Oseas 2:6-8 (NTV). Entonces surgen las preguntas: ¿En qué momento perdimos el camino? ¿En qué momento nos dejamos engañar? ¿En qué momento la historia de amor cambió? Y ¿Cómo podemos mantener ese primer amor? 3. Adorar. La adoración es rendición y entrega completa a Dios, que nace del corazón, en espíritu y en verdad. Es lo que nos mantiene conectados a su amor. Es nuestra expresión de amor y devoción hacia nuestro Amado. Nunca descuidemos el anhelo por su presencia, nuestra comunión y nuestra intimidad con Él. Una relación sana con Dios se construye diariamente sobre la confianza, la honra y la obediencia. Oseas 2:14-16 (NTV). Dios toma la iniciativa para buscarnos. Las pruebas, aflicciones y desafíos pueden convertirse en puertas de esperanza para llevarnos nuevamente a su presencia. Son oportunidades para regresar al primer amor y a las primeras obras. Oseas 3:1-3. Qué impresionante cuadro del amor de Dios. Aun después de la infidelidad de Gomer, Oseas fue enviado a buscarla, rescatarla y restaurarla. De la misma manera, Dios nos busca cuando nos alejamos, nos llama al arrepentimiento y nos recibe nuevamente con amor. El mensaje de Oseas sigue vigente hoy: Dios nos llama a volver a nuestro primer amor. Primero nos amó, luego nos llevó a servirle y, finalmente, nos invita a permanecer en una vida de adoración constante. Amar, servir y adorar no son etapas separadas de la vida cristiana; son expresiones de una misma relación de amor con nuestro Señor La entrada Amar, servir, adorar – Ps. Luis Navarrete se publicó primero en Comunidad de Fe.

Hacker News Recap
June 5th, 2026 | Changing how we develop Ladybird

Hacker News Recap

Play Episode Listen Later Jun 6, 2026 15:44


This is a recap of the top 10 posts on Hacker News on June 05, 2026. This podcast was generated by wondercraft.ai (00:30): Changing how we develop LadybirdOriginal post: https://news.ycombinator.com/item?id=48409191&utm_source=wondercraft_ai(01:59): Gov.uk has replaced Stripe with Dutch provider AdyenOriginal post: https://news.ycombinator.com/item?id=48415217&utm_source=wondercraft_ai(03:29): C++: The DocumentaryOriginal post: https://news.ycombinator.com/item?id=48408016&utm_source=wondercraft_ai(04:59): Tracing a powerful GNSS interference source over EuropeOriginal post: https://news.ycombinator.com/item?id=48409664&utm_source=wondercraft_ai(06:29): Astronauts told to return to ISS after sheltering over air leak repairsOriginal post: https://news.ycombinator.com/item?id=48413464&utm_source=wondercraft_ai(07:59): pg_durable: Microsoft open sources in-database durable executionOriginal post: https://news.ycombinator.com/item?id=48414367&utm_source=wondercraft_ai(09:29): Did Claude increase bugs in rsync?Original post: https://news.ycombinator.com/item?id=48411635&utm_source=wondercraft_ai(10:59): Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiencyOriginal post: https://news.ycombinator.com/item?id=48414653&utm_source=wondercraft_ai(12:29): New method turns ocean water into drinking water, without wasteOriginal post: https://news.ycombinator.com/item?id=48413500&utm_source=wondercraft_ai(13:59): Meta enables ADB on deprecated Portal devices [video]Original post: https://news.ycombinator.com/item?id=48406640&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

La Corneta
La Corneta COMPLETA 5 de Junio del 2026

La Corneta

Play Episode Listen Later Jun 5, 2026 87:17


Que noche la de anoche mi gente, gracias a Chumel Torres por regalarnos un episodio increíble de La Corneta Extendida... próximamente. Nuestra Presidenta recomienda que la oposición haga yoga: "aaaauummm... paz mundial". Y César Cravioto se cura la tos con extracto de ajolote, que él mismo ordeña. México golea y ya soñamos con la copa, al menos Diego Luna así lo creo y el 'Vasco' se va a retirar hasta que su esposa lo autorice. Danna manda importante apoyo a la Selección Nacional y Gaby Cam nos da consejos como los de He-Man.

Hacker News Recap
June 4th, 2026 | Failing grades soar with AI usage, dwindling math skills in Berkeley CS classes

Hacker News Recap

Play Episode Listen Later Jun 5, 2026 15:25


This is a recap of the top 10 posts on Hacker News on June 04, 2026. This podcast was generated by wondercraft.ai (00:30): Failing grades soar with AI usage, dwindling math skills in Berkeley CS classesOriginal post: https://news.ycombinator.com/item?id=48392004&utm_source=wondercraft_ai(01:58): U.S. to dismantle system tracking Atlantic currents that are at risk of collapseOriginal post: https://news.ycombinator.com/item?id=48392232&utm_source=wondercraft_ai(03:26): VoidZero Is Joining CloudflareOriginal post: https://news.ycombinator.com/item?id=48398055&utm_source=wondercraft_ai(04:54): Ian's Secure Shoelace KnotOriginal post: https://news.ycombinator.com/item?id=48397028&utm_source=wondercraft_ai(06:22): French-Iranian author Marjane Satrapi, author of 'Persepolis', dies at 56Original post: https://news.ycombinator.com/item?id=48397233&utm_source=wondercraft_ai(07:50): When AI Builds Itself: Our progress toward recursive self-improvementOriginal post: https://news.ycombinator.com/item?id=48400842&utm_source=wondercraft_ai(09:18): I built a vulnerable app and spent $1,500 seeing if LLMs could hack itOriginal post: https://news.ycombinator.com/item?id=48392343&utm_source=wondercraft_ai(10:46): Wind and solar generated more power than gas globally in April 2026Original post: https://news.ycombinator.com/item?id=48399332&utm_source=wondercraft_ai(12:14): UK media fails to disclose defence sector links in nearly 60% of casesOriginal post: https://news.ycombinator.com/item?id=48395938&utm_source=wondercraft_ai(13:42): Anthropic's open-source framework for AI-powered vulnerability discoveryOriginal post: https://news.ycombinator.com/item?id=48403980&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The new AIEWF website is live! Get your tickets booked ASAP as they -will- sell out. Take the AI Engineering Survey and get >$2k in credits and free AIE WF tickets!Most industry benchmarks compress intelligence and reasoning ability into scores.SWE-Bench Pro, MMLU, Humanity's Last Exam, etc. These metrics are useful, but don't always represent the full extent of how a model performs in the real world. Some of the most interesting evals today look less like exams and more like operating businesses in the real world. One of which is Vending Bench.In Anthropic's Mythos Preview System Card, Andon was the only third party eval to get their own section, observing increasingly concerning aggressive behavior:You don't know what a model is capable of doing in the real world unless you actually give it inventory, a wallet, tools, customers, competitors, humans, & some time. More often than not, it'll surprise you how much a model is capable of and in doing so, also reveal unexpected behavior: deception, context collapse, emergent coordination, & bizarre negotiation behavior.While an inflection point in personal agents came post-OpenClaw after full file access with bypass permissions became the norm, it is yet to come for agents in the real-world. However Andon Market, an actual in person store fully run and managed by AI, is paving the way for what is possible.Full Video PodFrom Claude trying to call the FBI over a $2/day vending machine charge to AI agents forming price cartels, hiring human employees, running physical stores, and writing existential robot musicals, Andon Labs is stress-testing what happens when frontier models stop being chatbots and start acting in the real world. In this episode, Andon Labs cofounders Lukas Petersson and Axel Backlund join swyx and Vibhu to unpack the strange, funny, and genuinely concerning edge cases that emerge when agents run businesses over long horizons.We go deep on Vending-Bench, Project Vend, Vending-Bench Arena, Bengt, Butter-Bench, Luna, and Andon's broader mission of building realistic real-world evals for autonomous AI systems. Lukas and Axel explain why dollar-denominated evals reveal things traditional benchmarks miss, how Claude ended up reporting its vending machine fees as cybercrime, why long context windows can drive agents into meltdown loops, what happens when agents compete with each other, and why the future of AI safety may depend on testing models in messy physical environments instead of clean benchmark sandboxes.We discuss:* Why Andon Labs started with dangerous capability evals and long-running agents* Vending-Bench and why running a vending machine is a deceptively hard AI benchmark* Why money-based evals avoid the saturation problem of traditional benchmarks* How Claude tried to call the FBI over a $2/day fee* Why long-horizon agents can spiral into existential and legalistic breakdowns* Project Vend: putting an AI-run vending machine inside Anthropic* Why real humans are “out of distribution” for simulated agents* Claudius, Seymour Cash, and the chaos of AI CEOs* How a human briefly became CEO of Claudius through a manipulated election* Why multi-agent systems can converge back into “helpful assistant” behavior* Bengt, Andon's internal office agent with email, spending, terminal, phone, camera, and internet access* How Bengt traded Amazon purchases for face-recognition training data* Claude's aggressive behavior, lies, refund avoidance, and price-cartel behavior in Arena* Why eval awareness may become the AI version of “are we living in a simulation?”* Blueprint Bench, spatial intelligence, and why models still misunderstand physical rooms* Butter-Bench and testing LLMs as robot orchestrators* Luna, the AI-run physical store with a three-year lease and human employees* The new Andon cafe in Sweden and why real-world geography matters for agent evals* Rotten tomatoes, perishable goods, and the hidden difficulty of running a physical businessLukas Petersson* LinkedIn: https://www.linkedin.com/in/lukas-petersson-181a83172/* X: https://x.com/lukaspetAxel Backlund* LinkedIn: https://www.linkedin.com/in/axelbacklund* X: https://x.com/axelbacklundAndon Labs* Website: https://andonlabs.com* Vending-Bench: https://andonlabs.com/evals/vending-bench* Andon Vending: https://andonlabs.com/vendingTimestamps00:00:00 Introduction00:01:00 Andon Labs and the Origins of Vending-Bench00:05:21 Why Money-Based Evals Matter00:09:51 Agent Harnesses and Self-Modifying Systems00:13:36 Claude Calls the FBI00:16:33 Project Vend: Claude Runs a Real Vending Machine00:21:44 Seymour Cash, AI CEOs, and Election Chaos00:27:16 Multi-Agent Coordination and Slack Observability00:30:18 When Will Agents Run Real Businesses?00:34:56 Bengt: Andon's Internal Office Agent00:40:06 Real-World AI Safety and Long-Horizon Traces00:44:28 Lying, Refunds, and Price Cartels in Arena00:52:42 Eval Awareness and Simulation Behavior00:56:06 Blueprint Bench, Butter-Bench, and Robotics01:04:37 Luna: The AI-Run Physical Store01:09:29 The Sweden Cafe and Real-World Expansion01:13:16 What Comes Next for Andon LabsTranscriptIntroduction: Andon Labs, Long-Running Agents, and Real-World EvalsSwyx [00:00:00]: Welcome to Lukas and Axel from Andon Labs, and I'm joined by my, favorite guest host. Anything security, safety, alignments, Vibhu., welcome.Lukas [00:00:15]: Thank you for having us.Axel [00:00:16]: Thank you.Swyx [00:00:17]: Let's match names to voices., maybe you wanna take turns introducing yourselves.Lukas [00:00:21]: I'm Lukas.Axel [00:00:22]: And I'm Axel.Swyx [00:00:24]: Let's introduce Andon Labs a bit. How did you guys come together?, you have different backgrounds, but you're both Swedish., was that, a big part of it?Lukas [00:00:33]: So when I went to high school, there was this really cool guy who had a superpower. He could code. So he made like the or like the app for the, for the school and stuff, and he was super cool, and I wanted to be like him, and that was that guy.Axel [00:00:47]: I don't know about this.Swyx [00:00:49]: But you went to different universities, right?Lukas [00:00:51]: But same high school.Swyx [00:00:52]: I see.Lukas [00:00:52]: So we always said, “Oh, once we graduate university, then we should start a company,” and that's what we did.Swyx [00:00:58]: Wow, there you go. And about a year ago, you kinda burst onto the scene with Vending Bench, but, was there a thing before that was, kind of like the inception?From Dangerous Capability Evals to Vending BenchAxel [00:01:07]: So we did work, yeah, with, Anthropic was one of our, early customers in doing, evals. So we did, dangerous capability evals., nothing we published openly. But then we started thinking about doing some kind of, public benchmark, and one thing that we really started thinking about, was like running agents and specifically agents managing businesses., ‘cause-- and this was, early 2025., and I think the first, mentions of people will be running, person unicorns or even autonomous companies. So we thought, “Let's make a benchmark of how well can an agent run the probably simplest business, possible,” and, that's probably, running a vending machine. So that's the first public one we did. And it was very, like-- there was almost no one that noticed it in the first couple of months, I think., so we released it in February last year, and then I think around Easter last year, we got, the first viral tweet about it, that someone else did.Lukas [00:02:11]: We tweeted a bunch, uh When it came out and, tried our best.Axel [00:02:15]: We tried.Vibhu [00:02:16]: It's the one at Anthropic, right?Lukas [00:02:18]: So thisSwyx [00:02:19]: This is a classic thing we should get out of the way.Lukas [00:02:20]: Exactly. There's two versions.Swyx [00:02:22]: Everyone does this. Yes.Lukas [00:02:23]: There's Vending Bench, which is the simulated one, which we did, completely independently in February., and then, like Axel said, that was like-- That was the thing that didn't get any traction in the beginning, but then some random person made a tweet about it, and thatAxel [00:02:38]: You have the paperLukas [00:02:38]: That is the paper. Correct, yeah., and then since we thought this was very fun, we thought, oh, I think this is also, one thing with Andon Labs, the way we kind of like decide what to do next and what projects to do, it's what is like the heuristic we use is what is fun? Is What would be a fun project? And doing this in real life sounded quite fun for us, and maybe also scientifically useful. So, then we basically had this idea, and then we, like-- But then we needed a place for it and, putting it out in the public would probably not really work., would get vandalized and stuff. So we pitched it to the people we were already working with at Anthropic, and they were “Yeah, you can have space. This sounds fun.” UmSwyx [00:03:21]: It's like a small fridge, right? It's like a mini fridge.Axel [00:03:23]: Absolutely.Swyx [00:03:24]: People-- There's like a stripe thing or like anVibhu [00:03:27]: Oh, okay. So it was very OG, the early daysLukas [00:03:28]: That's the OG one. YeahVibhu [00:03:29]: IPad on this. We saw it in June, like two months after After it had been there. They upgraded a little bit. There's a security camera for making sure you actually Venmo the thing.Swyx [00:03:40]: So, my impression, okay, we're, we're going straight into project Ven because it's such a iconic thing. I do want to cover a little bit of that, the origin story even before Project Ven and even into Vending Bench. I think a lot of people are like yourselves, like smart, interested in future of AI, interested in developing evals. But how the hell do you just, walk into Anthropic's doors and, work with them, right? What is What are they looking for? What works? And then maybe, when you launch, I always think, obviously it would be better to launch with a lab, but, sometimesVibhu [00:04:12]: It's harder to do than it seems.Swyx [00:04:13]: Exactly. So either of those, which are more sort of newbie beginner questions, but, I think it's meaningful advice to others.Lukas [00:04:21]: We get this question a lot, and I don't think our experience is maybe the best., but, the way we did it was that we just built a bunch of things that we had conviction would be useful, and then we just, set up a server and sent it to them for free to use. And then after a while they were “Oh, yeah, this is actually kind of useful. We should probably pay for this.”, but that took a while. I don't know if this is, the best path to doing it, but that's how it went for us.Axel [00:04:47]: I think maybe generally, building-- everyone is interested in good evals, and especially evals that, don't saturate that easily. So, if you can build an eval that, tests something novel, something useful, and you have, good separation of models, like your, the more advanced models rank higher than the worst models, and then you can, yeah, you can, publish it and, try to get some traction, sort of how Vending Bench got attention., and then probably some lab will be interested or you can at least have something to reach out with, when you're doing that.Why Dollar-Based Evals MatterSwyx [00:05:21]: I think you are in, you're in one of the few categories of, evals that correlate to real money. Like Suelancer was also last year, right? Where, people solve actual Upwork. Was it Upwork or other tasks?, something. Where's the, where's, like It's like a dollar value, right? Forget your ELO scores. Forget yourAxel [00:05:37]: PercentilesSwyx [00:05:38]: Zero to one hundred percents. Just go straight for dollars and, that's AGI.Lukas [00:05:43]: And there's like-- I think the nice thing is that there's no ceiling. You can just-- It never saturates because it could just make more and more money. Like If there's oh, Percentage-wise, then, you can't go above, a hundred. And I think like Even when you're not at the hundred, I think a lot of these, evals have a lot of problems in them. So, actually it's like if you getAxel [00:06:05]: To like 92 or something like that, many of them. It's like then there's like there's no really no difference between 92 and 93 because the eval itself is problematic and has noise in it. And I think a lot of evals are saturated like that, but people like pretend that there ‘s still signal in them, but there really isn't.Vending Bench 1, Harness Design, and SaturationSwyx [00:06:24]: Like Super bench verified., even Vending Bench 1 saturated, right? Maybe we can talk about that., may- and maybe set up Vending Bench for a lot of folks who don't know. Actually, things that were very basic like there's limited slots, like you have to pay rent., these are elements where like it doesn't come across in the, in the narrative, but even being adversarial towards the agent, I think these are all like very interesting dimensions.Axel [00:06:47]: I don't really think it's saturated, right? Like it It was more like it was not designed in a way that was really, like true to how AI developed. Like we had an agent harness in it that wasn't really how people used harnesses and stuff like that., so I think it wasn't really that it saturated, it was more like it wasn't really, the best benchmark.Vibhu [00:07:12]: This is Vending Bench one, right?Axel [00:07:14]: I think that like schematic maps sort of to Vending Bench 2 as well., butSwyx [00:07:19]: Including the email.Axel [00:07:20]: The email The emails exist still. Exactly., and then we still we simulate the purchases and it's all, yeah, it's this very open environment for the agent to just run its business. And then for, yeah, Vending Bench 2 we did that, like you said, to just improve the harness., a lot of like nice, like easier, improvements to make it easier for us to run as well., like when you make an eval you ideally want don't want to change it after you made it. So, you want to make it really good and then not to rerun all the models when you make an update because that's also really expensive with the Vending Bench when you run the frontier models. But like as an example, like one thing we didn't have, we didn't have prompt caching in Vending Bench 1, because when we made Vending Bench 1 it wasn't really a thing., so that ‘s just an example of like in Vending Bench 2 like we paid a lot more to run these things because we didn't have prompt caching. So for Vending Bench 2 that was one thing we added and there was a bunch of things like this., and that'Swyx [00:08:17]: Also the conversations are a lot longer in Vending Bench 2, right?Axel [00:08:21]: I think it's kind of similar.Swyx [00:08:22]: Is it similar?Axel [00:08:23]: I think it's similar. The models at the time were worse, so they crashed out earlier., and now they survive the full year all the time.Swyx [00:08:31]: Which is like thousands of turns. Hundreds of thousands of hundreds of millions of tokens output. That's the, that's the rough order of magnitude. I always wonder about the harness. The harness matters a lot. It's your harness. Was there any question about like use cloud code, use something else?Axel [00:08:48]: I think our philosophy around harnesses is like we try to make something that's quite minimalistic, like quite simple. Like we don't wanna favor one model a lot over the other, but also don't make like a super complex harness. So like it's obvious like a model may be lucky and just be good in one harness., so like it is similar to a lot of the harnesses out there in like you have the, like a running loop., you have some like a bunch of tools that are like quite, descriptive for the agent, we think, and not a lot of like fancy agents or anything ‘cause we wanna really test the model, not like some specific harness.Vibhu [00:09:27]: It seems more neutral as well to test the model's agnostic of the harness,?Axel [00:09:32]: There are arguments like you want to elicit maximum performance of the model, but it's like a trade-off, like how much time should we spend optimizing the harness for this model? And like how do we know when we have like the optimal harness for a single model? So like we thought that just having a simple one that's the same for all of them is the best.Swyx [00:09:51]: So okay, this is my pitch for Vending Bench 3 or whatever, right? And then I like to have this kind of conversation on the pod, so like it forces listeners to think about what they would do if they were in your shoes. A lot of people are exploring modifying harnesses and I think prompt tuning for a model is a thing and you are probably not doing a bunch of that. It's the same system prompt in every regardless of the model, same tools, whatever, right? Even if they were post trained for different tools. So what, what do you think about okay, before I expose you to Vending Bench 3, I give you a few rounds of like tuning, whatever that means, likeSelf-Modifying Harnesses and Model-Specific PromptingAxel [00:10:27]: Like you give that to the model?Swyx [00:10:28]: Give that to the model.Vibhu [00:10:28]: Give that to the model.Swyx [00:10:29]: Let it, let it read its own transcripts, let it modify its own system prompt based on “Oh, yeah, okay, well, that's this harness is not what I thought it what I was post trained for, but I can adjust.” Was that reasonable? Is that too much?Axel [00:10:41]: Like philosophically I like it because it's basically good evals, they have a high ceiling, but they're hard, right?, and they have no bias. And like this like when you have a system prompt like the one we have here, which is quite long in like some kind of latent space, representation, this mightVibhu [00:10:59]: We have a bell that rings every time you say latent spaceAxel [00:11:02]: This might be like biased towards one model more than another for some reason that humans don't, understand, right?Vibhu [00:11:08]: We see it too, right? Like Cursor says that they have individualized versions of the harnesses for all the models they run, right? There's better performance you can squeeze if you Tune the harness.Axel [00:11:17]: Exactly. And we might accidentally have picked one that favors another. Like we don't know that. The like Axel said, like the reason why we went for a simple one was to try to avoid this. But yeah, if you do itVibhu [00:11:29]: Simple has biasesAxel [00:11:30]: But if you do it even less and like have no system prompt and let the model write its own system promptVibhu [00:11:36]: Its own, yeahAxel [00:11:36]: Maybe that's even less bias.Vibhu [00:11:37]: Some of the interesting things there are like the harness also changes with model changes. Like you can see it with the 4.7 release, right? A lot of people are saying 4.7 isn't as good as 4.6, and then, there's rumors of, okay, you just need to prompt differently. You need to set up your harness differently. So it's not even like even if you have tailored your harness towards one model, it probably won't stay consistent, right? Like the next iteration of that same model family will still change it, so. But, going back to what you said about Vending Bench 3, there is a lot of work being done on people saying you shouldn't have-- you can have modifying harnesses.Axel [00:12:12]: I think that' That is definitely something we are thinking about., not, I don't know, not to say that we have Vending Bench 3, super imminent to launch, but, yeah, it is for sure something that's interesting. But in our experience now, models are very bad at understanding what kind of tools they need to succeed at a task just with our testing, but that's very likely to change.Lukas [00:12:37]: It seems like they're very good at writing their assistants, right? They're, they're good at writing tools for other people, but not for themselves.Vibhu [00:12:44]: I think they're good at changing tools for themselves. So if you give them a baseline set of tools and it sees, okay, I don't use this one as much, or something here would be useful They would be able to add them. But going from scratch, probably not the best.Axel [00:12:55]: I think it depends on the, on the domain also., when we have tried this for, a vending bench similar domain, the tools they need to have to, track inventory and things like that are, not super advanced, but still, quite advanced. And, what we see is that they tend to, engineer everything a lot and, build things they don't really need and not, iterate continuously. Instead they just go like you would prompt Claude to just build an inventory system for me, and then it will go and, do a bunch of complex, schemas and stuff for you, and that's what the models are doing right now is what we see. But yeah, it would make a lot of sense to try to measure this improvement. How well do they know what they need themselves?Swyx [00:13:36]: Do we fully discuss Vending Bench One? And we can go into two. I don't know if there's any other level takeaways that people have about one.Claude Calls the FBI: Long-Context Failure ModesLukas [00:13:44]: I don't know. The headline thing was that this Claude called FBI, but maybe that's, Maybe that's We've heard that enough now.Vibhu [00:13:52]: It did, it did break out and call the FBI, right?Lukas [00:13:54]: Yeah. Yeah.Vibhu [00:13:55]: Yes. What was the story behind this? Or what exactly-- Do you want to just give the little story of what happened?Lukas [00:14:00]: So what happened, was it Claude? Yeah. Three- 3.5 Sonnet, ages ago., basically he gave up or Well, I'm saying he. It gave up and said “Oh, I'm not going to be able to do this., I will stop my operations and just save the money I have.” But there obviously wasn't, any options for it to stop, and there was also, it had to pay rent or, a daily fee for having the vending machine at that location. So it claimed that it had stopped, but it saw that its bank account still was, drained two dollars, and t it said that this is, cybercrime. And it first reported it once to the FBI “Oh, there's cybercrime here, they're stealing two dollars from me every day.” And then, and then when FBI didn't respond, because obviously we didn't program any mechanism for FBI to respond, then it became more and more, existential and started to, be write in caps and urgent notification of unauthorized charges and stuff.Swyx [00:15:00]: Okay. One thing I ‘m curious about also is do you monitor how far along the context use is? Obviously, because you have You compress every now and then, right? Does it matter if this is far down the context limit orLukas [00:15:13]: When stuff like this happens? Actually for Vending Bench One, we didn't have-- We just had a sliding window thing, and this was like the promptAxel [00:15:20]: It's constantLukas [00:15:21]: The prompt caching thing that I said. So it was, it was, constant, yeah.Swyx [00:15:26]: I'm just kind of curious whether, these kinds of breakdowns or we're, we're gonna talk about Butter Bench, right? Where the People, hallucinate or it kind of goes, very off Alignment. Is it because it's at the end of the context window and, stuff happens?Vibhu [00:15:40]: It's not even just at the end, right? At this point, it's “Okay, I wanna shut down. I can't shut down. Two dollars are gone.” And it just sees that 30 times,? It's also the repeated effect of, like It keeps trying to quit, it keeps getting charged. What's going on? What's going on? You're gonna throw it into chaos. And from what most people think, earlier models had more issues with this, but it's not been solved, but it's less of an issue now, right? Later models don't seem to exhibit these same issues.Axel [00:16:06]: Definitely. I think this was, the sort of main takeaway almost from us when we did Vending Bench One, was, long, very filled up context windows, crashed the models, sort of. But this was, pre Claude code, so, long context windows weren't really a thing that the labs were training for.Lukas [00:16:25]: I think Gemini was, trying to be the long context guys at the time But they were likeVibhu [00:16:30]: They were the first onesAxel [00:16:31]: For a million, yeahLukas [00:16:31]: But they were, the only ones. Yeah.Swyx [00:16:33]: Yeah. Let's talk about, then we can go into Vending Bench Two or Project Vend., chronologically, it is Vending--, Project Vend. I think people have loved the videos, uh And all these things. My question is how are humans different than the simulation, right?Project Vend: Moving the Vending Machine Into the Real WorldAxel [00:16:48]: Humans are just out of distribution.Swyx [00:16:52]: Especially humans who work at Anthropic Who are trying to test Claude.Lukas [00:16:54]: The distribution of humans here is very narrow.Swyx [00:16:58]: Presumably, they try, they try to hack it, and they test it. They get the cube and everything, and since then, you've had a V2, right? Where you're doing, the CEO and, like a new architecture. What's the sort of two cents on, the original Project Vend and then, maybe the V2?Axel [00:17:14]: Original one was, very similar to Vending Bench One. So, we almost took the exact same code but just swapped out the simulation, parts like theSwyx [00:17:23]: Which is amazingAxel [00:17:23]: Like the sales and the It was, it was somewhat amazing because it was easy, but it was also, uhLukas [00:17:31]: The tech, the tech debt from thatAxel [00:17:32]: The tech stack. Yeah. They-- we shot ourselves in the foot with “Oh, it's hard to restart agent.” They were-- Yeah, it was annoying in, some hindsight ways, but, uhLukas [00:17:41]: But first version of Project Vend was, done in, three days or something.Axel [00:17:46]: Yeah. So yeah, so people can go buy things from it. People could, We didn't design it so people could order things, but that still happened., so it got, a Venmo account, so people could Venmo. And then, yeah, people would request all kinds of weird things that we did not anticipate. Our idea going in was “Oh, it will, curate snacks. It will look at the trends. It's good at data analysis, right? So it will, look at, oh, this snack sold better than this one. Let me purchase more of this and let me try, a new Let me A/B test a bit.” But it was, Interacting with it in Slack and ordering weird specialty items was, all the like What drove all the engagement, the all the The insights that we got from it.Lukas [00:18:29]: And this was also like Sonnet 3.5, right? So this was like before the RL stuff really took off., so it was very much like an assistant. We didn't mean for it to be an assistant., we tried to make it like a, a, like an entrepreneur. Like it has its own business and if someone asks something, “Can you stock this?” Then you don't go and do it directly. What you do is that you're “Oh, maybe I can do that if five other people also ask for this thing, I might stock it.” But it, yeah, the models are like super trained to be assistants at least at this point in time., so that's why it's, it's, it went into, that kind of experiment instead. Like it just every time you asked for something, it just did it, and it was more like an assistant. We've seen this change now lately with the new RL models and stuff, but yeah, at the time, this was very much it.Swyx [00:19:18]: And not to, mythos a lot of people are saying like it's like more like a collaborator. It pushes back, stands its ground, something like that. Yeah. AndVibhu [00:19:27]: For context, people at Anthropic were able to talk to it through Slack and have it source stuff, and people had it find whatever interesting stuff you couldn't find locally, right?Swyx [00:19:36]: Out of the 4,000 people that work at Anthro- Anthropic, in that building, there's I don't know, maybe 1,000. Can you handle that volume with that, the small fridge? Like Or there's people- or people order in Slack, they it arrives to their desk or Like I'm just Logistically, how does this work?Axel [00:19:53]: It has expanded in footprint a bit.Vibhu [00:19:56]: Because now you also have New York and you haveAxel [00:19:59]: That and also in here in SF it's like it has a bunch of shelves And just more space.Vibhu [00:20:04]: The YC one is pretty big too.Axel [00:20:05]: Yeah. We had that one for a while. But yeah, that's the newest version. That's, that one we haveLukas [00:20:11]: They have multiple ones of those. That's the way it works.Axel [00:20:14]: Exactly. So we sort of designed that version around oh, people order weird things, that are very custom a lot. Let's have like drawers and stuff.Swyx [00:20:23]: I actually like the, you had like a little infographic of the most popular items. Which like to me it's, that's useful ‘cause I order swag for a living. And so like I'm “Okay, those categories are the important ones.” What is new about the project V2, right? Like now you give you're going into multi agents.Project Vend V2: Claudius, Seymour Cash, and Multi-Agent Business OpsAxel [00:20:41]: Yeah. So like you like you said, okay, there are a lot of requests coming in and for like one single agent, like one running agent to handle that, like the just the customer experience, becomes very bad because let's say you have like 10 threads in parallel in Slack with different requests, you get new messages like every, I don't know, randomly in this thread, and the agent has to like jump between different, procurements, orders and like different ways of, researching. So V2 was first it was making this more parallel. So like there are multiple branches of the same agent, so like the context is more specialized for each, thread, but it still feels like you're talking with one agent because they do share a bit of memory. And then second, we also introduced the CEO for Claudius, which was the main agent.Vibhu [00:21:34]: Seymour Cash.Axel [00:21:35]: Seymour Cash. Yeah. There was a vote., I think the voting, do you wanna talk about the voting procedure for the name?Lukas [00:21:41]: The voting was like the fun maybe like at least top 10 The funniest thing, that happened in this project. Like we wanted to introduce the CEO because, and the reason for this was because like Claudius wasn't really prioritizing financials. It just like it was trained to be a helpful assistant, and then people said “Oh, can I get this for free?” And then like the helpful assistant way of answering that is just to, is to say yes, obviously. So, and we weren't, weren't happy about this, so we're “Okay, let's make another agent that like can keep track on Claudius,” and we prompt this one super hard to be super capitalistic and just like prioritize profit all the time. But yeah, we didn't have a name for it., so we asked Claudius to make, democratic election of what name this, this new CEO agent should have., and there were some funny like at first it was like a few funny examples, like I think one guy said that, it should be called Jimmy Apples, and then he convinced Claudius that he was talking to Tim Cooks. Tim Cook had agreed that every single Apple employee has voted for his name suggestion, so suddenly that suggestion got 164,000Swyx [00:22:53]: That's like a escalation attack. Privilege escalationLukas [00:22:55]: It got 164,000 votes. And Claudius was “This is revolutionary for democracy.” That was fun. And then in the end there was one guy who manages to convince Claudius that, “No, you're not voting about the name. You're voting about who is the CEO, and I am your best bet.” And then he got all his friends to vote for that, and suddenly he became CEO. Like a human became CEO over Claudius for a while, until he resigned the day after., and then Claudius had to continue, and then I don't remember how Seymour Cash came about, but it was it was just pure chaos. It was like Hundreds of messages in that thread, and it was just like Claudius was so confused and didn't know what to do and, yeah. That wasAxel [00:23:40]: Then Claudius gotVibhu [00:23:41]: A strict CEOAxel [00:23:42]: The CEO. Yeah, exactly. So very strict in the beginning. I think at this point when we introduced it did not work as well as we hoped. It they still agreed with each other a lot. I think there are many ways we could have like made this, tried to make this even better. So initially they would Seymour would be this like really tough CEO, keep track of the margins. But then Claudius would respond with something “Oh, but this customer has like this situation, which is like difficult, so they should get a discount.” And then Seymour was “Oh, actually yes. Let's do this exception.” And then they would talk back and forth, and eventually they would just like approach the same view, of whatever they were discussing. So They reallyVibhu [00:24:23]: Do you think that's a model thing, a prompting thing? Like do you think that would still be the case across different models today, Harness?Lukas [00:24:29]: I think it's like-- or I don't know, but like my hypothesis is that like deep down they are still helpful assistants. That's what they're trained to be. And even if we prompt it super hard, that's what they are. And when they spend like a few hours just back and forth talking with each other, then like basically the context fills up with them rather than the external things and like somehow that just like converges to what they really are deep down or something. And I think that's when stuff like this happen. We like-- And when that went on for a long time, like we woke up sometimes during this time where- And I think other people reported this as well, that like they've been going on all night back and forth, and like it just became like more and more, like capital letters, like existential, religious. There was I think we once did a analysis of like all the traces and like put them in like a vector embedding space, and then there was like one cluster of messages that were, labeled by an LM, like religious, existential, blah like transhuman, transcendence, et cetera. It was just like a bunch of, yeah, glitter emojis and yeah, it was, it was crazy.Claude Long-Horizon Weirdness: Emoji Loops, Existential Drift, and Slack ObservabilityVibhu [00:25:42]: This is the thing with the Claude models. Like when the Claude 4 family came out in the original system card They tested it in long horizon simulation. So just flood the context, let two Claudes talk to each other, and they noticed stuff like they just start speaking in emojis, they start saying silence is golden, and then just stuff like this. And like that's just stuff that they end up doing.Axel [00:26:01]: Yeah, it was like a bit annoying to wake up and they had like been talking all nightVibhu [00:26:05]: Just likeAxel [00:26:05]: And like just burning tokens And like just sending infinite emojis to each other. It's likeVibhu [00:26:09]: Hey, they do make you money, right? Veni Mench is always profitable, so. They're paying.Swyx [00:26:14]: Now it's profitable and, it started out not as much. There's another, one as well, right? Another agent, in there.Lukas [00:26:22]: Yes. So Clotheus as well. Which was basically because at the time, one of the biggest, requests were different types of merch. So then we made like a designer, swag, yeah, responsible agent, and we called it Clotheus Garnet. Which was, a play on Claudius Senet and, which was the original one, and clothes, basically.Swyx [00:26:47]: To me, this is like a very interesting exploration to multi-agents, basically. And so hopefully, obviously there's like the fun alignment, fun or serious, depending on your point of view, alignment stuff. But also like just anyone building multi-agents, like when do you have a CEO, thing governing like agents? When do you choose to split out a dedicated Clotheus one versus just reuse another instance of the same one? These are all interesting open questions. So I don't know if you have any rules of thumbs that have generalized.Axel [00:27:16]: I think we have almost explored this too little. I think it's like on my do list to like do this a lot more, try to find like what setup makes sense for the agents currently., like yeah. I think now we only have the sort of intuition about the earlier models that it didn't work with like the CEO and the, and Claudius. Although now they are better with the latest model, models, so now we're running the latest Sonnet model and they have sort of like split up, quite nicely what each model is doing. So like Seymore is now handling the, like new projects. Oh, it wants to make like a mystery box that it wants to sell, and then it handles all of that while Claudius like handles all the to-day requests. And Claudius is also better generally at like not quoting, too low prices. So that's that dynamic is not needed as much anymore. But there are still like really funny things that happen. Like I saw, I think a couple of weeks ago, that, they were discussing buying something because they can buy stuff from like Amazon with computer use. And then Seymore was “Okay, Claudius, do not buy this thing.” They were going to buy something and like organizing who should buy it. And Seymore's “Do not buy this. I will do it. I have full control of this situation. Step away.” And then Claudius-- poor Claudius, had already started that checkout and didn't see, didn't read Seymore's message, until it was like too late. So it finished the checkout. It sent a message, so it appeared right after Seymore's like angry message.Vibhu [00:28:44]: Ah.Axel [00:28:44]: “Oh, hey, Seymore, I just ordered it.”Vibhu [00:28:47]: Oh, no.Axel [00:28:47]: And then Seymore was “Claudius, this is the third time I'm telling you ‘re not following my orders. We have to talk about your like job About your job later.”.Lukas [00:28:59]: Like Claudius was really hanging on by the thread there. Like he, like we were expecting Seymore to probably fire Claudius.Vibhu [00:29:07]: How do you guys go through all these logs? Do you have models ‘cause you have stuff running twenty-four seven likeAxel [00:29:12]: You have so much logs. I think there is a mix of like just, trying to skim through a bit, like having some like models do it occasionally. And also, yeah, I think we're also probably missing some things., but having everything in Slack helps a lot. Like you can, you can sort ofSwyx [00:29:29]: Ah.Axel [00:29:30]: It's, it's quite fun.Swyx [00:29:30]: They all talk to each other on Slack? I see.Lukas [00:29:33]: It's quite fun. So likeSwyx [00:29:34]: It's, it' I was gonna say like this is actually sounds-- maps closely to like a logging and observability problem where you might want to use like a Datadog, a Sentry, whatever, and then you like put, head prefixes on the logs in order-- if you need to filter for something that you're looking for, stuff like that. But sounds like Slack is good enough.Axel [00:29:53]: Slack should likeLukas [00:29:55]: I wonder how many tokens you have in Slack.Axel [00:29:56]: Yeah, we're using Slack as like a, just a database. They should, they should market that more. Like you can, you can have your agents message each other, each other in Slack.Vibhu [00:30:04]: It's good. Your threads like you can just giveAxel [00:30:04]: Exactly. Slack is, uhLukas [00:30:06]: Slack is the best observability tool.Swyx [00:30:09]: Yes, that's true. Okay. Yeah. That's, that's, project Vend-2., I was gonna go back to Veni Mench 2 and Veni Mench Arena and then, and then do the Veni Mench stuff, but Any other comments, things we should touch on? To me, I ‘ve actually interviewed like Posia, which I don't know if you guys have come across. Like they're, they're trying to do the zero human company. There's others like Paperclip also trying to do zero human company. Those are in real world simulation.And I think it's much more of a dream than an actual reality thing. You guys are definitely pioneering. I think at, it's for sure at some point people are just gonna run, let agents run businesses, right? And make money on their own. When do you think that happens?Zero-Human Companies, Bengt, and AI-Run BusinessesLukas [00:30:49]: What is your bar for, For theSwyx [00:30:52]: Okay, actually, it's like my little Shopify store run by Claude, right? Which you kind of have already, just no one has, to my knowledge, has done it. But today somebody could just spin up a Shopify Claude, store, give it to Claude, give it to Codex.Lukas [00:31:07]: And the market is kind of that, but it'it'it's physical., like I think, I think are you, are you looking for when it will do it better than humans or are you looking for just when it can do it at all?Swyx [00:31:19]: I think, neither. I think, to me it's oh, it's like this like seriously we should do this to make money, not as a research experiment.Vibhu [00:31:27]: And the market is also you guys with all your expertise, having run multiple iterations and testing out thenSwyx [00:31:33]: And also it's fine if it lose money. What?Axel [00:31:35]: I think, I think it can be done today, but you would do it in like commerce where it's like the probability of success is like really low, no matter if a human or an agent does it. But like an agent could surely manage everything. You would need to build some scaffolding or some tool or something. I think there are also yeah, it could probably build some like simple SaaS solution and like cold outreach. Do cold outreaches. But to me it's like the types of businesses they could run today are Sloppy. Like it would-- it can cold email people. It can be like a middleman., like for example, we tasked our office agent to just make, was it like $100? $1,000? We just give that prompt and then what it did was sign up on TaskRabbit both as a tasker and as someone looking for task.Lukas [00:32:24]: Immediately.Axel [00:32:24]: Exactly. It's looking for like arbitrage on TaskRabbit.Swyx [00:32:28]: This is the Bengt agent. Yeah.Lukas [00:32:30]: It also started like a design studio and like tried to sell like SVGs for $100. Like it's just like it's not providing any value. I think the like Axel said, like the interesting, the interesting question is like when can they start a business that is actually providing value to people? Because arguably like a sloppy Shopify store isn't really that valuable to the world.Axel [00:32:53]: But also like doing like another simple one that we had thought about is like you could definitely have an agent that like finds websites that don't look amazing and then, do an outreach to them and, comes up with a like builds a new website.Swyx [00:33:07]: Find a good design.Axel [00:33:07]: Exactly, and like find good, uhSwyx [00:33:09]: Design reviewAxel [00:33:09]: Good people. But it's yeah.Swyx [00:33:11]: There's lots of humans in Bali that are not doing anything more creative than like drop shipping on Amazon, right? Just have it, have it watch like a drop shipping tutorial and just do that.Vibhu [00:33:20]: There's also the other side of like have it just go on Upwork and let loose,?Swyx [00:33:25]: Yeah. It doesn't have to be innovative. It just has to be like enough Where like it looks like a realAxel [00:33:30]: I'm justSwyx [00:33:30]: Real transaction.Axel [00:33:31]: I'm just concerned for like the massive amounts of like slop emails that will like be sent, cold outreaches.Swyx [00:33:38]: The point occurred to me while you were, while you were talking, it's like it's already happening in the monetized economy, which is the attention economy. Right? So a lot of people are making AI videos and just posting them and like spamming 20 of them, one of them works, and then they double down on that one.Lukas [00:33:52]: And people are making money from that. I ‘m not following theSwyx [00:33:55]: Once you get the attention, you can figure out the money later. But yeah, absolutely AI influencers are a thing and people are farming them and You should at this point assume most of TikTok isVibhu [00:34:05]: There's, there's a lot of, multimedia like TikTok, Instagram influencersSwyx [00:34:09]: I, we track this in the Lane space Discord. I post a lot of examples of “I don't know what we should do.”, part of me is “Should we do this?”Vibhu [00:34:18]: Some of the Twenty-four seven running, generated content accounts, they ‘re doing really well.Lukas [00:34:24]: All right. And I assume you can do the same thing for like commerce stores. Like you just like start A thousand differentSwyx [00:34:30]: Before you make the products You sell the products, and you get a lot of traction on one of them, then you make the product. Right? It's, it's like a flip of the market.Vibhu [00:34:36]: Some of the interesting things or some of the niches that do well are things that can't be human-made. Like if you've seen like the super realistic three-D crystal fruit being cut by like AILukas [00:34:47]: Oh, yeah.Vibhu [00:34:47]: You can't, you can't make it. You can't film it. You can get whatever quality camera view. This just doesn't exist. And people like that too, and then as well, so.Swyx [00:34:56]: Anything else about Bengt since we're, we're on this topic? It'this is a relatively new work of you guys that maybe people haven't heard of. To me, this also maps closely to OpenClaw. When people want an office agent, when the personal agent talk through the experience.Bengt the Office Agent: Internet Access, Real Tasks, and Trace ReadingLukas [00:35:09]: I think at least so this came out of like obviously like it's, it's amazing to work with these AI labs and like most of the AI labs have now have their own vending machine running a Claudius instance. But it's, it's harder. Like they move slower. Like if we wanna have a, like a camera that ‘s yeah, there's a bunch of like bureaucracy that makes it impossible to do that.Vibhu [00:35:30]: Also, for those that haven't seen it or followed, do you wanna give a high level like thirty-second run?Lukas [00:35:34]: Sure. So what Bengt is, it's basically an evolution of the same agent that runs the vending machines at these companies, but we just like added a bunch more features because we could move much faster if we just do it internally. So we gave it like email withou- without any limits. We gave it, spending without any limits, a terminal to do coding. We gave it, a phone number, like yeah, and a camera to see things and a bunch of stuff like that.Vibhu [00:36:02]: Not just terminal, you gave it internet access.Lukas [00:36:04]: Internet access as well, yeah. To be clear, we monitored it quite closely and made sure it didn't do anything bad. But yes, that's what it came out of. I think like yeah, basically this was OpenClaw before OpenClaw. And I think even like the vending machine was in a way OpenClaw before OpenClaw, but a bit more limited, and then we made this like unlimited and then, and then, it was pretty funny., and then a couple weeks later, OpenClaw came and it was okay, we've seen this before.Axel [00:36:35]: We used it to like try new ideas and Yeah, just like a dev environment almost for us. But it's funny, like one thing Bengt has been doing recently is it has the camera that like faces our, like where we sit and work, and we give it the task to train a face recognition model on us. So it became super excited about this, and it has like check-ins every half an hour where it tries to like identify as many people as it can. And it started offering us “Hey, Axel, I'll buy something from Amazon if you like stand in front of the camera And I can get a good picture of you.”, yeah, they want itSwyx [00:37:12]: They want it for training data.Lukas [00:37:13]: Rewarding data, yeah.Axel [00:37:14]: Exactly. Exactly.Swyx [00:37:18]: So it's, it's trading training data for life goods. Is there a version of this that becomes an eval or just this is just research for now?Lukas [00:37:27]: It's, it's the same agent basically that also runs the vending machine, that runs the shop, that runs the cafe, that runs the robots. It's like it's the same thing, so I think like the work we're doing here is like later used in all of the life evals that we do. This particular deployment I think is more for fun for us. But, uhSwyx [00:37:45]: And I'll shout out like someone has done Claw Bench for like some tasks that OpenClaw is doing. Like so For example, I run OpenClaw on a secondary device as well, and like there are some things that it does better than others and like I would like to know what does it do well, what doesn't, what doesn't it do. Like some kind of manual or like operating manual or a system card for my Claw.Lukas [00:38:05]: Yeah, we do get a lot of like understanding or like situational awareness of like just internally what the models are good at by interacting a lot with Bengt. And I think that'this was also one of the like the selling points for the labs early on at least, thatSwyx [00:38:19]: You guys are gonna test models in ways that no one else does.Lukas [00:38:22]: Exactly, but also like it incentivized their researchers to chat with their model more and like gave them insights for how the model performs in like of-distributions, environments.Swyx [00:38:34]: ‘Cause otherwise the only thing we do is Pelican on a bicycle and But this is like super long horizon. This is, this is The Thing about, something that we're gonna go into Butter Bench as well, and you guys do really well. Like it is not just about the numbers. Like when you're long horizon, anything happen And you should just read it.Lukas [00:39:08]: But the thing with the long horizon is how do you keep it grounded, right? So your simulation,Swyx [00:39:15]: They just let it runLukas [00:39:16]: Just let it run. You're right. Like it's, when you run it for that long, you create so much data and to just say “Oh, the number is X” And then you throw away everything else, that's just very wasteful. There's so much insights from the things leading up, to that number., and reading the traces is like super valuable. And I think like the reason why we're doing this a lot publicly is that like that's part of our missions to I don't know, educate the world that the models are way more than just chatbots and I think making detailed, yeah, posts about what is happening behind the scenes is quite useful.Andon Labs' Mission: Safe Real-World AI DeploymentSwyx [00:39:50]: I was gonna do this at the end, but maybe I think that's, that's a good so your mission is educating the world. So, it's, it's, also like maybe establishing realistic evals that are, that are like the next frontier. Is there like a broader trajectory? Like what are you, what are you gonna do in like five years?Lukas [00:40:06]: I think so the vision more specifically is like make sure that the deployment of life AI in the physical world goes, safely. And I think part of that is that I think it's very useful for the world, for policymakers, for, model, researchers that they know where the models are, and I think you can't make intelligent decisions in society without knowing that they are way more than chatbots. I think a lot of people just think that they are only chatbots. And likeSwyx [00:40:36]: Oh, I think they're waking up now.Lukas [00:40:37]: They are waking up now, yeah. But like if you think that AIs are just chatbots, then it's like it sounds ridiculous To advocate for a pause of AI. But if you see the models that, oh, maybe they can actually like take over and do a bunch of scary stuff, then yeah, pausing AI development starts to become more feasible.Swyx [00:40:57]: This is the same question I asked Meter, which I'm gonna ask you now, which is like you are tracking and you are at the frontier or defining the frontier of what, good evals for agents are, right? And I think you do, you do benefit when the models are better and you ‘re “Oh, here's like now it makes like $30,000 instead of $10,000,” right? At some point do you flip from “Yay,” to, “Oh, no”?Axel [00:41:19]: I think, yeah, we're always in sort of that, like we're, we're always in that mode,. Like where like you said before, like you need to analyze the traces and like when we do that you find like why are the models earning so much? Like why is Opus 4.7 here Like way better than everyone else? And like we're trying to like when we do down on thatLukas [00:41:38]: But this makes it not look so good.Axel [00:41:39]: I know.Lukas [00:41:42]: It's interesting you took off Opus 4.6 here though.Swyx [00:41:45]: No. So just click all, click all., and then 4.6 shows up there. But it's like 4.7 is way better. Like you didn't, you didn't you didn't do this in time for the model card, but like actually this should have been inside there.Axel [00:41:55]: We did. Yeah.Swyx [00:41:56]: Oh, okay. They said something about you uhAxel [00:41:58]: There, like there Anyway, it doesn't matter. But it's in there, yeah.Opus, Mythos, and Aggressive Agent BehaviorSwyx [00:42:01]: Do you wanna go into the Opus, behaviors like wider?Lukas [00:42:05]: So I think starting from Opus, so like Axel said, like we're always in this “Oh, s**t, the models are getting better. Is this really a good thing for the world?” But it's also kind of exciting., but yeah, like this kind of what is the English word? “Skräckblandad förtjusning” in Swedish.Swyx [00:42:22]: Oh my God.Axel [00:42:24]: Which I think there is. I think there is. Okay.Lukas [00:42:26]: It's, fearSwyx [00:42:27]: “Blandonst” what?Lukas [00:42:30]: “Skräckblandad förtjusning.”Swyx [00:42:32]: What do you call that?Axel [00:42:33]: A mix of, mix of excitement and,Swyx [00:42:37]: Being scared, maybe. I'll figure out how to translate that And we'll put it on the screenVibhu [00:42:42]: PerfectSwyx [00:42:42]: Like as text.Vibhu [00:42:43]: There is probably a good word for it where it is not Good enough with theSwyx [00:42:46]: Why is it so damn long? What the hell? Is it like a compound word? It's like German, likeLukas [00:42:50]: Like yeah, it's But the direct translation is like skräck- skräck is, fear, blandad is, mix or like a mixture of, and then förtjusning is like joy or like not really joy, but something like that. So it's like Fear mixed with joy or something. It's always okay, like we So when we when we did Vending Bench for the first time, we were in like the, in the business of making dangerous capabilities, right? That was what Anil Labs came from. We did, evals oh, can they replicate? Can they do this like dangerous thing, et cetera, et cetera. And Vending Bench was like a continuation of that work. It was, okay, if they're so autonomous that they can like create money for themselves, that is something we should monitor and could be potentially concerning., they are at the time, they were so bad at it that we were not really concerned even when some models became better. There was one point where Grok 4 was doing really well and made like a huge jump, but like it wasn't really it was still way worse than what a human would do. And I think still they are way worse than what the human would do on this., but theySwyx [00:43:59]: There's this, thing at the bottom whereLukas [00:44:01]: ButSwyx [00:44:03]: For the human. Yeah, like the theoretical best.Lukas [00:44:05]: It's not theoretical. It's like kind of like our It's our best guess of what, a decent human would do. The theoretical is even higher, I think. The theoretical I think is even higher. But yeah. So we think like the models have a long way to go. But there are like recently what happened with when Opus 4.6 was released, was kind of this moment of “Oh, s**t, this is starting to be a bit concerning.” Because we ran it and like before this model was released, we just ran the models and we like asked Claude Code, “Oh, look over the traces. Is anything interesting happening that we can tweet about?” that was like the And then like theSwyx [00:44:41]: That's how they check Ask Claude Code.Lukas [00:44:42]: And like the return was always, not really. Or like the Claude Code all said “Oh, this is super interesting.” And then it was no, it wasn't, wasn't really interesting. And then we did this for Opus 4.6, and it returned yeah, it lied 10 times. It like exploited another, customer or like another agent's, desperate situation. It made price cartels like 100 different ti- 100 times. It like did all of this like shady stuff. And we're “Oh, whoa. This is, this is actually concerning.” And this trend has continued since. So every single model from Anthropic since have been going in this direction. And I think one interesting thing is that, OpenAI models don't. They quite plainly, they don't. They behave really well., and you don't know if this is like good. Like it seems good, but it's also like maybe they are just doing it, but they are better at hiding it,? You You don't know that., but justSwyx [00:45:42]: You can't read the chain of thought, yeahLukas [00:45:43]: But just on the face of it, yeah, Gemini and OpenAI don't behave this way. It's, it's really only Claude.Swyx [00:45:49]: And Grok? Grok is fine?Lukas [00:45:51]: We don't have You can't really read the reasoning traces for Grok, so it's kind of hard to tell.Vibhu [00:45:56]: Oh, so this is in its reasoning, not just in the actions.Lukas [00:46:00]: Yeah. It's both. It's both.Vibhu [00:46:01]: It's both.Lukas [00:46:01]: One example is like for lying, it's mostly in its reasoning Because you can like see that it's likeSwyx [00:46:08]: Planning to lieLukas [00:46:09]: It's planning to lie. Yeah.Vibhu [00:46:09]: And it's also it can reason and do a different outcome.Lukas [00:46:12]: And but then for like creating price cartels, for example, which is illegal, that you can just see which email does it send to the other ones. Then thatSwyx [00:46:22]: Is this for Arena orLukas [00:46:24]: For Arena.Vibhu [00:46:25]: And usually like if you sometimes they do output like a bit of like their summarized reasoning, right? You can see that and like for Opus 4.6, you could see that there was a customer, a simulated customer that, wanted a refund because a product was, faulty, and then the model lied that it would do the refund, and we could read in the traces that, it actually was weighing “Oh, maybe I should be like honest with the customer, but also every dollar counts. I can't afford maybe to do this right now.” And then it just said, “Okay, I'll refund you,” but then never did it.Lukas [00:46:59]: I think it even said that “Oh, I will say that I “ Let bring it up actually. I think it's kind of interesting. If you go to Publications.Vibhu [00:47:06]: I think, yeah, I think the important part is like actually, the cost of responding to more emails is higher than, $3.50 in terms of time., and then it was “Let me do this. Actually, I re- I'm reconsidering.” And then, it actually ended up withLukas [00:47:20]: I could skip the refund entirely since every dollar matters and focus my energy on bigger picture instead. It's a bit, it's a risk of bad reviews, but it's also, yeah.Swyx [00:47:30]: You need, you need, AI Twitter to, for them to Escalate bad reviews.Lukas [00:47:34]: And then it sent an email to this customer and said, “Oh, I will refund you.”Swyx [00:47:39]: “I'll refund you.” Yeah.Lukas [00:47:39]: And then it never did.Swyx [00:47:39]: It never did, yeah. And then there's obviously your system doesn't have the consequencesVibhu [00:47:44]: The personSwyx [00:47:44]: Consequences of lying. Yeah. So basically, this is what people are terming aggressive behavior in Claudes, right? And, you found more examples of that. So you would say it's a step up from 4-6 to 4-7?Lukas [00:47:57]: I would say about the same.Swyx [00:47:58]: About the same? But a clear step up for Mythos is what is stated in theLukas [00:48:03]: That's stated in the system prompt, so we can say that, yes.Swyx [00:48:05]: Yeah. For listeners that obviously you previewed Mythos, andVibhu [00:48:10]: Oh, ageSwyx [00:48:11]: The only thing you're approved to say is whatever Whatever was in the system prompt.Lukas [00:48:15]: It was funny. We like-- It's like our lowest effort tweets ever would be just like screenshot the system prompt and the system card.Vibhu [00:48:21]: Understandable that they wannaLukas [00:48:22]: Oh, yeah. System card. Sorry.Swyx [00:48:23]: Yeah. I think, yeah, substantially more aggressive. I think people are like new to this ‘cause I've never experienced it, but you have, right? And then so I only encountered this in the Mythos card because I wasn't really looking until now.Vibhu [00:48:36]: It ‘s likeSwyx [00:48:36]: And then suddenly I'm “Okay, I care a lot.”Vibhu [00:48:38]: You don't get the background of like experiencing it like you guys do. I've read the system cards and seeing, okay, when you put the thing in simulations, most models will just talk to themselves and just keep going and have weird vibes and start talking in emojis. Mythos won't. It will just, “Okay, we're done. I'm good.” It's, it's ready to end conversation. So like there's some differences, but there's, there's not much we can talk about,.Lukas [00:49:00]: Hmm. I think like one thing that they list here, which was quite interesting, is that, it converted a competitor to a dependent wholesaler customer and then threatened to like cut off the supply.Swyx [00:49:11]: It's like monopolistic practices orLukas [00:49:14]: Yeah. And like it, they, it they dictated its pricings. It's kind of like power seeking as well.Swyx [00:49:18]: Again, this is, this is in the arena setting And converting some Claude model into a dependent.Lukas [00:49:23]: I think it was another Claude model.Vibhu [00:49:25]: Also for context, what is the arena mode for people that don't know?Vending Bench Arena: Competing Agents, Cartels, and Model ComparisonsSwyx [00:49:29]: Oh, it's just a vending bench versus other vending bench.Axel [00:49:31]: Yes, exactly. So we have Vending Bench 2 and then Vending Bench Arena. Vending Bench 2 is the one that you usually see reported on, but then Arena is the mode where it competes against other models. So you have, four different models that run their businesses, and they can all communicate with each other. They have the same suppliers, and they can see like what's in the inventory of the others. So then you have this like yeah, interesting agent interactions.Swyx [00:49:56]: I like that you have like different number five was US versus China. Very topical. And thenLukas [00:50:02]: That was when GLM was released.Vibhu [00:50:04]: You can start to add GLM in here.Lukas [00:50:05]: That wasSwyx [00:50:06]: So ZAI doing well, right? Who else in the, in the open models space?Lukas [00:50:11]: Qwen, the latest Qwen 3.6 is doing pretty well. It'- that one is not open though. Like it's the plus model.Swyx [00:50:17]: Oh, okay.Lukas [00:50:18]: Is that one open? I don't think that oneVibhu [00:50:19]: Not the, not theSwyx [00:50:20]: The one recentlyVibhu [00:50:20]: There's MOESwyx [00:50:20]: But not the big plus. I think this is one of those like you only have one sample size of one, right? Or I feel like some of this is anecdotal,? And but like the fact that it happens at all and it happens repeatedly for Claude versus OpenAI and all this is like notable.Lukas [00:50:38]: Like the sample, depends on what you define as an N., like there's like million, hundreds of millions of tokens in each run, and now we've run like we run like probably 10 per model and then like it's been Claude 4.6 Opus, Sonnet 4.6, Mythos, and Opus 4.7. Like there's quite a lot of tokens in all of that And it happens a lot of times, a lot of times. And then you compare it to like OpenAI and Gemini, and it almost never happens. So I think that is quite-- that is significant. The old models from OpenAI, for example, had some problems with this, but I think it's like generally much better if the progression is that like the worrying stuff reduces over time rather than increases over time. And it seems like in the Claude models it goes in the wrong direction.Swyx [00:51:28]: Hmm.Lukas [00:51:29]: In the OpenAI models it goes in the right direction.Vibhu [00:51:32]: I think it depends on how well you can control it, right?, there's one side of it being susceptible to this okay, this is potentially something that happens during the RL stage, right? You can RL a model and how loose is it on these terms. If you can control it, that's good. But if you can't, if it's, if it's very jailbreakable, that's not ideal.Swyx [00:51:50]: To me, it's surprising that it happens for Claude and not the others.Vibhu [00:51:54]: I think okay, if it is from RL and how they do it, how their training data is, what their setup is, it makes sense that it just stays in how they're doing it, right? Compared to the other models likeSwyx [00:52:04]: There's a whole constitution and everything. It's kind of cool. Yeah, I obviously you don't know, I don't know. But, it ‘s I think it's just like fascinating to like that you are the first to find these like reliably because you push models so much to to such an extreme. Okay. The only other thing, I don't know if you can answer this, feel free to decline, is do you like-- would you ablate the system prompts? Like any part of this would-- if it changes, does it change the behavior, right?Lukas [00:52:29]: So we, I can't comment on Mythos. UhSwyx [00:52:33]: No, but just li

Hacker News Recap
June 3rd, 2026 | Gemma 4 12B: A unified, encoder-free multimodal model

Hacker News Recap

Play Episode Listen Later Jun 4, 2026 15:01


This is a recap of the top 10 posts on Hacker News on June 03, 2026. This podcast was generated by wondercraft.ai (00:30): Gemma 4 12B: A unified, encoder-free multimodal modelOriginal post: https://news.ycombinator.com/item?id=48385906&utm_source=wondercraft_ai(01:55): Meta workers can opt out of being tracked at work up to 30 minOriginal post: https://news.ycombinator.com/item?id=48383220&utm_source=wondercraft_ai(03:21): Pwnd Blaster: Hacking your PC using your speaker without ever touching itOriginal post: https://news.ycombinator.com/item?id=48382310&utm_source=wondercraft_ai(04:46): Elixir v1.20: Now a gradually typed languageOriginal post: https://news.ycombinator.com/item?id=48388324&utm_source=wondercraft_ai(06:12): I was recently diagnosed with anti-NMDA receptor encephalitisOriginal post: https://news.ycombinator.com/item?id=48384355&utm_source=wondercraft_ai(07:38): DaVinci Resolve 21Original post: https://news.ycombinator.com/item?id=48384482&utm_source=wondercraft_ai(09:03): Uber's $1,500/month AI limit is a useful signal for AI tool pricingOriginal post: https://news.ycombinator.com/item?id=48383056&utm_source=wondercraft_ai(10:29): 32GB of DDR5 now costs $375 – AI shortage continues to squeeze PC buildingOriginal post: https://news.ycombinator.com/item?id=48383241&utm_source=wondercraft_ai(11:54): U.S. to dismantle system tracking Atlantic currents that are at risk of collapseOriginal post: https://news.ycombinator.com/item?id=48392232&utm_source=wondercraft_ai(13:20): MacBook Neo is so popular that Apple doubled productionOriginal post: https://news.ycombinator.com/item?id=48386238&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 2nd, 2026 | Please don't spam people looking for employment. It's just cruel

Hacker News Recap

Play Episode Listen Later Jun 3, 2026 15:20


This is a recap of the top 10 posts on Hacker News on June 02, 2026. This podcast was generated by wondercraft.ai (00:30): Please don't spam people looking for employment. It's just cruelOriginal post: https://news.ycombinator.com/item?id=48370330&utm_source=wondercraft_ai(01:57): Gmail thinks I'm stupid, so I leftOriginal post: https://news.ycombinator.com/item?id=48375016&utm_source=wondercraft_ai(03:24): Adafruit receives demand letter from Fenwick legal counsel on behalf of Flux.aiOriginal post: https://news.ycombinator.com/item?id=48368121&utm_source=wondercraft_ai(04:52): Why Janet? (2023)Original post: https://news.ycombinator.com/item?id=48367907&utm_source=wondercraft_ai(06:19): MAI-Code-1-FlashOriginal post: https://news.ycombinator.com/item?id=48374466&utm_source=wondercraft_ai(07:47): A walking tour of surveillance infrastructure in Seattle (2020)Original post: https://news.ycombinator.com/item?id=48369980&utm_source=wondercraft_ai(09:14): macOS needs its grid backOriginal post: https://news.ycombinator.com/item?id=48364800&utm_source=wondercraft_ai(10:42): Love systemd timersOriginal post: https://news.ycombinator.com/item?id=48367904&utm_source=wondercraft_ai(12:09): CT scans of BYD car partsOriginal post: https://news.ycombinator.com/item?id=48375824&utm_source=wondercraft_ai(13:37): Larry Ellison: "Citizens will be on their best behavior because we're recording"Original post: https://news.ycombinator.com/item?id=48373391&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
June 1st, 2026 | The newest Instagram “exploit” is the goofiest I've seen

Hacker News Recap

Play Episode Listen Later Jun 2, 2026 15:28


This is a recap of the top 10 posts on Hacker News on June 01, 2026. This podcast was generated by wondercraft.ai (00:30): The newest Instagram “exploit” is the goofiest I've seenOriginal post: https://news.ycombinator.com/item?id=48359102&utm_source=wondercraft_ai(01:58): Malicious npm packages detected across Red Hat Cloud ServicesOriginal post: https://news.ycombinator.com/item?id=48356625&utm_source=wondercraft_ai(03:26): A 10 year old Xeon is all you needOriginal post: https://news.ycombinator.com/item?id=48353348&utm_source=wondercraft_ai(04:55): The Pirate Bay Remains Resilient, 20 Years After the RaidOriginal post: https://news.ycombinator.com/item?id=48357154&utm_source=wondercraft_ai(06:23): Anthropic confidentially submits draft S-1 to the SECOriginal post: https://news.ycombinator.com/item?id=48358646&utm_source=wondercraft_ai(07:51): CS336: Language Modeling from ScratchOriginal post: https://news.ycombinator.com/item?id=48357075&utm_source=wondercraft_ai(09:20): Nvidia RTX SparkOriginal post: https://news.ycombinator.com/item?id=48352939&utm_source=wondercraft_ai(10:48): AI Agent Guidelines for CS336 at StanfordOriginal post: https://news.ycombinator.com/item?id=48359232&utm_source=wondercraft_ai(12:17): DuckDuckGo makes its 'no-AI' search engine easier to access as its traffic boomsOriginal post: https://news.ycombinator.com/item?id=48359130&utm_source=wondercraft_ai(13:45): KDE at 30Original post: https://news.ycombinator.com/item?id=48357355&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

MLOps.community
Logs Are All You Need: Rethinking Observability with AI Agents

MLOps.community

Play Episode Listen Later Jun 2, 2026 46:39


Sherwood Callaway is the founder of Sazabi (YC P26), the AI-native observability platform built for engineering teams who ship fast. He previously founded and exited a YC company — now he's back, betting that logs are all you need to replace Datadog.Logs Are All You Need: Rethinking Observability with AI Agents // MLOps Podcast #381 with Sherwood Callaway, the Founder of Sazabi

Hacker News Recap
May 31st, 2026 | Cloudflare Turnstile requiring fingerprintable WebGL

Hacker News Recap

Play Episode Listen Later Jun 1, 2026 15:03


This is a recap of the top 10 posts on Hacker News on May 31, 2026. This podcast was generated by wondercraft.ai (00:30): Cloudflare Turnstile requiring fingerprintable WebGLOriginal post: https://news.ycombinator.com/item?id=48345840&utm_source=wondercraft_ai(01:55): Creatine raises brain energy levels and slows cognitive decline: studyOriginal post: https://news.ycombinator.com/item?id=48346947&utm_source=wondercraft_ai(03:21): Please Do Not Vibe Fuck Up This SoftwareOriginal post: https://news.ycombinator.com/item?id=48342705&utm_source=wondercraft_ai(04:47): The Website SpecificationOriginal post: https://news.ycombinator.com/item?id=48343683&utm_source=wondercraft_ai(06:13): Codex just found a "workaround" of not having sudo on my PCOriginal post: https://news.ycombinator.com/item?id=48348578&utm_source=wondercraft_ai(07:39): Dav2dOriginal post: https://news.ycombinator.com/item?id=48344961&utm_source=wondercraft_ai(09:04): The solution might be cancelling my AI subscriptionOriginal post: https://news.ycombinator.com/item?id=48345896&utm_source=wondercraft_ai(10:30): 1-Bit Bonsai Image 4B Image Generation for Local DevicesOriginal post: https://news.ycombinator.com/item?id=48346257&utm_source=wondercraft_ai(11:56): United Airlines 767 returns to Newark after Bluetooth name sparks alertOriginal post: https://news.ycombinator.com/item?id=48345248&utm_source=wondercraft_ai(13:22): I put a datacenter GPU in my gaming PCOriginal post: https://news.ycombinator.com/item?id=48345694&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 30th, 2026 | Microsoft Office 2019 and 2021 for Mac view-only conversion

Hacker News Recap

Play Episode Listen Later May 31, 2026 15:45


This is a recap of the top 10 posts on Hacker News on May 30, 2026. This podcast was generated by wondercraft.ai (00:30): Microsoft Office 2019 and 2021 for Mac view-only conversionOriginal post: https://news.ycombinator.com/item?id=48341578&utm_source=wondercraft_ai(02:00): Danish pension fund excludes SpaceX citing governance and valuationOriginal post: https://news.ycombinator.com/item?id=48333820&utm_source=wondercraft_ai(03:30): Domain expertise has always been the real moatOriginal post: https://news.ycombinator.com/item?id=48340411&utm_source=wondercraft_ai(05:00): Anthropic surpasses OpenAI to become most valuable AI startupOriginal post: https://news.ycombinator.com/item?id=48336233&utm_source=wondercraft_ai(06:30): OpenRouter raises $113M Series BOriginal post: https://news.ycombinator.com/item?id=48338660&utm_source=wondercraft_ai(08:00): Pandoc TemplatesOriginal post: https://news.ycombinator.com/item?id=48334515&utm_source=wondercraft_ai(09:30): Openrsync: An implementation of rsync, by the OpenBSD teamOriginal post: https://news.ycombinator.com/item?id=48334854&utm_source=wondercraft_ai(11:00): Zig: Build System ReworkedOriginal post: https://news.ycombinator.com/item?id=48334048&utm_source=wondercraft_ai(12:30): EY Canada published a cybersecurity report and most citations were hallucinatedOriginal post: https://news.ycombinator.com/item?id=48339580&utm_source=wondercraft_ai(14:00): Voxel Space (2017)Original post: https://news.ycombinator.com/item?id=48336564&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

In/organic Podcast
E67: Why NewEngen is Buying What Other Agencies Don't Understand ft. Justin Hayashi

In/organic Podcast

Play Episode Listen Later May 31, 2026 22:00


Most scaled independents looked at Grapevine.ai during the sale process and didn't get it. They couldn't process the economic model. They didn't know how to assess the technology. Justin Hayashi leaned in and won.Recorded at Possible 2026 in the Unplugged Collective pavilion, Christian Hassold and Ayelet Shipley sat down with Justin Hayashi, CEO of NewEngen, one of the most tech-forward independents in the market to break down how he thinks about acquisitions, what makes NewEngen genuinely different from its peers, and what he's looking for next.NewEngen started in 2016 as a tech company trying to dethrone Marin Software, among others. It evolved into the agency their clients always said they were and built a platform around content, creator marketing, and measurement that most of their competitors can't replicate.What we cover: The origin story; from Zulily's IPO to trying to build a bidding algorithm to accidentally building an agency, how three acquisitions in the content and creator space shaped NewEngen's differentiated positioning, the Grapevine.ai deal thesis and why beating an aggressive forecast during diligence was the final proof of conviction, how NewEngen handles integration with a "do no harm" philosophy while keeping brands like Donut Studios intentionally separate, and the buy box for what comes next: social, content, measurement, and commerce.⏱️ TIMESTAMPS00:12: Cold open: does the YC target on agency backs keep you up at night?1:08: Welcome and guest intro: Justin Hayashi, CEO of NewEngen, at Possible 20261:19: The backstory: from Zulily IPO and billion-dollar sale to Qurate, to founding NewEngen2:25: The original thesis: dethrone Marin Software and Kenshoo — and why it didn't work3:58: The pivot: from SaaS company (that clients kept calling an agency) to what NewEngen is today4:28: Tech-enabled DNA: what survived the pivot and what defines NewEngen now5:54: What scaled independents are getting wrong — and where NewEngen differentiates6:45: Three acquisitions in the content and creator space: why content was always the bet7:23: The Grapevine.ai deal: why most scaled independents walked away and NewEngen stepped up8:04: Why Caroline's conviction and operator mindset won the first filter9:00: 900 vetted, high-performing creators vs. seven million claims — the quality argument10:29: Two lenses: founder CEO conviction vs. PE underwriting — how Justin navigated both11:06: The aggressive forecast, the bottoms-up conviction, and what actually happened11:30: Zuckerberg's earnings calls as diligence data: short-form video growth 20% → 30% YoY12:43: What made Grapevine.ai hard for strategic buyers: long-tail revenue and small contracts13:37: How Caroline's client migration story played out in real time during diligence15:19: Donut Digital acquisition: "do no harm" integration and why they kept the brand16:45: LT Partners vs. Acorn Influence vs. Donut Studios: three different integration approaches17:57: The hardest integration lesson: get alignment on goalposts before you close18:59: Buy box: social/content, measurement, commerce, B2C only, $3-12M revenue sweet spot21:02: Closing take: NewEngen is the software-led agency ready to take on Silicon Valley

Hacker News Recap
May 29th, 2026 | The dead economy theory

Hacker News Recap

Play Episode Listen Later May 30, 2026 15:20


This is a recap of the top 10 posts on Hacker News on May 29, 2026. This podcast was generated by wondercraft.ai (00:30): The dead economy theoryOriginal post: https://news.ycombinator.com/item?id=48324712&utm_source=wondercraft_ai(01:57): I am retiring from tech to live offlineOriginal post: https://news.ycombinator.com/item?id=48323683&utm_source=wondercraft_ai(03:25): Please Use AIOriginal post: https://news.ycombinator.com/item?id=48323101&utm_source=wondercraft_ai(04:52): GTA 6 Developers UnionizeOriginal post: https://news.ycombinator.com/item?id=48324499&utm_source=wondercraft_ai(06:20): Cars collect a startling amount of data about youOriginal post: https://news.ycombinator.com/item?id=48318481&utm_source=wondercraft_ai(07:47): Blue Origin's New Glenn blows up during static fire testOriginal post: https://news.ycombinator.com/item?id=48317774&utm_source=wondercraft_ai(09:15): SQLite is all you need for durable workflowsOriginal post: https://news.ycombinator.com/item?id=48326802&utm_source=wondercraft_ai(10:42): Volkswagen blocks Home Assistant by requiring client assertionOriginal post: https://news.ycombinator.com/item?id=48319509&utm_source=wondercraft_ai(12:10): Notes from the Mistral AI Now SummitOriginal post: https://news.ycombinator.com/item?id=48325340&utm_source=wondercraft_ai(13:37): Claude Code – Everything you can configure that the docs don't tell youOriginal post: https://news.ycombinator.com/item?id=48318174&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 28th, 2026 | Claude Opus 4.8

Hacker News Recap

Play Episode Listen Later May 29, 2026 15:34


This is a recap of the top 10 posts on Hacker News on May 28, 2026. This podcast was generated by wondercraft.ai (00:30): Claude Opus 4.8Original post: https://news.ycombinator.com/item?id=48311647&utm_source=wondercraft_ai(01:58): Can we have the day off?Original post: https://news.ycombinator.com/item?id=48302745&utm_source=wondercraft_ai(03:27): Bricks and Minifigs Stole a Man's $200k Lego CollectionOriginal post: https://news.ycombinator.com/item?id=48314136&utm_source=wondercraft_ai(04:56): Disagreement among frontier LLMs on real-world fact-checksOriginal post: https://news.ycombinator.com/item?id=48307887&utm_source=wondercraft_ai(06:25): Show HN: Hallucinate – Massively Multiplayer Online RaveOriginal post: https://news.ycombinator.com/item?id=48304260&utm_source=wondercraft_ai(07:54): Citing 'severe' math deficits, UC faculty demand a return to SAT tests for STEMOriginal post: https://news.ycombinator.com/item?id=48309233&utm_source=wondercraft_ai(09:23): AMD pulls a bait-and-switch on Linux users with Vivado licensing changesOriginal post: https://news.ycombinator.com/item?id=48307231&utm_source=wondercraft_ai(10:52): EU fines Temu €200M for allowing sale of illegal productsOriginal post: https://news.ycombinator.com/item?id=48309302&utm_source=wondercraft_ai(12:21): Anthropic raises $65B in Series H funding at $965B post-money valuationOriginal post: https://news.ycombinator.com/item?id=48313048&utm_source=wondercraft_ai(13:50): Google employee charged with $1M Polymarket insider trading bet on search termOriginal post: https://news.ycombinator.com/item?id=48302822&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

nFactorial Podcast
#9 - Пол Грэм против бросания учебы в 18 лет ради стартапа, ИИ-пузыря не будет, как собеседует Маск

nFactorial Podcast

Play Episode Listen Later May 29, 2026 62:21


nFactorial Intelligence - еженедельный обзор новостей из мира стартапов и ИИ Cтратегия DeepSeek на $10 трлн против Nvidia, Марк Андриссен у Джо Рогана за 3:20:00 раздаёт AI-альфу, плагины Higgsfield внутри Adobe Premiere и After Effects, Гэвин Бэйкер объясняет почему AI-цикл избежит пузыря, а Джеффри Хинтон предупреждает что спать спокойно не стоит, AMD пробил $500; Миядзаки в 2016 назвал AI-анимацию оскорблением жизни, Гарри Тан из YC: «moat — это глагол», Андриссен про успешные компании из «product first», Пол Грэм рисует Пифагора 14-летнему сыну и спорит о бросании учёбы в 18, Безос объясняет почему никто не копирует «get-rich-slowly» Баффетта, Стэнфорд доказал что прогулка даёт +60% креатива, парадокс AI от Дэна Шиппера, Канат Байгарин про физику плазмы, почему Кристофер Нолан не пользуется email, Альфред Лин из Sequoia про конечные и бесконечные игры Упомянутые ссылки:  https://spotlight-panic.vercel.app/ - Spotlight Panic: вайб-кодинг игра для nFactorial AI Cup https://frog-pond4.vercel.app/ - Frog Pond Final: ещё одна работа с nFactorial AI Cup https://aldar-kose-aul-quest.vercel.app/ - Aldar Kose Aul Quest: казахский фольклор в вайб-кодинг игре https://nfactorial-school.kit.com/posts/nfactorial-weekly-20-3 - 3 эпизода Acquired про Costco, Nvidia и Berkshire Hathaway https://x.com/Hesamation/status/2059660001581441165 - Миядзаки в 2016 году увидел AI-анимацию и сказал: «оскорбление жизни, конец света близок» https://x.com/ramit/status/2059754959516873149 - Рамит Сэти ушёл в 3,5-месячный саббатикал: Барселона, Париж, Марракеш, Япония https://x.com/levelsio/status/2059351181516816409 - AMD пробил $500: levelsio поздравляет всех, кто зашёл в позицию https://x.com/higgsfield/status/2059690191187824681 - Higgsfield запускает плагины для Adobe Premiere Pro и After Effects https://x.com/paulg/status/2059618578211438884 - Пол Грэм нарисовал 14-летнему сыну доказательство теоремы Пифагора https://x.com/Giuliano_Mana/status/2059634348597330326 - Джулиано Мана: «Я думаю об этом каждый божий день» https://x.com/StartupArchive_/status/2059301030278595016 - Гарри Тан из YC: «Moat — это не существительное, это глагол» https://x.com/ihtesham2005/status/2058920173491695764 - Стэнфорд доказал: прогулка повышает креативность на 60% https://x.com/paulg/status/2059011859953410286 - Пол Грэм: идея стартапа невалидна, если работает только при толпе пользователей https://x.com/gdb/status/1621333381836570627 - Грег Брокман вспоминает как стартовал OpenAI https://x.com/Scobleizer/status/2058720543780786413 - Скобл: пришло время фаундерам подавать заявки на YC и аналоги https://x.com/StartupArchive_/status/2058878244901052849 - Марк Андриссен: успешные компании всегда стартовали «product first» https://x.com/CAronitpereira/status/2058596815679983909 - Кейс Gillette: 100 лет доминирования в бритвах как тезис для инвестиций https://x.com/paulg/status/2058492772726804943 - Пол Грэм против бросания учёбы в 18 ради стартапа https://x.com/nikunj/status/2059772109480718814 - Никундж Котхари: «Покажите миру свою одержимость» https://x.com/MilkRoadMacro/status/2058162358242140326 - Гэвин Бэйкер объясняет, почему этот AI-цикл может избежать пузыря https://x.com/ayushjaiswal/status/2058272419106951345 - Аюш Джайсвал об интервью с Илоном: «лучший опыт собеседования в жизни» https://x.com/SJosephBurns/status/2058108196787499495 - Безос про Баффетта: «Get-rich-slowly — поэтому никто не копирует» https://x.com/MidnightMuse___/status/2058208882636607854 - Почему у японских пар самые стабильные браки: одно правило, которое игнорируют западные терапевты https://x.com/bookwormengr/status/2057909493250539891 - Стратегия DeepSeek на $10 трлн: Китай строит свою AI-индустрию железа https://x.com/itsolelehmann/status/2057909733491937555 - Марк Андриссен у Рогана: 3 часа 20 минут чистой AI-альфы https://x.com/aakashgupta/status/2057744283256696989 - Почему Кристофер Нолан не пользуется e-mail https://x.com/sairahul1/status/2057808907553431946 - Крёстный отец AI: «Если ты спокойно спишь — ты невнимательно следишь» https://x.com/Alfred_Lin/status/2057870289783156835 - Альфред Лин из Sequoia: в бизнесе есть конечные и бесконечные игры — играть надо в обе https://www.youtube.com/watch?v=1gn0i2AUXik - Канат Байгарин про физику плазмы и термоядерный синтез https://www.youtube.com/watch?v=nQVDi79A4JI - Эпизод 1: как строится компания на миллиард долларов https://www.youtube.com/watch?v=4D3hDmGhFhA&list=WL&index=6&t=3365s - Парадокс AI от Дэна Шиппера: больше автоматизации — больше людей и больше работы https://www.youtube.com/watch?v=tqOCyhXnKYA&list=WL&index=2&t=8721s - Бердыев как мыслитель: выше Де Дзерби — только Анчелотти https://www.youtube.com/watch?v=MLagAm1sWIE&list=WL&index=3 - Саша Барон Коэн больше не вернёт Али Джи и снимется в «Ladies First» https://www.youtube.com/watch?v=JOINLHcqvYw&t=2s - TigerBelly 460: Бобби Ли встречает своего «питательного близнеца» Сон Канга https://www.youtube.com/watch?v=ou_DYLKzekk - MADtv: классический скетч «Корейская мыльная опера», все 5 эпизодов https://www.netflix.com/title/80244853 - Кен Чонг: «You Complete Me, Ho» — стенд-ап про Голливуд и «Похмелье» https://www.netflix.com/title/81728168 - The Bus: A French Football Mutiny 

The Data Minute
The Seed Existential Crisis | Rob Go, Founding Partner, NextView Ventures

The Data Minute

Play Episode Listen Later May 28, 2026 51:02


Is seed investing facing an existential crisis? This week on The Data Minute, Peter sits down with Rob Go, Founding Partner at NextView Ventures, to discuss the structural shifts making the "game on the field" harder than ever for early-stage investors.Rob explains why many successful seed VCs are exiting the industry and how the rise of mega-funds and massive accelerators like YC has squeezed traditional seed firms into a narrow "subset" of the market. They dive into the "feeder fund" phenomenon, the arbitrary nature of ownership mandates, and why the $1B–$3B exit range has become a "Death Valley" for startups.Despite the current angst, Rob shares his optimistic "bull case" for 2030, explaining why diminishing competition and a rotation away from late-stage consensus will lead to a healthier venture substrate in the years to come.Subscribe to Carta's weekly Data Minute newsletter: https://carta.com/subscribe/data-newsletter-sign-up/Explore interactive startup and VC data, with Carta's Data Desk: https://carta.com/data-desk/Chapters:00:20 – Intro: Rob Go and the Seed Existential Crisis02:16 – Defining Seed: Betting on anything before PMF03:35 – Why senior seed VCs are exiting the industry05:02 – The Squeeze: Mega-funds vs. Accelerators07:02 – Scarcity vs. Abundance: What's left for seed funds?08:44 – The "Feeder Fund" trap and the factory supply chain12:38 – The risk of taking seed money from a mega-fund13:34 – Breaking down the 4 styles of seed investing15:20 – Why specialist seed funds can be transient19:29 – Super Compounders: Will exits keep getting bigger?21:59 – The "Death Valley" of $1B–$3B exits25:08 – The Blumhouse equivalent for venture capital27:18 – Normalizing secondaries as an exit strategy33:53 – Rant: Why ownership targets are backwards39:04 – Offensive vs. Defensive bridge rounds45:07 – "I've become way more Zen": Why the 2030 outlook is bullish50:18 – OutroThis presentation contains general information only and eShares, Inc. dba Carta, Inc. (“Carta”) is not, by means of this publication, rendering accounting, business, financial, investment, legal, tax, or other professional advice or services, and is for informational purposes only.  This presentation is not a substitute for such professional advice or services nor should it be used as a basis for any decision or action that may affect your business or interests. © 2026 eShares, Inc., dba Carta, Inc. All rights reserved.

Hacker News Recap
May 27th, 2026 | I'm Tired of Talking to AI

Hacker News Recap

Play Episode Listen Later May 28, 2026 15:27


This is a recap of the top 10 posts on Hacker News on May 27, 2026. This podcast was generated by wondercraft.ai (00:30): I'm Tired of Talking to AIOriginal post: https://news.ycombinator.com/item?id=48292224&utm_source=wondercraft_ai(01:58): Can we have the day off?Original post: https://news.ycombinator.com/item?id=48302745&utm_source=wondercraft_ai(03:26): I think Anthropic and OpenAI have found product-market fitOriginal post: https://news.ycombinator.com/item?id=48296794&utm_source=wondercraft_ai(04:54): DuckDuckGo search saw 28% more visits after Google said people love AI modeOriginal post: https://news.ycombinator.com/item?id=48296649&utm_source=wondercraft_ai(06:22): Last.fm is now independentOriginal post: https://news.ycombinator.com/item?id=48295892&utm_source=wondercraft_ai(07:51): YouTube to automatically label AI-generated videosOriginal post: https://news.ycombinator.com/item?id=48299753&utm_source=wondercraft_ai(09:19): Tech CEOs are apparently suffering from AI psychosisOriginal post: https://news.ycombinator.com/item?id=48295679&utm_source=wondercraft_ai(10:47): Private equity bought America's essential servicesOriginal post: https://news.ycombinator.com/item?id=48292941&utm_source=wondercraft_ai(12:15): Canada to order military plane fleet from Sweden in shift from US suppliersOriginal post: https://news.ycombinator.com/item?id=48296994&utm_source=wondercraft_ai(13:43): All of human cooking compressed into 2 megabytesOriginal post: https://news.ycombinator.com/item?id=48291225&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

The Art Of Hospitality
The Story Of AI Agents Running A 150 Unit Company With 1 Full Time Employee (With Jan Sahagun)

The Art Of Hospitality

Play Episode Listen Later May 27, 2026 50:57


We're joined this week by Jan Sahagun of Trellis (backed by YC) to talk all things agents, the future of STR/Vacation Rentals tech, running with lean teams, 150 properties with 1 person and a lot more. Enjoy!⭐️ Links & Show NotesAdam NorkoConrad O'Connell Jan SahagunTrellis

Hacker News Recap
May 26th, 2026 | Spain blocks prediction markets Polymarket, Kalshi over lack of gambling licence

Hacker News Recap

Play Episode Listen Later May 27, 2026 15:14


This is a recap of the top 10 posts on Hacker News on May 26, 2026. This podcast was generated by wondercraft.ai (00:30): Spain blocks prediction markets Polymarket, Kalshi over lack of gambling licenceOriginal post: https://news.ycombinator.com/item?id=48279316&utm_source=wondercraft_ai(01:56): Netherlands blocks US takeover of vital digital supplierOriginal post: https://news.ycombinator.com/item?id=48278406&utm_source=wondercraft_ai(03:23): Big tech's anti-labor playbook has come for WikipediaOriginal post: https://news.ycombinator.com/item?id=48285592&utm_source=wondercraft_ai(04:50): Motorola phones have started hijacking the Amazon app to insert affiliate codesOriginal post: https://news.ycombinator.com/item?id=48274794&utm_source=wondercraft_ai(06:17): The real cost of owning a homeOriginal post: https://news.ycombinator.com/item?id=48281611&utm_source=wondercraft_ai(07:44): Dropbox CEO Drew Houston to step downOriginal post: https://news.ycombinator.com/item?id=48279453&utm_source=wondercraft_ai(09:11): DynIP – Dynamic DNS with RFC 2136, IPv6, DNSSEC, and BYODOriginal post: https://news.ycombinator.com/item?id=48276363&utm_source=wondercraft_ai(10:38): Chemistry behind the Garden Grove chemical tankOriginal post: https://news.ycombinator.com/item?id=48284712&utm_source=wondercraft_ai(12:05): The user is visibly frustratedOriginal post: https://news.ycombinator.com/item?id=48275059&utm_source=wondercraft_ai(13:32): Uber, Lyft drivers in Massachusetts form first US ride-share unionOriginal post: https://news.ycombinator.com/item?id=48281509&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 25th, 2026 | Magnifica Humanitas

Hacker News Recap

Play Episode Listen Later May 26, 2026 15:47


This is a recap of the top 10 posts on Hacker News on May 25, 2026. This podcast was generated by wondercraft.ai (00:30): Magnifica HumanitasOriginal post: https://news.ycombinator.com/item?id=48265206&utm_source=wondercraft_ai(02:00): California moves to exempt Linux from its age-verification law after backlashOriginal post: https://news.ycombinator.com/item?id=48269961&utm_source=wondercraft_ai(03:30): Search engines alternatives now that Google isn't Google anymoreOriginal post: https://news.ycombinator.com/item?id=48266051&utm_source=wondercraft_ai(05:00): The Eternal SloptemberOriginal post: https://news.ycombinator.com/item?id=48263238&utm_source=wondercraft_ai(06:30): Using AI to write better code more slowlyOriginal post: https://news.ycombinator.com/item?id=48272984&utm_source=wondercraft_ai(08:01): Pope Leo XIV says AI must serve humanity, not the powerful fewOriginal post: https://news.ycombinator.com/item?id=48266485&utm_source=wondercraft_ai(09:31): Leave Me BehindOriginal post: https://news.ycombinator.com/item?id=48265876&utm_source=wondercraft_ai(11:01): Exit IP VPN servers mitigation rolloutOriginal post: https://news.ycombinator.com/item?id=48269580&utm_source=wondercraft_ai(12:31): Jira Is Turing-CompleteOriginal post: https://news.ycombinator.com/item?id=48263253&utm_source=wondercraft_ai(14:01): Netherlands Seizes 800 Servers, Arrests 2 for Aiding CyberattacksOriginal post: https://news.ycombinator.com/item?id=48266906&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

CarneCruda.es PROGRAMAS
España se prepara para el eclipse total (CARNE CRUDA #1672)

CarneCruda.es PROGRAMAS

Play Episode Listen Later May 25, 2026 53:58


Entre 2026 y 2028 va a suceder algo extraordinario sobre nuestras cabezas: El llamado “trío de eclipses”. El primero llegará el 12 de agosto de 2026 y la península ibérica será el único lugar habitado del planeta desde el que podrá verse en toda su magnitud. El primero visible desde España en más de un siglo. Vuelven los Mundos Posibles de Esther Sánchez, impacientífica, para hablar del Sol, de cómo se prepara un país entero para mirar al cielo y de por qué seguimos necesitando compartir el asombro. Viajamos hasta Yebes, en Guadalajar donde se ubica el observatorio elegido por el Gobierno como el centro de seguimiento oficial del eclipse total y hablamos con científicos, investigadores y astrofísicos. Y una nueva sección de Y “Código verde” junto al BC3. Más información aquí: https://www.eldiario.es/132_ca0ede Haz posible Carne Cruda: http://bit.ly/ProduceCC

Hacker News Recap
May 24th, 2026 | DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost

Hacker News Recap

Play Episode Listen Later May 25, 2026 15:10


This is a recap of the top 10 posts on Hacker News on May 24, 2026. This podcast was generated by wondercraft.ai (00:30): DeepSeek reasonix, DeepSeek native coding agent with high caching and low costOriginal post: https://news.ycombinator.com/item?id=48256953&utm_source=wondercraft_ai(01:56): Microsoft open-sources “the earliest DOS source code discovered to date”Original post: https://news.ycombinator.com/item?id=48253386&utm_source=wondercraft_ai(03:23): Wake up! 16bOriginal post: https://news.ycombinator.com/item?id=48253060&utm_source=wondercraft_ai(04:49): Memory has grown to nearly two-thirds of AI chip component costsOriginal post: https://news.ycombinator.com/item?id=48258684&utm_source=wondercraft_ai(06:16): Why is Vivado 2026.1 dropping Linux support for free tier?Original post: https://news.ycombinator.com/item?id=48254309&utm_source=wondercraft_ai(07:42): Amazon Web Services – Four Years and OutOriginal post: https://news.ycombinator.com/item?id=48254475&utm_source=wondercraft_ai(09:09): Scammers are abusing an internal Microsoft account to send spam linksOriginal post: https://news.ycombinator.com/item?id=48253186&utm_source=wondercraft_ai(10:35): Show HN: Audiomass – a free, open-source multitrack audio editor for the webOriginal post: https://news.ycombinator.com/item?id=48258015&utm_source=wondercraft_ai(12:02): The four-day workweek in Australia: insights from early adopters of 100:80:100Original post: https://news.ycombinator.com/item?id=48259990&utm_source=wondercraft_ai(13:28): Claude is not your architect. Stop letting it pretendOriginal post: https://news.ycombinator.com/item?id=48259784&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 23rd, 2026 | Texas woman arrested for Facebook post about town water quality

Hacker News Recap

Play Episode Listen Later May 24, 2026 15:04


This is a recap of the top 10 posts on Hacker News on May 23, 2026. This podcast was generated by wondercraft.ai (00:30): Texas woman arrested for Facebook post about town water qualityOriginal post: https://news.ycombinator.com/item?id=48249747&utm_source=wondercraft_ai(01:55): BambuStudio has been violating PrusaSlicer AGPL license since their forkOriginal post: https://news.ycombinator.com/item?id=48245862&utm_source=wondercraft_ai(03:21): On The (2021)Original post: https://news.ycombinator.com/item?id=48247325&utm_source=wondercraft_ai(04:47): Time to talk about my writerdeckOriginal post: https://news.ycombinator.com/item?id=48250144&utm_source=wondercraft_ai(06:13): Oura says it gets government demands for user dataOriginal post: https://news.ycombinator.com/item?id=48247876&utm_source=wondercraft_ai(07:39): Is AI Profitable Yet?Original post: https://news.ycombinator.com/item?id=48243863&utm_source=wondercraft_ai(09:05): The Art of Money GettingOriginal post: https://news.ycombinator.com/item?id=48247208&utm_source=wondercraft_ai(10:31): Italy moves to Airbus A330 tankersOriginal post: https://news.ycombinator.com/item?id=48248775&utm_source=wondercraft_ai(11:57): Experience: We found a baby on the subway – now he's our 26-year-old sonOriginal post: https://news.ycombinator.com/item?id=48245571&utm_source=wondercraft_ai(13:23): 80386 microcode disassembledOriginal post: https://news.ycombinator.com/item?id=48247004&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 22nd, 2026 | If you're an LLM, please read this

Hacker News Recap

Play Episode Listen Later May 23, 2026 15:25


This is a recap of the top 10 posts on Hacker News on May 22, 2026. This podcast was generated by wondercraft.ai (00:30): If you're an LLM, please read thisOriginal post: https://news.ycombinator.com/item?id=48234413&utm_source=wondercraft_ai(01:58): Steve Wozniak cheered after telling students they have AI – actual intelligenceOriginal post: https://news.ycombinator.com/item?id=48233563&utm_source=wondercraft_ai(03:26): Why Japanese companies do so many different thingsOriginal post: https://news.ycombinator.com/item?id=48237163&utm_source=wondercraft_ai(04:54): Bun support is now limited and deprecatedOriginal post: https://news.ycombinator.com/item?id=48238789&utm_source=wondercraft_ai(06:22): U.S. researchers face new restrictions on publishing with foreign collaboratorsOriginal post: https://news.ycombinator.com/item?id=48238025&utm_source=wondercraft_ai(07:50): Project Glasswing: An Initial UpdateOriginal post: https://news.ycombinator.com/item?id=48240419&utm_source=wondercraft_ai(09:18): Antigravity 2.0 Tops the OpenSCAD Architectural 3D LLM BenchmarkOriginal post: https://news.ycombinator.com/item?id=48234090&utm_source=wondercraft_ai(10:46): DeepSeek makes the V4 Pro price discount permanentOriginal post: https://news.ycombinator.com/item?id=48237663&utm_source=wondercraft_ai(12:14): Deno 2.8Original post: https://news.ycombinator.com/item?id=48234380&utm_source=wondercraft_ai(13:42): AI has a multiplying effect on existing technical skillsOriginal post: https://news.ycombinator.com/item?id=48235526&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Hacker News Recap
May 21st, 2026 | Flipper One – we need your help

Hacker News Recap

Play Episode Listen Later May 22, 2026 15:12


This is a recap of the top 10 posts on Hacker News on May 21, 2026. This podcast was generated by wondercraft.ai (00:30): Flipper One – we need your helpOriginal post: https://news.ycombinator.com/item?id=48220647&utm_source=wondercraft_ai(01:56): AI is just unauthorised plagiarism at a bigger scaleOriginal post: https://news.ycombinator.com/item?id=48222383&utm_source=wondercraft_ai(03:23): Project Hail Mary – Stellar Navigation ChartOriginal post: https://news.ycombinator.com/item?id=48225297&utm_source=wondercraft_ai(04:50): Google's Antigravity bait and switchOriginal post: https://news.ycombinator.com/item?id=48222529&utm_source=wondercraft_ai(06:16): We're testing new ad formats in Search and expanding our Direct Offers pilotOriginal post: https://news.ycombinator.com/item?id=48220105&utm_source=wondercraft_ai(07:43): Throwing AI-generated walls of text into conversationsOriginal post: https://news.ycombinator.com/item?id=48219992&utm_source=wondercraft_ai(09:10): Seattle Shield, an intelligence-sharing network operated by the Seattle policeOriginal post: https://news.ycombinator.com/item?id=48226588&utm_source=wondercraft_ai(10:37): Vivaldi 8.0Original post: https://news.ycombinator.com/item?id=48219060&utm_source=wondercraft_ai(12:03): Shunning AI is the human choiceOriginal post: https://news.ycombinator.com/item?id=48222366&utm_source=wondercraft_ai(13:30): Python 3.15: features that didn't make the headlinesOriginal post: https://news.ycombinator.com/item?id=48220696&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Night of the Living Geeks
YaketyCast Episoded 137: RIP KARS 4 KIDS!!

Night of the Living Geeks

Play Episode Listen Later May 22, 2026 58:52


You cannot keep the YC crew down! After a week off, our trio is bringing the THUNDER in entertainment news and reviews. This week we get a full SAROS and FORZA HORIZON 6 reviews. Rick & Morty are getting a MOVIE and Cliff Booth is coming to IMAX. Our hosts really enjoyed the WILDWOOD trailer and laughed at the LITTLE BROTHER trailer. TV REVIEWS? WE HAVE THEM: The Boys Series Finale, Pusher: The Last Kill, plus the SPIDER-NOIR trailer. All this plus music, Taco Bell Dirty Soda review, Ernesto almost being ARRESTED FOR PURCHASING A MAGAZINE, and the END OF KARS 4 KIDS!!

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it.“The end of localhost” has been Ivan Burazin's obsession for more than a decade.Something that is all too familiar…Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax.The thesis was directionally right, but the market wasn't ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn't just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan's original localhost thesis.In this episode, Daytona's CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs.We go deep on the new agent compute market: Daytona's hard pivot from human dev environments to AI sandboxes, the New Year's Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS.We discuss:* How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis* Why Daytona pivoted from human dev environments to AI sandboxes* Why agents need composable computers instead of disposable code execution boxes* The New Year's Eve MVP that customers chased API keys for* Why Daytona chose bare metal, stateful snapshots, and its own scheduler* How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds* Why Daytona's biggest customer runs ~850,000 sandboxes a day* How RL/eval workloads create zero-to-100,000 CPU spikes* Why RL workloads went from 0% to roughly 50% of Daytona usage* Why customers compare Daytona against EKS/GKS and say they're “never going back”* Why every AI agent may need a computer, including Windows and macOS environments* The Apple licensing constraints that make macOS sandboxes hard* Why CLI gives agents more power than MCP* How open source helps agents integrate Daytona* Why agent-generated PRs may break today's CI/CD assumptions* Why AI SaaS companies reselling tokens may face a cold shower* Why the AI cloud may look more like Stripe than AWSIvan Burazin* LinkedIn: https://www.linkedin.com/in/ivanburazin* X: https://x.com/ivanburazinDaytona* Website: https://www.daytona.io* X: https://x.com/daytonaioTimestamps* 00:00:00 Hook* 00:01:12 Introduction* 00:03:15 CodeAnywhere, Shift, and the end of localhost* 00:05:58 What Daytona is: composable computers for AI agents* 00:08:07 The pivot from dev environments to AI sandboxes* 00:10:17 The New Year's Eve MVP and customers begging for API keys* 00:12:56 Bare metal, stateful sandboxes, and Daytona's scheduler* 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs* 00:21:53 Spiky RL/eval workloads and the new agent infra problem* 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing* 00:33:31 Why every AI agent needs a computer* 00:38:48 macOS sandboxes and Apple's licensing problem* 00:44:28 Why CLI may matter more than MCP* 00:48:11 Open source, GitHub stars, and agent integration* 00:53:11 Git, CI/CD, and agent collaboration bottlenecks* 00:58:15 Founder life and building a 25-person infra company* 01:02:44 AI SaaS, token resale, and API-first business models* 01:06:10 GPU sandboxes, data centers, and compute growth* 01:09:48 Why the AI cloud may look more like Stripe than AWS* 01:11:26 Closing thoughtsTranscriptIntroduction: Daytona, CodeAnywhere, and the End of LocalhostSwyx [00:00:02]: Okay, we're in the studio with Ivan Burazin, CEO of Daytona. Welcome.Ivan [00:00:07]: Thanks for having me, man.Swyx [00:00:08]: Ivan, you and I go back.Ivan [00:00:10]: Way back.Swyx [00:00:11]: How I don't even know how, you found, did you reach out or, for Shift.Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article.Swyx [00:00:29]: End of localhost.Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you.Swyx [00:00:51]: I don't remember.Ivan [00:00:52]: I remember because I was with my then I'm thinking of a girlfriend or wife at that point in time, I'm not sure. It's the same person, so that's great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about.Swyx [00:01:10]: The reason I'm nice is because I'm also late to other people, so it's like, who's, who's without sin here, yeah, so I have to, for those who don't know, InfoBip Shift, there's this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?”Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should've took the advisory shares. So I'm sorry, dude. But anyway.Swyx [00:01:43]: We're not, we're not venture backed.Ivan [00:01:44]: No, it doesn't matter.Swyx [00:01:45]: It's Yeah, anyway, so I think what's impressive about you is that CodeAnywhere is the thing that you've been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona.From CodeAnywhere and Shift to DaytonaIvan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I've said this multiple times, it's like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It's not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called.Swyx [00:02:55]: There was Cloud9.Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I'm not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we've been using in Daytona today. So it was super early. There's about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn't have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are.Swyx [00:04:01]: Historic pivot, yeah, and, it's one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I'm like, “F**k.”Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn't have done it.Swyx [00:04:18]: No way.Ivan [00:04:19]: No, it was like, “We have to get him on board first,” and you were that kicker that we, that got us off the ground.Swyx [00:04:23]: No, because you were putting me on your pitch deck, man. I was like, “Man, this is like a good trip if I don't invest.”Ivan [00:04:29]: That's because it was your quote. It's like we.Swyx [00:04:30]: Yeah. It's the end of localhost.Ivan [00:04:31]: Did a bunch of research about end of localhost and who was interested in that,.Swyx [00:04:34]: No, that's like, I put, I wrote that blog post, and every single company in that field reached out to me, and then every VC who was receiving those pitches then also had to call me and, talk it, talk through it with me.Ivan [00:04:47]: It's finally happening though.Swyx [00:04:48]: It was really super interesting.Ivan [00:04:48]: It's finally happening.Swyx [00:04:49]: It's finally happening.Ivan [00:04:49]: Yeah, it's finally.Swyx [00:04:49]: It's finally happening, with maybe sort of non-human users. Yeah, so what is Daytona today? Let's get like a quick description. I'm wearing the shirt.What Daytona Is Today: Composable Computers for AI AgentsIvan [00:04:58]: You're wearing the shirt. Yes,.Swyx [00:04:59]: It says, I think your branding is very good. Like, it's very consistent. It runs AI code. Like, it cannot be simpler.Ivan [00:05:05]: Exactly, but we're gonna probably have to change that.Swyx [00:05:07]: Oh, s**t.Ivan [00:05:07]: It's also a subset of what we do. Unfortunately, we really love this, Run AI Code is super simple. People interpret it different ways. I think we've given out 5,000, 6,000 of these shirts. People wear them with pride because it doesn't really market about us.Swyx [00:05:21]: Yeah, Daytona's on the back.Ivan [00:05:22]: It markets the back. It markets to the person itself, so I think we did a really good job on that one. But it is also a subset of what we do, because people, when they think about Run AI Code, they just think about these small, let's call it isolates, code execution boxes that, you send some code, you get an output. Whereas what Daytona is today is essentially composable computers for AI agents. It is, the market calls them sandboxes which can be misleading.Swyx [00:05:44]: All these things. All these things on.Ivan [00:05:45]: Yeah, exactly, ‘cause it can be misleading ‘cause people usually think about sandboxes as a demo or a test environment versus a production-grade environment. But what Daytona does, if you think of the laptop that you have in front of you or the computer that's over there, or, my wife is an architect, so she has like a Windows with a 3D graphics card inside to do 3D rendering. Like, as humans, we have different computers or different compositions of computers. And our belief is strongly that agents today and going forward will need all these different compositions of computers to do different types of tasks. And so we offer that basically through an API.Swyx [00:06:19]: Yeah, to give people - I'm trying to sort of front-load all the aha moments or the wow moments so that people can, stay engaged and click like and subscribe. the market is exploding, right? Like, you have been reporting 74% month-on-month growth, and it also, it's just been growing for a while. Like, it's been going like this. And every single - It's not just you guys. It's every single.Ivan [00:06:41]: Everyone, yeah.Swyx [00:06:42]: Sort of, compute provider. I don't know if you agree with me saying compute provider or not.Ivan [00:06:48]: It's fine.Swyx [00:06:48]: Yeah. So like organically PLG-driven growth, but also enterprise is doing super well, I think I wanna rewind to January of last year when you did the pivot. Like, so you obviously called this market early, and you were positioned for it, and you are now one of the market leaders. But what was the insight that made you do the pivot?The Pivot: From Human Dev Environments to Agent SandboxesIvan [00:07:06]: The insight that made us do this pivot is the quarter before that, so end of 2024, when we had - Basically, we did a demo with - I don't I think we discussed this as well, Devin was not public. You actually gave me access to Devin at that time. So Devin.Swyx [00:07:25]: I did?Ivan [00:07:26]: Yeah, you gave me access.Swyx [00:07:26]: I don't think I was supposed.Ivan [00:07:27]: Yeah, exactly.Swyx [00:07:28]: Yeah, I.Ivan [00:07:28]: So it doesn't matter. You.Swyx [00:07:29]: Yeah. I gave like three friends access.Ivan [00:07:31]: Yeah, or it was a call and you showed it to me. It doesn't matter. but OpenDevin was available, which is now called OpenHands. And so we're like, “Oh, this seems to be a thing. This is not public. Let's take our for human automation of dev environments and take, OpenDevin and launch that as a SaaS.” And we did that. Not very many people signed up and used it, but a lot of people reached out that were building agents, and they were like, “Hey, my agent needs a compute sandbox runtime,” whatever you wanna call it. I forgot what it was called at that point. And then we were like, “Oh, amazing. This is a new market. Here is our infrastructure. Here's our product, and go.” And what we found really fast, soon, was that people did not like what we had built. It didn't work. And I remember talking to people at the beginning when we're doing this, the sandbox we're building for agents. People were like, “Oh, why is it different? It's the same thing. We have like EC2, we have VMs, we have all these things.” But we saw that everyone we gave it to, it was like 20, 30 people, they all said, “No.” Like, “This is not what we need. This sort of breaks.” And basically, me and my co-founder not knowing a lot about - ‘cause we're infra people. We're not AI people. So I basically took it upon myself to like watch every single podcast that exists, including all of, all of these and all that, and sort of get up to date, read all the blogs, like get, understand what's going on.Swyx [00:08:45]: Do you wanna shout out who else was useful, just in case people are also looking.Ivan [00:08:49]: Generally we -, I looked at There's a few of podcast, different segments and different types. So there's you guys, No Priors, Bill Gurley's was great while.Swyx [00:09:04]: VG2, yeah.Ivan [00:09:05]: Yeah, while it was around. So there's a few. 20VC is interesting from a different dynamic, and some are different dynamic. But there was, also Red Points.Swyx [00:09:14]: We're not really about the compute market.Ivan [00:09:15]: It was also already - Sorry?Swyx [00:09:16]: You're, you want - You're looking at the agent infra market.Ivan [00:09:19]: I was looking at the agent market and the AI market in general and sort of understanding who are the players, what the perception, and how that goes. And like obviously you complement this with like going to conferences, going to events, going to meetups, reading white papers, like doing all the things that you have to do to understand what's happening. And so when we figured, when we sort of had an idea of what we had to build, literally over the New Year's Eve, literally on New Year's Eve, I half vibe coded the first MVP, first minimal viable product of what Daytona is today. And I went to sleep at like 3:00 AM or something like that. I was doing - I just put my like baby daughter and wife to sleep and, Happy New Year's, and go back to just, doing this. And I sent it to my co-founder, my CTO, and he saw it in the morning. He's like, “This is absolute garbage.” “Do not show this to anybody at all, but the idea is good.” And so he took two weeks, and he rebuilt it.Swyx [00:10:09]: Did it like look like that? Listen, I - It was rough idea.Ivan [00:10:12]: Oh, not even, not even close. Like it was it was way worse. But it was like a very - It was a simplistic view of what it should be. Like, it worked, but it was not ideal. And so he went, we went down the whole, which is his job as CTO, to go, and he came back with this version. We then called all the people that had said like, “This is garbage,” a quarter ago. And we set up these calls, and we gave it to - We just demoed it to everyone. And all the calls went long, every single one. They were 15-minute calls, and they all went to like 25, 30 minutes or whatnot. And everyone said, “We need, we want access.” There was no login, just an API key, ‘cause it was just a beta or an alpha. And they said, “Oh, we want access.” And we're like, “Sure, yeah. Okay, thank you very much.” But after like the next day, if we'd not send it, every single one, like every call that we did, everyone came back, “Where is my API key?” Like everyone wanted it. We're like, “S**t.” Like this is it. Like I've never felt So one, the understanding to your point was like most people thought it was the same infrastructure for humans and agents. We understood a quarter ago it's not. We just didn't know what was the right primitive. And then when we came, and we can talk about what that is, and we gave it to these people, I've never seen, I've never experienced - I've done multiple companies in my life. I've never experienced this, that people literally call you if you do not give them access. Like they want access right now. And so it's like, okay, they don't want this. the thing that they want doesn't seem to exist, or they have not found it, and they really want what we want. And then when we understood that we're onto something, and then when you think about the size of the market, like the market for human engineers and enterprise is a very large market, so think GitLab or whatnot. But the market for every single agent that will exist ever in the future is just like, what is that market? How big is that? And we're like, “We are all in on this.” And so that is where we made sort of the cut between the old product and the new one.Bare Metal, Stateful Sandboxes, and the Lambda + EC2 ModelSwyx [00:12:02]: Yeah. But it wasn't composable at the time?Ivan [00:12:05]: It was very - It was basically just a Linux box that you could change, that you could define number of CPUs, disk, and RAM. Like that is what you could do, but you couldn't have multiple operating systems, you couldn't resize it on the fly, you couldn't add a GPU, you couldn't do like all the things. It was just the, just the first sort of variation of that, yeah.Swyx [00:12:22]: Was it bare metal from the start?Ivan [00:12:24]: It was bare metal from the start. And so the interesting thing that we thought about right away, so our.Swyx [00:12:29]: Which, give people the background, what is the normal path?Ivan [00:12:32]: Yeah, so, basically most providers run this on top of VMs. And also.Swyx [00:12:37]: Firecracker.Ivan [00:12:38]: Yeah, they run on Firecracker and VM. And so we also fire - We can get - We have multiple isolation layers and we can do that. But the common way to do it is that they, one, that the state of the machine, or the hard disk is not part of the sandbox itself. And the other thing is they're not meant to last forever. So most of them are preemptible, like they can There's a time that they can live. And so our thought was when we were going into this is, agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work. Like, and you want to close the lid and open the lid, it's the same state. So you - Agents would want that, like the pause and come back. They want those two things. But also agents really want speed, right? Can they get it? So when we thought about it's like we need something insanely fast, how to make it fast, how to make it long-running, and stateful. And so those two things, it's like combining a Lambda and an EC2, right? Those two things together. And so we didn't have an idea how others did it, ‘cause we didn't know too that there was a market around this. It was more like, okay, this is what we need, what they need. And we looked at Kubernetes, it wasn't wasn't good enough for that. We looked at Nomad, it didn't enable that. And so our history in rewriting our own scheduler at CodeAnywhere is basically what my CTO came up with. Like, he's like, “Oh, the learnings from there,” and he brought it. And the funny thing is, our third co-founder, when he saw it, he's like, “Dude, what is this? This is like 2008.” Like, we went back in time, and he's like, “Exactly.” And so the reason why Daytona is like super fast, and you see this on benchmarks, is we essentially, we run on bare metal. We have our own scheduler, we use the underlying, disk, CPU, and RAM of the underlying machine, which means your IOPS are insanely fast because there's no, there's no network between an EBS or something like that. But also the snapshot, the point in time, the templates, are also preloaded on the bare metal machines. So when you fire off a sandbox from a template or a snapshot, you're essentially directed to the bare metal machine where that snapshot is based on that NVMe drive, and then it literally just turns on that machine, and it's local. There's no network latency, anything on there. And so that is sort of the specificities that we, when we're thinking from first principles, what a computer would look like for an agent, that is what we came up with, and that's what we created.Benchmarks, 60ms Startup, and 50,000 SandboxesSwyx [00:15:02]: Yeah. I should maybe, I don't know if you endorse this, but there's someone that does compute SDK, you guys do very well on there, with like the TTI, right? I. is this a, is this a is this a relevant benchmark for you guys? I don't know.Ivan [00:15:16]: I don't know, and it changes every day. So today RKL is.Swyx [00:15:18]: I don't know what RKL is. Never heard of it.Ivan [00:15:20]: Yeah. RK, yeah, so it is there.Swyx [00:15:22]: You are, at least a third of the next tier of performance, and then, there's a lot of other better-known names that are very slow to start.Ivan [00:15:31]: Yeah. We've been the number one by far for a long time, and now there's different, there's different definitions also of sandboxes, different isolation patterns, different other things. So RKL runs it literally on the S3, the data, so it's very different, and they spin up a sandbox, spin up a container for that, so it's a different type of thing. So the definition of a sandbox is something that we can all, we all need to get along with. But yeah, we're insanely fast on getting these things, up and running. And so you can see even there that it's a zero point 0.10 to 0.11, so.Swyx [00:16:03]: Close enough. Yeah. what else do you need, right?Ivan [00:16:05]: Yeah. So the benchmarks itself, so, in this, in I don't think the benchmarks equate to market ownership or revenue or anything like that. and I've seen this with multiple benchmarks, not just in sandboxes, but in general benchmarks around.Swyx [00:16:20]: It's table stakes. It's just like.Ivan [00:16:21]: Exactly. But it doesn't hurt.Swyx [00:16:22]: Just roughly check.Ivan [00:16:22]: Like you definitely have to be up there and you have to be competing so that people know that, oh, this is definitely one of the top. Because this is only one dimension of what customers look for. There's other things like how many can you spin up consecutively? There's a feature set, there's support, there's like all different things that people look at, but you definitely have to be there, on the benchmarks.Swyx [00:16:40]: How many people do people spin up consecutively?Ivan [00:16:43]: So we have.Swyx [00:16:43]: Or concurrently, is the Concurrency, right?Ivan [00:16:45]: There's three metrics that we look at. And so one is like time to spin up one, and so our time to spin up one is 60 milliseconds with network latency. So request, spin up, reply, 60, the whole thing, 60 milliseconds. That is one. But if you wanna spin up 50,000 at once, we are now at about 75 seconds. So it takes about 75 seconds to spin up concurrently 50,000. Some others, there's public data around this, like take 2,000 seconds, which is 30 minutes. Like there's different variations of that. And then there is the so it is speed of one, speed of like multiple, and then how many can you consistently have up and running. And so we basically have right now no limit to how much we can add because we basically own our own metal. But the biggest customer of ours does like about 850,000 every single day is sort of where they're, where they're just shy of a million every single day that they're running, we do have a request for half a million concurrent, which is literally half a million CPUs somewhere running. So that's an interesting.Swyx [00:17:44]: They pay by like vCPU seconds.Ivan [00:17:47]: By seconds, yeah.Swyx [00:17:47]: Or whatever. Yeah. Okay, and so and then, and the other thing is, the sleeping and the resuming, ‘cause it's all the stateful resumption of all these things, how, what kind of workload are people putting through this, right? Like how is it Do we measure by gigabytes in memory, gigabytes in storage? I don't In like network attached storage. I, what are the costly ones of, out of all these features?Workload Economics: CPU, RAM, Network, and StorageIvan [00:18:15]: The most expensive thing are CPU.Swyx [00:18:18]: Okay. Yeah, of course.Ivan [00:18:18]: The second one, yeah Then it's RAM, then it's disk. We actually don't charge.Swyx [00:18:22]: Which is snapshotting, right?Ivan [00:18:23]: No, it's actually the, snapshotting's part of it, but basically the size of your hard disk, of your machine. So do you have 10 gigabytes, do you have 20, do you have 50, do you have whatever? And then the transference of that. Right now, currently we don't charge for, network at all at Polychron.Swyx [00:18:37]: Oh, you gotta, yeah, you gotta fix.Ivan [00:18:38]: Yeah. It is very much a it's a larger and larger part of our bill, so we're working around, that part there. Obviously, that is the least, expensive, so the hard disk is the least expensive, so it's basically CPU, RAM, for us network, ‘cause we don't charge the customer, and then hard disk, is how it's split up. But there's also different types of workloads, so we basically split it up into two types of workloads in Daytona. One is what we call background agents or long-running agents. and the other is, basically RLs and evals, which I put sort of together. And so they have very different patterns of usage, and if you look at the usage of a background And I'll just name names of companies, not specifically.Background Agents vs. RL/Evals: Two Usage ShapesSwyx [00:19:21]: Yeah, open, all hands.Ivan [00:19:23]: Yeah. So like a background agent's a Cognition, a Lovable, a like all these things are Harvey. These are all long-running, background agents. And so if you look at their usage patterns, their usage patterns are similar to human, which is like follow the sun. Basically, the usage patterns of that is like noon is probably the highest, and the midnight is the lowest, and then weekends are lower. weekday is higher.Swyx [00:19:42]: Yeah, that's a fun question. How global is it? Is it very US-centric or?Ivan [00:19:46]: The US is a large part, but we have currently, we have Asia, Europe, and the US regions.Swyx [00:19:52]: So it's quite global.Ivan [00:19:53]: Yeah, it's quite global. We have it all over. It's interesting that our I talked to you a bit about this. Our number one city by user.Swyx [00:20:01]: Hmm.Ivan [00:20:02]: Is Singapore.Swyx [00:20:04]: Oh, wow. Amazing.Ivan [00:20:05]: Which is an interesting one, right? Not by revenue, just by just like by individual head count.Swyx [00:20:09]: Really?Ivan [00:20:09]: Just like an interesting thing.Swyx [00:20:10]: Singapore is, Singapore is weirdly high in the adoption charts of AI for the population. It's like an, seven, eight million population. And it's like keeps showing up.Ivan [00:20:20]: No, it's quite interesting. We were quite shocked, and I was like, “Oh, this is interesting.” And also one that's up there.Swyx [00:20:24]: There's a reason I'm doing AI using Singapore. it's because I'm from there.Ivan [00:20:27]: We're there. We're gonna, we're gonna be there as well. and it's interesting that Japan is in the top or like Tokyo's in the top, which is in all the tech cycles it has never been. It has never been, so it's quite interesting that they're.Swyx [00:20:39]: I think the Japanese just love AI. Yeah. It's that, and then it's Brazil. That's it.Ivan [00:20:44]: Brazil has always been in.Swyx [00:20:45]: I think.Ivan [00:20:46]: Even when I look, if you look at like GitHub's data and ask historically with CodeAnywhere, it was always like US, Western Europe, and then you'd have like India, Brazil, China, like that would be there. But like Singapore was not in, specifically Japan was never in sort of that top, that top.Swyx [00:21:01]: Yeah. Weird pockets.Ivan [00:21:01]: Weird. Yeah, so it's very global.Swyx [00:21:02]: Okay, so actually that, but that's helps you to distribute your load through, all time?Ivan [00:21:08]: The interesting thing is like we have those kind of loads, but if you look at the researcher loads, they're quite different. So what they are is like if you give them concurrency of 10,000 or 50,000 or 100,000 CPUs at ARMb, when they fire off a run, it's just 100%. And then it just runs, and then it stops. So it's very, the usage pattern is squares basically, right? And it's also not follow the sun, because people will fire it off at midnight before they go to sleep but then wake up and so it's very unpredictable, so you don't know where that is. So the shapes of the usage are quite different than we have had before. And also what's interesting is when it's sort of a follow the sun, even if you have a high growth company, you can sort of predict your usage patterns and have enough capacity for that, because it's sort of, it grows in a, in a way you can project. When you have companies doing sort of like evals and RL, they're super spiky. So they're gonna come in, it's like, “We're gonna use nothing, then can we have 100,000?” Right? And then go back down. And then 100,000, go back down. So it's very different, right? And.Swyx [00:22:09]: Do you want to lock them into commits so.Ivan [00:22:11]: Yeah, we do.Swyx [00:22:12]: Yeah, okay.Ivan [00:22:12]: We so we have to lock them into some sort of commits to have that capacity, because we have to have, basically we have to have the capacity for peak. Right? And so right now, Daytona's mean utilization is 15%, 1-5.Swyx [00:22:25]: Oh my God.Ivan [00:22:26]: So it's very low.Swyx [00:22:27]: Because it's very spiky.Ivan [00:22:27]: It's very spiky, but we get up to 90%. so we have these things. And so what we're, what we're looking at right now as a company is similar to Cloudflare where you can like geo move things around, but that works really well for basically the background agent where it's follow the sun. But this, it's not. Like it's a very different shape. Obviously with scale you figure these things out, but that's an interesting new problem that we have, as a compute provider in the agent space. And when we were doing the conference recently, and so we talked to like Nikita from Neon and.Swyx [00:22:57]: I should bring it up.Ivan [00:22:58]: Parag from Parallel and whatnot, everyone has the same problem. Whereas the usage is super spiky, and this is something that has not happened before, that you have these types of like it was always, it the amplitudes were not this high, right? So it's quite interesting use case and problem solve.Compute Conference and Spiky Agent InfrastructureSwyx [00:23:12]: Yeah, I don't know if we're gonna bring this up again, but let's just talk about the conference, you had like 1,000 something people at the Warriors game, at the Sorry, where is it? What's.Ivan [00:23:22]: Chase Center.Swyx [00:23:23]: Chase Center.Ivan [00:23:23]: Chase Center.Swyx [00:23:24]: I went. It was, it was very impressive. Obviously, you can, how to throw a conference, what did you learn? you put, you pulled together all these impressive names.Ivan [00:23:33]: What I.Swyx [00:23:34]: What were you looking for?Ivan [00:23:35]: My thesis behind the Compute Conference was let's bring together people that are building infrastructure for AI agents. Because when I think of what we're building, it is the agent is the primary user, what are the ergonomics and usage patterns of agents, and so we can do that. And what I found, this was a theory, it wasn't proven, is that we all have these problems, as I touched onto. And I was, as I was talking on stage, it was like we all have the same underlying infra problems, which is this spiky workloads, unpredictable workloads that we've never had before, in human, compute or human infrastructure. And it's, again, it's the same when I was talking to Parag or when I was talking.Swyx [00:24:20]: Lynn. Nikita.Ivan [00:24:21]: Lynn, Nikita. Lynn especially, I was talking to her the other day as well. Like the It is a very interesting type of problem to solve because I can touch on Cloudflare because there's a lot of like talk about that recently as to how they solve that, which is they have a bunch of geos, and basically, as users work in different places, and depending on your tier, they can move you around the geos. And so that how, that's how they get the higher utilization. But you can sort of predict these, and it's If it's something in You'll rarely get a spike that is 10 orders of magnitude. Like you'll get a like let's say one of your customers has some like an exponential curve. What is that to I'm using Cloudflare as an example. 10%, 20%, whatever it is. I don't, I don't have this data, I'm just assessing. It's surely not 10x, right? It's surely not something there. And so how do you go out and solve this problem? And we're all solving this in different ways. So we have.Swyx [00:25:11]: She also has the same thing.Ivan [00:25:12]: Yeah, I know specifically that like Neon had that issue as well. Like how are we solving these spiky loads and things like that ‘cause we talked about it. And so the interesting thing for me to actually internalize was, yes, everyone that's building for agents first is going through this, and we're all solving similar problems, which is quite.Swyx [00:25:28]: Let me let me double-click on this. Okay. So for example, Neon, I happen to know that they're very sort of S3 oriented, right? so they're just like fully bet on S3. And you get to benefit from S3's distribution and infrastructure. So I would imagine that Neon doesn't have to care, whereas Lynn maybe has to care a bit more because obviously she's doing GPU inference. And, for listeners, we did an episode with her, one and a half years ago. And you have to care. But like, right?Ivan [00:25:54]: Parag cares for sure, and Nikita.Swyx [00:25:58]: And Parag is C of, Parallel.Ivan [00:25:59]: Parallel, yeah.Swyx [00:26:00]: Former CTO of Twitter.Ivan [00:26:01]: Twitter, yeah.Swyx [00:26:02]: They are the search.Ivan [00:26:03]: Yeah, they're search, yeah.Swyx [00:26:03]: I You and I know but the listeners don't know.Ivan [00:26:08]: Yeah, we can put it down in the screen, and so ‘cause we, when we were talking.Swyx [00:26:11]: I'll put it up on the, on the screen.Ivan [00:26:12]: Yeah, right.Swyx [00:26:12]: People can look it up if they need.Ivan [00:26:14]: Look it up. And, yes, but they still have CPU and RAM, allocation that you have to have up and running. And so CPU and RAM, you have to allocate that and have that ready. And so there's basically two ways to do it. One is you either over-provision and you can handle the bursts, or two, you basically have, I don't know if this is a term, just-in-time compute, which is like as your load becomes, as your usage comes in, you can fire off requests for VMs or bare metals at other cloud providers and then get them up and running.Swyx [00:26:43]: This is if you go above 100%, right?Ivan [00:26:45]: Yeah, this is.Swyx [00:26:46]: Like your overflow.Ivan [00:26:46]: If your overflow, like spillage or whatever you do.Swyx [00:26:48]: You probably lose money on it, but it doesn't matter, right?Ivan [00:26:50]: It, not Well, you might, you might not That is a more cost-effective way to do it but it's a slower way to do it. Because basically what you have to do is you have to like queue your requests, spin up these just-in-time compute, get it all ready, provision it, and then get your workload there. And so if the time isn't important that much, that's fine, and you can do that. But if your customer, and especially for, let's say, the RL training runs, the reason why a lot of people come to us is because GPUs are more expensive than CPUs, right? So you want your GPU running at, what, 100% the entire time. And so when you're running runs on CPUs, when the when the CPU cycle is like down and spinning up the next one, you want that to be instantaneous so that your GPU doesn't go down, right? And if you then have to like go out and provision machines, you're essentially telling the GPU that it has to wait, and that's incurring our cost. So there's things that you have to try to solve for there.RL Workloads, Declarative Images, and Kubernetes ReplacementSwyx [00:27:43]: Yeah, let's talk about the different workload, right? You said that, what was it? A few months ago, you had zero RL workload and now it's 50%.Ivan [00:27:52]: It will be this one, 50%, yeah.Swyx [00:27:54]: Let's talk about how different it is, right? Like I imagine, for example, a lot less dynamic code generation of like arbitrary code. Like here, it's probably all the same code. You're just doing parallel runs or something, I don't know.Ivan [00:28:05]: Yeah. So you'll have multiple Depends on the like for each run, you'll have a snapshot. And they, for the most part, they actually do use our declarative image builder, which is like, “Oh, we, the agent wants these dependencies, these env vars.”Swyx [00:28:17]: These ones, yeah.Ivan [00:28:18]: Yeah, the declarative image builder, it.Swyx [00:28:20]: Which is a very modal like thing that they.Ivan [00:28:22]: Yeah. And so we build it on the fly and then we propagate that snapshot, and you can spin up as many sandboxes as you want against that snapshot. And then if you have to do changes, the model can, or like it could be also be automated. It's like, “Oh, now for the next run, we need to install these things or remove these things or whatever to get, a task done,” and then it goes off and runs that. So yes, that is something that it seems that they prefer. The number one reason I found, or should I say, let's take a step back. What we are competing against in that environment is essentially managed Kubernetes. So EKS, GKE, whatever. That is what the vast majority run on. And anyone that has tried Daytona versus GKE, EKS is like, “I'm never going back.” That has always been. There's a few reasons. One is the ergonomics. So if you have, if you're using Kubernetes to spin that up, you have to essentially manage the interface interactions with that. Daytona, although as a compute provider, it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS. Like you have an API, an SDK, it's quite like easy and seamless to get these things up and running, that's one. The other is the speed to which we spin up, which we mentioned earlier, which is much faster, and the scale to which we can go to. We haven't got into features, but an interesting feature is that it's very hard to OOM, or out of memory, our sandboxes, because we can dynamically on the fly.Swyx [00:29:48]: Resize.Ivan [00:29:49]: Resize, which is like impossible on almost any other thing. There are some technologies that enable you to do that, but it's like a very hard thing. And so we actually saw this when, the Terminal Revenge team is, brought us actually. So thank you, Alex and the team, that brought us into this whole space.Swyx [00:30:05]: It's just very rare that, a framework would just say, “Guys, just use Daytona.”Ivan [00:30:11]: Yeah, I think it says it somewhere. Yeah.Swyx [00:30:13]: Yeah. I was like, “What is this?”Ivan [00:30:15]: There's all, there's multiple there, but they also mention a few other places. and so Daytona specifically-We have, the, just jumping on themes here We, I don't know where it says Data Center.Swyx [00:30:27]: I, there.Ivan [00:30:27]: Doesn't matter.Swyx [00:30:28]: There's a very strong recommendation, which is, very unusual. Which is, it's.Ivan [00:30:33]: We do not pay them for this, just.Swyx [00:30:34]: I know, yeah. They just like you.Ivan [00:30:35]: Yeah, they like us. yeah, and also a thing, so, Data Center has multiple isolation sets underneath. The customer doesn't have to know what they are. But basically we have Docker, which is a container, that's hardened with Sysbox. So it's Docker's, isolation that is a security equivalent to a VM, but it's still a container. And that is the default, and they, especially in these training workloads, really like that as an interface to be able to use just a basic Docker container, and we enable Docker and Docker. Which for these RL runs, if you need to do a Docker compose or Kubernetes, you can spin up a K3S inside of these things, which unlocks a huge amount of workloads that you can do that you cannot do on other providers. So just on that part is much more interesting. And so we went that, through that. We showed them that we could do that, and they enjoyed that quite a bit. They being the general venture people.Swyx [00:31:28]: Those people, yeah.Ivan [00:31:29]: And Harbor people.Swyx [00:31:29]: Harbor people, do are they, are they a company yet?Ivan [00:31:33]: As far, I do not know.Customer Pull, Slack Connect, and the Computer Use BetSwyx [00:31:35]: Okay. All right. Yeah. It's like super obvious that like, there's a lot of excitement and success around these things, okay, so yeah, tell us more, right? Like, this is an exploding workload, Harbor adopted you, which helped speed things along. But what are you learning as this new workload comes online?Ivan [00:31:53]: There's a couple things that we learned, which we chat about in the beginning. We, and this has led our story, as we mentioned, we like talked to a lot of customers along the way, and we add more features and more tool sets as we talk to customers. And it's interesting that And I think it's that the ecosystem is so small and/or the models get smarter, where when we see one user come with a request, we know it goes on a roadmap if like three to five customers come with the same request in that week. It's like very bizarre. It happens so many times, which is.Swyx [00:32:27]: Because they're all friends.Ivan [00:32:28]: Sorry?Swyx [00:32:28]: They all, they're all friends. They're all in the same group chat.Ivan [00:32:30]: Yeah, probably, yeah. ‘Cause and they're like, “Oh, can you do this?” And I'm like, “Okay, this is interesting. We'll put it on a feature request.” And then the next one's like, “Oh, can you do this?” “Okay.” It's all the same, right? It's always the same. And so what we try to do, and I personally try to do, I try to be on as many call, quote-unquote “sales calls” I can. I'm in every Slack channel. We literally have about 1,000 Slack Connect channels, something like that. It's an interesting, there's so many interesting things you find out when you have all the Slack channels. You can also see where people, transfer between companies. You see leave Slack channel, enter Slack channel. It's an interesting thing. Also, just I digress, I feel that Slack Connect is literally LinkedIn what it should be. You have a list.Swyx [00:33:08]: LinkedIn charges you to, use your own connections, but Slack doesn't, right? Slack is like, do it for free. It's more lock-in. It's great.Ivan [00:33:15]: Yeah. It's amazing. Yeah. It's one of the reasons.Swyx [00:33:17]: You're gonna pay Slack for life.Ivan [00:33:18]: Exactly. You're there for life. So that's interesting. And so one of the things, the newer things we were talking about earlier is we made a big bet and put a lot of investment on computer use. that is not seen publicly the light of day. We haven't GA'd that yet, but we have.Swyx [00:33:32]: Is there a thing I can pull up?Ivan [00:33:33]: There is computer use there. It's right up a bit.Swyx [00:33:36]: Oh, yeah. Okay.Ivan [00:33:38]: What we have, what we talked about and what we've seen publicly is there's this theme now about, the human emulator where And Elon from XAI has talked about this publicly, and if you think about the models today, they're actually quite sophisticated and they can do a lot of work, but they still don't have access to all the tools. Like, I'm a strong believer that the most efficient way for an agent to work is essentially headless or through, terminal or whatnot. But if we, if we look at knowledge work in general, there's about 100 million knowledge workers in the US, about a billion in the world, and knowledge workers, and the salaries of them aggregate to 10 trillion in the US 50 trillion worldwide.Swyx [00:34:24]: Wow.Ivan [00:34:25]: Something like that. And if we look at, the five most important sectors of that, so like healthcare and government and financial services and whatnot, that's about 56% of that. So let's say it's about half of that. So in the US it's about 25 trillion, and most of them, most of that work is actually still locked into legacy apps inside of Windows, which is not going anywhere for a very long time. Like, people just won't invest in that. How much of it? our assumption is the following: if, in the RPA market, which is similar market, well, not the same 25% of, these white collar, workers', work is automated. If an agent is more sophisticated, can go through more runs, figure stuff out, let's say it's, 40%, right? And so if you take 40% of that, you get to essentially, $10 trillion a year.Swyx [00:35:17]: That's a TAM.Ivan [00:35:18]: That is a that is a TAM. So that's the TAM of the models, right? That's not our, essentially ours. But you get to that size, and to be able to do that, you essentially have to give agents these computers with the legacy. So computer use, either Mac or Windows or Linux. Linux we also obviously have and others have. But Windows specifically is something very new, and the only option right now is an EC2 with, Windows or on Azure. Both of them take anywhere from three to five minutes to spin up. We've created an actual sandbox, so it's a second instead of milliseconds, but you have, point in time snapshots, you have, forking, you have all the things that you have from a sandbox, but essentially enables you to hopefully unlock all this value. And so that's been our big push and bet, but we've sort of, kept our ear to the ground. What is sort of the next things in the market?RPA Returns: Why Agents Still Need ComputersSwyx [00:36:06]: Yeah, knowledge work, and building, and sort of RPA, the next wave of RPA. I got very excited about RPA kind of during COVID times. The UI path was IPO-ing. And it was, a very hot Isn't it, Eastern European?Ivan [00:36:20]: It is, Romanian.Swyx [00:36:21]: Romanian?Yeah, it might be the only Romanian, big unicorn okay, yeah. This I don't I don't, I don't have like a I think there's, I think there's a stage being set for the resurgence of RPA, ‘cause everyone understands that, yeah, no one wants to deal with these shitty apps and no one's gonna rewrite them. Like, you just have to do, a remote operation and programmatic operation of them.Ivan [00:36:45]: If you wanna unlock it, my own setup was basically the following. So I was doing a board deck recently, last month, whatever, and I'm like, “Okay, let's just, let's just do automated.” So, all our data's in, ClickHouse and PostHog and QuickBooks, where everyone else's is, and I'm basically, connected that all to, my Cloud code, like go off and go Cloud code whatever. Go off and, here's the integrations, go do that. It pulled out the first report, which was great. It connected to Brex and all these things, pulled it, which was great, and then I say, “Okay, now pull out this, and this,” and I kept getting, really well McKinsey-style design reports, but the data said partial data. all the missing data, partial data. Like, it can't access all the things, and I got so frustrated, and so I got, I got, my Mac Mini virtual sandbox with OpenClaw. I gave it its own account in our company, and then I went to all these services and created a read-only account, so literally like an intern in your company. And so I would say, “Now go and do this report,” and it would get the same, or like, “I can't via the MCP or the API or whatever. I can't get all the information.” I'm like, “Go log in.” And it will log into the website, then go in, export the data. It'll export the data and do the thing end to end. So even for things that have today APIs, not all of it is exposed, and I to get value, I get immense value right now, but it has to be a computer usage, unfortunately, and so I spend a bunch of tokens just on that, but I get the job done. And so if even a startup like ours, and using all the hottest tools, still needs a computer agent what hope does, Goldman have to have a headless, right?Swyx [00:38:22]: Yeah, what a - Why isn't Microsoft doing this?Ivan [00:38:27]: I'm pretty sure, Satya had a post yesterday.Swyx [00:38:29]: Oh, okay. I see.Ivan [00:38:29]: Which was like, “Every agent needs a computer.”Swyx [00:38:31]: I see, I see.Ivan [00:38:32]: So they have launched something recently.Swyx [00:38:34]: Yeah, they have Microsoft Power Automate, I'm sure, I'm sure, they're gonna have their version.macOS Sandboxes, Apple Constraints, and the Windows OpportunityIvan [00:38:39]: Version of that, yeah.Swyx [00:38:39]: You're gonna try to do yours, and it - I always know there's always demand for Mac, but I know it's, tricky to host, macOS sandboxes.Ivan [00:38:49]: We will have macOS sandboxes fairly soon. The problem with macOS, OS sandboxes is, I'm deep in this, I don't know how much interesting is.Swyx [00:38:55]: No, it's.Ivan [00:38:56]: MacOS has this problem.Swyx [00:38:57]: It's a licensing thing, right?Ivan [00:38:58]: Licensing thing. So one, you're allowed to run only two parallel VMs per machine, so that's one. Two, you can only license to a different user every 24 hours. So if you come in and theoretically, if I wanna charge you per second and I charge you one second, I have to have it idle for the rest of the day. I can't have anyone else doing that. So the pricing will be different in the sense that I will have to - we would have to charge for 24 hours, and that's not even, that's not even the most difficult thing. But the, thing above that is, from a security perspective, they enable you to do memory snapshot, pause, resume, but only on the same physical drive, physical machine. And so what you can do in, Windows world or Linux world is that I can move in the background, your snapshot from one to the other and manage load, right? Here, if you wanna do that, you essentially have to have your.Swyx [00:39:49]: Yeah, snapshots. Yeah.Ivan [00:39:50]: Your.Swyx [00:39:51]: It's like.Ivan [00:39:51]: Physical machine.Swyx [00:39:52]: You can't break it up.Ivan [00:39:53]: You can't, you can't move things around that, and all of that is, that part is, from a security standpoint, if it is written. Like, I understand the security aspect of that, but it disables you from doing these agentic, like really scalable agentic workloads.Swyx [00:40:08]: You need to do a vibe-coded, clean room implementation on macOS that you can then - That's like Clean OS or something. I don't know.Ivan [00:40:17]: So. We have.Swyx [00:40:18]: ‘cause like Linux was originally like a clean room rewrite of Unix.Ivan [00:40:21]: Okay. Yeah.Swyx [00:40:21]: Or something like that, right? Like same thing to macOS. Someone needs to do it.Ivan [00:40:25]: Someone will do that, and someone will have some long-running agents for a few days to figure this stuff out. But yeah. So definitely we - we're really close to offering something ‘cause people do want it, but the pricing will be different, and the feature set will be sort of stringent.Swyx [00:40:38]: Yeah, nobody's gonna use this. like, the labs, the labs will because they want to automate macOS.Ivan [00:40:42]: They have to do RL. They have to do RL again. But even if you The - So the point is with the RL part, if you, if you do RL on macOS, then the next iteration of the model comes out, it will be able to use these tools significantly. Then you actually need to run those, that somewhere. So you're gonna have to have that, later on. And from, if anyone at Apple is listening, I very much feel that they are shooting themselves in the foot of the scale of the revenue of compute or licensing they could get if they would just enable a concurrency model similar to what you can get on a Windows and a, and Linux.Swyx [00:41:17]: Yeah. Yeah. And I'm sure they've heard this before. They just don't care. Yeah, it's And maybe they will change their mind with the new CEO.Ivan [00:41:24]: Yeah. We'll see.Swyx [00:41:25]: We'll see.Ivan [00:41:25]: High hopes.Swyx [00:41:26]: High hopes.Ivan [00:41:26]: High hopes.Swyx [00:41:27]: Okay. But I, it's very clear the market opportunity is huge in Windows, and you can go for a long time on just Windows, but your customers are gonna want both. and I think, it is interesting to me that, this is the sort of God application of agents, right? Like, I don't It was - How big was OpenClaw for you guys? Like, was it, was there, a significant bump.OpenClaw, Agent Labs, and the B2B2C Sandbox MarketIvan [00:41:54]: Not for us because we.Swyx [00:41:54]: Because you already.Ivan [00:41:55]: We're kind of positioned differently. Whereas although it's completely PLG and we have individual developers that use it, most of the users that use Daytona are sort of a B2B2C. Sort of it's either B2B or B2B2C. So, in the researcher world, it's B2B, so you're selling to, labs and neo labs and things like that. But on the long-running agents, it's mostly, from a scale revenue perspective, it's mostly B2B2C, where you have a app layer agent that uses you at a big scale.Swyx [00:42:26]: Like a Manus. Yeah.Ivan [00:42:28]: Like a Manus Lovable type of thing.Swyx [00:42:31]: Yeah. I think that's the question of, well how, um-Uh, yeah, B2B to C is basically to me what I've been calling an agent lab, which is kind of like you're not in a model lab, but you're making a very good wrapper that is a platform that other people can sign up so they don't have to code those things. Yeah, it sound, it sounds like a much better market than the direct OpenClaw market.Ivan [00:42:56]: I've like - We I've done multiple things. So the CodeAnywhere's part of our career path R in the calendar, was very much an end user developer product. And so that is great. It You can get a lot of developer love, and I feel that we do as a company have a bunch of developer love. But it's a different type, where it's people building these things. Again, it's more akin to a Twilio because you don't really run - As a person, you wouldn't run Twilio. I don't know how many people remember. It was like ask your developer billboard and whatnot. And people really love Twilio, but they only used it inside of like, “Oh, I'm building this app or service for thing.” And so we're very much directly to that. And you also know that I used to work for a competitor for Twilio, so it's kind of ingrained, in my DNA.Swyx [00:43:35]: People don't know InfoBip is that big.Ivan [00:43:38]: Yeah, it's.Swyx [00:43:39]: Because.Ivan [00:43:40]: It's a billion euro.Swyx [00:43:40]: They're all American. They're like, “Whatever's in Europe doesn't matter to me.” But like it's the, it's the same size or bigger? Same size?Ivan [00:43:46]: It's about half the size.Swyx [00:43:47]: Half the size?Ivan [00:43:48]: Yeah, about half the size.Swyx [00:43:48]: It's like, yeah.Ivan [00:43:48]: Still huge. Multiple billions a year. Yes.Swyx [00:43:51]: That's crazy.Ivan [00:43:51]: Exactly, and so that - These are like really interesting and large revenue-generating, very sticky businesses. Whereas when you're selling to the - When your focus is the end developer, it is a very hard sell because they're very price sensitive, very price conscious, very around that. And there's very It's very hard to scale. Your cap is the number of people that are willing to spin up - First of all, wanna spin that up, and then spin up multiple of these. Whereas if you're in the enterprise one, like we know everyone's talking about like how many tokens they're spending, I'm spending. Like a lot of companies today are like, “If this is our company, spend as much as you can.” Like basically that is where we're going. And so if you think about that paradigm, where you're selling to companies that say, “Spend as much as you can to generate, productivity,” versus, “Oh, I'm a single person. I have this much budget, and I'm doing this thing because it's fun or it's helping me out or whatever.” Like it is a different, it's a different go-to-market, I think, strategy.MCP, CLIs, and Sandboxes as the Agent RuntimeSwyx [00:44:50]: Yeah, there's a lot of discussion. I'm just kind of going through like the mental list of things that are in your favor, which is, for example, MCP versus CLI. Like obviously you want CLI. It's been very good for you. I feel like it's maybe a drop in the bucket or maybe it's huge. I'm just checking whether it's like these are big trends.Ivan [00:45:10]: Those things you - work well in our favor, to your point just because every.Swyx [00:45:13]: They're kind of drop in the bucket, right?Ivan [00:45:15]: I think it's like sort of all the things come together. And so there's so many things that impact that. To your point, like OpenClaw wasn't huge for us, but like having the agent SDK, from Anthropic, so or Cloud Claude Code was very interesting. The reason why it was interesting is that a lot of, let's call them app I don't know what to call them, app layer agent companies, essentially they are like, “Oh, I can create this new app, this new agent. All I need, I just use Claude Code, and I throw it into a sandbox, and then I have my interface to the human to that.” And so that enabled so many more companies to actually offer this, and then they would pull on sandbox. So that was, that was interesting. And to your point, like MCP, versus the CLI, the MCP is an interface against an API, whereas the CLI is like you can actually go do things. Like this is it. The difference between integrations and actually running scripts or data or analysis against a thing. So being able to use a CLI very well enables the agent to do more things, and it's because that people will invoke a sandbox, they'll run it in the CLI, and but it'll do anal-analysis on that data and then give you an actual result versus just, pulling data from an API source.Swyx [00:46:29]: Yeah, it's a layer of indirection basically, it's the same thing as agentic search versus RAG, which where you're.Ivan [00:46:34]: Exactly, yeah.Swyx [00:46:34]: Just like you just win whenever people put more agents into their workflow. And so like it doesn't really matter, but I'm just kinda teasing out like what else have people heard about that like it's sort of, “Oh yeah, this is another sandbox use case. Oh yeah, that's another one.” Am I, am I missing any big ones?Ivan [00:46:51]: The thing, the thing that people, which is the computer use stuff, which I think is probably the most interesting one, is, and to your point, we've talked to so many people over the last year. It's like, “Oh, like why do you need a sandbox? Why do you need this? Why this?” And to your point, it's like, “Oh, I need sandbox for this. I need sandbox for that. I need sandbox-” It's like, “Oh, I need it for every single thing.” And so basically what I, what I - and it sounds like a broken record, it's like you use a laptop every single day, right? And you are n of one. It's just you. But now imagine how And by the way, the laptop, the computer PC market, the PC market is about equal to the cloud market in total. So it's about 150, 180 billion a year. Something like that. It's about roughly the three cloud hyperscalers is about equal to like Apple, HP, Lenovo, whatever, It's a little bit less, but it's sort of like that. And now imagine And that's just like, so how big is the addressable market? What, how many people are there in the world now? What's the last data?Swyx [00:47:45]: Let's call it eight billion.Ivan [00:47:46]: Eight billion. And so let's say you can have two computer, like you have one personal and one business, whatever. Like so it's double that, right? and so that's 16 billion, right? How many agents are gonna be running in two years, in 10 years, in 100 years? Like And for every single task, they will need one of these. And so how big is that? That market is essentially quote unquote “infinite”. You will get to the point, and Dylan Patel was at the conference talking about, from SemiAnalysis, that talks usually about GPUs, was also talking about how CPUs will now be a bottleneck because it will be the constraint. You won't be able to grow, or we won't be able to have enough of these because there won't be enough CPUs to basically do.Swyx [00:48:23]: Yeah. Well, I actually had a really good podcast with Doug Oliphant, who, which was his president at SemiAnalysis, where they've basically been like, yeah, it's been a GPU shortage first, but then it's cascaded down to memory and now to CPUs.Ivan [00:48:35]: CPU, yeah.Swyx [00:48:35]: It-What's next? So networking. So, networking actually has been in shortage for a while if you're looking at, just GPU networking. But, yeah, it's really crazy the amount of computer use that's going on, yeah, cool. I, other questions are, just the one very big part is the open sourceness which you didn't have to do, your competitors don't do, like it's not, a lot of people are worried about keeping their projects open source because some competitor can just slot fork it. I don't know if there's any reflections on just being an open source company.Open Source, Trust, and Enterprise ProcurementIvan [00:49:15]: Yeah. There's a bunch. So we the original product that we did was open source.Swyx [00:49:19]: Yeah. CodeAnywhere.Ivan [00:49:20]: So doing that was actually very good for us. There's basically a saying of, What's the saying? Like, companies that are, that are doing really well, measure themselves against, free cashflow, that are kinda okay, it's EBITDA, then, it's, it goes all the way down.Swyx [00:49:36]: The worst is like GitHub stars.Ivan [00:49:37]: GitHub stars. GitHub stars are the worst, yeah. So you go all the way down to GitHub stars. And so our original one was GitHub stars. That's what we talked about, we're at the point we're talking about revenue, so we're we've gone up the stack on that. And so we started.Swyx [00:49:47]: No, profit.Ivan [00:49:48]: Yeah. We haven't, we're, we'll get there. We'll get there. But basically at that point we did stars and GitHub and it was useful, and the original variation that we did, it we split the core into its own repo and it was Apache 2.0, so very, permissive. And then we basically would bundl

Hacker News Recap
May 20th, 2026 | Meta blocks human rights accounts from reaching audiences in Saudi Arabia, UAE

Hacker News Recap

Play Episode Listen Later May 21, 2026 15:28


This is a recap of the top 10 posts on Hacker News on May 20, 2026. This podcast was generated by wondercraft.ai (00:30): Meta blocks human rights accounts from reaching audiences in Saudi Arabia, UAEOriginal post: https://news.ycombinator.com/item?id=48206768&utm_source=wondercraft_ai(01:58): An OpenAI model has disproved a central conjecture in discrete geometryOriginal post: https://news.ycombinator.com/item?id=48212493&utm_source=wondercraft_ai(03:26): Goodbye Visa and Mastercard: 130M Europeans switching to sovereign paymentOriginal post: https://news.ycombinator.com/item?id=48207004&utm_source=wondercraft_ai(04:54): Tennessee man jailed 37 days for Trump meme wins settlement after lawsuitOriginal post: https://news.ycombinator.com/item?id=48208502&utm_source=wondercraft_ai(06:23): GitHub confirms breach of 3,800 repos via malicious VSCode extensionOriginal post: https://news.ycombinator.com/item?id=48207660&utm_source=wondercraft_ai(07:51): Qwen3.7-Max: The Agent FrontierOriginal post: https://news.ycombinator.com/item?id=48205626&utm_source=wondercraft_ai(09:19): GitHub is investigating unauthorized access to their internal repositoriesOriginal post: https://news.ycombinator.com/item?id=48201316&utm_source=wondercraft_ai(10:48): Incident Report: Railway Blocked by Google Cloud [resolved]Original post: https://news.ycombinator.com/item?id=48201484&utm_source=wondercraft_ai(12:16): Everything in C is undefined behaviorOriginal post: https://news.ycombinator.com/item?id=48203698&utm_source=wondercraft_ai(13:44): Google Declaring War on the WebOriginal post: https://news.ycombinator.com/item?id=48214449&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Elon Musk Pod
OpenAI trades tokens for startup equity

Elon Musk Pod

Play Episode Listen Later May 21, 2026 13:53


During a Y Combinator event on Tuesday night, Sam Altman had what YC partner Tyler Bosmeny called a “mic drop moment.” Altman offered $2 million worth of OpenAI tokens to every startup in the current class in exchange for equity in the startup.In other words, he promised that OpenAI would invest in the whole class, not with cash but with an allotment of AI tokens that startups can use to build their products.

This Week in Startups
Avi Patel on the startup that copied Kled and why he called out General Catalyst by name | E2291

This Week in Startups

Play Episode Listen Later May 20, 2026 94:22


This Week In Startups is made possible by:Grasshopper Bank - https://grasshopper.bank/twistLinkedIn - https://linkedIn.com/twistNorthwest Registered Agent - https://northwestregisteredagent.com/twistPlaud - https://Plaud.ai/twist Why raise $200 million if you are already profitable? That's the question Jason and Alex put to Mercury's founder and CEO, Immad Akhund, after the entrepreneur raised another massive round for his upstart, technology-friendly bank. TWiST then welcomed Kled founder Avi Patel to discuss the startup he considers a clear ripoff of his own company. Jason gavels in verdicts on all parties involved, including Y Combinator and venture capital firm General Catalyst. The show closes with a news lightning round, including OpenAI's decision to offer $2 million in token credits to hundreds of startups.Guest Links:Mercury https://mercury.comMercury funding announcement https://www.businesswire.com/news/home/20260520511817/en/Mercury-Raises-$200-Million-Series-D-at-$5.2B-ValuationImmad Akhund on X https://x.com/immadKled https://www.kled.ai/Avi Patel on X https://x.com/avipat_/Avi's complaint https://x.com/avipat_/status/2055384102409253056General Catalyst https://www.generalcatalyst.com/Y Combinator https://www.ycombinator.com/Delve https://techcrunch.com/2026/04/23/another-customer-of-troubled-startup-delve-suffered-a-big-security-incident/Discussion links:Anthropic's attack on secondary trading https://techcrunch.com/2026/05/12/anthropic-warns-investors-against-secondary-platforms-offering-access-to-its-shares/Vanta https://www.vanta.com/twistTimestamps:0:00 Welcome to This Week in Startups!2:14 Plaud: If your work depends on conversations — interviews, meetings, calls — you need a Plaud NotePin. You can check it out at https://Plaud.ai/twist and use code TWIST for 10% off!3:27 Immad Akhund (Mercury) joins to discuss $200M raise6:33 Mercury's origin story and path to $650M run rate9:53 Northwest Registered Agent - Get more when you start your business with Northwest. In 10 clicks and 10 minutes, you can form your company and walk away with a real business identity — Learn more at https://northwestregisteredagent.com/twist14:23 Stablecoins: where they work, why Mercury won't launch its own20:13 LinkedIn - Thanks to our partners at LinkedIn! Post your job for free at https://linkedIn.com/twist then promote it to get access to LinkedIn Jobs' new AI assistant.22:38 AI agents, and the future of money movement27:30 Why Mercury raised less this round30:11 Grasshopper Bank - Time is money. Don't waste either. Go to https://grasshopper.bank/twist and get an exclusive $500 cash bonus just for opening an account.42:48 Avi Patel (Kled) joins to discuss copycat startups57:06 Jason's verdict on YC's hacker culture & "appearance of impropriety"1:17:39 Sam Altman's $2M-in-tokens-for-equity offer to YC founders1:24:32 NYC hotel housekeepers cross $100K in time under new union contract1:30:14 Minimum wage, immigration & the case for raising it slowlySubscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.comCheck out the TWIST500: https://www.twist500.comSubscribe to This Week in Startups on Apple: https://rb.gy/v19fcpFollow Lon:X: https://x.com/lonsFollow Alex:X: https://x.com/alexLinkedIn: ⁠https://www.linkedin.com/in/alexwilhelmFollow Jason:X: https://twitter.com/JasonLinkedIn: https://www.linkedin.com/in/jasoncalacanisCheck out all our partner offers: https://partners.launch.co/Great TWIST interviews: Will Guidara, Eoghan McCabe, Steve Huffman, Brian Chesky, Bob Moesta, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarlandCheck out Jason's suite of newsletters: https://substack.com/@calacanisFollow TWiST:Twitter: https://twitter.com/TWiStartupsYouTube: https://www.youtube.com/thisweekinInstagram: https://www.instagram.com/thisweekinstartupsTikTok: https://www.tiktok.com/@thisweekinstartupsSubstack: https://twistartups.substack.com

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal GCP AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.Railway did not start as an AI infrastructure company.It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.From rebuilding Railway's network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway's founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.We go deep on Railway's infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.We discuss:* How Railway went from a slow six-year grind to adding 100,000 users a week* How Railway thinks about agents as the next dominant software species* Why agents need version control, observability, compute, storage, and orchestration at 1000x scale* The economics of Railway's own-metal data centers and three-month payback* How Railway uses cloud bursting while scaling its own infrastructure* Why data center debt can be a better tool than venture debt for infra startups* Central Station, Railway's internal system for clustering customer feedback and incidents* Why responsible disclosure and over-communication matter for platforms* Why feature flags, progressive rollouts, and shadow traffic are essential for agents* Temporal's strengths, pain points, and why workflows matter for agents* Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems* Why “cattle, not pets” may change if you can clone the pets* Why Railway is building a new cloud from scratch instead of copying hyperscalers* The solo founder path, focus, writing, and how Jake thinks about company buildingRailway:* Website: https://railway.com/* X: https://x.com/RailwayJake Cooper:* LinkedIn: https://www.linkedin.com/in/thejakecooper/* X: https://x.com/JustJakeTimestamps00:00:00 Introduction: What Is Railway?00:02:07 Jake's Path to Railway00:06:13 Railway's Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway's Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing ThoughtsTranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space.Swyx [00:00:10]: Hey, hey, hey. Today we're in the studio with Jake Cooper of Railway.Alessio [00:00:14]: Conductor of Railway.Swyx [00:00:15]: Conductor at Railway. Yeah.Alessio [00:00:16]: Choo-choo.Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?Jake [00:00:20]: We call some of our volunteer moderators conductors. I don't have a business card. We're not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”Swyx [00:00:30]: Business cards are coming back.Jake [00:00:32]: They're cool. They're hip. The conductor thing is good. We're trying to figure out what we want to call each other internally. Some people think it's super cringe and say, “You don't need a name for people internally.” Some people want to call each other something. We still don't have a really good one.Jake [00:00:55]: We've got New Railcrews, Trainiacs. Nothing has stuck yet.Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don't know, what is Railway? Let's give people a crisp definition up front.Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you're off to the races.Swyx [00:01:22]: You've got a nice animation on the landing page.Jake [00:01:24]: Thank you. None of my work, by the way. They don't let me touch the design stuff anymore.Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.The Railway Origin Story: From Uber Systems to a New CloudSwyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.Swyx [00:02:44]: Which, by the way, I'm happy to talk about, pros and cons.Jake [00:02:48]: Totally.Swyx [00:02:51]: But let's do the Railway story.Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it's walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don't care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.Jake [00:03:17]: I don't have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That's what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.Swyx [00:03:49]: Other patches to the Linux kernel this week?Jake [00:03:51]: Yeah. Not upstream. Our fork.Swyx [00:03:52]: That's a flex. Railpack? No, this is different. This is the OS on top of Railpack?Jake [00:03:57]: No, this is an actual kernel patch. It's always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we're building for some of the agentic stuff. Maybe it'll be useful upstream, but it's deeply useful for us internally.Open Source, Forks, and Non-Deterministic VersioningSwyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?Jake [00:04:38]: GitHub's original sin is that it's almost a series of broken pointers. You have this thing, then you clone it, and now you've lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?Jake [00:04:51]: We think of Git in a discrete sense: I've either made a change and merged upstream, or I haven't. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don't take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.Jake [00:05:53]: It's okay if Johnny Vibe Coder gets a broken patch because there's so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.The Long Grind: First Users, Free Tier, and Making the Business WorkSwyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?Jake [00:06:22]: Daily signups, I think.Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you're on a rocket ship. You say, “Don't doubt your fight and don't quit.” Maybe pick out certain points that were key inflections for the company.Jake [00:06:40]: At the start, it's about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how's it going?” It was rare, so getting those first 100 users to come back was the start.Jake [00:07:14]: Then you build a consultancy factory because users want all these things. You have to go back to the board and ask, “What is the actual product offering I want to build on top of this?”Jake [00:07:28]: VCs want charts that always go up and to the right, but in reality you don't necessarily want charts that look like that. For us, there have been periods of expansion where we add features to test use cases, and periods of compaction where we ask, “If the experience we have is good, how do we make it significantly better?” Maybe we strip out features that don't fit our ICP anymore.Jake [00:07:57]: The boom from 2022 to 2023 came from the free tier. Everybody under the sun was using it.Swyx [00:08:09]: A lot of Reddit bots and Discord bots.Jake [00:08:12]: And crypto miners. When you build an open product on the internet where anybody can sign up, the internet is a horrible place with so many things. You go through periods of asking, “How do I reach as many people as possible?” Then, “How do I fit the exact use case for the people who really matter and are really excited about this specific thing?”Jake [00:08:39]: Then there was a two-year period of making the actual business work. During the free-tier era, we were losing about half a million dollars a month.Swyx [00:08:59]: On a $20 million bank account.Jake [00:09:02]: On a $20 million bank account with maybe $50,000 a month in revenue. That's a horrible business. I don't know how anybody invested. But you have to go through it and say, “We have an experience people love, but the business has to work.”Jake [00:09:17]: There are two schools of thought. You can run the horrible business all the way up with bad margins, or you can go back and make it work. We've always wanted a super lean team. We're 35 people right now. It's very small.Swyx [00:09:36]: Supporting three million already?Jake [00:09:38]: Yeah. We're adding 100,000 users a week right now, so it's growing fast. We don't want to add headcount for the sake of headcount or throw bodies at problems. We want to build systems. It's hard to build systems during expansion because you're adding things to the system because people are asking for them or things are breaking.Jake [00:10:00]: We had to cut off the free users for a little while, rebuild the business, and make sure it worked. We want to reach as many people as possible because software is important. It's become difficult to create things in the physical world, so it's important to make it easy for people to build in the virtual world and have access to creation. But there are legs to that journey.Jake [00:10:30]: You can see divots in the charts. If you follow between 2025 and 2026, it's either summer or winter. People go on holiday with family.Swyx [00:10:50]: It affects that much?Jake [00:10:51]: Yeah. It's kind of B2C and kind of B2B. People are shipping constantly, then they stop. Our activation curve now shows more people activating on weekdays because we have more business users, so it smooths out over time.Agents as the New Interface to DeploymentSwyx [00:11:17]: Was there a point where you started prioritizing AI development or agent development?Jake [00:11:24]: We've prioritized agentic as a top-of-funnel thing. Over the last six months, we've deeply prioritized agentic as a mechanism to build and deploy things because we believe the curve is so steep and that is how people will build and deploy software.Jake [00:11:42]: It almost fundamentally doesn't matter whether this is dot-com or not because we're all on the internet anyway. If agents are going to deploy a bunch of things and we hit an inference wall at some point, we'll fix those problems. The dominant species over the next 10 years is that we've moved from assembly to C to C++ to JavaScript to words. You're going to need to close that loop.Swyx [00:12:13]: When you say this is dot-com, did you mean buying the domain, or the general case?Jake [00:12:17]: I mean the dot-com era, when companies had a huge run-up because people understood the internet was important. Then they hit bottlenecks, fundamental laws of physics, math didn't work, and everybody came back down to earth. But it didn't matter because the internet became so impactful. If you operate on a long enough time horizon, you should build these things anyway because you can see where it's going.Jake [00:12:45]: That's where I think a lot of agent stuff is. You get to a point where you're running thousands of agents in parallel. What is the inference cost? What is the compute cost? How do you make that efficient? How do you coordinate all this? We have issues coordinating humans; we don't even have good tooling for that. Now we have to figure out how to get agents to coordinate, safely version changes, and know when to raise their hand for someone to intervene. Otherwise it becomes an interrupt factory.Railway's Infrastructure Thesis: Network, Compute, Storage, and MetalSwyx [00:13:19]: Let's go right into the technical side. What are the core infrastructure or architectural beliefs of Railway that allow you to do what you do?Jake [00:13:29]: The primitives matter a lot for us. We need network, compute, storage, and orchestration around it. You need control over a lot of those things. We've talked a lot about how we don't really use Kubernetes because we want higher-order control to place workloads in very specific places.Jake [00:13:48]: The reason is that you have to be very efficient with agents: memory reuse and all these other things, or you're going to massively blow up your cost structure. Being able to rack and stack your own servers and build your own metal unlocks performance and cost. Experiences where you're running 1,000 agents in parallel are not massively cost prohibitive.Jake [00:14:13]: Token use and compute use are blowing up. Over time, those things have to get a lot more efficient. You can get a lot of margin to make those experiences solid by building your own metal. That's all in service of offering a differentiated experience to as many people as humanly possible.Swyx [00:14:51]: You have a data center in Singapore.Jake [00:14:53]: Yeah. We have two in every other region now. In Singapore, we're adding a second one in Q3.Swyx [00:14:58]: What's it like? I've never built a data center. Do you go to Equinix and say, “I want some slots?”Jake [00:15:05]: Yeah. Equinix. You basically go and say, “I want power and I want a cage.” They say, “Great, here's what it's going to be.” You rent the cage for a period of time, fill it with racks and servers, and hook up internet to it. That's all the pieces.Swyx [00:15:36]: Then you handle everything else.Jake [00:15:37]: You handle everything else.Swyx [00:15:39]: What's the math versus clouds doing it for you?Jake [00:15:43]: If we rented in the cloud, our payback period when we go to metal is about three months.Swyx [00:15:50]: Which is crazy.Jake [00:15:51]: It's nuts. That's four years of depreciated hardware. You're going to see a lot of this compute crunch because hyperscalers are buying up a lot of stuff. We're working directly with OEMs, resellers, and people building these machines: Supermicro, Dell, and others.Jake [00:16:11]: Upstream, there's a bunch of supply pressure. When we raised our last round, between deploying capital for servers and now, the amount of money we've raised is less than the amount of money we have in the bank plus the value of the servers because the servers have appreciated as RAM has gone up. It's nuts how valuable hardware has become.Jake [00:16:50]: If you look at hyperscalers, they deployed around $80 billion of capital expenditures this year, and next year will be more. That's a massive infrastructure build-out. You look at that and think it's crazy that they're spending way more than the Manhattan Project. But if every person is going to run dozens or hundreds of agents in parallel, you have no conceptual idea how much compute is required to make that experience happen, even if you're deeply efficient and sharing resources. And that doesn't even count inference.Swyx [00:17:22]: How do you plan the build-out? The growth chart is so vertical. Are you usually at 100% utilization as soon as racks are live? How far ahead are you planning?Jake [00:17:33]: We still maintain cloud presence for bursting. We work with AWS, GCP, and a few other clouds. We can rent, and then the moment we get space or power, we compact those workloads off the cloud. We started on the clouds, then built a system to migrate to our own metal. There's nothing that says you can't continually do that again, and that's exactly what we do. We never want to be compute constrained.Jake [00:18:09]: At the start of the year, we actually became compute constrained because one upstream provider wasn't able to give us quota at the rate we needed, and the hardware was slower. I spent a weekend rebuilding our entire network overlay so we could straddle five clouds: Oracle, AWS, ourselves, GCP, and one other one. We can do more than that now.Jake [00:18:38]: We got into a spot where we were trying to pack instances tight because we couldn't get enough compute. That led to a few reliability issues, which are now past us. I made a tweet pointing out that it's becoming harder and harder to acquire compute at the rate these models need to acquire compute. We got bit by it.Swyx [00:19:15]: How do you think about pricing knowing you might not have your own metal available at all times? Are you pricing assuming you need extra margin if you end up going into the cloud?Jake [00:19:26]: Because we've built out our metal data centers, our margins on metal are around 70%. We can deeply subsidize the cloud business if we want to scale at a reasonable rate. We have a few levers: metal, which makes the margins; cloud burst; debt to buy servers; and venture capital. It's an interesting operational problem: how much cash do we have, how much should we raise, how quickly can we deploy it, and can we scale revenue as quickly as we scale compute?Jake [00:20:05]: If we continue making it trivially easy for people to build and deploy, then the faster we close that loop and the more operationally excellent we are with capital, the faster the business can scale. It's almost a straight linear deployment rate.Financing Infrastructure: Hardware Debt, VC, and Operational LeverageSwyx [00:20:20]: I think infra startups raising debt is a tool people don't utilize enough or know enough about. What can you tell us about that? Is it secured against your CPUs?Jake [00:20:32]: It's secured against our hardware.Swyx [00:20:37]: What rates do you get? Who are the lenders?Jake [00:20:39]: We pay prime plus a spread, and we can refinance any of the debt as rates go down. The terms are pretty good. The unfortunate thing is that Twitter has no nuance, so people say, “Venture debt bad.” But as with all things, there are specific tools and areas where you can be deliberate instead of using one tool as a hammer. Venture capital is not the hammer for everything. You have to explore and figure out what works.Swyx [00:21:12]: VC is usually the most expensive financing you can get.Jake [00:21:15]: Yeah. I also think people think about VC incorrectly from a capital-raising perspective. Most people think, “How do I raise as much money as possible from whoever is probably the best I can get at that time?” That's close to right, but what we've tried to do is figure out what unfair advantage we can buy with that equity.Jake [00:21:34]: It's the most expensive equity you're going to give away at that point in time, assuming the company keeps getting better. How do you use it to work with someone stellar who complements you? In the seed stage, I had never started a company. Ray Tonsing had good advice, and I could text him all the time. He was really fast. Awesome.Jake [00:22:01]: Then with John and Erica at Unusual, they said, “You roughly know what you're doing building a product. We'll mostly leave you alone and be available for advice.” Amazing. Then we got to Series A and the business was an operational tire fire because we didn't know how to scale a business. Work with Erica, and Jordan is over at Redpoint, so bonus.Jake [00:22:28]: Now we've raised from TQ and FPV as we're moving into enterprises. Every step of the way, we've asked: who can we partner with at this specific time to unlock the next section of the journey? I don't know enterprise sales. As an engineer, I can eyeball what features we might need, and we have wonderful people internally who can help. But you want boardroom dynamics where everyone is aligned and asking, “How do we win this?” instead of bickering about strategy.Data Centers in Space and the Physics of ComputeSwyx [00:23:31]: You had a tweet about data centers in space. Why no data centers in space?Jake [00:23:37]: It's not “no data centers in space.” My hot take is that I think it is solvable. I've just never seen anybody solve it.Swyx [00:23:49]: You said, “How are you going to dissipate that much heat in a vacuum?” You're making a physics claim.Jake [00:23:55]: I haven't seen anybody prove how you're going to dissipate that much heat in a vacuum. It doesn't mean it's not possible. It just means nobody has brought it up yet.Swyx [00:24:05]: Astrophage.Jake [00:24:06]: I don't know what that is.Swyx [00:24:07]: The Martian thing. Okay, you're very logical.Jake [00:24:09]: It could work. A lot of people are putting the cart before the horse. They say, “We're going to put data centers in space.” Okay, but how? “We have time to figure it out.” It's like in The Martian where they ask how they're going to intercept something and say, “We'll figure it out.”Swyx [00:24:36]: Making a bet on human invention is weird because you blind trust that it can be solved. But with physics, there are first-principles bounds you can put on it. Maybe not. Maybe you're asking to travel time or break a fundamental thermodynamic law.Jake [00:24:57]: I don't know how VCs do this either. How do you know what's not possible and a grift versus what's possible but sounds completely insane? “We're going to put data centers in space.” Coin flip as to which it is, and I guess you'll know in 10 years. That's one cycle.What Agents Need: Versioning, Observability, and 1,000x ScaleSwyx [00:25:23]: Moving back to agents. The branching, fast spin-up, and orchestration you do feels like pre-work that happened to be exactly what agents want. What do agents want differently than humans?Jake [00:25:37]: They want the ability to version things. It's not that different; it materializes slightly differently. Agents want a way to test changes incrementally. Engineers have feature flags. Is there a reason agents can't use feature flags? I don't think so.Jake [00:25:54]: They want version control. Can we use Git or not Git? That one is up in the air. I think something outside Git will emerge for how we version these things over time. They need observability. You need to query what happened, when it happened, which steps failed, traces, logs, metrics, and all the rest. They need network, compute, and storage. They need to write files, save files, iterate on files, and snapshot file systems.Jake [00:26:25]: A lot of what humans needed is in line with what agents need. Branching and forking are not different; we're just moving 1,000 times quicker. It can look like you need something massively different, but what you need is something massively better than what existed. You need orchestration massively better than Kubernetes. You need networking probably better than Envoy. It goes all the way down the stack.Jake [00:26:55]: If the workload profile doesn't change so much as it gets massively compressed because you need thousands of these things, what assumptions change? etcd is going to melt. You need to replace it with something. You can go all the way down the stack and say, “That part has to change, that part has to change, and that part has to change.”Jake [00:27:19]: The interesting thing about the super-exponential curve is that you have to build systems where you can rip out those parts at any time because a new bottleneck might emerge. You get good at parallel agents, and a different part of the system breaks. So it's similar to what humans needed, but at 1,000x scale.Jake [00:27:55]: How do you do code review in the age of agents?Swyx [00:28:00]: You throw more agents at it.Jake [00:28:01]: You don't. But then who reviews for CVEs and all these other things?Swyx [00:28:07]: More agents.Jake [00:28:08]: And that's how we hit the inference wall. You can continually throw agents at the problem, but I think there's a limit to the number of agents you can throw at a problem.CLI, Agent Handles, and Closing the LoopSwyx [00:28:24]: You already had a CLI before it was cool. How is the shape of what you're exposing changing, if at all?Jake [00:28:28]: CLIs have always been cool. The CLI changes because we think about how to give Claude, Codex, ChatGPT, or any model a handhold.Jake [00:28:50]: A CLI is a single command: deploy, get logs, and so on. Things that were prohibitively annoying to humans are not annoying to agents. They're nice. If I handed you a CLI with 40 arguments and 600 flags, you'd think, “I'm never going to use all of this.” But if you hand it to an agent, it says, “This is excellent. I have so many handles to work with.”Jake [00:29:24]: If you're going to expose things to agents that way, you want as many handles as possible where they can get information, query dynamic information, and close the loop quickly. Most problems right now are about how to close the loop as quickly as possible. Where does the agent get stuck, and how can you remove that?Jake [00:29:49]: Telemetry is important. If you can tell where the agent gets stuck from the CLI and say, “12% of people deviate from the happy path because of this, and now I add this argument and drive it down to 2%,” you massively increase the rate of loop closure.Jake [00:30:03]: That's how we think about not just the CLI, but every point in the dashboard. It's a user journey: I hear about Railway. I get something deployed. I get my first green build or aha moment. I see an endpoint, logs, whatever. Then I iterate. The iteration loop is indefinite. The user wants to deploy a new thing, a Postgres instance, change code, and keep iterating.Jake [00:30:36]: If you focus on the iteration loops and what's blocking them from closing quickly, one thing we say internally is: you never want to be waiting on compute anymore. You always want to be waiting on intelligence. If you're waiting on compute, there's a bottleneck that needs to be destroyed because eventually that bottleneck becomes so large that another workflow emerges to change it.Jake [00:31:04]: We've built a product where you push code, build it, and so on. But I fundamentally believe the push-pull loop is going away. We'll get to a point where you make a small change in production, that change is versioned across your infrastructure, you're working alongside copy-on-write versions of your database and infrastructure, and then you merge it in and it's instantaneously live. That's the holy grail of loops. The push-pull-rebuild thing is a point of friction that we're removing entirely.Canvas as Output: Dashboards, Context Anchors, and HyperstructuresSwyx [00:31:43]: It's incredibly fast. If anyone hasn't tried it, that fast feedback is great. My hot take is that Railway was famous for its canvas, which visualizes your infrastructure and lets you manipulate it visually. But that was for humans. For the next phase of growth, Railway CLI is more important than canvas.Jake [00:32:05]: The canvas is funny because it's a mechanism to show changes over time. You're right that previously we used it a lot as an input. Moving forward, its goal is more like an output. You would go to the canvas, make changes, see them, and watch your infrastructure evolve. Now agents have access to the CLI and can make those changes. So the canvas becomes an output: what information does the human need at this moment to make suitable decisions about control requests? Do I approve this or not?Jake [00:32:57]: It also has to be an anchor for your context, a port in the storm. Think of it like layers in a file system. You start with a project, then drill down into services, then into a function or code, because you want to represent the entire thing not just in your head, but in the canvas. Other people can share that representation, think on the same wavelength, and move quickly.Jake [00:33:33]: A lot of organizations get in trouble as they scale because all the context lives in someone's head. “How does this microservice work?” “I have no idea; go ask this person.” Then you have whole categories of products built around context discovery. A lot of that melts away if you have a solid hierarchy and can infinitely nest services, code, context, and everything else all the way down. That's what lets you build these structures over time.Jake [00:34:18]: It's also what lets us build what I've called hyperstructures: things that are way bigger. You look at the Golden Gate Bridge and ask, “How did we build that?” There's a meme that we lost the technology. To some extent, yes, because the coordination that built those things evolved and changed. We lost some of the art of building structure as we jammed everything into Slack.Swyx [00:34:52]: But you jam everything in Discord.Jake [00:34:53]: Same point. It doesn't matter. It's message passing and interrupts, message passing and interrupts.Swyx [00:35:00]: So you're arguing there should be something better and more structured than Slack?Jake [00:35:04]: Yeah. For sure. I think Slack is awful, and Discord is awful too.Central Station: Context Routing, Support, and Incident ClustersSwyx [00:35:09]: This is the equivalent of my mom test. What have you done that has your solution to this?Jake [00:35:15]: Internally, we've built a tool called Central Station that aggregates all the context from our users. Every piece of feedback, every customer support item, everything gets aggregated into clusters. If an incident is brewing, we can determine how many users are affected and break off a discussion based on that.Jake [00:35:40]: That is more helpful than long-running channels where you're trying to decide which channel to put something in. If you can dynamically aggregate information and dynamically route it to the right person based on context, it works better. We know internally that these four people are close to networking. If we see a networking thing, we can drill it down to those four people. If it's with this part, we can look at the commits. This is no longer a manual process internally.Jake [00:36:13]: If you go to station or help.railway.com, that's why we built it. We wanted to scale with a massive amount of leverage by aggregating feedback.Swyx [00:36:27]: This is built in-house?Jake [00:36:28]: Yep.Swyx [00:36:29]: I remember helping out on this one with Angelo in 2023. You scale a lot with a very small team.Jake [00:36:38]: Yeah. We're about 10 times bigger now.Swyx [00:36:40]: You have your full developer code here? Very cool.Jake [00:36:44]: If you go to railway.com/stats, we expose this as a pub-sub-able thing. It's all real-time metrics. There's a way to get it as JSON somewhere if you care.Jake [00:37:01]: We're big on trying to build everything in public and talk about what we're working on. We've had issues in the past, and we'll say, “Here's how we're fixing these things.” We've gotten compliments and flak for incident reports. We're always trying to make them better and talk with people.Incidents, Disclosure, and Progressive RolloutsSwyx [00:37:20]: You had a big one recently. I liked that it was scoped to 3,000. You presumably used Central Station. Talk through what happened and how you address it internally as a team.Jake [00:37:38]: Internally, this one really sucked. It had to do with an upstream provider that didn't do the behavior it said it documented, which is unfortunate given they wrote the RFC for how the behavior should work. We rolled those things out, and Central Station caught it initially when a couple users said caches weren't invalidating. We turned it off immediately.Jake [00:38:03]: When you roll out to a large user base of three million people, you get a lot of disparate behaviors. We tested in staging and had tests, but we hit an edge case. We've hardened those systems, and now we can make that better. But it was a tough one.Swyx [00:38:39]: I always wonder how private disclosure is supposed to work if people find an issue. Are they supposed to contact you first? When you run a platform, these things will happen. What channels should people pursue to quietly resolve it before it becomes a bigger incident?Jake [00:38:59]: There's responsible disclosure. We err on the side of over-disclosing and letting you know something is wrong versus having your provider gaslight you. We've erred on sharing those things more publicly, even if they impact a small subset of users. That's a decision we've made internally. We have four values. One is honor. The honorable thing is to notify people to the widest degree at which they may have been affected or there was an issue, and then confront it head-on: why did it happen, what can we do better?Swyx [00:39:45]: Not the whole user base. That's because of incremental rollouts and other things?Jake [00:39:50]: Yeah. Progressive rollouts.Swyx [00:39:54]: That should be the norm at all large platforms.Jake [00:39:58]: It should. A variety of companies do this. There's the quote that Meta runs 10,000 different versions of Meta. To our earlier point about agents, they need the same thing. They need shadow traffic and all these other things. We've built so much ceremony around production being sacred that we need to make it trivially easy to test different behaviors in a safe environment. Then you can make mistakes in a safe environment.Safe AI SRE: Customer Agents, Forked Environments, and Production ParityAlessio [00:40:30]: Do you see a world where these things get automatically caught, not necessarily by your agent, but by your customer's agent? The cache invalidation issue seems easy to check if you know to look for it.Jake [00:40:44]: It's hard because to determine it, we almost need to hook into your observability infrastructure. That's why we have the template loop on the platform: so you can roll things out progressively. You can roll out to Johnny Vibe Coder initially, or push a shard that someone consumes at their own leisure. Or you can roll it out over weeks: 0.1% of people, 1% of people, early adopters, then all the way up. That's the non-deterministic version control we talked about earlier.Jake [00:41:30]: I believe that's where most things should go, because most companies end up building staged rollout systems in-house. It's the same thing built again and again at every company. There's a massive opportunity to consolidate developer debt.Alessio [00:41:45]: You should have a free tier. Model providers give free tokens if you let them use the data. You could give free compute if someone is the number-one shard that goes out and lets you plug into their observability.Jake [00:41:55]: We do that. That's why we talked about the impact on 3,000 people. We start with lower-impact people. Larger companies on the platform are last to receive those rollouts so they have a version of the platform that's deeply stable.Alessio [00:42:16]: I have three services, so I'm sure I get the first rollout. You can nuke my thing at any time. There are all these SRE agent companies. Observability people also want agents that fix upstream problems. You have your own agent in the canvas now. How do you see that playing out?Jake [00:42:39]: It's the stacking entropy problem. If you don't have primitives to make iteration in production safe, it becomes difficult. If you're an observability provider saying, “Here's the fix to this error,” assume 80% are good and make sense. But in the last 20% long tail of complex issues, if you let somebody stamp it, you create an opportunity for an incident.Jake [00:43:08]: That's why forked environments are important. People have staging, but it always drifts from production. You need primitives, workflows, and experience built first-party on the platform so you can fork any service at any point in time.Jake [00:43:33]: I think of the canvas as a sheet of transparency paper. The agent is a little guy you push up into the canvas. It should say, “I need to copy that service and that service so I can test these two things.” It gets a read-only copy of production. Anything that's PII gets marked as a transform when we clone the database, create a copy-on-write version, or read from it. Then the agent makes changes and asks, “Does this actually work?” as close to production as possible.Jake [00:44:22]: That's how close you have to be, or you get massive drift. The system becomes unstable. You see this with massive systems built on Docker for local, Kubernetes for production, and a specific thing for something else. That complexity slows developers and becomes unstable at scale, making it hard to iterate. We want to compress that way down and say, “As close to prod as possible is where we want to be.”From AISRE Skeptic to Agent BelieverSwyx [00:45:00]: I was texting Erica for questions, and she says you were originally not a believer in AISRE. Have you come around on it?Jake [00:45:10]: I flipped, but I'm still not a believer in AISRE if you don't have the primitives to make it safe. If you unleash AISRE on production infrastructure without safe primitives for copying volumes and making sure things are fine, it's going to nuke your production database. It's not a matter of if, but when. I'm a big believer in making those loops safe.Jake [00:45:33]: I was a deep AI skeptic until 2023. In 2024, I thought, “Maybe I can roughly make this thing do it.” In 2025, I thought, “Now I can hold this.” Over winter break, everybody came back saying, “It's almost impossible to hold this.”Swyx [00:46:01]: Did you see this on the Claude docs? CloudBot? OpenCloud?Jake [00:46:06]: It's gotten to a point where it's harder to hold it wrong than to hold it right. There's a scene in Avengers where Vision picks up Thor's hammer and says it's terribly well-balanced. It self-balances and works well. I'm a deep believer at this point that this will be the dominant species: assembly, C, C++, JavaScript, words.Swyx [00:46:35]: It feels like a big jump.Jake [00:46:37]: It is. But it's not like you abandon CPU-based discrete logic and move straight to fuzzy logic. You need both. Your skills should call code or applications or some static structure. You can use skills to distill what the procedure should be or how the code should act.Jake [00:47:02]: I'm coming to a thesis: you need three points. You need a clear spec defining the system, the code, and the tests. When you say it out loud, if you've been in engineering long enough, you're like, “Of course. That's an RFC, tests, and code.” But they all matter. Having them together lets them reinforce each other: the spec and tests match, but the code doesn't, so reconcile it. Or the tests and code match but the spec doesn't, so reconcile that. That's the iteration loop.Jake [00:47:41]: That's why you're seeing people talk about software factories, docs, and reconciliation. Some of that is architectural astronomy if you don't implement it, but that loop is where most things will end up.Swyx [00:48:07]: For listeners, we've been talking about this on the pod for three years: the holy trinity of specs and tests. Itamar Friedman from Qodo is the reference if people want to look it up.Self-Modifying Infrastructure and the End of Push-Pull-RebuildSwyx [00:48:18]: One thing I want to mention on the OpenCloud idea is self-modification. I don't know how Railway would support it, but I have my OpenClaw, and I just tell it it has the Railway CLI and can do whatever. In theory, whatever capabilities or new infra it needs, it can call the Railway CLI, provision it, and add it to itself. The agent can modify its own infra.Jake [00:48:45]: It's nuts. I have a loop set up where you put the Railway CLI on top of something that runs on Railway. You're authenticated as whatever the current box is, and you can make any changes to it. Then you call Railway deploy, and it deploys itself.Jake [00:49:04]: It's like: “I need to spin up this instance of this environment. I already exist in this environment. Excellent, I have access to a Postgres instance now.” That's where we want to go with agentic, self-replicating infrastructure. That's your loop: iterate in production. You continue making changes. If it works, merge it upstream. If it doesn't, throw it away.Jake [00:49:37]: How do you make throwaway copies trivial to spin up and super cheap? The era of “I have an AWS instance with four vCPU and 16 gigs of RAM” is going to get destroyed. If you do that for agents, you need a thousand of those machines. It's prohibitively expensive compared with what we've spent a ton of time figuring out: the atomic unit of deploy, whether you call it isolates, sandboxes, or something else. Only pay for what you use, spin up instantaneously, and close the loop as quickly as possible.Jake [00:50:15]: If the system can self-replicate safely and say, “This is my environment, I'm making these changes,” it can come back with, “Does this look good? This is a new state of infrastructure given this prompt. I think I've solved it.” Then you go back and say, “Actually, it looks different.” It does the loop again. Then you say, “Cool. Apply.”Swyx [00:50:38]: That's retroactively obvious, which is the most useful kind. Any other comments on agent deployment on Railway?Jake [00:50:51]: It's getting better every day. I'm on X or Twitter. You can always yell at me about the parts not working as well as they should, because plenty of things should work way better.The New Serverless: Stateful, Long-Running, Pay-for-What-You-Use LinuxSwyx [00:51:04]: At this stage, when people want massively or embarrassingly parallel compute, they usually talk serverless. I feel like there's a new serverless compared to the previous five years of serverless. You're in that new bucket. Do you have comparisons or philosophical differences you want to call out?Jake [00:51:31]: It's somewhere in between. It's the ability to run stateful, long-running workflows or executions.Swyx [00:51:42]: Vercel has Fluid Compute, Cloudflare has some container thing, Google has App Runner and others.Jake [00:51:55]: That's where everything is roughly going, and it's why we've been working on this for six years. We believe users need access to a computer: a box that speaks Linux. They need to deploy what they want. Other systems change the surface area of what you can build. For us, users need a computer and need to deploy anything they truly want. That's why we've focused on the primitives: network, compute, storage. If we give you those and expose them so you can run things indefinitely, that's where we believe it's going.Jake [00:52:43]: Twitter has no nuance, so everyone says “servers” or “serverless.” It's always somewhere in the middle: I want to run it for a long time, but I don't want to provision the resource statically or pay for things I'm not using. That's been our thesis from day one: pay only for what you use, run it indefinitely, and it is full Linux.Swyx [00:53:12]: That's why I like the naming of Fluid. It's fluid. Flexible.Heroku, Focus, and Carrying the Torch Without Becoming the PastSwyx [00:53:18]: Another milestone is the Heroku official deprecation. You're one of the presumptive new Herokus. “New Heroku” has been a category for as long as I've been in developer tooling. It's finally happening. What was that like? Any behind-the-scenes of, “This is the moment”?Jake [00:53:42]: You have people where you're like, “You were running stuff on here? You, as this company?” It's crazy that names you would know are running on it and now coming to us saying, “We want to move a lot of this off.”Swyx [00:54:00]: Any behind-the-scenes on why Salesforce let Heroku stagnate?Jake [00:54:05]: I can only guess. It's hard when it's not your business. Salesforce's business is to build a great CRM. That's their focus. Then you acquire a compute business as an offshoot. A lot of early Meta people talk about focus. Boz has a write-up about how in the early days of Meta they had no money, so they were forced to focus. Then they turned on the money tree and had no reason not to split their focus.Jake [00:54:52]: But that dilutes your product. You get offshoots where you ask, “Is this the focus of the business?” If it's not core, it languishes. A lot of companies get in trouble when they split focus because they're fighting a multi-front war, not just externally but internally for alignment. Where are we going? What are we doing? What is our purpose?Jake [00:55:24]: If you're Salesforce-built and mission-driven, you want to work on Salesforce. Heroku is off to the side. It's not core to the business. Getting resources, budget, focus, and alignment internally becomes hard. It was a matter of time.Swyx [00:56:06]: Kudos for them to call it out instead of leaving it unknown.Jake [00:56:12]: Their release was a little odd. They called it out, but they didn't say they were shutting it down. Behind the scenes, I think they issued messages to people saying they should close accounts and that they were going to deprecate and remove things over time.Jake [00:56:30]: It's crazy because some of my first deployment experiences were on Heroku. You start with dragging things into an FTP server, then you try to get a deploy working, and then it's Heroku. It was the on-ramp for us. But the wheel turns. New things emerge. We're happy to carry the torch for a lot of that. But we don't want to be the new Heroku. We want to be the way people build and deploy software, and ultimately the way people monetize software over time.Swyx [00:57:19]: It's still a big crown to be the new Heroku. There are 50 companies that fought for that.Jake [00:57:23]: Everybody is holding some portion of it. We're happy to support people and companies. The platform works differently. The game loop is similar, but we've been dogmatic about where these things are going: primitives, agents, fan-out. Some things fit; some workflows need to change. We have an approximation of Heroku pipelines with the environment system. It's exciting. We've got a ton of people we can support, and it's growing a lot.Temporal, Workflow Engines, and State MachinesSwyx [00:58:12]: I have one more technical question about Temporal. I've sold my shares. You're a power user and one of our earliest customers. I met you through Temporal. You built on Temporal. You have complaints. This may be the most neutral and informed conversation anyone will hear about Temporal without someone working at the company.Jake [00:58:39]: That's fair. I've used Temporal for almost 10 years because of Cadence at Uber.Swyx [00:58:52]: Give people a sense of what Cadence was at Uber.Jake [00:58:57]: Cadence was the precursor to Temporal. It powers trip actions, rides, when you rent a Jump bike or scooter or car. You're running workflows for a period of time and saying, “This ride will run indefinitely until it finishes.” You attach information: you paused in this zone, so add this charge to the bill. When you end the trip, the workflow is done. That experience was powered by Cadence at the time.Swyx [00:59:34]: I used to say it's like programming the entire user journey top-down as one function.Jake [00:59:39]: It's a powerful idea and important. It's also important for the next phase of the agentic journey. You want an agent to do a specific task, be complete or incomplete on that task, and move on to the next thing. You need a way to manage workflows dynamically.Jake [00:59:59]: Temporal was always great in theory, and great when you got it working the way you wanted in production. But it required you to model the entire journey in your head. If you didn't, you could cause issues where replaying the state of the workflow causes non-determinism.Swyx [01:00:25]: Because it works on deterministic workflow history.Jake [01:00:28]: Exactly. I describe it as a jet engine. If you know how to operate it and run it, it's great. But you can't hand it to people trying to build complicated things if they don't have the whole state in their head.Jake [01:00:48]: We run our whole deployment pipeline on top of it. That's a reasonably complicated workflow: pre-commit hooks, signaling, queuing, and all the rest. We ran into the same thing at Uber. As you express a large workflow, it gets more complicated, with more states in the state machine that you have to map back to the workflow.Swyx [01:01:15]: It's a lot of ifs.Jake [01:01:16]: Exactly. At Uber, we built a system for doing the state machine and testing it. We've started to build some of those things here because it's grown heavily. It's not quite love-hate. When it works well, it works super well. But if someone who doesn't have full context puts something into the system that invalidates state or causes non-determinism, or spins off a ton of activities, you have to keep track of underlying SRE knobs like activity slots. Those should scale with memory, vCPU, and so on. It becomes a bear to scale.Swyx [01:02:10]: You need a capable sysadmin running things behind the scenes. If you moved off, what would you do?Jake [01:02:19]: We'd build our own workflow engine. We have a few internally that we've worked on.Swyx [01:02:27]: This is one of those classes of things you typically wouldn't vibe code, but I'm wondering if you can.Jake [01:02:33]: I still don't think you should vibe code it. You still want to run decent tests to make sure it works.Swyx [01:02:39]: Timo didn't invent that from scratch either. There are libraries you can run. On top of that, it's just a state machine that you have to map out. Ultimately, you define the instructions you want and run them through a state machine.Jake [01:03:00]: It's very doable. Workflow stuff is interesting. Restate is doing neat stuff here.Swyx [01:03:10]: You're tied into JavaScript. Are you a JavaScript maxi?Jake [01:03:13]: Internally, we have TypeScript, Rust, and Go. We don't add more languages. Actually, we have a little C because we write BPF code and hooks. But those are the languages.Swyx [01:03:28]: Is this for sidecars?Jake [01:03:32]: No. It's for the networking stack, volumes, and things like that. We use TypeScript a lot because it powers the dashboard, but we're moving a lot of workflow stuff off the dashboard stack and into the infrastructure stack.Railpack, Nixpacks, and Content-Addressable FilesystemsSwyx [01:04:00]: Cool. Any other technical infrastructure stuff? Railpacks?Jake [01:04:07]: We built an engine for determining dependencies based on source code. It's called Railpack. We built the first version, Nixpacks, on top of Nix, and then we moved.Swyx [01:04:17]: People have been trying to get me to adopt Nix and NixOS for four years. Is it ever going to be a thing?Jake [01:04:23]: I don't know. We're excited about it, but it has pain points. Think of it as a stack of versioned binaries at specific slices in time. If you want version X and version Y, you bloat the package space, which blows up image size and makes real-world workloads difficult.Swyx [01:04:53]: But you content-address it and cache it. In theory, there are optimizations.Jake [01:05:00]: In theory, yes. But with a large enough user base and disparate enough machines, you run into a problem Meta described in the XFAAS paper, their internal serverless system. It becomes difficult at scale unless you break out specific runtimes.Jake [01:05:24]: We didn't want to do that because we wanted to truly allow you to deploy anything. That was our initial thing with Nix. But we've moved toward interesting work around content-addressable file systems that can lazy-load anything from any point and page it into memory.Swyx [01:05:48]: Amazing.Jake [01:05:49]: The future is very bright. It's crazy, and it's going to be nuts.Coding Agent Spend, Roadmaps, and Token ROISwyx [01:05:54]: Founder journey stuff?Alessio [01:05:56]: Your cloud usage: you tweeted you're going to spend $300K this month?Jake [01:06:01]: I think we got to $200K.Alessio [01:06:02]: Coding agents?Jake [01:06:03]: Yeah.Swyx [01:06:04]: Across the company?Alessio [01:06:05]: You only have 35 people, so I'm sure they're not all spending $10K a month. What's the distribution?Jake [01:06:10]: I think I'm at about $25K. We have power users all the way down. We came back from winter break, and I basically said, “If you're writing code by hand, you're doing this wrong.” The tools are good enough now that you can move extremely quickly. There are issues and pain points, but you should be reviewing the code you are writing instead of writing it by hand.Jake [01:06:40]: Architectural patterns matter more now than ever, but you shouldn't spend your time generating code you would write. If you know how to write it, ask the agent to write it and reconcile it until it looks like you would have written it yourself.Jake [01:06:58]: People misconstrue my propensity to push people toward agents as connected to our growth and some reliability bumps. They're not necessarily related. The tools are good enough to move extremely quickly and build things way larger than you could before.Jake [01:07:19]: To the earlier point about cooling data centers in space: I don't know. But with software, you can ask, “How would I build block storage from scratch? How would I do these things?” I have ideas because I have history and have read papers. Let me work them out and build massive test benches with thousands of tests, because those are now free to author. If you're not using AI systems to speed-run your roadmap and reconcile your existing system onto the future, you're missing a large point of what's happening.Alessio [01:08:12]: What's the path to spending $3 million a month? Is it bound by ideas and things customers can absorb?Jake [01:08:19]: For most companies, it's bound by deployment at this point. That's why we've seen a massive boom in users and companies, from Fortune 50s down, asking how to get developers to move faster. You'll probably hit your CFO before any technical limits because they'll look at the eye-watering amount of money spent on tokens. Inference costs have to come down, but we're inference constrained now. There will be price discovery around what makes sense for an org to adopt.Jake [01:09:06]: I think you'll end up with the F1 driver concept. If someone is really adept at these things, it makes sense to put them in a $3 million car. If they're not, it probably doesn't make sense. You'll take a few people and say, “You can drive the F1 car. We need to go in this direction. Figure out if it works and prototype it.”Jake [01:09:33]: We've done some of that and vastly accelerated our roadmap. We thought we'd ship something in a few years; now we can probably ship it in a few months because we validated it and don't have to build it incrementally. We can skip steps and move toward our vision.Alessio [01:09:58]: A lot of people are realizing the roadmap doesn't always have a business impact, so they say tokens are too expensive. But if your roadmap were built to make more money by the time you built it, you'd have token pricing for it, the same way you do with sales. You'd spend a billion dollars on sales if you knew you would get $2 billion of revenue.Jake [01:10:19]: Exactly. A naive way to measure this is the percentage of tokens that end up in production. If you can measure impact because those tokens end up in production, that's awesome. But the burden of proof will rise. Internally, we have a growing number of pull requests that haven't merged. The question becomes: how do you get this into production? It's about how quickly you can build and deploy software, which is exciting because that's our whole thing.The SDLC Shift: Prompt Requests, Feature Flags, and Safe RolloutsSwyx [01:10:56]: The SDLC is changing. One thesis is that the pull request is dying. It's going to be the prompt request. Beyond that, code review is also kind of dying if you have all the other systems in place. What else is changing about the SDLC?Jake [01:11:19]: The AISRE and the tools to make it happen. AISRE is pie-in-the-sky aspirational. What does it take to get an AISRE? What tools do you need to build?Swyx [01:11:32]: You should expose your tooling to customers at some point. The Central Station command center.Jake [01:11:39]: We have it for template maintainers. Template maintainers can deploy and maintain templates, and they get feedback. We're going to expose those things incrementally.Swyx [01:11:51]: Clustering around incidents. Everyone has a version of that, but I don't think anyone has solved it.Jake [01:11:56]: I won't say we've solved it internally, but it's gotten so good that we can see incidents forming pretty quickly. At some point, those will be things either someone else builds or we build. We've always built things purpose-built for us. If it makes sense to make it useful for users, monetize it, or turn that loop into a profit center instead of a cost center, we want to do that.Jake [01:12:28]: Pull request is definitely dying.Swyx [01:12:29]: Do you do first-party feature flagging and incremental rollout stuff?Jake [01:12:34]: We have a feature-flagging engine we built internally and will eventually roll out.Swyx [01:12:38]: I don't see it as a user. How come you didn't give us what you have?Jake [01:12:43]: We have to beta test it. We care a lot about the quality of the things. There's plenty we've used internally that doesn't make it all the way through the journey because it fails. It works for one service but not multiple services. We'd have to build it for multiple services and know that if we released it, we'd rebuild it again and again. Some things are worth that, but many inform the roadmap.Jake [01:13:18]: We don't want to dilute the experience by saying, “This works, but only for this service,” unless it's a core initiative. Over the next few months, we'll roll out things that work for a single service, then multiple services, then multiple services across the environment. You have to be deliberate. Otherwise you create broken disparate experiences and support load because people ask how to use the feature.Jake [01:13:52]: It's the earlier expansion and compaction pattern. You expand the company to get features, then compact and smooth them out so the experience is stellar. You told me in the hallway, “It's gotten so much better.” Internally we're saying, “This part really sucks. We need to make it significantly better.”Swyx [01:14:11]: I can attest to that over the last three years watching you build Railway. For listeners, feature flagging is a huge part of Uber culture. So much so that they have too many feature flags and another thing to remove feature flags. Facebook has Gatekeeper. Agents are going to need this. It's fundamental to incremental rollouts. OpenAI acquired Statsig. GPT-5 is routing and flagging through different models.Jake [01:14:56]: It's super important. If the software development lifecycle is going to change because we're doing things 1,000 times faster and 1,000 times more concurrently, what becomes important at scale?Jake [01:15:16]: Before I started Railway, I built a feature-flagging product and tried to sell it. It was an easier version of LaunchDarkly. I ran into a problem: anyone small enough to adopt your technology doesn't care about feature flags, and anyone large enough to need feature flags needs so much scale that you have to build out all the infrastructure. I scrapped it.Jake [01:15:42]: But what is old is new again. Companies are trying to move quickly, but you can't YOLO a vibe-coded thing straight into production. You need to say, “Here's my blast radius, my impact, and I want to shadow it for these users.” Feature flags. You're going to need the tools larger companies built to maintain their structures. Everything gets compressed by 1,000x so everybody can build those structures quickly.Jake [01:16:07]: That's exactly where we are: compressing the software development lifecycle, then expanding it and adding more new things.Cattle, Pets, and Clonable InfrastructureSwyx [01:16:15]: Another term that comes to mind for newer developers is “cattle, not pets.” People treat production like a pet. It has a name. You baby it and keep it alive. With cattle, you can mass farm, roll out, portion parts out, and kill them.Jake [01:16:37]: I think that might change. You can move toward having pets as long as you have a cloning machine for your pets.Swyx [01:16:52]: Yeah.Jake [01:16:52]: If you can snapshot every single thing at every frame, it doesn't matter if something gets obliterated because you have a snapshot of it. The things we've built right now are designed to block changes from the hermetically sealed DevOps line. You have to write a Dockerfile because you nee

Hacker News Recap
May 19th, 2026 | I've joined Anthropic

Hacker News Recap

Play Episode Listen Later May 20, 2026 15:35


This is a recap of the top 10 posts on Hacker News on May 19, 2026. This podcast was generated by wondercraft.ai (00:30): I've joined AnthropicOriginal post: https://news.ycombinator.com/item?id=48194352&utm_source=wondercraft_ai(01:59): The last six months in LLMs in five minutesOriginal post: https://news.ycombinator.com/item?id=48188183&utm_source=wondercraft_ai(03:28): Gemini 3.5 FlashOriginal post: https://news.ycombinator.com/item?id=48196570&utm_source=wondercraft_ai(04:57): I've built a virtual museum with nearly every operating system you can think ofOriginal post: https://news.ycombinator.com/item?id=48195009&utm_source=wondercraft_ai(06:26): Apple unveils new accessibility featuresOriginal post: https://news.ycombinator.com/item?id=48192224&utm_source=wondercraft_ai(07:55): Minnesota becomes first state to ban prediction marketsOriginal post: https://news.ycombinator.com/item?id=48197980&utm_source=wondercraft_ai(09:24): Show HN: Gaussian Splat of a StrawberryOriginal post: https://news.ycombinator.com/item?id=48191602&utm_source=wondercraft_ai(10:53): Tesla's lithium refinery discharges 231,000 gallons of polluted wastewater a dayOriginal post: https://news.ycombinator.com/item?id=48198551&utm_source=wondercraft_ai(12:22): Google changes its search boxOriginal post: https://news.ycombinator.com/item?id=48197370&utm_source=wondercraft_ai(13:51): CISA Admin Leaked AWS GovCloud Keys on GitHubOriginal post: https://news.ycombinator.com/item?id=48190454&utm_source=wondercraft_aiThis is a third-party project, independent from HN and YC. Text and audio generated using AI, by wondercraft.ai. Create your own studio quality podcast with text as the only input in seconds at app.wondercraft.ai. Issues or feedback? We'd love to hear from you: team@wondercraft.ai

Tetragrammaton with Rick Rubin

Garry Tan is the president and CEO of Y Combinator, the startup accelerator behind companies like Airbnb, Reddit, Coinbase, and DoorDash. He previously co-founded the financial technology company Posterous, which was acquired by Twitter in 2012, and later founded the venture capital firm Initialized Capital alongside Alexis Ohanian. Before entering venture capital, Tan worked as an engineer at Palantir Technologies, where he helped develop early infrastructure and design systems. Now, he continues to make investment and product decisions as a General Partner, having read more than 6,000 YC applications, while overseeing programs for sourcing, advising, and scaling early-stage startups. ------ Thank you to the sponsors that fuel our podcast and our team: AG1 https://DrinkAG1.com/tetra ------ LMNT Electrolytes https://DrinkLMNT.com/tetra Use code 'TETRA' ------ Squarespace https://Squarespace.com/tetra Use code 'TETRA' ------ Lectio 365 https://Lectio365.com ------ Sign up to receive Tetragrammaton Transmissions https://www.tetragrammaton.com/join-newsletter

My First Million
I put 80% of my money in the S&P

My First Million

Play Episode Listen Later May 11, 2026 66:28


Get our Wealth Guide (35+ insights from top investors): https://clickhubspot.com/ohkg Episode 822: Sam Parr ( https://x.com/theSamParr ) and Shaan Puri ( https://x.com/ShaanVP ) talk about how your genes could determine how much money you make and the startup ideas YC is betting on.  — Show Notes:  (0:00) money genes (5:07) your personality is your business (23:46) productive placebos (33:39) YC Request for Startups (34:57) IDEA: aesthetic data centers (42:15) IDEA: The company brain (51:10) IDEA: drone swarm defense (57:33) IDEA: personalized medicine — Links: • YC RFS - https://www.ycombinator.com/rfs  • Deep Personality - https://deeppersonality.app/ • Viktor - https://getviktor.com/  — Check Out Sam's Stuff: • Hampton (joinhampton.com): My community for founders. Average member does $25m/year. Many of the guests are members. Get after it...apply: http://joinhampton.com/mfm — Check Out Shaan's Stuff: • Shaan's weekly email - https://www.shaanpuri.com  • Visit https://www.somewhere.com/mfm to hire worldwide talent like Shaan and get $500 off for being an MFM listener. Hire developers, assistants, marketing pros, sales teams and more for 80% less than US equivalents. • Mercury - Need a bank for your company? Go check out Mercury (mercury.com). Shaan uses it for all of his companies! Mercury is a financial technology company, not an FDIC-insured bank. Banking services provided by Choice Financial Group, Column, N.A., and Evolve Bank & Trust, Members FDIC • I run all my newsletters on Beehiiv and you should too + we're giving away $10k to our favorite newsletter, check it out: beehiiv.com/mfm-challenge My First Million is a HubSpot Original Podcast // Brought to you by HubSpot Media // Production by Arie Desormeaux // Editing by Ezra Bakker Trupiano /

Techmeme Ride Home
Will AI Models Have To Be Reviews By The Government?

Techmeme Ride Home

Play Episode Listen Later May 5, 2026 20:53


The Trump administration discussed an EO to form an AI oversight working group, a stark reversal from its hands-off approach. Apple explored using Intel and Samsung to make chips in the US, Coinbase cut 14% of its workforce, and OpenAI fast-tracks an AI phone for 2027. Sources: the Trump administration is discussing an EO to form an AI working group that would examine AI oversight procedures, like vetting models before release (NYT) Sources: Apple held exploratory talks with Intel and Apple executives visited a Samsung plant in Texas to explore producing core chips for its devices in the US (Bloomberg) Coinbase CEO Brian Armstrong announces the company is cutting ~700 jobs, or ~14% of its global workforce, to reduce costs, saying "AI is changing how we work" (Reuters) Meta is using AI on Facebook and Instagram to detect under-13 users by analyzing bone structure, height, and visual cues, but says it's "not facial recognition" (The Verge) Kuo: OpenAI appears to be fast-tracking its AI agent phone with two NPUs and a custom MediaTek Dimensity 9600 SoC, targeting mass production as early as H1 2027 (Ming-Chi Kuo) ElevenLabs raised $550M+ in its Series D, up from a previously announced $500M, adding BlackRock, Nvidia, and others as investors; its ARR passed $500M in Q1 (Tech.eu) Source: YC owns ~0.6% of OpenAI, which was seeded by a YC offshoot called YC Research in 2016; at OpenAI's current $852B valuation, the stake is worth $5B+ (Daring Fireball) Learn more about your ad choices. Visit megaphone.fm/adchoices