Podcasts about Temporal

  • 2,536PODCASTS
  • 4,810EPISODES
  • 39mAVG DURATION
  • 1DAILY NEW EPISODE
  • Jun 15, 2026LATEST

POPULARITY

20192020202120222023202420252026

Categories



Best podcasts about Temporal

Show all podcasts related to temporal

Latest podcast episodes about Temporal

CityLight Church
The Indweller | Holy Spirit | John 14:18-24

CityLight Church

Play Episode Listen Later Jun 15, 2026 40:55


2026-06-14 | The Indweller | Holy Spirit | John 14:18-24 Join us this week as Pastor Nate helps us learn how to be better connected to the Holy Spirit. John 14:18-24 Vital and Verifiable God's presence and care is vital, not optional for our life. Jesus' resurrection appearance guarantees His daily presence. Practical and Spiritual Obedience is the greatest form of understanding. 1 John 4:19 “we love because he first loved us” Our love is not the cause of God's love, but God's love is the cause of our love and our obedience is the evidence of our love. What you need is not “out there somewhere” but ‘in here by the Spirit” Experiencing God's presence isn't a matter of vibes but verifiable steps. It is a practical and spiritual experience. Our everyday life can be infused with an active awareness of God's presence through normal acts of obedience. Temporal and Eternal His home in my heart now and my heart in his home forever. Revelation 21:1-4

Noticiário Nacional
2h Mundial. Temporal na Flórida obriga seleção a cancelar treino

Noticiário Nacional

Play Episode Listen Later Jun 15, 2026 7:25


See omnystudio.com/listener for privacy information.

Nova Ràdio Lloret
Junts, Tots i ERC denuncien el tancament temporal de la caserna de la policia per manca d’efectius

Nova Ràdio Lloret

Play Episode Listen Later Jun 15, 2026 6:35


Els grups municipals de Junts, Tots i ERC denuncien el tancament temporal de la comissaria de la Policia Local de Lloret de Mar per manca d'efectius disponibles, una situació que s’ha donat aquest cap de setmana, que qualifiquen d'“excepcional” i sense precedents. Segons un comunicat conjunt d’aquests tres grups de l’oposició, la falta de personal ha obligat a clausurar les instal·lacions i substituir la vigilància de l'edifici per un servei de seguretat privat. Els tres partits atribueixen aquesta situació a un conflicte laboral que fa mesos que s'arrossega i que, asseguren, el govern municipal no ha estat capaç de resoldre. L'oposició sosté que sindicats i representants dels treballadors havien advertit reiteradament sobre l'agreujament del conflicte i acusa l'executiu local d'haver ignorat aquests avisos. També afirmen que el malestar laboral s'ha estès a altres àrees municipals, amb mobilitzacions sindicals, dificultats per cobrir serveis i la marxa de diversos professionals. Per tot plegat, el portaveu de Junts, Jordi Martínez, en representació dels tres grups signants del manifest, lamenta que era previsible que la situació acabaria petant. «Ja hi havia molts indicadors que deien que la situació no estava anant bé, que les negociacions amb els sindicats, tant de la policia com dels diferents departaments de l’Ajuntament, no estaven funcionant correctament» Jordi Martínez Els grups de l’oposició recorden que a finals de maig es va celebrar un ple extraordinari sobre la gestió dels recursos humans municipals, en què, segons l'oposició, l'alcalde va ser reprovat i políticament desautoritzat. Jordi Martínez denuncia que comptar amb el servei de policia local és un dret i una necessitat, tant de la ciutadania com dels visitants. En aquest sentit, considera que una ciutat turística com Lloret de Mar no es pot permetre arribar a una situació en què la seva comissaria de Policia Local hagi de tancar per manca de personal. «El servei de Policia Local és un servei essencial i necessari per a la seguretat de les persones que hi vivim o que venen a passar les vacances aquí Lloret»Jordi Martínez Junts, Tots i ERC consideren que el problema és de lideratge polític i responsabilitzen l'alcalde, Adrià Lamelas, i la regidoria de Recursos Humans de la situació actual. Per això els reclamen una resposta immediata del govern municipal, sobretot tenint en compte que la setmana que ve se celebra la revetlla de Sant Joan, una nit on solen haver-hi moltes incidències. «Demanem, sobretot l’alcalde, que lideri, que faci aquestes negociacions immediatament, perquè recordem que la setmana que ve és Sant Joan, una de les nits més perilloses de tot l’any»Jordi Martínez Davant d'aquesta situació, l'Ajuntament ha emès un comunicat en què informa que el passat dissabte van registrar una 20a d’incidències entre baixes mèdiques i indisposicions dels agents, una situació que va deixar sense efectius el torn de nit. Davant d’aquest escenari, la prefectura va activar mesures d’urgència per reduir al màxim l’impacte sobre la ciutadania. Per reforçar la seguretat al municipi, el consistori manté una coordinació permanent amb la Conselleria d’Interior i els Mossos d’Esquadra, que han incrementat la seva presència a Lloret de Mar per compensar la manca d’agents locals. Malgrat les dificultats, la comissaria continua oberta les 24 hores i ofereix atenció presencial i telefònica. A més, s’ha incorporat personal de seguretat privada per garantir la custòdia de les instal·lacions i el funcionament dels serveis essencials. Pel que fa al conflicte laboral amb la plantilla policial, l’Ajuntament assegura que ja ha traslladat una proposta als representants sindicals i que està a l’espera d’una contraproposta. El govern municipal reitera la seva voluntat de diàleg i negociació.

Redeemer Broadcasting : A Plain Answer
A Plain Answer: Christ over all - Both Spiritual Kingdom and Temporal kingdom of Visible church and Commonwealth - Bill Shishko

Redeemer Broadcasting : A Plain Answer

Play Episode Listen Later Jun 13, 2026 27:49


jesus christ commonwealth plain temporal visible church two kingdom spiritual kingdom
Noticentro
Julian Barnes gana el Princesa de Asturias de las Letras

Noticentro

Play Episode Listen Later Jun 10, 2026 1:41 Transcription Available


Mantienen cerco de seguridad por fuga de gas en Puebla Inflación en México baja a 3.94%: SheinbaumExportaciones mexicanas rompen récord en EE. UU.Más información en nuestro podcast#grc

Noticentro
Remanentes de Boris mantienen temporal en el país

Noticentro

Play Episode Listen Later Jun 10, 2026 1:42 Transcription Available


Prevén fuertes tormentas en el Valle de México En Veracruz las lluvias dejan comunidades incomunicadasSenado y Michoacán ajustan actividades por torneo de Fútbol  Más informaciín en nuestro podast#grc

Líderes del Futuro
Las Diferencias entre Custodia Temporal y Carta Notario

Líderes del Futuro

Play Episode Listen Later Jun 9, 2026 13:26


Es importante que aprendamos que una carta de un Notario no tiene el mismo poder que una orden judicial cuando hablamos de custodia temporal de menores, ancianxs, o gente con discapacidades. #sonomacounty #custodiatemporal #notario #notarios #padres #padresdefamilia #madres #familia #migrantes #inmigrantes

Noticentro
Depresión tropical Dos-E podría impactar Acapulco como tormenta 

Noticentro

Play Episode Listen Later Jun 7, 2026 1:46 Transcription Available


Temporal de lluvias seguirá hasta el miércoles en la ZMVM Aseguran 1.3 toneladas de cocaína en GuerreroMuere el periodista Alan Riding a los 82 añosMás información en nuestro podcast#grc

Emmanuel Baptist Church of Longview
Journey Through Peter: Unshakeable Joy In Temporal Suffering - Wednesday Bible Study 06/03/2026 - Pastor Bob Gray II

Emmanuel Baptist Church of Longview

Play Episode Listen Later Jun 4, 2026 52:32


Send us Fan MailListen to a message from Emmanuel Baptist Church of Longview, TX. Church Bible Publishers produces high-quality King James Bibles that are not only beautiful, but durable enough for daily study, preaching, teaching, and life. These aren't flimsy, disposable Bibles. They're Smyth-sewn, carefully bound, and made to endure years of faithful use. If you want a Bible that feels solid in your hands and will still be standing long after trends fade, check out Church Bible Publishers today at churchbiblepublishers.com.  RG33 Candle Co. doesn't just make candles — they honor a life. Each hand-poured soy candle was created to celebrate the spirit and legacy of RG Gray III, a young man whose love, joy, and unforgettable personality inspired this company's mission.If you want a candle that feels personal, uplifting, and full of purpose — check out RG33 Candle Co. Visit rg33candleco.com and use code PODCAST10 for 10% off your purchase. Support the show

Physiotutors Podcast
Why Cervicogenic Dizziness Is a Misleading Diagnosis with Firat Kesgin

Physiotutors Podcast

Play Episode Listen Later Jun 3, 2026 52:16


Todo en Uno TV
Noticias de hoy 3 de junio 2026 | 13:00 horas

Todo en Uno TV

Play Episode Listen Later Jun 3, 2026 1:05


✅Investiga a gobernadores ✅Sheinbaum responde ✅Retiran estatuas ✅Temporal de lluvias ✅Princesa de Asturias

Autopod Decepticast: A Weekly Podcast Delivering a Minute-By-Minute Breakdown of the 1986 Transformers Movie.

Join your APDC hosts Caleb & Ryan plus special guest Bedigunz as they review the FINAL season 2 episode “The Agenda pt. 3” of the 1998 classic animated series Beast Wars: Transformers!Well that's just dandy! It's Bedigunz!!! Open Heart Magic!! Green for Machines!! Kazaa_Virus69.exe!! Fix our syntax!! Rattrap goes Knievel mode!!! Rhinox's John Woo dive!!! Eat ‘em and weep!! The Ark!! Diecast, it's a lost art!! Megatron enters these hallowed halls a conqueror!! SLAVES!!! The Fickle Finger of Fate!!! A whirlpool of time! Temporal buggery!! In the Real World! Script Deviations!! Rate the Scheme!!! Iconic Moments!!! Say goodbye to the Universe, Maximals!!!15:30 - SHOUT OUTS25:45 - COCKTAIL37:07 - REVIEW01:20:00 - REAL WORLD01:32:40 - SCRIPT DEVIATIONS01:37:30 - RATE THE SCHEME01:38:40 - ICONIC MOMENT01:39:20 - NEXT TIME

Todo en Uno TV
Noticias de hoy 1 de junio 2026 | 13:00 horas

Todo en Uno TV

Play Episode Listen Later Jun 1, 2026 1:16


✅Paro nacional ✅Acusan a EE. UU. ✅Petróleo se dispara ✅295 homicidios ✅Temporal de lluvias

Contador 4.0
SAT: Restricción temporal del CSD

Contador 4.0

Play Episode Listen Later May 27, 2026 11:18


Este episodio detalla normativas fiscales críticas emitidas por las autoridades tributarias mexicanas respecto al cumplimiento de obligaciones y beneficios para los contribuyentes. Se advierte sobre la suspensión del Certificado de Sello Digital debido a omisiones en la contabilidad electrónica, lo cual puede paralizar las operaciones comerciales y generar multas económicas severas. Por otro lado, se explica un criterio legal que permite la deducción de primas de seguros médicos incluso si el comprobante fiscal no está a nombre directo del beneficiario, bajo condiciones específicas de jubilación. En conjunto, el episodio subraya la importancia de regularizar la situación fiscal mediante aclaraciones en el portal oficial dentro de los plazos establecidos. Esta información sirve como guía para evitar sanciones administrativas y aprovechar las facilidades vigentes en materia de gastos deducibles.

Scripture Studies in Romans - A Verse-by-Verse Bible Study
Hebrews 11:1-3 - Faith Defined - Verse by Verse Bible Study

Scripture Studies in Romans - A Verse-by-Verse Bible Study

Play Episode Listen Later May 26, 2026 36:59


A verse-by-verse Bible study class. This study covers Hebrews 11:1-3. These studies focus on what the Bible says, and what it means. If you want to follow along, a written transcription of the study can be found here: https://www.mediafire.com/file_premium/jo456wgr2x1nkci/Hebrews_11_01-03.pdf/fileThe visual slides of this study can be found here: https://www.mediafire.com/file_premium/5yr0a3r58acsgbh/Hebrews_11_01-03_SLIDES.pdf/fileTopics: Varying translations and linguistic connotations of Hebrews 11:1 -- Objective versus subjective definitions of faith's foundation -- Linguistic analysis of faith as "evidence" versus "assurance" -- The perspective of Early Church Fathers on the objective sense of faith -- Foundational faith: Faith in God as Creator -- The creation of the universe through the spoken word of God -- The role of Christ as the Word in creation -- The concept of creation ex nihilo -- Temporal and spatial implications of the Greek word translated as universe (aionas) -- God as Creator of universe and Director of history -- General revelation and the universal inexcusability for denying the Creator.FYI, Scott Sperling hosts a bi-weekly, live, interactive, verse-by-verse Bible study via Zoom. If you're interested in attending, you can email me at ssper@scripturestudies.comFor more Bible studies, visit ScriptureStudies.com

Dead Robots' Society
Timelines and Other Temporal Mischief

Dead Robots' Society

Play Episode Listen Later May 23, 2026 61:19


Terry and Veronica wonder about timelines and whether or not Paul exists after his surgery. DRS caffeinates with Larry's Coffee. If you want to purchase fair-trade, organically grown beans that brew into smooth, balanced, never-bitter coffee, visit LarrysCoffee.com/drs, receive a free gift with your purchase, and help support The Dead Robots' Society. Our links: Paul's store: https://payhip.com/paulecooley Paul's site:  https://shadowpublications.com Terry's site: https://www.terrymixon.com/ Veronica: http://www.voicesbyveronica.com/ DRS Discord: https://discord.gg/pgmQxaVbGP Enjoy the show? Consider becoming a Patreon or Buy Me A Coffee supporter and for as little as $1 a month, you can help keep the podcast free and receive exclusive content. More information at https://patreon.com/drspodcast  and https://buymeacoffee.com/drspodcast. #writing #fiction #podcast #chat #live #novel #story #narrative #publishing #author #writer #discussion #podcast #talkshow  

Noticentro
Inicia registro para anfitriones de hospedaje temporal en CDMX

Noticentro

Play Episode Listen Later May 22, 2026 1:48 Transcription Available


Reportan violaciones en publicidad de fórmulas infantilesCongreso de la Unión discutirá reformas electoralesMeta evita juicio por adicción de menores a redes sociales Más información en nuestro Podcast#grc

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it.“The end of localhost” has been Ivan Burazin's obsession for more than a decade.Something that is all too familiar…Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax.The thesis was directionally right, but the market wasn't ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn't just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan's original localhost thesis.In this episode, Daytona's CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs.We go deep on the new agent compute market: Daytona's hard pivot from human dev environments to AI sandboxes, the New Year's Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS.We discuss:* How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis* Why Daytona pivoted from human dev environments to AI sandboxes* Why agents need composable computers instead of disposable code execution boxes* The New Year's Eve MVP that customers chased API keys for* Why Daytona chose bare metal, stateful snapshots, and its own scheduler* How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds* Why Daytona's biggest customer runs ~850,000 sandboxes a day* How RL/eval workloads create zero-to-100,000 CPU spikes* Why RL workloads went from 0% to roughly 50% of Daytona usage* Why customers compare Daytona against EKS/GKS and say they're “never going back”* Why every AI agent may need a computer, including Windows and macOS environments* The Apple licensing constraints that make macOS sandboxes hard* Why CLI gives agents more power than MCP* How open source helps agents integrate Daytona* Why agent-generated PRs may break today's CI/CD assumptions* Why AI SaaS companies reselling tokens may face a cold shower* Why the AI cloud may look more like Stripe than AWSIvan Burazin* LinkedIn: https://www.linkedin.com/in/ivanburazin* X: https://x.com/ivanburazinDaytona* Website: https://www.daytona.io* X: https://x.com/daytonaioTimestamps* 00:00:00 Hook* 00:01:12 Introduction* 00:03:15 CodeAnywhere, Shift, and the end of localhost* 00:05:58 What Daytona is: composable computers for AI agents* 00:08:07 The pivot from dev environments to AI sandboxes* 00:10:17 The New Year's Eve MVP and customers begging for API keys* 00:12:56 Bare metal, stateful sandboxes, and Daytona's scheduler* 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs* 00:21:53 Spiky RL/eval workloads and the new agent infra problem* 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing* 00:33:31 Why every AI agent needs a computer* 00:38:48 macOS sandboxes and Apple's licensing problem* 00:44:28 Why CLI may matter more than MCP* 00:48:11 Open source, GitHub stars, and agent integration* 00:53:11 Git, CI/CD, and agent collaboration bottlenecks* 00:58:15 Founder life and building a 25-person infra company* 01:02:44 AI SaaS, token resale, and API-first business models* 01:06:10 GPU sandboxes, data centers, and compute growth* 01:09:48 Why the AI cloud may look more like Stripe than AWS* 01:11:26 Closing thoughtsTranscriptIntroduction: Daytona, CodeAnywhere, and the End of LocalhostSwyx [00:00:02]: Okay, we're in the studio with Ivan Burazin, CEO of Daytona. Welcome.Ivan [00:00:07]: Thanks for having me, man.Swyx [00:00:08]: Ivan, you and I go back.Ivan [00:00:10]: Way back.Swyx [00:00:11]: How I don't even know how, you found, did you reach out or, for Shift.Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article.Swyx [00:00:29]: End of localhost.Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you.Swyx [00:00:51]: I don't remember.Ivan [00:00:52]: I remember because I was with my then I'm thinking of a girlfriend or wife at that point in time, I'm not sure. It's the same person, so that's great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about.Swyx [00:01:10]: The reason I'm nice is because I'm also late to other people, so it's like, who's, who's without sin here, yeah, so I have to, for those who don't know, InfoBip Shift, there's this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?”Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should've took the advisory shares. So I'm sorry, dude. But anyway.Swyx [00:01:43]: We're not, we're not venture backed.Ivan [00:01:44]: No, it doesn't matter.Swyx [00:01:45]: It's Yeah, anyway, so I think what's impressive about you is that CodeAnywhere is the thing that you've been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona.From CodeAnywhere and Shift to DaytonaIvan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I've said this multiple times, it's like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It's not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called.Swyx [00:02:55]: There was Cloud9.Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I'm not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we've been using in Daytona today. So it was super early. There's about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn't have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are.Swyx [00:04:01]: Historic pivot, yeah, and, it's one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I'm like, “F**k.”Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn't have done it.Swyx [00:04:18]: No way.Ivan [00:04:19]: No, it was like, “We have to get him on board first,” and you were that kicker that we, that got us off the ground.Swyx [00:04:23]: No, because you were putting me on your pitch deck, man. I was like, “Man, this is like a good trip if I don't invest.”Ivan [00:04:29]: That's because it was your quote. It's like we.Swyx [00:04:30]: Yeah. It's the end of localhost.Ivan [00:04:31]: Did a bunch of research about end of localhost and who was interested in that,.Swyx [00:04:34]: No, that's like, I put, I wrote that blog post, and every single company in that field reached out to me, and then every VC who was receiving those pitches then also had to call me and, talk it, talk through it with me.Ivan [00:04:47]: It's finally happening though.Swyx [00:04:48]: It was really super interesting.Ivan [00:04:48]: It's finally happening.Swyx [00:04:49]: It's finally happening.Ivan [00:04:49]: Yeah, it's finally.Swyx [00:04:49]: It's finally happening, with maybe sort of non-human users. Yeah, so what is Daytona today? Let's get like a quick description. I'm wearing the shirt.What Daytona Is Today: Composable Computers for AI AgentsIvan [00:04:58]: You're wearing the shirt. Yes,.Swyx [00:04:59]: It says, I think your branding is very good. Like, it's very consistent. It runs AI code. Like, it cannot be simpler.Ivan [00:05:05]: Exactly, but we're gonna probably have to change that.Swyx [00:05:07]: Oh, s**t.Ivan [00:05:07]: It's also a subset of what we do. Unfortunately, we really love this, Run AI Code is super simple. People interpret it different ways. I think we've given out 5,000, 6,000 of these shirts. People wear them with pride because it doesn't really market about us.Swyx [00:05:21]: Yeah, Daytona's on the back.Ivan [00:05:22]: It markets the back. It markets to the person itself, so I think we did a really good job on that one. But it is also a subset of what we do, because people, when they think about Run AI Code, they just think about these small, let's call it isolates, code execution boxes that, you send some code, you get an output. Whereas what Daytona is today is essentially composable computers for AI agents. It is, the market calls them sandboxes which can be misleading.Swyx [00:05:44]: All these things. All these things on.Ivan [00:05:45]: Yeah, exactly, ‘cause it can be misleading ‘cause people usually think about sandboxes as a demo or a test environment versus a production-grade environment. But what Daytona does, if you think of the laptop that you have in front of you or the computer that's over there, or, my wife is an architect, so she has like a Windows with a 3D graphics card inside to do 3D rendering. Like, as humans, we have different computers or different compositions of computers. And our belief is strongly that agents today and going forward will need all these different compositions of computers to do different types of tasks. And so we offer that basically through an API.Swyx [00:06:19]: Yeah, to give people - I'm trying to sort of front-load all the aha moments or the wow moments so that people can, stay engaged and click like and subscribe. the market is exploding, right? Like, you have been reporting 74% month-on-month growth, and it also, it's just been growing for a while. Like, it's been going like this. And every single - It's not just you guys. It's every single.Ivan [00:06:41]: Everyone, yeah.Swyx [00:06:42]: Sort of, compute provider. I don't know if you agree with me saying compute provider or not.Ivan [00:06:48]: It's fine.Swyx [00:06:48]: Yeah. So like organically PLG-driven growth, but also enterprise is doing super well, I think I wanna rewind to January of last year when you did the pivot. Like, so you obviously called this market early, and you were positioned for it, and you are now one of the market leaders. But what was the insight that made you do the pivot?The Pivot: From Human Dev Environments to Agent SandboxesIvan [00:07:06]: The insight that made us do this pivot is the quarter before that, so end of 2024, when we had - Basically, we did a demo with - I don't I think we discussed this as well, Devin was not public. You actually gave me access to Devin at that time. So Devin.Swyx [00:07:25]: I did?Ivan [00:07:26]: Yeah, you gave me access.Swyx [00:07:26]: I don't think I was supposed.Ivan [00:07:27]: Yeah, exactly.Swyx [00:07:28]: Yeah, I.Ivan [00:07:28]: So it doesn't matter. You.Swyx [00:07:29]: Yeah. I gave like three friends access.Ivan [00:07:31]: Yeah, or it was a call and you showed it to me. It doesn't matter. but OpenDevin was available, which is now called OpenHands. And so we're like, “Oh, this seems to be a thing. This is not public. Let's take our for human automation of dev environments and take, OpenDevin and launch that as a SaaS.” And we did that. Not very many people signed up and used it, but a lot of people reached out that were building agents, and they were like, “Hey, my agent needs a compute sandbox runtime,” whatever you wanna call it. I forgot what it was called at that point. And then we were like, “Oh, amazing. This is a new market. Here is our infrastructure. Here's our product, and go.” And what we found really fast, soon, was that people did not like what we had built. It didn't work. And I remember talking to people at the beginning when we're doing this, the sandbox we're building for agents. People were like, “Oh, why is it different? It's the same thing. We have like EC2, we have VMs, we have all these things.” But we saw that everyone we gave it to, it was like 20, 30 people, they all said, “No.” Like, “This is not what we need. This sort of breaks.” And basically, me and my co-founder not knowing a lot about - ‘cause we're infra people. We're not AI people. So I basically took it upon myself to like watch every single podcast that exists, including all of, all of these and all that, and sort of get up to date, read all the blogs, like get, understand what's going on.Swyx [00:08:45]: Do you wanna shout out who else was useful, just in case people are also looking.Ivan [00:08:49]: Generally we -, I looked at There's a few of podcast, different segments and different types. So there's you guys, No Priors, Bill Gurley's was great while.Swyx [00:09:04]: VG2, yeah.Ivan [00:09:05]: Yeah, while it was around. So there's a few. 20VC is interesting from a different dynamic, and some are different dynamic. But there was, also Red Points.Swyx [00:09:14]: We're not really about the compute market.Ivan [00:09:15]: It was also already - Sorry?Swyx [00:09:16]: You're, you want - You're looking at the agent infra market.Ivan [00:09:19]: I was looking at the agent market and the AI market in general and sort of understanding who are the players, what the perception, and how that goes. And like obviously you complement this with like going to conferences, going to events, going to meetups, reading white papers, like doing all the things that you have to do to understand what's happening. And so when we figured, when we sort of had an idea of what we had to build, literally over the New Year's Eve, literally on New Year's Eve, I half vibe coded the first MVP, first minimal viable product of what Daytona is today. And I went to sleep at like 3:00 AM or something like that. I was doing - I just put my like baby daughter and wife to sleep and, Happy New Year's, and go back to just, doing this. And I sent it to my co-founder, my CTO, and he saw it in the morning. He's like, “This is absolute garbage.” “Do not show this to anybody at all, but the idea is good.” And so he took two weeks, and he rebuilt it.Swyx [00:10:09]: Did it like look like that? Listen, I - It was rough idea.Ivan [00:10:12]: Oh, not even, not even close. Like it was it was way worse. But it was like a very - It was a simplistic view of what it should be. Like, it worked, but it was not ideal. And so he went, we went down the whole, which is his job as CTO, to go, and he came back with this version. We then called all the people that had said like, “This is garbage,” a quarter ago. And we set up these calls, and we gave it to - We just demoed it to everyone. And all the calls went long, every single one. They were 15-minute calls, and they all went to like 25, 30 minutes or whatnot. And everyone said, “We need, we want access.” There was no login, just an API key, ‘cause it was just a beta or an alpha. And they said, “Oh, we want access.” And we're like, “Sure, yeah. Okay, thank you very much.” But after like the next day, if we'd not send it, every single one, like every call that we did, everyone came back, “Where is my API key?” Like everyone wanted it. We're like, “S**t.” Like this is it. Like I've never felt So one, the understanding to your point was like most people thought it was the same infrastructure for humans and agents. We understood a quarter ago it's not. We just didn't know what was the right primitive. And then when we came, and we can talk about what that is, and we gave it to these people, I've never seen, I've never experienced - I've done multiple companies in my life. I've never experienced this, that people literally call you if you do not give them access. Like they want access right now. And so it's like, okay, they don't want this. the thing that they want doesn't seem to exist, or they have not found it, and they really want what we want. And then when we understood that we're onto something, and then when you think about the size of the market, like the market for human engineers and enterprise is a very large market, so think GitLab or whatnot. But the market for every single agent that will exist ever in the future is just like, what is that market? How big is that? And we're like, “We are all in on this.” And so that is where we made sort of the cut between the old product and the new one.Bare Metal, Stateful Sandboxes, and the Lambda + EC2 ModelSwyx [00:12:02]: Yeah. But it wasn't composable at the time?Ivan [00:12:05]: It was very - It was basically just a Linux box that you could change, that you could define number of CPUs, disk, and RAM. Like that is what you could do, but you couldn't have multiple operating systems, you couldn't resize it on the fly, you couldn't add a GPU, you couldn't do like all the things. It was just the, just the first sort of variation of that, yeah.Swyx [00:12:22]: Was it bare metal from the start?Ivan [00:12:24]: It was bare metal from the start. And so the interesting thing that we thought about right away, so our.Swyx [00:12:29]: Which, give people the background, what is the normal path?Ivan [00:12:32]: Yeah, so, basically most providers run this on top of VMs. And also.Swyx [00:12:37]: Firecracker.Ivan [00:12:38]: Yeah, they run on Firecracker and VM. And so we also fire - We can get - We have multiple isolation layers and we can do that. But the common way to do it is that they, one, that the state of the machine, or the hard disk is not part of the sandbox itself. And the other thing is they're not meant to last forever. So most of them are preemptible, like they can There's a time that they can live. And so our thought was when we were going into this is, agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work. Like, and you want to close the lid and open the lid, it's the same state. So you - Agents would want that, like the pause and come back. They want those two things. But also agents really want speed, right? Can they get it? So when we thought about it's like we need something insanely fast, how to make it fast, how to make it long-running, and stateful. And so those two things, it's like combining a Lambda and an EC2, right? Those two things together. And so we didn't have an idea how others did it, ‘cause we didn't know too that there was a market around this. It was more like, okay, this is what we need, what they need. And we looked at Kubernetes, it wasn't wasn't good enough for that. We looked at Nomad, it didn't enable that. And so our history in rewriting our own scheduler at CodeAnywhere is basically what my CTO came up with. Like, he's like, “Oh, the learnings from there,” and he brought it. And the funny thing is, our third co-founder, when he saw it, he's like, “Dude, what is this? This is like 2008.” Like, we went back in time, and he's like, “Exactly.” And so the reason why Daytona is like super fast, and you see this on benchmarks, is we essentially, we run on bare metal. We have our own scheduler, we use the underlying, disk, CPU, and RAM of the underlying machine, which means your IOPS are insanely fast because there's no, there's no network between an EBS or something like that. But also the snapshot, the point in time, the templates, are also preloaded on the bare metal machines. So when you fire off a sandbox from a template or a snapshot, you're essentially directed to the bare metal machine where that snapshot is based on that NVMe drive, and then it literally just turns on that machine, and it's local. There's no network latency, anything on there. And so that is sort of the specificities that we, when we're thinking from first principles, what a computer would look like for an agent, that is what we came up with, and that's what we created.Benchmarks, 60ms Startup, and 50,000 SandboxesSwyx [00:15:02]: Yeah. I should maybe, I don't know if you endorse this, but there's someone that does compute SDK, you guys do very well on there, with like the TTI, right? I. is this a, is this a is this a relevant benchmark for you guys? I don't know.Ivan [00:15:16]: I don't know, and it changes every day. So today RKL is.Swyx [00:15:18]: I don't know what RKL is. Never heard of it.Ivan [00:15:20]: Yeah. RK, yeah, so it is there.Swyx [00:15:22]: You are, at least a third of the next tier of performance, and then, there's a lot of other better-known names that are very slow to start.Ivan [00:15:31]: Yeah. We've been the number one by far for a long time, and now there's different, there's different definitions also of sandboxes, different isolation patterns, different other things. So RKL runs it literally on the S3, the data, so it's very different, and they spin up a sandbox, spin up a container for that, so it's a different type of thing. So the definition of a sandbox is something that we can all, we all need to get along with. But yeah, we're insanely fast on getting these things, up and running. And so you can see even there that it's a zero point 0.10 to 0.11, so.Swyx [00:16:03]: Close enough. Yeah. what else do you need, right?Ivan [00:16:05]: Yeah. So the benchmarks itself, so, in this, in I don't think the benchmarks equate to market ownership or revenue or anything like that. and I've seen this with multiple benchmarks, not just in sandboxes, but in general benchmarks around.Swyx [00:16:20]: It's table stakes. It's just like.Ivan [00:16:21]: Exactly. But it doesn't hurt.Swyx [00:16:22]: Just roughly check.Ivan [00:16:22]: Like you definitely have to be up there and you have to be competing so that people know that, oh, this is definitely one of the top. Because this is only one dimension of what customers look for. There's other things like how many can you spin up consecutively? There's a feature set, there's support, there's like all different things that people look at, but you definitely have to be there, on the benchmarks.Swyx [00:16:40]: How many people do people spin up consecutively?Ivan [00:16:43]: So we have.Swyx [00:16:43]: Or concurrently, is the Concurrency, right?Ivan [00:16:45]: There's three metrics that we look at. And so one is like time to spin up one, and so our time to spin up one is 60 milliseconds with network latency. So request, spin up, reply, 60, the whole thing, 60 milliseconds. That is one. But if you wanna spin up 50,000 at once, we are now at about 75 seconds. So it takes about 75 seconds to spin up concurrently 50,000. Some others, there's public data around this, like take 2,000 seconds, which is 30 minutes. Like there's different variations of that. And then there is the so it is speed of one, speed of like multiple, and then how many can you consistently have up and running. And so we basically have right now no limit to how much we can add because we basically own our own metal. But the biggest customer of ours does like about 850,000 every single day is sort of where they're, where they're just shy of a million every single day that they're running, we do have a request for half a million concurrent, which is literally half a million CPUs somewhere running. So that's an interesting.Swyx [00:17:44]: They pay by like vCPU seconds.Ivan [00:17:47]: By seconds, yeah.Swyx [00:17:47]: Or whatever. Yeah. Okay, and so and then, and the other thing is, the sleeping and the resuming, ‘cause it's all the stateful resumption of all these things, how, what kind of workload are people putting through this, right? Like how is it Do we measure by gigabytes in memory, gigabytes in storage? I don't In like network attached storage. I, what are the costly ones of, out of all these features?Workload Economics: CPU, RAM, Network, and StorageIvan [00:18:15]: The most expensive thing are CPU.Swyx [00:18:18]: Okay. Yeah, of course.Ivan [00:18:18]: The second one, yeah Then it's RAM, then it's disk. We actually don't charge.Swyx [00:18:22]: Which is snapshotting, right?Ivan [00:18:23]: No, it's actually the, snapshotting's part of it, but basically the size of your hard disk, of your machine. So do you have 10 gigabytes, do you have 20, do you have 50, do you have whatever? And then the transference of that. Right now, currently we don't charge for, network at all at Polychron.Swyx [00:18:37]: Oh, you gotta, yeah, you gotta fix.Ivan [00:18:38]: Yeah. It is very much a it's a larger and larger part of our bill, so we're working around, that part there. Obviously, that is the least, expensive, so the hard disk is the least expensive, so it's basically CPU, RAM, for us network, ‘cause we don't charge the customer, and then hard disk, is how it's split up. But there's also different types of workloads, so we basically split it up into two types of workloads in Daytona. One is what we call background agents or long-running agents. and the other is, basically RLs and evals, which I put sort of together. And so they have very different patterns of usage, and if you look at the usage of a background And I'll just name names of companies, not specifically.Background Agents vs. RL/Evals: Two Usage ShapesSwyx [00:19:21]: Yeah, open, all hands.Ivan [00:19:23]: Yeah. So like a background agent's a Cognition, a Lovable, a like all these things are Harvey. These are all long-running, background agents. And so if you look at their usage patterns, their usage patterns are similar to human, which is like follow the sun. Basically, the usage patterns of that is like noon is probably the highest, and the midnight is the lowest, and then weekends are lower. weekday is higher.Swyx [00:19:42]: Yeah, that's a fun question. How global is it? Is it very US-centric or?Ivan [00:19:46]: The US is a large part, but we have currently, we have Asia, Europe, and the US regions.Swyx [00:19:52]: So it's quite global.Ivan [00:19:53]: Yeah, it's quite global. We have it all over. It's interesting that our I talked to you a bit about this. Our number one city by user.Swyx [00:20:01]: Hmm.Ivan [00:20:02]: Is Singapore.Swyx [00:20:04]: Oh, wow. Amazing.Ivan [00:20:05]: Which is an interesting one, right? Not by revenue, just by just like by individual head count.Swyx [00:20:09]: Really?Ivan [00:20:09]: Just like an interesting thing.Swyx [00:20:10]: Singapore is, Singapore is weirdly high in the adoption charts of AI for the population. It's like an, seven, eight million population. And it's like keeps showing up.Ivan [00:20:20]: No, it's quite interesting. We were quite shocked, and I was like, “Oh, this is interesting.” And also one that's up there.Swyx [00:20:24]: There's a reason I'm doing AI using Singapore. it's because I'm from there.Ivan [00:20:27]: We're there. We're gonna, we're gonna be there as well. and it's interesting that Japan is in the top or like Tokyo's in the top, which is in all the tech cycles it has never been. It has never been, so it's quite interesting that they're.Swyx [00:20:39]: I think the Japanese just love AI. Yeah. It's that, and then it's Brazil. That's it.Ivan [00:20:44]: Brazil has always been in.Swyx [00:20:45]: I think.Ivan [00:20:46]: Even when I look, if you look at like GitHub's data and ask historically with CodeAnywhere, it was always like US, Western Europe, and then you'd have like India, Brazil, China, like that would be there. But like Singapore was not in, specifically Japan was never in sort of that top, that top.Swyx [00:21:01]: Yeah. Weird pockets.Ivan [00:21:01]: Weird. Yeah, so it's very global.Swyx [00:21:02]: Okay, so actually that, but that's helps you to distribute your load through, all time?Ivan [00:21:08]: The interesting thing is like we have those kind of loads, but if you look at the researcher loads, they're quite different. So what they are is like if you give them concurrency of 10,000 or 50,000 or 100,000 CPUs at ARMb, when they fire off a run, it's just 100%. And then it just runs, and then it stops. So it's very, the usage pattern is squares basically, right? And it's also not follow the sun, because people will fire it off at midnight before they go to sleep but then wake up and so it's very unpredictable, so you don't know where that is. So the shapes of the usage are quite different than we have had before. And also what's interesting is when it's sort of a follow the sun, even if you have a high growth company, you can sort of predict your usage patterns and have enough capacity for that, because it's sort of, it grows in a, in a way you can project. When you have companies doing sort of like evals and RL, they're super spiky. So they're gonna come in, it's like, “We're gonna use nothing, then can we have 100,000?” Right? And then go back down. And then 100,000, go back down. So it's very different, right? And.Swyx [00:22:09]: Do you want to lock them into commits so.Ivan [00:22:11]: Yeah, we do.Swyx [00:22:12]: Yeah, okay.Ivan [00:22:12]: We so we have to lock them into some sort of commits to have that capacity, because we have to have, basically we have to have the capacity for peak. Right? And so right now, Daytona's mean utilization is 15%, 1-5.Swyx [00:22:25]: Oh my God.Ivan [00:22:26]: So it's very low.Swyx [00:22:27]: Because it's very spiky.Ivan [00:22:27]: It's very spiky, but we get up to 90%. so we have these things. And so what we're, what we're looking at right now as a company is similar to Cloudflare where you can like geo move things around, but that works really well for basically the background agent where it's follow the sun. But this, it's not. Like it's a very different shape. Obviously with scale you figure these things out, but that's an interesting new problem that we have, as a compute provider in the agent space. And when we were doing the conference recently, and so we talked to like Nikita from Neon and.Swyx [00:22:57]: I should bring it up.Ivan [00:22:58]: Parag from Parallel and whatnot, everyone has the same problem. Whereas the usage is super spiky, and this is something that has not happened before, that you have these types of like it was always, it the amplitudes were not this high, right? So it's quite interesting use case and problem solve.Compute Conference and Spiky Agent InfrastructureSwyx [00:23:12]: Yeah, I don't know if we're gonna bring this up again, but let's just talk about the conference, you had like 1,000 something people at the Warriors game, at the Sorry, where is it? What's.Ivan [00:23:22]: Chase Center.Swyx [00:23:23]: Chase Center.Ivan [00:23:23]: Chase Center.Swyx [00:23:24]: I went. It was, it was very impressive. Obviously, you can, how to throw a conference, what did you learn? you put, you pulled together all these impressive names.Ivan [00:23:33]: What I.Swyx [00:23:34]: What were you looking for?Ivan [00:23:35]: My thesis behind the Compute Conference was let's bring together people that are building infrastructure for AI agents. Because when I think of what we're building, it is the agent is the primary user, what are the ergonomics and usage patterns of agents, and so we can do that. And what I found, this was a theory, it wasn't proven, is that we all have these problems, as I touched onto. And I was, as I was talking on stage, it was like we all have the same underlying infra problems, which is this spiky workloads, unpredictable workloads that we've never had before, in human, compute or human infrastructure. And it's, again, it's the same when I was talking to Parag or when I was talking.Swyx [00:24:20]: Lynn. Nikita.Ivan [00:24:21]: Lynn, Nikita. Lynn especially, I was talking to her the other day as well. Like the It is a very interesting type of problem to solve because I can touch on Cloudflare because there's a lot of like talk about that recently as to how they solve that, which is they have a bunch of geos, and basically, as users work in different places, and depending on your tier, they can move you around the geos. And so that how, that's how they get the higher utilization. But you can sort of predict these, and it's If it's something in You'll rarely get a spike that is 10 orders of magnitude. Like you'll get a like let's say one of your customers has some like an exponential curve. What is that to I'm using Cloudflare as an example. 10%, 20%, whatever it is. I don't, I don't have this data, I'm just assessing. It's surely not 10x, right? It's surely not something there. And so how do you go out and solve this problem? And we're all solving this in different ways. So we have.Swyx [00:25:11]: She also has the same thing.Ivan [00:25:12]: Yeah, I know specifically that like Neon had that issue as well. Like how are we solving these spiky loads and things like that ‘cause we talked about it. And so the interesting thing for me to actually internalize was, yes, everyone that's building for agents first is going through this, and we're all solving similar problems, which is quite.Swyx [00:25:28]: Let me let me double-click on this. Okay. So for example, Neon, I happen to know that they're very sort of S3 oriented, right? so they're just like fully bet on S3. And you get to benefit from S3's distribution and infrastructure. So I would imagine that Neon doesn't have to care, whereas Lynn maybe has to care a bit more because obviously she's doing GPU inference. And, for listeners, we did an episode with her, one and a half years ago. And you have to care. But like, right?Ivan [00:25:54]: Parag cares for sure, and Nikita.Swyx [00:25:58]: And Parag is C of, Parallel.Ivan [00:25:59]: Parallel, yeah.Swyx [00:26:00]: Former CTO of Twitter.Ivan [00:26:01]: Twitter, yeah.Swyx [00:26:02]: They are the search.Ivan [00:26:03]: Yeah, they're search, yeah.Swyx [00:26:03]: I You and I know but the listeners don't know.Ivan [00:26:08]: Yeah, we can put it down in the screen, and so ‘cause we, when we were talking.Swyx [00:26:11]: I'll put it up on the, on the screen.Ivan [00:26:12]: Yeah, right.Swyx [00:26:12]: People can look it up if they need.Ivan [00:26:14]: Look it up. And, yes, but they still have CPU and RAM, allocation that you have to have up and running. And so CPU and RAM, you have to allocate that and have that ready. And so there's basically two ways to do it. One is you either over-provision and you can handle the bursts, or two, you basically have, I don't know if this is a term, just-in-time compute, which is like as your load becomes, as your usage comes in, you can fire off requests for VMs or bare metals at other cloud providers and then get them up and running.Swyx [00:26:43]: This is if you go above 100%, right?Ivan [00:26:45]: Yeah, this is.Swyx [00:26:46]: Like your overflow.Ivan [00:26:46]: If your overflow, like spillage or whatever you do.Swyx [00:26:48]: You probably lose money on it, but it doesn't matter, right?Ivan [00:26:50]: It, not Well, you might, you might not That is a more cost-effective way to do it but it's a slower way to do it. Because basically what you have to do is you have to like queue your requests, spin up these just-in-time compute, get it all ready, provision it, and then get your workload there. And so if the time isn't important that much, that's fine, and you can do that. But if your customer, and especially for, let's say, the RL training runs, the reason why a lot of people come to us is because GPUs are more expensive than CPUs, right? So you want your GPU running at, what, 100% the entire time. And so when you're running runs on CPUs, when the when the CPU cycle is like down and spinning up the next one, you want that to be instantaneous so that your GPU doesn't go down, right? And if you then have to like go out and provision machines, you're essentially telling the GPU that it has to wait, and that's incurring our cost. So there's things that you have to try to solve for there.RL Workloads, Declarative Images, and Kubernetes ReplacementSwyx [00:27:43]: Yeah, let's talk about the different workload, right? You said that, what was it? A few months ago, you had zero RL workload and now it's 50%.Ivan [00:27:52]: It will be this one, 50%, yeah.Swyx [00:27:54]: Let's talk about how different it is, right? Like I imagine, for example, a lot less dynamic code generation of like arbitrary code. Like here, it's probably all the same code. You're just doing parallel runs or something, I don't know.Ivan [00:28:05]: Yeah. So you'll have multiple Depends on the like for each run, you'll have a snapshot. And they, for the most part, they actually do use our declarative image builder, which is like, “Oh, we, the agent wants these dependencies, these env vars.”Swyx [00:28:17]: These ones, yeah.Ivan [00:28:18]: Yeah, the declarative image builder, it.Swyx [00:28:20]: Which is a very modal like thing that they.Ivan [00:28:22]: Yeah. And so we build it on the fly and then we propagate that snapshot, and you can spin up as many sandboxes as you want against that snapshot. And then if you have to do changes, the model can, or like it could be also be automated. It's like, “Oh, now for the next run, we need to install these things or remove these things or whatever to get, a task done,” and then it goes off and runs that. So yes, that is something that it seems that they prefer. The number one reason I found, or should I say, let's take a step back. What we are competing against in that environment is essentially managed Kubernetes. So EKS, GKE, whatever. That is what the vast majority run on. And anyone that has tried Daytona versus GKE, EKS is like, “I'm never going back.” That has always been. There's a few reasons. One is the ergonomics. So if you have, if you're using Kubernetes to spin that up, you have to essentially manage the interface interactions with that. Daytona, although as a compute provider, it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS. Like you have an API, an SDK, it's quite like easy and seamless to get these things up and running, that's one. The other is the speed to which we spin up, which we mentioned earlier, which is much faster, and the scale to which we can go to. We haven't got into features, but an interesting feature is that it's very hard to OOM, or out of memory, our sandboxes, because we can dynamically on the fly.Swyx [00:29:48]: Resize.Ivan [00:29:49]: Resize, which is like impossible on almost any other thing. There are some technologies that enable you to do that, but it's like a very hard thing. And so we actually saw this when, the Terminal Revenge team is, brought us actually. So thank you, Alex and the team, that brought us into this whole space.Swyx [00:30:05]: It's just very rare that, a framework would just say, “Guys, just use Daytona.”Ivan [00:30:11]: Yeah, I think it says it somewhere. Yeah.Swyx [00:30:13]: Yeah. I was like, “What is this?”Ivan [00:30:15]: There's all, there's multiple there, but they also mention a few other places. and so Daytona specifically-We have, the, just jumping on themes here We, I don't know where it says Data Center.Swyx [00:30:27]: I, there.Ivan [00:30:27]: Doesn't matter.Swyx [00:30:28]: There's a very strong recommendation, which is, very unusual. Which is, it's.Ivan [00:30:33]: We do not pay them for this, just.Swyx [00:30:34]: I know, yeah. They just like you.Ivan [00:30:35]: Yeah, they like us. yeah, and also a thing, so, Data Center has multiple isolation sets underneath. The customer doesn't have to know what they are. But basically we have Docker, which is a container, that's hardened with Sysbox. So it's Docker's, isolation that is a security equivalent to a VM, but it's still a container. And that is the default, and they, especially in these training workloads, really like that as an interface to be able to use just a basic Docker container, and we enable Docker and Docker. Which for these RL runs, if you need to do a Docker compose or Kubernetes, you can spin up a K3S inside of these things, which unlocks a huge amount of workloads that you can do that you cannot do on other providers. So just on that part is much more interesting. And so we went that, through that. We showed them that we could do that, and they enjoyed that quite a bit. They being the general venture people.Swyx [00:31:28]: Those people, yeah.Ivan [00:31:29]: And Harbor people.Swyx [00:31:29]: Harbor people, do are they, are they a company yet?Ivan [00:31:33]: As far, I do not know.Customer Pull, Slack Connect, and the Computer Use BetSwyx [00:31:35]: Okay. All right. Yeah. It's like super obvious that like, there's a lot of excitement and success around these things, okay, so yeah, tell us more, right? Like, this is an exploding workload, Harbor adopted you, which helped speed things along. But what are you learning as this new workload comes online?Ivan [00:31:53]: There's a couple things that we learned, which we chat about in the beginning. We, and this has led our story, as we mentioned, we like talked to a lot of customers along the way, and we add more features and more tool sets as we talk to customers. And it's interesting that And I think it's that the ecosystem is so small and/or the models get smarter, where when we see one user come with a request, we know it goes on a roadmap if like three to five customers come with the same request in that week. It's like very bizarre. It happens so many times, which is.Swyx [00:32:27]: Because they're all friends.Ivan [00:32:28]: Sorry?Swyx [00:32:28]: They all, they're all friends. They're all in the same group chat.Ivan [00:32:30]: Yeah, probably, yeah. ‘Cause and they're like, “Oh, can you do this?” And I'm like, “Okay, this is interesting. We'll put it on a feature request.” And then the next one's like, “Oh, can you do this?” “Okay.” It's all the same, right? It's always the same. And so what we try to do, and I personally try to do, I try to be on as many call, quote-unquote “sales calls” I can. I'm in every Slack channel. We literally have about 1,000 Slack Connect channels, something like that. It's an interesting, there's so many interesting things you find out when you have all the Slack channels. You can also see where people, transfer between companies. You see leave Slack channel, enter Slack channel. It's an interesting thing. Also, just I digress, I feel that Slack Connect is literally LinkedIn what it should be. You have a list.Swyx [00:33:08]: LinkedIn charges you to, use your own connections, but Slack doesn't, right? Slack is like, do it for free. It's more lock-in. It's great.Ivan [00:33:15]: Yeah. It's amazing. Yeah. It's one of the reasons.Swyx [00:33:17]: You're gonna pay Slack for life.Ivan [00:33:18]: Exactly. You're there for life. So that's interesting. And so one of the things, the newer things we were talking about earlier is we made a big bet and put a lot of investment on computer use. that is not seen publicly the light of day. We haven't GA'd that yet, but we have.Swyx [00:33:32]: Is there a thing I can pull up?Ivan [00:33:33]: There is computer use there. It's right up a bit.Swyx [00:33:36]: Oh, yeah. Okay.Ivan [00:33:38]: What we have, what we talked about and what we've seen publicly is there's this theme now about, the human emulator where And Elon from XAI has talked about this publicly, and if you think about the models today, they're actually quite sophisticated and they can do a lot of work, but they still don't have access to all the tools. Like, I'm a strong believer that the most efficient way for an agent to work is essentially headless or through, terminal or whatnot. But if we, if we look at knowledge work in general, there's about 100 million knowledge workers in the US, about a billion in the world, and knowledge workers, and the salaries of them aggregate to 10 trillion in the US 50 trillion worldwide.Swyx [00:34:24]: Wow.Ivan [00:34:25]: Something like that. And if we look at, the five most important sectors of that, so like healthcare and government and financial services and whatnot, that's about 56% of that. So let's say it's about half of that. So in the US it's about 25 trillion, and most of them, most of that work is actually still locked into legacy apps inside of Windows, which is not going anywhere for a very long time. Like, people just won't invest in that. How much of it? our assumption is the following: if, in the RPA market, which is similar market, well, not the same 25% of, these white collar, workers', work is automated. If an agent is more sophisticated, can go through more runs, figure stuff out, let's say it's, 40%, right? And so if you take 40% of that, you get to essentially, $10 trillion a year.Swyx [00:35:17]: That's a TAM.Ivan [00:35:18]: That is a that is a TAM. So that's the TAM of the models, right? That's not our, essentially ours. But you get to that size, and to be able to do that, you essentially have to give agents these computers with the legacy. So computer use, either Mac or Windows or Linux. Linux we also obviously have and others have. But Windows specifically is something very new, and the only option right now is an EC2 with, Windows or on Azure. Both of them take anywhere from three to five minutes to spin up. We've created an actual sandbox, so it's a second instead of milliseconds, but you have, point in time snapshots, you have, forking, you have all the things that you have from a sandbox, but essentially enables you to hopefully unlock all this value. And so that's been our big push and bet, but we've sort of, kept our ear to the ground. What is sort of the next things in the market?RPA Returns: Why Agents Still Need ComputersSwyx [00:36:06]: Yeah, knowledge work, and building, and sort of RPA, the next wave of RPA. I got very excited about RPA kind of during COVID times. The UI path was IPO-ing. And it was, a very hot Isn't it, Eastern European?Ivan [00:36:20]: It is, Romanian.Swyx [00:36:21]: Romanian?Yeah, it might be the only Romanian, big unicorn okay, yeah. This I don't I don't, I don't have like a I think there's, I think there's a stage being set for the resurgence of RPA, ‘cause everyone understands that, yeah, no one wants to deal with these shitty apps and no one's gonna rewrite them. Like, you just have to do, a remote operation and programmatic operation of them.Ivan [00:36:45]: If you wanna unlock it, my own setup was basically the following. So I was doing a board deck recently, last month, whatever, and I'm like, “Okay, let's just, let's just do automated.” So, all our data's in, ClickHouse and PostHog and QuickBooks, where everyone else's is, and I'm basically, connected that all to, my Cloud code, like go off and go Cloud code whatever. Go off and, here's the integrations, go do that. It pulled out the first report, which was great. It connected to Brex and all these things, pulled it, which was great, and then I say, “Okay, now pull out this, and this,” and I kept getting, really well McKinsey-style design reports, but the data said partial data. all the missing data, partial data. Like, it can't access all the things, and I got so frustrated, and so I got, I got, my Mac Mini virtual sandbox with OpenClaw. I gave it its own account in our company, and then I went to all these services and created a read-only account, so literally like an intern in your company. And so I would say, “Now go and do this report,” and it would get the same, or like, “I can't via the MCP or the API or whatever. I can't get all the information.” I'm like, “Go log in.” And it will log into the website, then go in, export the data. It'll export the data and do the thing end to end. So even for things that have today APIs, not all of it is exposed, and I to get value, I get immense value right now, but it has to be a computer usage, unfortunately, and so I spend a bunch of tokens just on that, but I get the job done. And so if even a startup like ours, and using all the hottest tools, still needs a computer agent what hope does, Goldman have to have a headless, right?Swyx [00:38:22]: Yeah, what a - Why isn't Microsoft doing this?Ivan [00:38:27]: I'm pretty sure, Satya had a post yesterday.Swyx [00:38:29]: Oh, okay. I see.Ivan [00:38:29]: Which was like, “Every agent needs a computer.”Swyx [00:38:31]: I see, I see.Ivan [00:38:32]: So they have launched something recently.Swyx [00:38:34]: Yeah, they have Microsoft Power Automate, I'm sure, I'm sure, they're gonna have their version.macOS Sandboxes, Apple Constraints, and the Windows OpportunityIvan [00:38:39]: Version of that, yeah.Swyx [00:38:39]: You're gonna try to do yours, and it - I always know there's always demand for Mac, but I know it's, tricky to host, macOS sandboxes.Ivan [00:38:49]: We will have macOS sandboxes fairly soon. The problem with macOS, OS sandboxes is, I'm deep in this, I don't know how much interesting is.Swyx [00:38:55]: No, it's.Ivan [00:38:56]: MacOS has this problem.Swyx [00:38:57]: It's a licensing thing, right?Ivan [00:38:58]: Licensing thing. So one, you're allowed to run only two parallel VMs per machine, so that's one. Two, you can only license to a different user every 24 hours. So if you come in and theoretically, if I wanna charge you per second and I charge you one second, I have to have it idle for the rest of the day. I can't have anyone else doing that. So the pricing will be different in the sense that I will have to - we would have to charge for 24 hours, and that's not even, that's not even the most difficult thing. But the, thing above that is, from a security perspective, they enable you to do memory snapshot, pause, resume, but only on the same physical drive, physical machine. And so what you can do in, Windows world or Linux world is that I can move in the background, your snapshot from one to the other and manage load, right? Here, if you wanna do that, you essentially have to have your.Swyx [00:39:49]: Yeah, snapshots. Yeah.Ivan [00:39:50]: Your.Swyx [00:39:51]: It's like.Ivan [00:39:51]: Physical machine.Swyx [00:39:52]: You can't break it up.Ivan [00:39:53]: You can't, you can't move things around that, and all of that is, that part is, from a security standpoint, if it is written. Like, I understand the security aspect of that, but it disables you from doing these agentic, like really scalable agentic workloads.Swyx [00:40:08]: You need to do a vibe-coded, clean room implementation on macOS that you can then - That's like Clean OS or something. I don't know.Ivan [00:40:17]: So. We have.Swyx [00:40:18]: ‘cause like Linux was originally like a clean room rewrite of Unix.Ivan [00:40:21]: Okay. Yeah.Swyx [00:40:21]: Or something like that, right? Like same thing to macOS. Someone needs to do it.Ivan [00:40:25]: Someone will do that, and someone will have some long-running agents for a few days to figure this stuff out. But yeah. So definitely we - we're really close to offering something ‘cause people do want it, but the pricing will be different, and the feature set will be sort of stringent.Swyx [00:40:38]: Yeah, nobody's gonna use this. like, the labs, the labs will because they want to automate macOS.Ivan [00:40:42]: They have to do RL. They have to do RL again. But even if you The - So the point is with the RL part, if you, if you do RL on macOS, then the next iteration of the model comes out, it will be able to use these tools significantly. Then you actually need to run those, that somewhere. So you're gonna have to have that, later on. And from, if anyone at Apple is listening, I very much feel that they are shooting themselves in the foot of the scale of the revenue of compute or licensing they could get if they would just enable a concurrency model similar to what you can get on a Windows and a, and Linux.Swyx [00:41:17]: Yeah. Yeah. And I'm sure they've heard this before. They just don't care. Yeah, it's And maybe they will change their mind with the new CEO.Ivan [00:41:24]: Yeah. We'll see.Swyx [00:41:25]: We'll see.Ivan [00:41:25]: High hopes.Swyx [00:41:26]: High hopes.Ivan [00:41:26]: High hopes.Swyx [00:41:27]: Okay. But I, it's very clear the market opportunity is huge in Windows, and you can go for a long time on just Windows, but your customers are gonna want both. and I think, it is interesting to me that, this is the sort of God application of agents, right? Like, I don't It was - How big was OpenClaw for you guys? Like, was it, was there, a significant bump.OpenClaw, Agent Labs, and the B2B2C Sandbox MarketIvan [00:41:54]: Not for us because we.Swyx [00:41:54]: Because you already.Ivan [00:41:55]: We're kind of positioned differently. Whereas although it's completely PLG and we have individual developers that use it, most of the users that use Daytona are sort of a B2B2C. Sort of it's either B2B or B2B2C. So, in the researcher world, it's B2B, so you're selling to, labs and neo labs and things like that. But on the long-running agents, it's mostly, from a scale revenue perspective, it's mostly B2B2C, where you have a app layer agent that uses you at a big scale.Swyx [00:42:26]: Like a Manus. Yeah.Ivan [00:42:28]: Like a Manus Lovable type of thing.Swyx [00:42:31]: Yeah. I think that's the question of, well how, um-Uh, yeah, B2B to C is basically to me what I've been calling an agent lab, which is kind of like you're not in a model lab, but you're making a very good wrapper that is a platform that other people can sign up so they don't have to code those things. Yeah, it sound, it sounds like a much better market than the direct OpenClaw market.Ivan [00:42:56]: I've like - We I've done multiple things. So the CodeAnywhere's part of our career path R in the calendar, was very much an end user developer product. And so that is great. It You can get a lot of developer love, and I feel that we do as a company have a bunch of developer love. But it's a different type, where it's people building these things. Again, it's more akin to a Twilio because you don't really run - As a person, you wouldn't run Twilio. I don't know how many people remember. It was like ask your developer billboard and whatnot. And people really love Twilio, but they only used it inside of like, “Oh, I'm building this app or service for thing.” And so we're very much directly to that. And you also know that I used to work for a competitor for Twilio, so it's kind of ingrained, in my DNA.Swyx [00:43:35]: People don't know InfoBip is that big.Ivan [00:43:38]: Yeah, it's.Swyx [00:43:39]: Because.Ivan [00:43:40]: It's a billion euro.Swyx [00:43:40]: They're all American. They're like, “Whatever's in Europe doesn't matter to me.” But like it's the, it's the same size or bigger? Same size?Ivan [00:43:46]: It's about half the size.Swyx [00:43:47]: Half the size?Ivan [00:43:48]: Yeah, about half the size.Swyx [00:43:48]: It's like, yeah.Ivan [00:43:48]: Still huge. Multiple billions a year. Yes.Swyx [00:43:51]: That's crazy.Ivan [00:43:51]: Exactly, and so that - These are like really interesting and large revenue-generating, very sticky businesses. Whereas when you're selling to the - When your focus is the end developer, it is a very hard sell because they're very price sensitive, very price conscious, very around that. And there's very It's very hard to scale. Your cap is the number of people that are willing to spin up - First of all, wanna spin that up, and then spin up multiple of these. Whereas if you're in the enterprise one, like we know everyone's talking about like how many tokens they're spending, I'm spending. Like a lot of companies today are like, “If this is our company, spend as much as you can.” Like basically that is where we're going. And so if you think about that paradigm, where you're selling to companies that say, “Spend as much as you can to generate, productivity,” versus, “Oh, I'm a single person. I have this much budget, and I'm doing this thing because it's fun or it's helping me out or whatever.” Like it is a different, it's a different go-to-market, I think, strategy.MCP, CLIs, and Sandboxes as the Agent RuntimeSwyx [00:44:50]: Yeah, there's a lot of discussion. I'm just kind of going through like the mental list of things that are in your favor, which is, for example, MCP versus CLI. Like obviously you want CLI. It's been very good for you. I feel like it's maybe a drop in the bucket or maybe it's huge. I'm just checking whether it's like these are big trends.Ivan [00:45:10]: Those things you - work well in our favor, to your point just because every.Swyx [00:45:13]: They're kind of drop in the bucket, right?Ivan [00:45:15]: I think it's like sort of all the things come together. And so there's so many things that impact that. To your point, like OpenClaw wasn't huge for us, but like having the agent SDK, from Anthropic, so or Cloud Claude Code was very interesting. The reason why it was interesting is that a lot of, let's call them app I don't know what to call them, app layer agent companies, essentially they are like, “Oh, I can create this new app, this new agent. All I need, I just use Claude Code, and I throw it into a sandbox, and then I have my interface to the human to that.” And so that enabled so many more companies to actually offer this, and then they would pull on sandbox. So that was, that was interesting. And to your point, like MCP, versus the CLI, the MCP is an interface against an API, whereas the CLI is like you can actually go do things. Like this is it. The difference between integrations and actually running scripts or data or analysis against a thing. So being able to use a CLI very well enables the agent to do more things, and it's because that people will invoke a sandbox, they'll run it in the CLI, and but it'll do anal-analysis on that data and then give you an actual result versus just, pulling data from an API source.Swyx [00:46:29]: Yeah, it's a layer of indirection basically, it's the same thing as agentic search versus RAG, which where you're.Ivan [00:46:34]: Exactly, yeah.Swyx [00:46:34]: Just like you just win whenever people put more agents into their workflow. And so like it doesn't really matter, but I'm just kinda teasing out like what else have people heard about that like it's sort of, “Oh yeah, this is another sandbox use case. Oh yeah, that's another one.” Am I, am I missing any big ones?Ivan [00:46:51]: The thing, the thing that people, which is the computer use stuff, which I think is probably the most interesting one, is, and to your point, we've talked to so many people over the last year. It's like, “Oh, like why do you need a sandbox? Why do you need this? Why this?” And to your point, it's like, “Oh, I need sandbox for this. I need sandbox for that. I need sandbox-” It's like, “Oh, I need it for every single thing.” And so basically what I, what I - and it sounds like a broken record, it's like you use a laptop every single day, right? And you are n of one. It's just you. But now imagine how And by the way, the laptop, the computer PC market, the PC market is about equal to the cloud market in total. So it's about 150, 180 billion a year. Something like that. It's about roughly the three cloud hyperscalers is about equal to like Apple, HP, Lenovo, whatever, It's a little bit less, but it's sort of like that. And now imagine And that's just like, so how big is the addressable market? What, how many people are there in the world now? What's the last data?Swyx [00:47:45]: Let's call it eight billion.Ivan [00:47:46]: Eight billion. And so let's say you can have two computer, like you have one personal and one business, whatever. Like so it's double that, right? and so that's 16 billion, right? How many agents are gonna be running in two years, in 10 years, in 100 years? Like And for every single task, they will need one of these. And so how big is that? That market is essentially quote unquote “infinite”. You will get to the point, and Dylan Patel was at the conference talking about, from SemiAnalysis, that talks usually about GPUs, was also talking about how CPUs will now be a bottleneck because it will be the constraint. You won't be able to grow, or we won't be able to have enough of these because there won't be enough CPUs to basically do.Swyx [00:48:23]: Yeah. Well, I actually had a really good podcast with Doug Oliphant, who, which was his president at SemiAnalysis, where they've basically been like, yeah, it's been a GPU shortage first, but then it's cascaded down to memory and now to CPUs.Ivan [00:48:35]: CPU, yeah.Swyx [00:48:35]: It-What's next? So networking. So, networking actually has been in shortage for a while if you're looking at, just GPU networking. But, yeah, it's really crazy the amount of computer use that's going on, yeah, cool. I, other questions are, just the one very big part is the open sourceness which you didn't have to do, your competitors don't do, like it's not, a lot of people are worried about keeping their projects open source because some competitor can just slot fork it. I don't know if there's any reflections on just being an open source company.Open Source, Trust, and Enterprise ProcurementIvan [00:49:15]: Yeah. There's a bunch. So we the original product that we did was open source.Swyx [00:49:19]: Yeah. CodeAnywhere.Ivan [00:49:20]: So doing that was actually very good for us. There's basically a saying of, What's the saying? Like, companies that are, that are doing really well, measure themselves against, free cashflow, that are kinda okay, it's EBITDA, then, it's, it goes all the way down.Swyx [00:49:36]: The worst is like GitHub stars.Ivan [00:49:37]: GitHub stars. GitHub stars are the worst, yeah. So you go all the way down to GitHub stars. And so our original one was GitHub stars. That's what we talked about, we're at the point we're talking about revenue, so we're we've gone up the stack on that. And so we started.Swyx [00:49:47]: No, profit.Ivan [00:49:48]: Yeah. We haven't, we're, we'll get there. We'll get there. But basically at that point we did stars and GitHub and it was useful, and the original variation that we did, it we split the core into its own repo and it was Apache 2.0, so very, permissive. And then we basically would bundl

Capital, la Bolsa y la Vida
La calificación concursal: el dilema temporal que determina la responsabilidad

Capital, la Bolsa y la Vida

Play Episode Listen Later May 21, 2026 15:36


Antonio González y Francisco Tato, director del área Mercantil y director del área económica de Tato & González Concursales, respectivamente, lo abordan en este espacio en colaboración Surus

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!This was recorded before Railway suffered a major GCP outage on May 19, despite being a multi-AZ, multi-zone mesh ring, with HA fiber interconnects between their Metal GCP AWS, because workload discoverability was unintentionally still tied to GCP. All has been resolved with a post-mortem.Railway did not start as an AI infrastructure company.It was founded in 2020 years before agents became the default way people thought about deploying software. Jake Cooper, formerly at Bloomberg and Uber, started Railway with a simple obsession: the activation energy to ship something to production should be near zero. Push code, get a URL, iterate. No Docker files, no Kubernetes manifests, no Ansible scripts stacked on Ansible scripts.For years, this was a slow grind. Railway spent its first 18 months hand-acquiring its first 100 users with Jake personally greeting every Discord signup on a second monitor.Today, Railway has raised $124m and is growing very fast. A 35-person team supports 3 million users, adding roughly 100,000 signups a week. Their bare metal data centers have a 3-month payback period vs. renting in the cloud, with 70% margins funding aggressive cloud bursting when needed. The servers they own have actually appreciated in value as RAM prices have climbed basically meaning the value of their hardware now exceeds the capital they've raised.From rebuilding Railway's network overlay over a weekend to moving the vast majority of workloads onto its own bare metal data centers, Jake Cooper is trying to build a new cloud for an agent-native world. In this episode, Railway's founder and “conductor” joins swyx and Alessio to unpack why the next era of software infrastructure is not just “Heroku but newer,” what agents need that humans did not, and why the old deployment loop of Git, PRs, CI/CD, and static cloud resources may be heading for a rewrite.We go deep on Railway's infrastructure stack: own-metal data centers, three-month cloud payback periods, cloud bursting, data center debt, Railpack, Nixpacks, Temporal, feature flags, Central Station, content-addressable filesystems, agent-safe production forks, and why the CLI may become more important than the canvas in an agent world. Jake also shares the founder journey behind Railway, how the company survived losing $500K/month, why it now serves millions of users with only 35 people, and why he believes the pull request is dying.We discuss:* How Railway went from a slow six-year grind to adding 100,000 users a week* How Railway thinks about agents as the next dominant software species* Why agents need version control, observability, compute, storage, and orchestration at 1000x scale* The economics of Railway's own-metal data centers and three-month payback* How Railway uses cloud bursting while scaling its own infrastructure* Why data center debt can be a better tool than venture debt for infra startups* Central Station, Railway's internal system for clustering customer feedback and incidents* Why responsible disclosure and over-communication matter for platforms* Why feature flags, progressive rollouts, and shadow traffic are essential for agents* Temporal's strengths, pain points, and why workflows matter for agents* Railpack, Nixpacks, Nix, and lazy-loaded content-addressable filesystems* Why “cattle, not pets” may change if you can clone the pets* Why Railway is building a new cloud from scratch instead of copying hyperscalers* The solo founder path, focus, writing, and how Jake thinks about company buildingRailway:* Website: https://railway.com/* X: https://x.com/RailwayJake Cooper:* LinkedIn: https://www.linkedin.com/in/thejakecooper/* X: https://x.com/JustJakeTimestamps00:00:00 Introduction: What Is Railway?00:02:07 Jake's Path to Railway00:06:13 Railway's Six-Year Growth Story00:08:52 Rebuilding the Business After the Free Tier00:11:17 Agents as the Next Software Platform00:13:29 Railway's Infrastructure Philosophy00:15:42 Bare Metal, Cloud Economics, and the Compute Crunch00:17:22 Cloud Bursting and Five-Cloud Networking00:20:20 Data Center Debt and Infra Financing00:23:31 Data Centers in Space00:25:24 What Agents Need From Infrastructure00:28:24 CLIs, Canvas, and Agent-Native UX00:35:15 Central Station, Incidents, and Responsible Disclosure00:40:30 Safe Rollouts, SRE Agents, and Production Forks00:45:00 AI SRE, Specs, Code, and Tests00:48:24 Self-Replicating Infrastructure and the New Serverless00:53:18 Heroku, Temporal, and Workflow Engines01:04:07 Railpack, Nixpacks, and Lazy-Loaded Filesystems01:06:01 Coding Agents, Token Spend, and Roadmap Acceleration01:10:56 The Pull Request Is Dying01:12:28 Feature Flags and the Agent-Era SDLC01:16:15 Cattle, Pets, and Cloning Machines01:19:29 Solo Founder Lessons01:24:12 Focus, GPUs, and Building a New Cloud01:28:20 Closing ThoughtsTranscriptAlessio [00:00:00]: Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, founder of Kernel Labs, and I'm joined by Swyx, editor of Latent Space.Swyx [00:00:10]: Hey, hey, hey. Today we're in the studio with Jake Cooper of Railway.Alessio [00:00:14]: Conductor of Railway.Swyx [00:00:15]: Conductor at Railway. Yeah.Alessio [00:00:16]: Choo-choo.Swyx [00:00:17]: Do you actually have that anywhere, like on your business card?Jake [00:00:20]: We call some of our volunteer moderators conductors. I don't have a business card. We're not that big yet. At some point I will. I got handed a nice business card from the Supermicro folks, and I was like, “Damn, this is pretty official.”Swyx [00:00:30]: Business cards are coming back.Jake [00:00:32]: They're cool. They're hip. The conductor thing is good. We're trying to figure out what we want to call each other internally. Some people think it's super cringe and say, “You don't need a name for people internally.” Some people want to call each other something. We still don't have a really good one.Jake [00:00:55]: We've got New Railcrews, Trainiacs. Nothing has stuck yet.Swyx [00:01:00]: I like Trainiac. Trainiac sounds good. Railwayians. For those who don't know, what is Railway? Let's give people a crisp definition up front.Jake [00:01:09]: Railway is the easiest way to ship anything. You go to the canvas, or you talk with Claude, and you say, “Deploy a Postgres instance, deploy my GitHub repository, run this code,” and you're off to the races.Swyx [00:01:22]: You've got a nice animation on the landing page.Jake [00:01:24]: Thank you. None of my work, by the way. They don't let me touch the design stuff anymore.Jake [00:01:25]: We want to make it trivially easy not just to deploy things, but to evolve applications over time. Most tooling right now stacks entropy on top of entropy: Docker, Kubernetes, Ansible scripts, and all these other things. If we can version all of your software and keep track of all the changes, then we can make it trivial to clone environments, fork into a parallel universe, get copies of production data, get copies of any services, make changes, validate them, and collapse them back in without reproducing everything across a staging environment.The Railway Origin Story: From Uber Systems to a New CloudSwyx [00:02:07]: I was looking at your background: Bloomberg, Uber. Nothing immediately stands out as, “This guy is going to found the next great platform as a service.” What prepared you for Railway?Jake [00:02:21]: It was curiosity to keep going deeper. I started out on front-end stuff, working on Wolfram Mathematica and porting it over. Then I briefly moved to Bloomberg, then toward Uber and distributed systems, taking the Jump Bikes systems and moving them to a distributed system built on top of Cadence, the pre-Temporal Temporal.Swyx [00:02:44]: Which, by the way, I'm happy to talk about, pros and cons.Jake [00:02:48]: Totally.Swyx [00:02:51]: But let's do the Railway story.Jake [00:02:52]: It has been a continual step of wanting an experience. Whether it's walking up to a bike, unlocking it, and having it work frictionlessly, or something else, the depth required to make that happen follows from the experience. A lot of the work I do, and a lot of the team does, is in service of that experience. We fundamentally don't care how deep we have to go. We will swim to the bottom of the swimming pool to get the experience.Jake [00:03:17]: I don't have a physics PhD. I did an EECS degree. It has always been about figuring out the next step: how do we get there? That's what led to starting Railway for that experience and then moving all the way to bare metal data centers. I was adding patches to the kernel this week to get the experience there because I can see how much better it can be.Swyx [00:03:49]: Other patches to the Linux kernel this week?Jake [00:03:51]: Yeah. Not upstream. Our fork.Swyx [00:03:52]: That's a flex. Railpack? No, this is different. This is the OS on top of Railpack?Jake [00:03:57]: No, this is an actual kernel patch. It's always literally: what do we have to do to get that experience? Then figure it out. Anything is figureoutable.Swyx [00:04:10]: Would you send the patch upstream, or does it not fit other use cases?Jake [00:04:13]: Maybe. We have to work out the experience internally. It has to do with the storage layer we're building for some of the agentic stuff. Maybe it'll be useful upstream, but it's deeply useful for us internally.Open Source, Forks, and Non-Deterministic VersioningSwyx [00:04:29]: You mentioned open source before. How do you think about starting from open source, and then coding agents letting you do a lot more from forks of it?Jake [00:04:38]: GitHub's original sin is that it's almost a series of broken pointers. You have this thing, then you clone it, and now you've lost the whole upstream. How do we make it trivial for people to modify really small pieces of it?Jake [00:04:51]: We think of Git in a discrete sense: I've either made a change and merged upstream, or I haven't. What would it look like if it were percentage-based, a little more non-deterministic, or a stream of changes that users traverse as a percentage rolled out in general and then rolled all the way up?Jake [00:05:13]: We have the open-source kickback program and let you deploy templates because we want to make it trivial for people to version these shards over time. It solves a large problem around authentication, authorization, and security. NPM has a way to define, “Don't take any new packages.” The ideal end state is that you roll out progressively to users with the minimum impact zone and continue rolling up. JPMorgan should probably be the last one on the patch line, for all our sakes, because our money and livelihoods are there.Jake [00:05:53]: It's okay if Johnny Vibe Coder gets a broken patch because there's so much entropy in the system that the rubber has to meet the road at some point. You have to test at varying levels.The Long Grind: First Users, Free Tier, and Making the Business WorkSwyx [00:06:13]: I wanted to pull up this glorious chart, which is your usage or number of daily signups?Jake [00:06:22]: Daily signups, I think.Swyx [00:06:24]: You started six years ago. It was a slow grind, and now you're on a rocket ship. You say, “Don't doubt your fight and don't quit.” Maybe pick out certain points that were key inflections for the company.Jake [00:06:40]: At the start, it's about getting your first 100 users, hell or high water. We had a website and a support link. The support link was the Discord channel. I had notifications on with two monitors: the monitor I was working on and the other monitor with Discord. If anybody came in, I was immediately like, “Hey, how's it going?” It was rare, so getting those first 100 users to come back was the start.Jake [00:07:14]: Then you build a consultancy factory because users want all these things. You have to go back to the board and ask, “What is the actual product offering I want to build on top of this?”Jake [00:07:28]: VCs want charts that always go up and to the right, but in reality you don't necessarily want charts that look like that. For us, there have been periods of expansion where we add features to test use cases, and periods of compaction where we ask, “If the experience we have is good, how do we make it significantly better?” Maybe we strip out features that don't fit our ICP anymore.Jake [00:07:57]: The boom from 2022 to 2023 came from the free tier. Everybody under the sun was using it.Swyx [00:08:09]: A lot of Reddit bots and Discord bots.Jake [00:08:12]: And crypto miners. When you build an open product on the internet where anybody can sign up, the internet is a horrible place with so many things. You go through periods of asking, “How do I reach as many people as possible?” Then, “How do I fit the exact use case for the people who really matter and are really excited about this specific thing?”Jake [00:08:39]: Then there was a two-year period of making the actual business work. During the free-tier era, we were losing about half a million dollars a month.Swyx [00:08:59]: On a $20 million bank account.Jake [00:09:02]: On a $20 million bank account with maybe $50,000 a month in revenue. That's a horrible business. I don't know how anybody invested. But you have to go through it and say, “We have an experience people love, but the business has to work.”Jake [00:09:17]: There are two schools of thought. You can run the horrible business all the way up with bad margins, or you can go back and make it work. We've always wanted a super lean team. We're 35 people right now. It's very small.Swyx [00:09:36]: Supporting three million already?Jake [00:09:38]: Yeah. We're adding 100,000 users a week right now, so it's growing fast. We don't want to add headcount for the sake of headcount or throw bodies at problems. We want to build systems. It's hard to build systems during expansion because you're adding things to the system because people are asking for them or things are breaking.Jake [00:10:00]: We had to cut off the free users for a little while, rebuild the business, and make sure it worked. We want to reach as many people as possible because software is important. It's become difficult to create things in the physical world, so it's important to make it easy for people to build in the virtual world and have access to creation. But there are legs to that journey.Jake [00:10:30]: You can see divots in the charts. If you follow between 2025 and 2026, it's either summer or winter. People go on holiday with family.Swyx [00:10:50]: It affects that much?Jake [00:10:51]: Yeah. It's kind of B2C and kind of B2B. People are shipping constantly, then they stop. Our activation curve now shows more people activating on weekdays because we have more business users, so it smooths out over time.Agents as the New Interface to DeploymentSwyx [00:11:17]: Was there a point where you started prioritizing AI development or agent development?Jake [00:11:24]: We've prioritized agentic as a top-of-funnel thing. Over the last six months, we've deeply prioritized agentic as a mechanism to build and deploy things because we believe the curve is so steep and that is how people will build and deploy software.Jake [00:11:42]: It almost fundamentally doesn't matter whether this is dot-com or not because we're all on the internet anyway. If agents are going to deploy a bunch of things and we hit an inference wall at some point, we'll fix those problems. The dominant species over the next 10 years is that we've moved from assembly to C to C++ to JavaScript to words. You're going to need to close that loop.Swyx [00:12:13]: When you say this is dot-com, did you mean buying the domain, or the general case?Jake [00:12:17]: I mean the dot-com era, when companies had a huge run-up because people understood the internet was important. Then they hit bottlenecks, fundamental laws of physics, math didn't work, and everybody came back down to earth. But it didn't matter because the internet became so impactful. If you operate on a long enough time horizon, you should build these things anyway because you can see where it's going.Jake [00:12:45]: That's where I think a lot of agent stuff is. You get to a point where you're running thousands of agents in parallel. What is the inference cost? What is the compute cost? How do you make that efficient? How do you coordinate all this? We have issues coordinating humans; we don't even have good tooling for that. Now we have to figure out how to get agents to coordinate, safely version changes, and know when to raise their hand for someone to intervene. Otherwise it becomes an interrupt factory.Railway's Infrastructure Thesis: Network, Compute, Storage, and MetalSwyx [00:13:19]: Let's go right into the technical side. What are the core infrastructure or architectural beliefs of Railway that allow you to do what you do?Jake [00:13:29]: The primitives matter a lot for us. We need network, compute, storage, and orchestration around it. You need control over a lot of those things. We've talked a lot about how we don't really use Kubernetes because we want higher-order control to place workloads in very specific places.Jake [00:13:48]: The reason is that you have to be very efficient with agents: memory reuse and all these other things, or you're going to massively blow up your cost structure. Being able to rack and stack your own servers and build your own metal unlocks performance and cost. Experiences where you're running 1,000 agents in parallel are not massively cost prohibitive.Jake [00:14:13]: Token use and compute use are blowing up. Over time, those things have to get a lot more efficient. You can get a lot of margin to make those experiences solid by building your own metal. That's all in service of offering a differentiated experience to as many people as humanly possible.Swyx [00:14:51]: You have a data center in Singapore.Jake [00:14:53]: Yeah. We have two in every other region now. In Singapore, we're adding a second one in Q3.Swyx [00:14:58]: What's it like? I've never built a data center. Do you go to Equinix and say, “I want some slots?”Jake [00:15:05]: Yeah. Equinix. You basically go and say, “I want power and I want a cage.” They say, “Great, here's what it's going to be.” You rent the cage for a period of time, fill it with racks and servers, and hook up internet to it. That's all the pieces.Swyx [00:15:36]: Then you handle everything else.Jake [00:15:37]: You handle everything else.Swyx [00:15:39]: What's the math versus clouds doing it for you?Jake [00:15:43]: If we rented in the cloud, our payback period when we go to metal is about three months.Swyx [00:15:50]: Which is crazy.Jake [00:15:51]: It's nuts. That's four years of depreciated hardware. You're going to see a lot of this compute crunch because hyperscalers are buying up a lot of stuff. We're working directly with OEMs, resellers, and people building these machines: Supermicro, Dell, and others.Jake [00:16:11]: Upstream, there's a bunch of supply pressure. When we raised our last round, between deploying capital for servers and now, the amount of money we've raised is less than the amount of money we have in the bank plus the value of the servers because the servers have appreciated as RAM has gone up. It's nuts how valuable hardware has become.Jake [00:16:50]: If you look at hyperscalers, they deployed around $80 billion of capital expenditures this year, and next year will be more. That's a massive infrastructure build-out. You look at that and think it's crazy that they're spending way more than the Manhattan Project. But if every person is going to run dozens or hundreds of agents in parallel, you have no conceptual idea how much compute is required to make that experience happen, even if you're deeply efficient and sharing resources. And that doesn't even count inference.Swyx [00:17:22]: How do you plan the build-out? The growth chart is so vertical. Are you usually at 100% utilization as soon as racks are live? How far ahead are you planning?Jake [00:17:33]: We still maintain cloud presence for bursting. We work with AWS, GCP, and a few other clouds. We can rent, and then the moment we get space or power, we compact those workloads off the cloud. We started on the clouds, then built a system to migrate to our own metal. There's nothing that says you can't continually do that again, and that's exactly what we do. We never want to be compute constrained.Jake [00:18:09]: At the start of the year, we actually became compute constrained because one upstream provider wasn't able to give us quota at the rate we needed, and the hardware was slower. I spent a weekend rebuilding our entire network overlay so we could straddle five clouds: Oracle, AWS, ourselves, GCP, and one other one. We can do more than that now.Jake [00:18:38]: We got into a spot where we were trying to pack instances tight because we couldn't get enough compute. That led to a few reliability issues, which are now past us. I made a tweet pointing out that it's becoming harder and harder to acquire compute at the rate these models need to acquire compute. We got bit by it.Swyx [00:19:15]: How do you think about pricing knowing you might not have your own metal available at all times? Are you pricing assuming you need extra margin if you end up going into the cloud?Jake [00:19:26]: Because we've built out our metal data centers, our margins on metal are around 70%. We can deeply subsidize the cloud business if we want to scale at a reasonable rate. We have a few levers: metal, which makes the margins; cloud burst; debt to buy servers; and venture capital. It's an interesting operational problem: how much cash do we have, how much should we raise, how quickly can we deploy it, and can we scale revenue as quickly as we scale compute?Jake [00:20:05]: If we continue making it trivially easy for people to build and deploy, then the faster we close that loop and the more operationally excellent we are with capital, the faster the business can scale. It's almost a straight linear deployment rate.Financing Infrastructure: Hardware Debt, VC, and Operational LeverageSwyx [00:20:20]: I think infra startups raising debt is a tool people don't utilize enough or know enough about. What can you tell us about that? Is it secured against your CPUs?Jake [00:20:32]: It's secured against our hardware.Swyx [00:20:37]: What rates do you get? Who are the lenders?Jake [00:20:39]: We pay prime plus a spread, and we can refinance any of the debt as rates go down. The terms are pretty good. The unfortunate thing is that Twitter has no nuance, so people say, “Venture debt bad.” But as with all things, there are specific tools and areas where you can be deliberate instead of using one tool as a hammer. Venture capital is not the hammer for everything. You have to explore and figure out what works.Swyx [00:21:12]: VC is usually the most expensive financing you can get.Jake [00:21:15]: Yeah. I also think people think about VC incorrectly from a capital-raising perspective. Most people think, “How do I raise as much money as possible from whoever is probably the best I can get at that time?” That's close to right, but what we've tried to do is figure out what unfair advantage we can buy with that equity.Jake [00:21:34]: It's the most expensive equity you're going to give away at that point in time, assuming the company keeps getting better. How do you use it to work with someone stellar who complements you? In the seed stage, I had never started a company. Ray Tonsing had good advice, and I could text him all the time. He was really fast. Awesome.Jake [00:22:01]: Then with John and Erica at Unusual, they said, “You roughly know what you're doing building a product. We'll mostly leave you alone and be available for advice.” Amazing. Then we got to Series A and the business was an operational tire fire because we didn't know how to scale a business. Work with Erica, and Jordan is over at Redpoint, so bonus.Jake [00:22:28]: Now we've raised from TQ and FPV as we're moving into enterprises. Every step of the way, we've asked: who can we partner with at this specific time to unlock the next section of the journey? I don't know enterprise sales. As an engineer, I can eyeball what features we might need, and we have wonderful people internally who can help. But you want boardroom dynamics where everyone is aligned and asking, “How do we win this?” instead of bickering about strategy.Data Centers in Space and the Physics of ComputeSwyx [00:23:31]: You had a tweet about data centers in space. Why no data centers in space?Jake [00:23:37]: It's not “no data centers in space.” My hot take is that I think it is solvable. I've just never seen anybody solve it.Swyx [00:23:49]: You said, “How are you going to dissipate that much heat in a vacuum?” You're making a physics claim.Jake [00:23:55]: I haven't seen anybody prove how you're going to dissipate that much heat in a vacuum. It doesn't mean it's not possible. It just means nobody has brought it up yet.Swyx [00:24:05]: Astrophage.Jake [00:24:06]: I don't know what that is.Swyx [00:24:07]: The Martian thing. Okay, you're very logical.Jake [00:24:09]: It could work. A lot of people are putting the cart before the horse. They say, “We're going to put data centers in space.” Okay, but how? “We have time to figure it out.” It's like in The Martian where they ask how they're going to intercept something and say, “We'll figure it out.”Swyx [00:24:36]: Making a bet on human invention is weird because you blind trust that it can be solved. But with physics, there are first-principles bounds you can put on it. Maybe not. Maybe you're asking to travel time or break a fundamental thermodynamic law.Jake [00:24:57]: I don't know how VCs do this either. How do you know what's not possible and a grift versus what's possible but sounds completely insane? “We're going to put data centers in space.” Coin flip as to which it is, and I guess you'll know in 10 years. That's one cycle.What Agents Need: Versioning, Observability, and 1,000x ScaleSwyx [00:25:23]: Moving back to agents. The branching, fast spin-up, and orchestration you do feels like pre-work that happened to be exactly what agents want. What do agents want differently than humans?Jake [00:25:37]: They want the ability to version things. It's not that different; it materializes slightly differently. Agents want a way to test changes incrementally. Engineers have feature flags. Is there a reason agents can't use feature flags? I don't think so.Jake [00:25:54]: They want version control. Can we use Git or not Git? That one is up in the air. I think something outside Git will emerge for how we version these things over time. They need observability. You need to query what happened, when it happened, which steps failed, traces, logs, metrics, and all the rest. They need network, compute, and storage. They need to write files, save files, iterate on files, and snapshot file systems.Jake [00:26:25]: A lot of what humans needed is in line with what agents need. Branching and forking are not different; we're just moving 1,000 times quicker. It can look like you need something massively different, but what you need is something massively better than what existed. You need orchestration massively better than Kubernetes. You need networking probably better than Envoy. It goes all the way down the stack.Jake [00:26:55]: If the workload profile doesn't change so much as it gets massively compressed because you need thousands of these things, what assumptions change? etcd is going to melt. You need to replace it with something. You can go all the way down the stack and say, “That part has to change, that part has to change, and that part has to change.”Jake [00:27:19]: The interesting thing about the super-exponential curve is that you have to build systems where you can rip out those parts at any time because a new bottleneck might emerge. You get good at parallel agents, and a different part of the system breaks. So it's similar to what humans needed, but at 1,000x scale.Jake [00:27:55]: How do you do code review in the age of agents?Swyx [00:28:00]: You throw more agents at it.Jake [00:28:01]: You don't. But then who reviews for CVEs and all these other things?Swyx [00:28:07]: More agents.Jake [00:28:08]: And that's how we hit the inference wall. You can continually throw agents at the problem, but I think there's a limit to the number of agents you can throw at a problem.CLI, Agent Handles, and Closing the LoopSwyx [00:28:24]: You already had a CLI before it was cool. How is the shape of what you're exposing changing, if at all?Jake [00:28:28]: CLIs have always been cool. The CLI changes because we think about how to give Claude, Codex, ChatGPT, or any model a handhold.Jake [00:28:50]: A CLI is a single command: deploy, get logs, and so on. Things that were prohibitively annoying to humans are not annoying to agents. They're nice. If I handed you a CLI with 40 arguments and 600 flags, you'd think, “I'm never going to use all of this.” But if you hand it to an agent, it says, “This is excellent. I have so many handles to work with.”Jake [00:29:24]: If you're going to expose things to agents that way, you want as many handles as possible where they can get information, query dynamic information, and close the loop quickly. Most problems right now are about how to close the loop as quickly as possible. Where does the agent get stuck, and how can you remove that?Jake [00:29:49]: Telemetry is important. If you can tell where the agent gets stuck from the CLI and say, “12% of people deviate from the happy path because of this, and now I add this argument and drive it down to 2%,” you massively increase the rate of loop closure.Jake [00:30:03]: That's how we think about not just the CLI, but every point in the dashboard. It's a user journey: I hear about Railway. I get something deployed. I get my first green build or aha moment. I see an endpoint, logs, whatever. Then I iterate. The iteration loop is indefinite. The user wants to deploy a new thing, a Postgres instance, change code, and keep iterating.Jake [00:30:36]: If you focus on the iteration loops and what's blocking them from closing quickly, one thing we say internally is: you never want to be waiting on compute anymore. You always want to be waiting on intelligence. If you're waiting on compute, there's a bottleneck that needs to be destroyed because eventually that bottleneck becomes so large that another workflow emerges to change it.Jake [00:31:04]: We've built a product where you push code, build it, and so on. But I fundamentally believe the push-pull loop is going away. We'll get to a point where you make a small change in production, that change is versioned across your infrastructure, you're working alongside copy-on-write versions of your database and infrastructure, and then you merge it in and it's instantaneously live. That's the holy grail of loops. The push-pull-rebuild thing is a point of friction that we're removing entirely.Canvas as Output: Dashboards, Context Anchors, and HyperstructuresSwyx [00:31:43]: It's incredibly fast. If anyone hasn't tried it, that fast feedback is great. My hot take is that Railway was famous for its canvas, which visualizes your infrastructure and lets you manipulate it visually. But that was for humans. For the next phase of growth, Railway CLI is more important than canvas.Jake [00:32:05]: The canvas is funny because it's a mechanism to show changes over time. You're right that previously we used it a lot as an input. Moving forward, its goal is more like an output. You would go to the canvas, make changes, see them, and watch your infrastructure evolve. Now agents have access to the CLI and can make those changes. So the canvas becomes an output: what information does the human need at this moment to make suitable decisions about control requests? Do I approve this or not?Jake [00:32:57]: It also has to be an anchor for your context, a port in the storm. Think of it like layers in a file system. You start with a project, then drill down into services, then into a function or code, because you want to represent the entire thing not just in your head, but in the canvas. Other people can share that representation, think on the same wavelength, and move quickly.Jake [00:33:33]: A lot of organizations get in trouble as they scale because all the context lives in someone's head. “How does this microservice work?” “I have no idea; go ask this person.” Then you have whole categories of products built around context discovery. A lot of that melts away if you have a solid hierarchy and can infinitely nest services, code, context, and everything else all the way down. That's what lets you build these structures over time.Jake [00:34:18]: It's also what lets us build what I've called hyperstructures: things that are way bigger. You look at the Golden Gate Bridge and ask, “How did we build that?” There's a meme that we lost the technology. To some extent, yes, because the coordination that built those things evolved and changed. We lost some of the art of building structure as we jammed everything into Slack.Swyx [00:34:52]: But you jam everything in Discord.Jake [00:34:53]: Same point. It doesn't matter. It's message passing and interrupts, message passing and interrupts.Swyx [00:35:00]: So you're arguing there should be something better and more structured than Slack?Jake [00:35:04]: Yeah. For sure. I think Slack is awful, and Discord is awful too.Central Station: Context Routing, Support, and Incident ClustersSwyx [00:35:09]: This is the equivalent of my mom test. What have you done that has your solution to this?Jake [00:35:15]: Internally, we've built a tool called Central Station that aggregates all the context from our users. Every piece of feedback, every customer support item, everything gets aggregated into clusters. If an incident is brewing, we can determine how many users are affected and break off a discussion based on that.Jake [00:35:40]: That is more helpful than long-running channels where you're trying to decide which channel to put something in. If you can dynamically aggregate information and dynamically route it to the right person based on context, it works better. We know internally that these four people are close to networking. If we see a networking thing, we can drill it down to those four people. If it's with this part, we can look at the commits. This is no longer a manual process internally.Jake [00:36:13]: If you go to station or help.railway.com, that's why we built it. We wanted to scale with a massive amount of leverage by aggregating feedback.Swyx [00:36:27]: This is built in-house?Jake [00:36:28]: Yep.Swyx [00:36:29]: I remember helping out on this one with Angelo in 2023. You scale a lot with a very small team.Jake [00:36:38]: Yeah. We're about 10 times bigger now.Swyx [00:36:40]: You have your full developer code here? Very cool.Jake [00:36:44]: If you go to railway.com/stats, we expose this as a pub-sub-able thing. It's all real-time metrics. There's a way to get it as JSON somewhere if you care.Jake [00:37:01]: We're big on trying to build everything in public and talk about what we're working on. We've had issues in the past, and we'll say, “Here's how we're fixing these things.” We've gotten compliments and flak for incident reports. We're always trying to make them better and talk with people.Incidents, Disclosure, and Progressive RolloutsSwyx [00:37:20]: You had a big one recently. I liked that it was scoped to 3,000. You presumably used Central Station. Talk through what happened and how you address it internally as a team.Jake [00:37:38]: Internally, this one really sucked. It had to do with an upstream provider that didn't do the behavior it said it documented, which is unfortunate given they wrote the RFC for how the behavior should work. We rolled those things out, and Central Station caught it initially when a couple users said caches weren't invalidating. We turned it off immediately.Jake [00:38:03]: When you roll out to a large user base of three million people, you get a lot of disparate behaviors. We tested in staging and had tests, but we hit an edge case. We've hardened those systems, and now we can make that better. But it was a tough one.Swyx [00:38:39]: I always wonder how private disclosure is supposed to work if people find an issue. Are they supposed to contact you first? When you run a platform, these things will happen. What channels should people pursue to quietly resolve it before it becomes a bigger incident?Jake [00:38:59]: There's responsible disclosure. We err on the side of over-disclosing and letting you know something is wrong versus having your provider gaslight you. We've erred on sharing those things more publicly, even if they impact a small subset of users. That's a decision we've made internally. We have four values. One is honor. The honorable thing is to notify people to the widest degree at which they may have been affected or there was an issue, and then confront it head-on: why did it happen, what can we do better?Swyx [00:39:45]: Not the whole user base. That's because of incremental rollouts and other things?Jake [00:39:50]: Yeah. Progressive rollouts.Swyx [00:39:54]: That should be the norm at all large platforms.Jake [00:39:58]: It should. A variety of companies do this. There's the quote that Meta runs 10,000 different versions of Meta. To our earlier point about agents, they need the same thing. They need shadow traffic and all these other things. We've built so much ceremony around production being sacred that we need to make it trivially easy to test different behaviors in a safe environment. Then you can make mistakes in a safe environment.Safe AI SRE: Customer Agents, Forked Environments, and Production ParityAlessio [00:40:30]: Do you see a world where these things get automatically caught, not necessarily by your agent, but by your customer's agent? The cache invalidation issue seems easy to check if you know to look for it.Jake [00:40:44]: It's hard because to determine it, we almost need to hook into your observability infrastructure. That's why we have the template loop on the platform: so you can roll things out progressively. You can roll out to Johnny Vibe Coder initially, or push a shard that someone consumes at their own leisure. Or you can roll it out over weeks: 0.1% of people, 1% of people, early adopters, then all the way up. That's the non-deterministic version control we talked about earlier.Jake [00:41:30]: I believe that's where most things should go, because most companies end up building staged rollout systems in-house. It's the same thing built again and again at every company. There's a massive opportunity to consolidate developer debt.Alessio [00:41:45]: You should have a free tier. Model providers give free tokens if you let them use the data. You could give free compute if someone is the number-one shard that goes out and lets you plug into their observability.Jake [00:41:55]: We do that. That's why we talked about the impact on 3,000 people. We start with lower-impact people. Larger companies on the platform are last to receive those rollouts so they have a version of the platform that's deeply stable.Alessio [00:42:16]: I have three services, so I'm sure I get the first rollout. You can nuke my thing at any time. There are all these SRE agent companies. Observability people also want agents that fix upstream problems. You have your own agent in the canvas now. How do you see that playing out?Jake [00:42:39]: It's the stacking entropy problem. If you don't have primitives to make iteration in production safe, it becomes difficult. If you're an observability provider saying, “Here's the fix to this error,” assume 80% are good and make sense. But in the last 20% long tail of complex issues, if you let somebody stamp it, you create an opportunity for an incident.Jake [00:43:08]: That's why forked environments are important. People have staging, but it always drifts from production. You need primitives, workflows, and experience built first-party on the platform so you can fork any service at any point in time.Jake [00:43:33]: I think of the canvas as a sheet of transparency paper. The agent is a little guy you push up into the canvas. It should say, “I need to copy that service and that service so I can test these two things.” It gets a read-only copy of production. Anything that's PII gets marked as a transform when we clone the database, create a copy-on-write version, or read from it. Then the agent makes changes and asks, “Does this actually work?” as close to production as possible.Jake [00:44:22]: That's how close you have to be, or you get massive drift. The system becomes unstable. You see this with massive systems built on Docker for local, Kubernetes for production, and a specific thing for something else. That complexity slows developers and becomes unstable at scale, making it hard to iterate. We want to compress that way down and say, “As close to prod as possible is where we want to be.”From AISRE Skeptic to Agent BelieverSwyx [00:45:00]: I was texting Erica for questions, and she says you were originally not a believer in AISRE. Have you come around on it?Jake [00:45:10]: I flipped, but I'm still not a believer in AISRE if you don't have the primitives to make it safe. If you unleash AISRE on production infrastructure without safe primitives for copying volumes and making sure things are fine, it's going to nuke your production database. It's not a matter of if, but when. I'm a big believer in making those loops safe.Jake [00:45:33]: I was a deep AI skeptic until 2023. In 2024, I thought, “Maybe I can roughly make this thing do it.” In 2025, I thought, “Now I can hold this.” Over winter break, everybody came back saying, “It's almost impossible to hold this.”Swyx [00:46:01]: Did you see this on the Claude docs? CloudBot? OpenCloud?Jake [00:46:06]: It's gotten to a point where it's harder to hold it wrong than to hold it right. There's a scene in Avengers where Vision picks up Thor's hammer and says it's terribly well-balanced. It self-balances and works well. I'm a deep believer at this point that this will be the dominant species: assembly, C, C++, JavaScript, words.Swyx [00:46:35]: It feels like a big jump.Jake [00:46:37]: It is. But it's not like you abandon CPU-based discrete logic and move straight to fuzzy logic. You need both. Your skills should call code or applications or some static structure. You can use skills to distill what the procedure should be or how the code should act.Jake [00:47:02]: I'm coming to a thesis: you need three points. You need a clear spec defining the system, the code, and the tests. When you say it out loud, if you've been in engineering long enough, you're like, “Of course. That's an RFC, tests, and code.” But they all matter. Having them together lets them reinforce each other: the spec and tests match, but the code doesn't, so reconcile it. Or the tests and code match but the spec doesn't, so reconcile that. That's the iteration loop.Jake [00:47:41]: That's why you're seeing people talk about software factories, docs, and reconciliation. Some of that is architectural astronomy if you don't implement it, but that loop is where most things will end up.Swyx [00:48:07]: For listeners, we've been talking about this on the pod for three years: the holy trinity of specs and tests. Itamar Friedman from Qodo is the reference if people want to look it up.Self-Modifying Infrastructure and the End of Push-Pull-RebuildSwyx [00:48:18]: One thing I want to mention on the OpenCloud idea is self-modification. I don't know how Railway would support it, but I have my OpenClaw, and I just tell it it has the Railway CLI and can do whatever. In theory, whatever capabilities or new infra it needs, it can call the Railway CLI, provision it, and add it to itself. The agent can modify its own infra.Jake [00:48:45]: It's nuts. I have a loop set up where you put the Railway CLI on top of something that runs on Railway. You're authenticated as whatever the current box is, and you can make any changes to it. Then you call Railway deploy, and it deploys itself.Jake [00:49:04]: It's like: “I need to spin up this instance of this environment. I already exist in this environment. Excellent, I have access to a Postgres instance now.” That's where we want to go with agentic, self-replicating infrastructure. That's your loop: iterate in production. You continue making changes. If it works, merge it upstream. If it doesn't, throw it away.Jake [00:49:37]: How do you make throwaway copies trivial to spin up and super cheap? The era of “I have an AWS instance with four vCPU and 16 gigs of RAM” is going to get destroyed. If you do that for agents, you need a thousand of those machines. It's prohibitively expensive compared with what we've spent a ton of time figuring out: the atomic unit of deploy, whether you call it isolates, sandboxes, or something else. Only pay for what you use, spin up instantaneously, and close the loop as quickly as possible.Jake [00:50:15]: If the system can self-replicate safely and say, “This is my environment, I'm making these changes,” it can come back with, “Does this look good? This is a new state of infrastructure given this prompt. I think I've solved it.” Then you go back and say, “Actually, it looks different.” It does the loop again. Then you say, “Cool. Apply.”Swyx [00:50:38]: That's retroactively obvious, which is the most useful kind. Any other comments on agent deployment on Railway?Jake [00:50:51]: It's getting better every day. I'm on X or Twitter. You can always yell at me about the parts not working as well as they should, because plenty of things should work way better.The New Serverless: Stateful, Long-Running, Pay-for-What-You-Use LinuxSwyx [00:51:04]: At this stage, when people want massively or embarrassingly parallel compute, they usually talk serverless. I feel like there's a new serverless compared to the previous five years of serverless. You're in that new bucket. Do you have comparisons or philosophical differences you want to call out?Jake [00:51:31]: It's somewhere in between. It's the ability to run stateful, long-running workflows or executions.Swyx [00:51:42]: Vercel has Fluid Compute, Cloudflare has some container thing, Google has App Runner and others.Jake [00:51:55]: That's where everything is roughly going, and it's why we've been working on this for six years. We believe users need access to a computer: a box that speaks Linux. They need to deploy what they want. Other systems change the surface area of what you can build. For us, users need a computer and need to deploy anything they truly want. That's why we've focused on the primitives: network, compute, storage. If we give you those and expose them so you can run things indefinitely, that's where we believe it's going.Jake [00:52:43]: Twitter has no nuance, so everyone says “servers” or “serverless.” It's always somewhere in the middle: I want to run it for a long time, but I don't want to provision the resource statically or pay for things I'm not using. That's been our thesis from day one: pay only for what you use, run it indefinitely, and it is full Linux.Swyx [00:53:12]: That's why I like the naming of Fluid. It's fluid. Flexible.Heroku, Focus, and Carrying the Torch Without Becoming the PastSwyx [00:53:18]: Another milestone is the Heroku official deprecation. You're one of the presumptive new Herokus. “New Heroku” has been a category for as long as I've been in developer tooling. It's finally happening. What was that like? Any behind-the-scenes of, “This is the moment”?Jake [00:53:42]: You have people where you're like, “You were running stuff on here? You, as this company?” It's crazy that names you would know are running on it and now coming to us saying, “We want to move a lot of this off.”Swyx [00:54:00]: Any behind-the-scenes on why Salesforce let Heroku stagnate?Jake [00:54:05]: I can only guess. It's hard when it's not your business. Salesforce's business is to build a great CRM. That's their focus. Then you acquire a compute business as an offshoot. A lot of early Meta people talk about focus. Boz has a write-up about how in the early days of Meta they had no money, so they were forced to focus. Then they turned on the money tree and had no reason not to split their focus.Jake [00:54:52]: But that dilutes your product. You get offshoots where you ask, “Is this the focus of the business?” If it's not core, it languishes. A lot of companies get in trouble when they split focus because they're fighting a multi-front war, not just externally but internally for alignment. Where are we going? What are we doing? What is our purpose?Jake [00:55:24]: If you're Salesforce-built and mission-driven, you want to work on Salesforce. Heroku is off to the side. It's not core to the business. Getting resources, budget, focus, and alignment internally becomes hard. It was a matter of time.Swyx [00:56:06]: Kudos for them to call it out instead of leaving it unknown.Jake [00:56:12]: Their release was a little odd. They called it out, but they didn't say they were shutting it down. Behind the scenes, I think they issued messages to people saying they should close accounts and that they were going to deprecate and remove things over time.Jake [00:56:30]: It's crazy because some of my first deployment experiences were on Heroku. You start with dragging things into an FTP server, then you try to get a deploy working, and then it's Heroku. It was the on-ramp for us. But the wheel turns. New things emerge. We're happy to carry the torch for a lot of that. But we don't want to be the new Heroku. We want to be the way people build and deploy software, and ultimately the way people monetize software over time.Swyx [00:57:19]: It's still a big crown to be the new Heroku. There are 50 companies that fought for that.Jake [00:57:23]: Everybody is holding some portion of it. We're happy to support people and companies. The platform works differently. The game loop is similar, but we've been dogmatic about where these things are going: primitives, agents, fan-out. Some things fit; some workflows need to change. We have an approximation of Heroku pipelines with the environment system. It's exciting. We've got a ton of people we can support, and it's growing a lot.Temporal, Workflow Engines, and State MachinesSwyx [00:58:12]: I have one more technical question about Temporal. I've sold my shares. You're a power user and one of our earliest customers. I met you through Temporal. You built on Temporal. You have complaints. This may be the most neutral and informed conversation anyone will hear about Temporal without someone working at the company.Jake [00:58:39]: That's fair. I've used Temporal for almost 10 years because of Cadence at Uber.Swyx [00:58:52]: Give people a sense of what Cadence was at Uber.Jake [00:58:57]: Cadence was the precursor to Temporal. It powers trip actions, rides, when you rent a Jump bike or scooter or car. You're running workflows for a period of time and saying, “This ride will run indefinitely until it finishes.” You attach information: you paused in this zone, so add this charge to the bill. When you end the trip, the workflow is done. That experience was powered by Cadence at the time.Swyx [00:59:34]: I used to say it's like programming the entire user journey top-down as one function.Jake [00:59:39]: It's a powerful idea and important. It's also important for the next phase of the agentic journey. You want an agent to do a specific task, be complete or incomplete on that task, and move on to the next thing. You need a way to manage workflows dynamically.Jake [00:59:59]: Temporal was always great in theory, and great when you got it working the way you wanted in production. But it required you to model the entire journey in your head. If you didn't, you could cause issues where replaying the state of the workflow causes non-determinism.Swyx [01:00:25]: Because it works on deterministic workflow history.Jake [01:00:28]: Exactly. I describe it as a jet engine. If you know how to operate it and run it, it's great. But you can't hand it to people trying to build complicated things if they don't have the whole state in their head.Jake [01:00:48]: We run our whole deployment pipeline on top of it. That's a reasonably complicated workflow: pre-commit hooks, signaling, queuing, and all the rest. We ran into the same thing at Uber. As you express a large workflow, it gets more complicated, with more states in the state machine that you have to map back to the workflow.Swyx [01:01:15]: It's a lot of ifs.Jake [01:01:16]: Exactly. At Uber, we built a system for doing the state machine and testing it. We've started to build some of those things here because it's grown heavily. It's not quite love-hate. When it works well, it works super well. But if someone who doesn't have full context puts something into the system that invalidates state or causes non-determinism, or spins off a ton of activities, you have to keep track of underlying SRE knobs like activity slots. Those should scale with memory, vCPU, and so on. It becomes a bear to scale.Swyx [01:02:10]: You need a capable sysadmin running things behind the scenes. If you moved off, what would you do?Jake [01:02:19]: We'd build our own workflow engine. We have a few internally that we've worked on.Swyx [01:02:27]: This is one of those classes of things you typically wouldn't vibe code, but I'm wondering if you can.Jake [01:02:33]: I still don't think you should vibe code it. You still want to run decent tests to make sure it works.Swyx [01:02:39]: Timo didn't invent that from scratch either. There are libraries you can run. On top of that, it's just a state machine that you have to map out. Ultimately, you define the instructions you want and run them through a state machine.Jake [01:03:00]: It's very doable. Workflow stuff is interesting. Restate is doing neat stuff here.Swyx [01:03:10]: You're tied into JavaScript. Are you a JavaScript maxi?Jake [01:03:13]: Internally, we have TypeScript, Rust, and Go. We don't add more languages. Actually, we have a little C because we write BPF code and hooks. But those are the languages.Swyx [01:03:28]: Is this for sidecars?Jake [01:03:32]: No. It's for the networking stack, volumes, and things like that. We use TypeScript a lot because it powers the dashboard, but we're moving a lot of workflow stuff off the dashboard stack and into the infrastructure stack.Railpack, Nixpacks, and Content-Addressable FilesystemsSwyx [01:04:00]: Cool. Any other technical infrastructure stuff? Railpacks?Jake [01:04:07]: We built an engine for determining dependencies based on source code. It's called Railpack. We built the first version, Nixpacks, on top of Nix, and then we moved.Swyx [01:04:17]: People have been trying to get me to adopt Nix and NixOS for four years. Is it ever going to be a thing?Jake [01:04:23]: I don't know. We're excited about it, but it has pain points. Think of it as a stack of versioned binaries at specific slices in time. If you want version X and version Y, you bloat the package space, which blows up image size and makes real-world workloads difficult.Swyx [01:04:53]: But you content-address it and cache it. In theory, there are optimizations.Jake [01:05:00]: In theory, yes. But with a large enough user base and disparate enough machines, you run into a problem Meta described in the XFAAS paper, their internal serverless system. It becomes difficult at scale unless you break out specific runtimes.Jake [01:05:24]: We didn't want to do that because we wanted to truly allow you to deploy anything. That was our initial thing with Nix. But we've moved toward interesting work around content-addressable file systems that can lazy-load anything from any point and page it into memory.Swyx [01:05:48]: Amazing.Jake [01:05:49]: The future is very bright. It's crazy, and it's going to be nuts.Coding Agent Spend, Roadmaps, and Token ROISwyx [01:05:54]: Founder journey stuff?Alessio [01:05:56]: Your cloud usage: you tweeted you're going to spend $300K this month?Jake [01:06:01]: I think we got to $200K.Alessio [01:06:02]: Coding agents?Jake [01:06:03]: Yeah.Swyx [01:06:04]: Across the company?Alessio [01:06:05]: You only have 35 people, so I'm sure they're not all spending $10K a month. What's the distribution?Jake [01:06:10]: I think I'm at about $25K. We have power users all the way down. We came back from winter break, and I basically said, “If you're writing code by hand, you're doing this wrong.” The tools are good enough now that you can move extremely quickly. There are issues and pain points, but you should be reviewing the code you are writing instead of writing it by hand.Jake [01:06:40]: Architectural patterns matter more now than ever, but you shouldn't spend your time generating code you would write. If you know how to write it, ask the agent to write it and reconcile it until it looks like you would have written it yourself.Jake [01:06:58]: People misconstrue my propensity to push people toward agents as connected to our growth and some reliability bumps. They're not necessarily related. The tools are good enough to move extremely quickly and build things way larger than you could before.Jake [01:07:19]: To the earlier point about cooling data centers in space: I don't know. But with software, you can ask, “How would I build block storage from scratch? How would I do these things?” I have ideas because I have history and have read papers. Let me work them out and build massive test benches with thousands of tests, because those are now free to author. If you're not using AI systems to speed-run your roadmap and reconcile your existing system onto the future, you're missing a large point of what's happening.Alessio [01:08:12]: What's the path to spending $3 million a month? Is it bound by ideas and things customers can absorb?Jake [01:08:19]: For most companies, it's bound by deployment at this point. That's why we've seen a massive boom in users and companies, from Fortune 50s down, asking how to get developers to move faster. You'll probably hit your CFO before any technical limits because they'll look at the eye-watering amount of money spent on tokens. Inference costs have to come down, but we're inference constrained now. There will be price discovery around what makes sense for an org to adopt.Jake [01:09:06]: I think you'll end up with the F1 driver concept. If someone is really adept at these things, it makes sense to put them in a $3 million car. If they're not, it probably doesn't make sense. You'll take a few people and say, “You can drive the F1 car. We need to go in this direction. Figure out if it works and prototype it.”Jake [01:09:33]: We've done some of that and vastly accelerated our roadmap. We thought we'd ship something in a few years; now we can probably ship it in a few months because we validated it and don't have to build it incrementally. We can skip steps and move toward our vision.Alessio [01:09:58]: A lot of people are realizing the roadmap doesn't always have a business impact, so they say tokens are too expensive. But if your roadmap were built to make more money by the time you built it, you'd have token pricing for it, the same way you do with sales. You'd spend a billion dollars on sales if you knew you would get $2 billion of revenue.Jake [01:10:19]: Exactly. A naive way to measure this is the percentage of tokens that end up in production. If you can measure impact because those tokens end up in production, that's awesome. But the burden of proof will rise. Internally, we have a growing number of pull requests that haven't merged. The question becomes: how do you get this into production? It's about how quickly you can build and deploy software, which is exciting because that's our whole thing.The SDLC Shift: Prompt Requests, Feature Flags, and Safe RolloutsSwyx [01:10:56]: The SDLC is changing. One thesis is that the pull request is dying. It's going to be the prompt request. Beyond that, code review is also kind of dying if you have all the other systems in place. What else is changing about the SDLC?Jake [01:11:19]: The AISRE and the tools to make it happen. AISRE is pie-in-the-sky aspirational. What does it take to get an AISRE? What tools do you need to build?Swyx [01:11:32]: You should expose your tooling to customers at some point. The Central Station command center.Jake [01:11:39]: We have it for template maintainers. Template maintainers can deploy and maintain templates, and they get feedback. We're going to expose those things incrementally.Swyx [01:11:51]: Clustering around incidents. Everyone has a version of that, but I don't think anyone has solved it.Jake [01:11:56]: I won't say we've solved it internally, but it's gotten so good that we can see incidents forming pretty quickly. At some point, those will be things either someone else builds or we build. We've always built things purpose-built for us. If it makes sense to make it useful for users, monetize it, or turn that loop into a profit center instead of a cost center, we want to do that.Jake [01:12:28]: Pull request is definitely dying.Swyx [01:12:29]: Do you do first-party feature flagging and incremental rollout stuff?Jake [01:12:34]: We have a feature-flagging engine we built internally and will eventually roll out.Swyx [01:12:38]: I don't see it as a user. How come you didn't give us what you have?Jake [01:12:43]: We have to beta test it. We care a lot about the quality of the things. There's plenty we've used internally that doesn't make it all the way through the journey because it fails. It works for one service but not multiple services. We'd have to build it for multiple services and know that if we released it, we'd rebuild it again and again. Some things are worth that, but many inform the roadmap.Jake [01:13:18]: We don't want to dilute the experience by saying, “This works, but only for this service,” unless it's a core initiative. Over the next few months, we'll roll out things that work for a single service, then multiple services, then multiple services across the environment. You have to be deliberate. Otherwise you create broken disparate experiences and support load because people ask how to use the feature.Jake [01:13:52]: It's the earlier expansion and compaction pattern. You expand the company to get features, then compact and smooth them out so the experience is stellar. You told me in the hallway, “It's gotten so much better.” Internally we're saying, “This part really sucks. We need to make it significantly better.”Swyx [01:14:11]: I can attest to that over the last three years watching you build Railway. For listeners, feature flagging is a huge part of Uber culture. So much so that they have too many feature flags and another thing to remove feature flags. Facebook has Gatekeeper. Agents are going to need this. It's fundamental to incremental rollouts. OpenAI acquired Statsig. GPT-5 is routing and flagging through different models.Jake [01:14:56]: It's super important. If the software development lifecycle is going to change because we're doing things 1,000 times faster and 1,000 times more concurrently, what becomes important at scale?Jake [01:15:16]: Before I started Railway, I built a feature-flagging product and tried to sell it. It was an easier version of LaunchDarkly. I ran into a problem: anyone small enough to adopt your technology doesn't care about feature flags, and anyone large enough to need feature flags needs so much scale that you have to build out all the infrastructure. I scrapped it.Jake [01:15:42]: But what is old is new again. Companies are trying to move quickly, but you can't YOLO a vibe-coded thing straight into production. You need to say, “Here's my blast radius, my impact, and I want to shadow it for these users.” Feature flags. You're going to need the tools larger companies built to maintain their structures. Everything gets compressed by 1,000x so everybody can build those structures quickly.Jake [01:16:07]: That's exactly where we are: compressing the software development lifecycle, then expanding it and adding more new things.Cattle, Pets, and Clonable InfrastructureSwyx [01:16:15]: Another term that comes to mind for newer developers is “cattle, not pets.” People treat production like a pet. It has a name. You baby it and keep it alive. With cattle, you can mass farm, roll out, portion parts out, and kill them.Jake [01:16:37]: I think that might change. You can move toward having pets as long as you have a cloning machine for your pets.Swyx [01:16:52]: Yeah.Jake [01:16:52]: If you can snapshot every single thing at every frame, it doesn't matter if something gets obliterated because you have a snapshot of it. The things we've built right now are designed to block changes from the hermetically sealed DevOps line. You have to write a Dockerfile because you nee

Noticentro
Alertan por temporal de lluvias en CDMX

Noticentro

Play Episode Listen Later May 16, 2026 1:40 Transcription Available


Controlan fuga y alistan reparaciones en Xochimilco Museo Regional de Puebla cumple 50 añosTrump, analiza levantar sanciones a empresas chinas Más información en nuestro Podcast#grc

MLOps.community
Agents are Just While Loops

MLOps.community

Play Episode Listen Later May 15, 2026 41:11


Hamza Tahir, co-founder of ZenML, joins the show to cut through the hype around long-running agents — arguing that at the end of the day, an agent is just a while loop that talks to a model, calls a tool, and writes to a file system. He covers the architecture of agent harnesses (inner and outer), what durable execution actually guarantees (and what it doesn't), and why the ML pipeline paradigm is a cleaner mental model than transactions for most agent workloads.Hamza also announces Kitaru — ZenML's new open-source execution runtime for async Python agents — built on five years of running ML workloads in enterprise environments.What we get into:Agents are while loops: The surprising simplicity under all the tooling: a brain (LLM), hands (tool calls), and a file system, stacked recursivelyInner harness vs outer harness: Why Pydantic AI owns the inner loop while production deployment needs a separate runtime layerWhat "long-running" actually means: Why the infrastructure we need to build is about extrapolating the future, not defining a time window todayDurable execution demystified: What checkpointing actually guarantees (infra failures, pod death, network drops) vs. what it never will (external state, bad LLM outputs, Snowflake rollbacks)ML pipelines vs transactions: Why bursty containers in Kubernetes map more naturally to agent workloads than microsecond-latency queue workers — and why Hamza argues against the complexity taxAnthropic opening the harness: Why letting other models run Claude Cowork is a "boss move," and what it means for the one-harness vs one-model debateHuman-in-the-loop, done right: The pod-kill-and-resume pattern, and why warm pools matter less when your agent runs for daysKitaru: ZenML's new open source durable execution runtime: zero-config local, Kubernetes/SageMaker/Vertex in production, built on Pydantic AI integrationArguing with Claude about Temporal: Hamza's story of spending hours getting an LLM to admit ZenML and Temporal solves the same problemIf you're architecting agents for production, picking between Pydantic AI, LangGraph, and Temporal, or just want to understand what "durable execution" actually means — this is the episode.// LINKS & RESOURCESKitaru on GitHub: https://github.com/zenml-io/kitaruKitaru launch blog post: https://www.zenml.io/blog/kitaru-launchKitaru on Hacker News: https://news.ycombinator.com/item?id=47520115Hamza Tahir on LinkedIn: https://www.linkedin.com/in/hamzatahirofficial/ZenML: https://www.zenml.io/ Timestamps[00:00] While Loop Checkpointing[00:24] Long-Running Agents Explained[01:28] Agent Harness Model Definitions[06:30] Durability and State Recovery[11:03] Agent Systems Layers[18:45] Durability in Agent Systems[22:07] ML Pipeline vs Transactions[29:23] Durability vs Guarantees[33:13] Durability vs Chaos Engineering[39:50] Kitaru Naming and Purpose[40:38] Wrap up#AIAgents #DurableExecution #OpenSource

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
AI-Native Healthcare: 100M Doctor Visits, 10–20 Hours Saved, Prior Auth in Minutes — Janie Lee & Chai Asawa, Abridge

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later May 14, 2026 65:20


Special discounts up for AIE Melbourne (LS discount) and AIE World's Fair (group discounts up to 25% - CFPs still open for Autoresearch and Vertical AI) Cya there!Abridge did not start as an “GPT wrapper”. It was founded in 2018, years before the Cambrian explosion of AI application layer companies. OpenAI launched ChatGPT publicly on November 30, 2022 and by then, Abridge had already spent years doing the unglamorous work of building trust for one of the highest context, most important workflows in healthcare: the conversation between a patient and a clinician.Abridge's original wedge was clinical documentation. Listen to the visit, generate the note, reduce the clerical burden, and let clinicians spend more time with patients instead of the EHR. By focusing on how doctors actually document, how health systems actually buy, how EHR integration actually works, how clinicians verify outputs, and how missing context during a visit turns into downstream friction across billing, prior authorization, quality, and follow-up, the adoption of LLMs became a force multiplier on a workflow already optimized for sensitive context gathering.The company has scaled fast: Abridge says it is projected to support 80M+ patient-clinician conversations this year across 250 large and complex U.S. health systems, with support for 28+ languages and 50+ specialties. It raised $300M at a $5.3B valuation in June 2025, after a $250M round earlier that year.Today, Janie Lee and Chaitanya “Chai” Asawa of Abridge join us for another crossover pod with Redpoint's Jacob Effron (who is on the board of Abridge) to dive into how Abridge is building the clinical intelligence layer for healthcare starting with ambient documentation, then expanding into clinical decision support, prior authorization, payer/provider/pharma workflows, and eventually real-time agents that act before, during, and after the patient conversation. We go inside the product, data, infra, evals, workflow, privacy, and org design choices behind bringing AI into one of the highest-stakes enterprise environments from 100M+ medical conversations and specialty-specific evals to real-time alerts, EHR integration, de-identification, clinician-scientist teams, and why healthcare may solve some of the hardest AI problems first.We discuss:* Why Abridge started with clinical documentation, “pajama time,” and saving clinicians 10–20 hours a week* The transition from ambient scribe to clinical intelligence layer: save time, save money, and save lives* Why conversations between patients and clinicians may be the most important workflow in healthcare (patient visit summary feature)* Chai's “healthcare-coded Glean” framing: context is king, but healthcare raises the stakes on safety, evals, and rollout* Why Abridge wants AI to feel like “air conditioning”: always in the background, but only interrupting when it truly matters* The prior authorization example: turning a denied MRI weeks later into real-time guidance while the patient is still in the room* Why payer policies, EHR data, medical literature, and hospital-specific guidelines make the problem hard, and also create the moat* How Abridge thinks about ambient form factors: mobile, desktop, in-room devices, nursing workflows, multimodality, and future AR* The multi-sided healthcare customer: CMIOs, CFOs, CIOs, clinicians, patients, payers, and pharma* The hardest AI problem at Abridge: high-quality, low-latency, low-cost real-time support in a high-stakes clinical setting* When Abridge uses frontier models vs proprietary models, and why its unique data from medical conversations matters* Why “every agent is a coding agent underneath,” and how the EHR can be thought of as a filesystem for healthcare agents* How Abridge approaches personalization across individual doctors, specialties, and health systems* Why “AI slop” is AI without context, and how edits, memories, and clinician preferences create a data flywheel* Abridge's eval stack: LFDs, LLM judges, in-house clinicians, third-party evaluators, specialty-specific evals, and progressive rollout* HIPAA, PHI, de-identification, one-way anonymization, customer contracts, and learning from healthcare data safely* What changes when you operate at 100M+ conversations: reliability, cost, post-training, model routing, and infrastructure optimization* Why the same clinical conversation can serve doctors, patients, payers, pharma, and future clinical-trial workflows* How Abridge works with EHRs, and why deep interoperability is table stakes for clinician adoption* Why healthcare AI has regulatory tailwinds, why 80/20 does not work here, and why high-stakes domains may drive AI forward* Why Abridge embeds “clinician scientists” into product and eval teams* What Chai learned from Glean about search, quality, and durable AI infrastructure* Why the future of AI infra may look like context layers, event-driven systems, Kafka, Temporal, sockets, CRDTs, and tools built for humans* Why Janie changed her mind on “PRDs are dead,” and why crisp written clarity matters more in complex AI products* How Abridge uses Claude Code, Cursor, and coding agents internallyAbridge:* Website: https://www.abridge.com/* X: https://x.com/AbridgeHQJanie Lee:* LinkedIn: https://www.linkedin.com/in/janiejleeChaitanya “Chai” Asawa:* LinkedIn: https://www.linkedin.com/in/casawaTimestamps00:00:00 Introduction and what Abridge does00:02:05 From ambient documentation to clinical intelligence00:04:04 Clinical decision support and context as king00:06:57 Alert fatigue, proactive intelligence, and prior authorization00:12:36 Ambient AI form factors and healthcare customers00:16:59 The hardest AI problems in healthcare00:18:26 Frontier models, proprietary data, and model strategy00:21:07 The EHR as a filesystem for agents00:24:03 Personalization, memory, and clinician preferences00:30:40 Evals, LLM judges, and progressive rollout00:36:47 HIPAA, de-identification, and privacy00:39:21 100M conversations and operating at scale00:44:10 EHR integration and the clinical intelligence layer00:46:39 Healthcare regulation, latency, and high-stakes AI00:50:11 Clinician scientists and long-tail quality00:53:04 Lessons from Glean and durable AI infrastructure00:57:03 The future of agentic healthcare workflows00:57:34 PRDs, product clarity, and building serious AI products01:03:11 AI coding tools at Abridge01:04:06 OutroTranscriptIntroduction: Abridge, Clinical Intelligence, and the Latent Space x Unsupervised Learning CrossoverSwyx [00:00:00]: Okay. This is a special crossover Latent Space Unsupervised Learning pod.Jacob [00:00:07]: Very excited to do this.Jacob [00:00:08]: At this point, we get together once a year.Swyx [00:00:10]: Once a yearJacob [00:00:11]: And this is a fun occasion to get to do it on.Swyx [00:00:13]: I really wanted to talk to Abridge but I felt very underqualified because healthcare is not something we cover very intensely. It just so happens that Redpoint's our big investors and supporters of Abridge.Jacob [00:00:27]: Anytime you want to have a portfolio company on your podcastJacob [00:00:29]: Please, by all means.Swyx [00:00:31]: So we'll introduce our guests. Chai and Janie, welcome to the pod.Janie [00:00:34]: Thanks for having us.Chai [00:00:35]: Thank you.Janie [00:00:35]: We're excited to be here.Chai [00:00:36]: Thank you.Swyx [00:00:36]: So for listeners, what do you guys do, just to situate you guys in the company?Janie [00:00:42]: Abridge is a clinical intelligence layer for health systems. We really started with documentation and building for clinicians and as we think about reducing the burden that clinicians have, they're spending 10 to 20 hours a week on documentation. There's a massive doctor shortage in the country. We also think that conversations between patients and clinicians are probably the most important workflow in healthcare. It's where care is given and received but if you think about the 20% of our GDP that goes towards healthcare, almost everything is a derivative of that conversation, whether it's the claim, the payment, the actual diagnosis given, the treatment. And we've started with a conversation to reduce the burden for doctors on documentation but we're really excited about the path ahead as we become this broader clinical intelligence layer.Chai [00:01:34]: I'm Chai. I work on clinical decision support at Abridge.Swyx [00:01:37]: Yes.Chai [00:01:37]: And so as Janie said, we're uniquely situated where we started off with the clinical note. What I'm really excited about and where we're expanding towards is what are all the things you can do before the conversation, during the conversation and after the conversation if you did have access to all the context about patients, payer guidelines, medical literature and put that together and to serve, how healthcare could look fundamentally different.Swyx [00:02:01]: And that's the context engine that you guys have?Chai [00:02:04]: Yes.Swyx [00:02:04]: Is that what it's called? Okay.Swyx [00:02:05]: So historically, as I understand it, the company started in 2018. A lot of people would be familiar with the AI voice notes form factor that doctors would be “Well, do you consent to being recorded?” It replaces handwriting and what have you. But it sounds like more recently there's been a big transition in the company. Tell me about the broader transition.From Documentation to Clinical Intelligence: Save Time, Save Money, Save LivesJanie [00:02:26]: So from a transition perspective, we really think about our journey as The first act was: how do we help save time? And that's where a lot of that original product was.Swyx [00:02:37]: By the way, one of those interesting statsSwyx [00:02:39]: On your landing page was, doctors spend time after hours.Janie [00:02:43]: They call it pajama time.Swyx [00:02:44]: Why is that pajama time?Janie [00:02:46]: Doctors after work in their pajamasSwyx [00:02:48]: In their pajamas. OhJanie [00:02:49]: At home are just writing and catching up on their notes every day.Janie [00:02:53]: Some of our favorite customer love stories, we have a Slack channel called Love Stories. We have clinicians telling us, “Abridge has helped us, from retiring early or we're now finally able toJanie [00:03:06]: go home and eat dinner with our kids for the first time.”Chai [00:03:08]: Save the marriage in some cases.Swyx [00:03:10]: One of the quotes was “We're not divorcing anymore.”Swyx [00:03:12]: I'm asking, “Why?”Swyx [00:03:14]: Because they're working too much.Janie [00:03:16]: But, in terms of where we're going and where we're expanding, we really think about our second and third acts around how do we help health systems save and make more money. Health systems are operating with record-low operating margins. It's getting harder and harder to serve patients and they have regulatory, some tailwinds but also a lot of headwinds coming their way and AI is ripe for helping on the saving and make-more-money piece. And then ultimately, how do we help save lives? The fact that our software and our product is open millions of times a week before, during and after a patient walks in the room, gives us massive opportunity with products like clinical decision support, which Chai is building but so many others to improve patient outcomes and probably one of the most important workflows and problems to be going after right now.From Glean to Healthcare: Context Is KingJacob [00:04:04]: One thing that's interesting, Chai, is you came over to Abridge from Glean and clinical decision support, which for our listeners is, in the context of a visit, helping a doctor figure out the right type of care. It's really a search problem in many ways, going through lots of different data sources. Very analogous to your previous role as one of the earliest engineers over at Glean. I'm sure a lot of our listeners are curious what's similar about the problems that you're going after now and what feels different, now that you're in healthcare.Chai [00:04:33]: Very similar. Taking a step back, with every wave, there's a lot of very similar patterns that happen across different products. A lot of social networking products look the same. A lot of credit-based products look the same. And we're seeing that very similar in the agent era with many companies, of course, in Redpoint's portfolio and so forth. And the key insight between both companies is that you have amazing models but context is king. Context is what puts them to work. So I see it in a lot of ways, a lot of similarities in this is a healthcare-coded version of Glean but the differences are really interesting. A couple things that come to mind. First and foremost, the rigor of the setting we're in. The downside risk is extremely high here in healthcare. It can be fatal in some cases. You prescribe something that the patient is allergic to for example. Whereas at Glean, it's “Oh, you got the question wrong.” It wasn't the end of the world in most cases. And so what does that mean? That shapes our evaluation strategy, both offline evaluation, progressive rollout and there's a lot more we could go into there. Second thing that comes to mind is, vertical versus horizontal. In both cases, there's a large variance but when Glean is, it's a much more horizontal company, there's a variance of personas, companies that you're working with. We also have a variance of personas, different types of specialties, different hospital systems. But the variance is a little more narrow. So from a product perspective, you're able to focus far more, especially when you have a maturing technology and you're building new products that never existed before. It lets you go after them much more easily and especially in healthcare where so many problems were solved with labor and process, that it's extremely ripe for AI to keep helping augment and enable. And the final thing that's really interesting, Abridge specifically compared to many other companies in the AI area, is the modality we started with where we're ambient and we're always listening in the background. And many more AI products will go that way but it's how we started. And that's the greatest form of AI we can create, AI that's seamless. You're not looking at your screen. It's always there. It's always helping you out and being proactive. The Jarvis vision that, every hackathon I went to over the past decade, there was always a Jarvis competitor. But Abridge very much started from the opportunity and continues to go that way.Ambient AI and Alert Fatigue: When Should the Product Interrupt?Jacob [00:06:57]: One thing that is super interesting then from a product perspective is you have this always-on seamless in the background and then you have to decide when you break the wall almost and say, “Hey, clinician, you might not have thought about X,” or whatever it is that you want to do. And in healthcare traditionally there's been this idea of alert fatigue and a million pop-ups and then a doctor just ignores all of them. It's probably a pattern that a lot of builders are thinking through now. How do you think about the right way to intervene or to pop up in a doctor visit?Janie [00:07:26]: It's such a good question. Alerts are notorious in healthcare specifically. Over 90% of alerts are ignored. The first and most important thing is context is everything, as Chai alluded to and I also think about how do we go from being reactive alerting to really proactive intelligence at the point at which it matters most. One thing we like to say is we want our product to feel like air conditioning. It should be in the background just making things better and if there is something that has great clinical risk and we're acutely aware that intervening now and not later is incredibly important, we should decide to act. But if you think about proactive versus reactive, instead of alerting a clinician during a visit when they're with their patient having a pretty serious and sensitive conversation, how do we prep a clinician before they walk into the room with that patient? And so historically, clinicians might have to manually go through charts with a patient that they've had over the course of months or years and they'll try to suss out what are the things they should be doing. You can imagine a world with Abridge. We'll summarize all of the most recent context for you, tell you based on the reason for a visit the patient is coming in for the types of things you should be discussing. And so you're going into that conversation prepped rather than walking in cold to that patient visit and then having this product interrupt you five or 10 times throughout the visit. And there might be times where it's really important to interrupt. We have a product called Prior Authorization and so this is when you may go into a doctor's office with knee pain. They'll prescribe you an MRI and so many of us have had this experience before, where in four weeks you'll get a call saying, “Hey, Sean, that MRI that you were prescribed wasn't approved and why don't you come back in? We'll figure it out.” In a world with Abridge, we might choose to quietly but still alert a doctor in that visit. And alert is probably not even the word we would want to use. Before a patient leaves, we would want to tell the doctor, “Hey, Doctor, before Sean leaves, you should ask him, has he had physical therapy and has his pain lasted for more than six weeks? Because the Aetna plan that he's on in California requires six things. We've already confirmed four of them have been met ‘cause we have all the context. But these two last criteria, if you can address with Sean before he leaves the room, we could guarantee that your MRI is approved before you leave.” And so when you think about clinical usefulness, impact to the patient, there are instances in which if we can catch a doctor while the patient is still in the room, as we think about save time, save money, save lives, we get to check all of those boxes. But when doctors have 15 minutes between visits, we have to be really thoughtful about when it matters.Prior Authorization: Reducing Latency in CareChai [00:10:23]: There's this interesting product opportunity AI has is reducing latency in the world. For example, prior authorization is an example of where care gets delayed and so great AI can reduce that. And the problem with alerts before partially is a technical problem: the quality of your alerts really matters. They're going to get ignored if you get alerts that... Similarly in engineering, where they're noisy alerts that you can't act on. But if you can make really high-quality alerts with both the context, as Janie said, and really high-quality models, then you can create a whole other game.Janie [00:10:53]: And I really like that experience because it starts to tease apart, what makes this so hard and unique. One, to make that prior authorization example possible, think about all the data that you need to have. You need to integrate with the electronic health record to know all of the patient context. Do we have access to your previous labs, previous imaging? And then to match you and to know that you're on Aetna, we have to collect all of the different payer policies and they vary by state. Some of these payer policies live on websites. Some of them live in unstructured 50-page PDF files.Jacob [00:11:31]: I thought this episode wasJacob [00:11:31]: To make sure we didn't scare people from healthcare.Janie [00:11:34]: But when you think about the things that make it hard, it also gives you the moat.Janie [00:11:39]: And then the second is the AI and the model quality we need to be able to hang our hat on. And so the bar, similarly when I worked at Opendoor, I worked on pricing models. Every outlier wiped out the margins of 30 and so similarly here in healthcare, the bar for accuracy is so high. And then I'd say the last is workflow is everything. If insurance companies deploy AI, it typically happens too late and this is when you have the notorious comical examples of AI just fighting each other when it's too late. But if we can pull forward the use of both the AI but also the ability to solve problems when the patient's in the room, you can start to collapse what typically takes weeks or months after your visit, ideally down to minutes or real-time. And it's where healthcare is both very difficult but also extremely rewarding if you can crack it.Product Form Factors: Mobile, Desktop, In-Room Devices, and ARSwyx [00:12:36]: Just to get some baseline on the form factors, because I've seen some videos on your website and stuff. You guys talk a lot about ambient AI. Is it primarily on the phone? Is there any other form factor that people get Abridge in? Is there an Abridge room setup where it's always on? I don't know.Jacob [00:12:55]: An Abridge podcast studio.Janie [00:12:58]: Primary form factor is mobile and desktop. UsuallyJanie [00:13:00]: Clinicians are walking in and out of rooms with mobile but at the end of the day, when they're closing out their notes or wanting to prep for the day ahead, they might use desktop. We have been having a lot of really interesting partnership conversations with a lot of these in-room device companies as you think about the power of multimodality and even more data, as you think about all of what is not captured today. It is fascinating to think about, especially even as we go into building and scaling our nursing product. It's one where nurses constantly, as they're walking in to check in on a patient for two minutes or maybe even 30 seconds,Janie [00:13:43]: Starting an Abridge experience is probably going to take longer than the visit. And so what can we do with in-room devices that are always on starts to raise really interesting and fun product questions.Swyx [00:13:54]: I was thinking, the way in tech companies we have all these Google MeetSwyx [00:13:58]: And other things, we might as well set up entire rooms with just Abridge tech.Chai [00:14:02]: Very much. AR glasses and related form factors are also relevant: how do we bring the information to the clinician in real-time without a screen, while still letting them focus on the patient?Swyx [00:14:18]: Do you think they want that? I'm skeptical of AR, but I'm curious what you've tried.Chai [00:14:26]: Admittedly, it's not a near-term product roadmapChai [00:14:29]: By any means. I'm being far-fetched.Jacob [00:14:31]: There's some sick AR stuff for surgeries.Swyx [00:14:33]: Really?Jacob [00:14:33]: When people are trying to visualize, you're about to make an incision but you want to see, what the cut might look or what the body might look like inside and they can layer in imaging.Swyx [00:14:43]: That's cool.Chai [00:14:45]: At some point in the future.Janie [00:14:46]: But there are a lot of our largest customers and at the largest health systems integrating already and so even as we think about building into it, unlocks a lot of product capabilities.Swyx [00:14:57]: And just to establish the terminology. Sorry, and I know I'm asking basic questions somewhat for myself but also for the audience who might beHealth Systems, Buyers, Clinicians, Patients, and PayersSwyx [00:15:05]: Less integrated. When you say health systems, it's like the Johns Hopkins, the Kaiser Permanentes.Janie [00:15:09]: Mayos, the Kaisers of the world.Swyx [00:15:10]: These are your customers, right? And the outcome that you deliver for them is happier doctors, reduced cost of processing, reduced mistakes. It's weird in a sense that I feel like there's also, a secondary customer, the customer of the customer and I don't know if you — do you think about it that way?Janie [00:15:28]: The other interesting and complex part of building product is we have our buyers, who are the chief medical information officersJanie [00:15:39]: The chief financial officers, the CIOs of these large health systems. Our users today are clinicians but if you think about who downstream is impacted, it's patients. And so as we build, with every product in mind, we think about who we're building for, who the secondary user is and what does that mean either in terms of experience, security compliance, ROI that we have to make tangible. And so like you said, time savings is one of them. But for CFOs, they care a lot more than just time savings. We have to show for every dollar you put into Abridge, because you have more compliant documentation or because you have fewer queries coming from your billing team, we save or add real dollars to your bottom line or top line, are things that we're constantly thinking about because of the dynamic across all three sets of users.Chai [00:16:32]: There's a whole other axis too with the payers and pharmaChai [00:16:35]: as well. Connecting all these three big stakeholders in healthcare isSwyx [00:16:39]: Do the payers ever see your data? Sorry, the payers meaning the insurers, right?Chai [00:16:44]: Yes.Swyx [00:16:44]: They also see Abridge data?Chai [00:16:47]: NoSwyx [00:16:47]: Like the direct integration to you guysChai [00:16:48]: They wouldn't see the raw Abridge data but when you're working together on something like prior authorization, whatever information they need, we'd communicate to them.Jacob [00:16:59]: That's cool. I would love to dig into the AI side. You still have a lot of problems on the AI side. And so maybe to start at the highest level, what's one of the hardest problems you have to solve in AI at Abridge today?The Hardest AI Problems: Quality, Latency, and CostChai [00:17:11]: To make things simple, let's take, building off the prior auth example. So one thing Janie talked about is okay, this data is all over the place and there's this combinatorial explosion of procedures, payer policies and even sometimes different health systems. There can be some cross-product of all of these different considerations you have to take into account. But what's really hard about this problem is doing it real-time in the conversation. So, in any AI product, usually the three KPIs you care about are quality, latency and cost. Now, what we're saying is we want you to do this real-time in the conversation, guiding the clinician. How do we do it in a way that does not break the bank? But we're using — But we also need very intelligent models because you're working with this cross-product of data and this, all this context layer as well. So you need high intelligence and high-quality because you don't want the alert fatigue but you also need to be fast and cost-effective. And so that's where a lot of clever engineering goes. It's okay, without getting into all the details here, can you model these policies in some intermediate representation or other things that you can do that can make this problem tractable? And of course, the Pareto frontier is always changing but we are also trying to do this now.Model Strategy: Third-Party Models, Proprietary Data, and Medical ConversationsJacob [00:18:26]: What implications has that had for what you take off-the-shelf and say, “ what? We don't need to be world-class at X. We'll just take this from the model providers or from some infrastructure player,” and what you're “No, this is where we spend most of our time focused on”?Chai [00:18:38]: This is, the fun challenge in AI?Jacob [00:18:42]: It changes every three months? SoChai [00:18:42]: Of course, with the shifting landscape, we try to be extremely thoughtful on predicting the trends of where third-party models are going and where we can uniquely go. And, sometimes when you talk about AI models, we're the models are just going to get infinitely better. But I don't think... It may be in the grandness of time you could say that but, within every month, every quarter, there's specific ways they're getting better. They're training on a lot more, coding data to be better coding agents, for example. And soChai [00:19:14]: We have to think about where are the things that won't — unique data that we're uniquely training on or to step back a little, where is a proprietary model bringing advantage to us is if it can give higher quality or lower cost and latency for similar quality, very similar to many other companies. And when we can do that is when we have proprietary data. So, for example, we have on the order of eighty million or hundreds of millions now getting close to of medical conversations.Jacob [00:19:44]: It's insane.Chai [00:19:45]: This is a unique data set. And this data set, it's very interesting because this data set is effectively a large part of the trace between the patient and the provider. That's where the quote-unquote debugging happens in healthcare. We have these traces at scale, as in as, our CEOs even called it, an exhaust that comes out of our product. And so when you have these traces, that's how you can train better agents on certain use cases, whether it's your transcription diarization use cases or so on or like note generation models and we can do that much cheaper and faster. But we're always also working with these third-party model providers. We closely collaborate with them and that's how we predict where the trends are going. The thing that I think about a lot is that, I know that the model providers are going to train much more on agentic workflows and so forth, so that's great, so that you have a better agentic harness. But the other thing that's interesting is that the model providers, because a large class of the consumer model providers is healthcare queries, that they might, optimize to train a lot of healthcare data to encode the knowledge in its weights. And this is just a great thing for us as well, where the off-the-shelf models can keep bett-getting better at general healthcare information, such that what our strategy is, we have a constellation of models, we can use something for this, that and, we only care about, at the end of the day, the best product experience.EHR as File System: Agentic Workflows and Real-Time InterfacesJacob [00:21:07]: And, you have, overall capabilities improving. I'm curious, as these models get better, is there something you look at and you're “, three months ago, we really couldn't do that but God, the the latest models really allow us to do it”?Chai [00:21:19]: So here's something interesting that I've, been toying with. So all models are... This wasn't super obvious a year ago but now it's become clear and clear that almost every agent is a coding agent underneath the hood? So you give it whatever file system, it can write its own code and so forth. So when you think about within healthcare and the use case that we have, you can think of the EHR effectively like a file system. It's just — it's a storage of all this information. It's a lot of information there that cannot fit into the context window, at least of today's models and you want to use that context effectively for all these product use cases we're talking about. And so if you have better agents that can, manipulate data, read that data, treat it as a file system as we see they're going and we know model companies are investing this way, then that very directly benefits us.Swyx [00:22:09]: Yeah. Okay, cool. Again, just establishing basic things. But we're going back to the model stuff. I'm really interested in double-clicking more on the real-time, element, which is pretty important for both of you. Is it — Is real-time just batches of every one minute, every five minutes? Is that how we do it? Or is there some more native, genuinely real-time in the sense that OpenAI has a real-time API or Gemini has a real-time API?Chai [00:22:35]: Yeah. Yeah. So today it is more on the on the batch basis but there's interestingChai [00:22:41]: Prototypes that we have that we're still not fully, full time, voice in text out or in that sense. But, can you trigger your models, your agents or agentic workflows, depending on the right times in the conversation?Chai [00:22:58]: And so you can imagine, different techniques to bring this latency down and, you want to bring the feedback loop down as much as you can. And so a lot of clever engineering there without fully... Maybe one day we'll do full voice in and text out, train a model to do something like that.Swyx [00:23:15]: You do — People don't want voice in voice out?Chai [00:23:18]: Now we aren't creating experiences that are, during the conversation, inter — It's almost likeSwyx [00:23:25]: Might be too disruptiveChai [00:23:26]: Too disruptive until, who knows, maybe eventually you could have full voice agents once we — the quality and we improve the comfort of the technology. But right now gra — that change is much more gradual and it's more text focus, text out.Janie [00:23:42]: And so much of currently what our product is trying to do is allow a clinician to focus on their patient and maybe at some point but right now patients, clinicians don't want a third voice, at least in a literal voice in that room. And so how do we be there with all the contacts and information ready at hand when there's the right moment?Personalization: Individual Doctors, Specialties, and Health SystemsJacob [00:24:03]: Jenny, one thing I'm curious about is how you think about, personalization in the product. I imagine, every doctor is a special snowflake in their own way, has their own way they like to do things. There are probably a bunch of different approaches you could take to doing that, both within the model layer itself but then also just with clever prompting or engineering. How do youJacob [00:24:20]: Deliver on that?Janie [00:24:21]: It's such a good question. Personalization is massive for us. We think about personalization at three levels. The first is at the individual, the second is at the specialty level and then the third is at the health system or the organization level. To your point, there are a lot of individual preferences. You-When a note is produced, it almost is a reflection that is so deeply personal of a doctor's work and how they give care. And so do they have preferences on things like style? They might want bullets versus paragraphs, really concise versus comprehensive. They also might have phrases that they really like to use or the templates that they want every note to be structured. And, we see it in our feedback all the time. We want two spaces in between sentences or I refuse to use this tool. And so that's something that we've had to build in. And the tricky part is how do you make sure that stylistic preferences don't interrupt accuracy and quality and that's something that we've really had to refine and hone over time. Second is at the specialty level. A cardiologist note or workflow is going to look very different from a dermatologist workflow.Jacob [00:25:32]: I assume cardiology notes are the highest stakes for you guys, given your CEO is a cardiologist.Jacob [00:25:36]: It's “Oh my God, make sure we get this one.”Janie [00:25:37]: Shiv, our CEO, is still a practicing cardiologist. He rounds once a month. And so, first call when we want just quick and easy user feedback too.Janie [00:25:46]: But, specialties require a lot of personalization, both in terms of what does the product look and so we make sure that as new users onboard, we catch that and the product proportionally reflects that. But also on the back end, evals at the specialty level, they are hard-earned to calibrate and get. What does a really great dermatology note look like? What makes it complete? What makes it compliant and billable is very different than a primary care doctor. And so it's not just about what does the product experience look but on the back end tuning and really deepening our understanding for the specialists. What does great output look like? And that's, a problem that we need to calibrate internally, externally, online, offline but, takes lots of cycles but is necessary in a high-stakes environment. And then at the health system level, for products like clinical decision support, you have health systems who've spent years or decades refining their best practices and they want to know, “Hey, we love your clinical decision support product but how do we embed our own hospital guidelines into them to inform clinicians before, during or after a visit what brest — best practices should look like?” And as you think about, deepening moats as well, when health systems, trust us with that data, allow us to productize it and directly into the clinical workflow, makes us a really great partner to health systems who want to build something that truly meets their needs, their practicing guidelines.AI Slop, Memory, and Product Data FlywheelsChai [00:27:23]: And I want to add onto that. The for the clinical documentation problem, it's very similar to AI writing that doesn't feel like your own and then we call that slop. But the way I describe one framing of slop is like AI without context. But we have all that context and both the clinicians, can have it and can guide it. And so part of the other interesting exhaust for us is, memory is, one of these new systems recordsChai [00:27:49]: Almost.Janie [00:27:50]: And we also have all the edits people make on our product and when you think about a data flywheel and how we get better over time becomes really powerful as a mechanism to just going deeper in personalization.Jacob [00:28:04]: It's interesting. I love this idea of working with systems on the guidelines they built up over a long time. I feel like so many of the best AI app companies today are... The question is: How do you take the expertise that a law firm or a bank has built up over many years and then add that as context and also a special sauce over, a an AI tool? And so seems like y'all are really doing that very effectively.Janie [00:28:24]: We're now starting to have our customers ask, “What are other customers doing?”Janie [00:28:28]: “And how are they doing it?”Janie [00:28:30]: And as we think about having visibility across such a large set of care being delivered right now, a really interesting place we could also partner.Swyx [00:28:40]: I'm just curious. I — This may be a nothing question but, how different are health system guidelines from each other? Don't they all converge to the same thing? And if not, where do they differ?Chai [00:28:52]: At a really high level, they're going to talk about very similar things but the difference is probably in some more of the details. “Oh, you should refer to specialists only when XYZ conditions are met,” or so forth and maybe different organizations have different practices and guidelines around that. But high level, talking about similar things but the details are what, of course, that shapes the context and the decisions you make.Swyx [00:29:15]: And this all goes into the context engine and it might affect the notes but maybe not.Chai [00:29:21]: The — For these local pathways, we're definitely thinking about it a little more for our clinical decision support product.Chai [00:29:26]: So yeah.Swyx [00:29:27]: Which is your stuff, yeah.Swyx [00:29:28]: And then the memory which you raised, let's just tell us more about that. What have you tried in memory? What's the structure of the memory? What works? What doesn't work?Chai [00:29:38]: There's, of course, many different ways you could do memory, where it's okay, can you bake it into the model weights or can you do it in some external store? For us, what's interesting is, of course, when you think the models are rapidly changing, whether it's in-house or third-party, baking into the model weights, sometimes you worry that it could be a little throwaway. And so, how do you... You need to find a way that you decompose the problem, the preferences from the underlying models and so forth. The thing we're right now most both that's easiest to start with and we're excited about is having, a separate store for memory, where you have, for example, a memory sub-agent that's, working in the background, figuring out what are the important parts of the clinician's actions that we want to remember for the long term. And then you can also imagine, other things where in the — you have background jobs that are running that are collating these, memories similar to Sleep, of course and what other pattern, patterns products do as well. Learning over all these action, all the action data we have, again, note edits, the conversations they did and the actual transcripts.Evals: LFD, LLM Judges, and Clinical SafetyJacob [00:30:40]: What about evals? How in the world do you... It is such a complex product surface area. We would love to hear you riff on that and also how has that evolved? I'm sure you've gotten better at it, so any learnings along the way.Janie [00:30:50]: From an evals perspective, we, from day one when we build any new product or feature, we think about, what does good look like? And there are table stakes things like clinical safety but then you start to get deeper into what does good quality look like. And when you go into something like our core product, there's stuff like style and completeness and there's things like does this note become something that can be billable, which is very high stakes for a health system. We have a number of ways in which we get confidence for this. We have, internal in-house clinicians who do what we call an LFD process to give us our very first pass at is this or isn't this a good enough output, look at the effing data.Jacob [00:31:41]: LFD?Chai [00:31:42]: That's why I was smiling. I was “Is Janie going to mention what it stands for?”Jacob [00:31:46]: I was not... There's like a million acronyms.Jacob [00:31:48]: How am I supposed to know that I don't? So “Oh yeah, of course, an LFD.”Swyx [00:31:51]: I've never heard of LFDs.Chai [00:31:53]: It's a bridge for sure.Janie [00:31:55]: I got through three days and then I had to ask someone.Janie [00:31:58]: I thought it was just me that didn't knowJanie [00:32:01]: It's our internal process.Swyx [00:32:02]: But look at the data as a meme in ML, ‘cause you tend to not look at it. You just want to look at number go up.Chai [00:32:06]: Exactly.Swyx [00:32:07]: But yes.Janie [00:32:08]: But so, we make sure we look at the data and then as we think about all of the components of good output, we, one, create LLM judges across all of these and we make sure with annotated data and either internal or external evaluators, we feel like these judges are calibrated. And then depending on the stakes, we also work with in-house and third-party evaluators across all of these before we ship any big change. And the goal is, in terms of evolution, how do you go from this process taking months, down to weeks, down to days? Some of it is, a true science and ML problem. A lot of it's also just, hard operational work. Have you planned ahead in terms of what you need? Have you really optimized the capacity that you need across all of the different specialties you need? Have you gotten a really good sense of which third parties are great to work with for what use cases? This takes a lot of domain, expertise and, lots of mistakes and errors in figuring that out. And so as much of it is an ML problem, so much of it has also been operational gains that are hugely important, where domain-specific expertise is everything.Specialty-Level Evaluation and Progressive RolloutsJacob [00:33:23]: But it's funny, ‘cause I feel like people talk about healthcare like it's one giant market and the reality isJacob [00:33:26]: It's, dozens and dozens of sub-markets. And so it feels like in your evals you have to build that up across the board, probably.Swyx [00:33:34]: And is specialization the primary cardinality at... That's the word that comes to mind.Janie [00:33:40]: Sometimes, depending on the product or the use case. And so if we're making a note improvement or feature for a particular specialty, definitely but we have products that are for nurses. We have products that, are really aimed at making the document or the output a lot more billable. And so we'll want to work with coding teams and not necessary clinicians. And so likeJacob [00:34:05]: Coding meaning healthcare coding.Janie [00:34:06]: Yes. Yes.Jacob [00:34:07]: NotChai [00:34:07]: Yes. I see you.Swyx [00:34:07]: Other kinds.Janie [00:34:09]: But is this output proportional to the work that was delivered? Is there sufficient documentation to justify the amount that a health system may end up charging? And so, specialty sometimes but also domain, very different across all of the different products that we're working for. And building out that network is, not easy and is where a lot of our operational investments have gone into.Chai [00:34:35]: And I view a lot of analogies to self-driving cars here, where, part of it is we really want progressive rollout of features to test in the real world is this useful? Is this going to work? One big difference compared to past lives is before I'd build a product, maybe I'd alpha it and then I'd like GA it the next week, ‘cause I'm “Go, move fast, ship,” and whatnot. But the mentality is like you... I want to make contact with the reality as quick as possible but I want a progressive rollout. Because as much as I get as large of an offline eval set, I want the distribution of that to match real-life distribution. And over time, by rolling out early, similar to Waymo has a tagline, “The world's most experienced driver,” another thing that can, at least linearly increase for us is, both the size of our evaluation offline and online, that and it all feeds back.Janie [00:35:25]: Something that's been earned over time, speaking of evolution, is just the trust we've gotten with customers. Historically, a lot of these health systems, when they bring on new vendors, their release cycles are quarters, sometimes twice a year. We've gotten our customers onto monthly release cycles, which is pretty fast for health systems but what is more exciting over the last, call it, few quarters, has been, a subset of our customers have said, “We want to innovate with you. We trust you,” and we have a pretty, decent chunk of our customers who say, “We'll develop with you outside of these monthly release cycles. We have a higher tolerance. We know that the stakes are very high but we want to be the first ones using these products, giving you feedback.” And so for a pretty substantial set of our customers, we've been able to convince them to be able to ship, in this gradual way before GA. Something we talk about a lot internally is, trust is earned in drops, earned in buckets and so we still can't do what I used to do when I worked at Loom. We had 30 million users. I'd just be, rolling out experiments left and. The bar is still quite high for iterative rollout but because of the trust we've earned, we're able to learn at pretty high volume very quickly.Privacy, HIPAA, and De-IdentificationSwyx [00:36:45]: Your scale is still pretty huge.Swyx [00:36:47]: One thing I want to... We were going to go into scale? In a sec. One thing I wanted to call up, follow up on evals, which, again, just coming from a generalist engineer point of view, just thinking through what would people be scared of in doing this, the privacy and HIPAAJacob [00:37:00]: Elements of this. I have zero experience in that. What do you have to do? What is surprisingly not that bad?Chai [00:37:06]: So one thing that's really important here from a compliance perspective is very much that any of the data we use needs to be de-identified, any real-world data we use as a basis of online eval sets we're learning from. And so you have to — And there's, very clear, government guidelines, what counts as PHI. And so we've even have built models that can take, for example, a clinical transcript and remove all the key PHI indicators and so you have a scrubbed/de-identified version. And then once you... And so one thing that's important is first you've got to get confidence in that model in the first place? And prove that out. Because, now you have, multiple probabilistic systems on top of each other.Chai [00:37:46]: But once you have that, then you can train on it use it for evaluation and so forth, provided one of the cool things also that you can do from a business side is the right data contracting as well with your partners.Jacob [00:37:57]: Is the anonymization one way? Once it's done, you cannot undo it? Or is there someoneChai [00:38:01]: YesJacob [00:38:02]: Who holds the master key that can... Yeah, okay. So it's one way.Chai [00:38:05]: It's one way. Yeah.Jacob [00:38:06]: That's how it works. I just wanted to... Because, there's a lot of this, learning from feedback and everything that, you would want to debug more but you can't because you just physically don't allow yourself to.Janie [00:38:17]: Some of it's also written in our customer contracts in terms of who can or can't access PHI data, how long do we retain it,Jacob [00:38:27]: Very goodJanie [00:38:27]: Before it gets de-identified. And so we have a pretty high bar for who can access that PHI data, just to make sure that we always respect our customer data and privacy. But that's something that we partner with our customers on too, to make sure that as we want full, as close to precision as possible in that qualityJanie [00:38:48]: We can still use it.Jacob [00:38:50]: But it'll be fascinating to see how that space evolves? Because you think about, I used to work at a company that, did a lot of healthcare data in the cancer space and if you asked, the average cancer patient, “Hey, do you want people, do you want other patients to be able to learn-”Chai [00:39:03]: Take it.Jacob [00:39:03]: “... Learn from your experience?”Chai [00:39:04]: Take it all.Jacob [00:39:05]: They're “Please.”Jacob [00:39:06]: “I'd love, nothing more than for other people to be able to learn fromJacob [00:39:10]: The experience that I had.” And so in the past it was a lot harder to do that learning. But with this technology, that might really be practical and so it'll be fascinating to see how that continues to evolve.Chai [00:39:21]: There's so much in our data set of 100 million conversations.Chai [00:39:26]: You can imagine things like insights that you can give to the clinician. How could you, oh, how could you have reacted to this? In coaching or insights around, which treatments are effective or, like... Because you have this, again, this data source that was never captured before but that's, where, intuition or experience is created from, going back to this idea that the conversation is the agent of truth.Operating at Scale: Reliability, Cost, and Token EfficiencyJacob [00:39:46]: Back to the 100 million conversations, I feel like you have this insane scale that maybe only a few other AI app companies have and everyone else dreams of. So not everyone has had to confront this yet but maybe just talk about some of the challenges of operating at that scale and what, our listeners have to look forward to if they ever get to this level of scale.Chai [00:40:05]: At large and larger in scale, so of course there's a general, infrastructure reliability. When you... In any given startup, you're building the plane while it's flying. So there's some notion of that. But what gets interesting on the AI and ML side for sure is this, as you get at more and more scale, so one, you have the data to first and foremost do this. But, you start thinking about costs or infrastructure in a whole different way at scale versus, a prototype.Chai [00:40:34]: You can use the most expensive model, you can burn as many tokens as you want but when you're doing 100 million conversationsJacob [00:40:41]: Token max on leaderboards are less upsetting than that context.Chai [00:40:45]: . When you're doing that and so that comes for we have the data and we also have the team that's able to post-train based on this and you can optimize for efficiency, especially in areas where you believe that maybe a lot of the quality headroom is less so and you don't expect the other off-the-shelf models to go that way, such that you want to do, efficiency maximization, in terms of compute and tokens.Jacob [00:41:08]: I feel like you guys live in the future in some way where most use cases today are really just in use case discovery mode, where it's “God, I really hope I can find something that can get to scale,” and so you're always going to use the most powerful model. And then the few things that do get to this level of scale, you start to do those optimizations.Chai [00:41:22]: It's a natural trajectory where it's like zero-to-one, we're not talking about any of these optimizations.Chai [00:41:26]: But when maybe we're in the one-to-100 or so forth, then we're in optimization mode and, what works out really well is you've got all this data from zero-to-one that lets you do this.What Comes Next: The Conversation as the Shared Healthcare PlatformJacob [00:41:36]: That's fascinating. I feel like one thing that's so interesting about the Abridge footprint is that you're in the doctor-patient visit in real-time. I always like to say, there's like probably 50 years' worth of product you could build on top of that. What gets each of you, I don't know, what are you most excited about building, either in the short term or medium term or even, long down the line?Janie [00:41:53]: Something that I get really excited about is that the same conversation can serve so many stakeholders. If you think about the conversation, a doctor needs to know what is the documentation, how do I make sure that this fully represent the care I gave? A patient needs to know, “What the heck just happened? This was really overwhelming. What are my next steps?” A payer needs to know, was this the proper and appropriate care given? A pharma company might want to know why isn't this drug being properly used or is there a good candidate for this clinical trial that I'm about to run? And where I get excited is that our product and our platform and our infrastructure can be the same product across all of those things and start to what's today, separate, very expensive, complex systems that serve each one of these stakeholders in very different ways, start to collapse all of that into a singular platform that enables not just more efficiency across the board but also better outcomes for everyone. And, all of us experience healthcare in probably very painful ways and knowing that there is a world in which we can simplify a lot is really exciting to me and it all starts with the conversation.Chai [00:43:15]: It's interesting. Of it very similar to going back to the KPIs that any AI product cares about. How do you increase quality of care? How do you reduce latency to care? And how do you reduce costs? Which is a huge, in healthcareJacob [00:43:28]: They call it the triple aim in healthcare.Chai [00:43:30]: But very similar to building AI products and the thing that really excites me is when we talk about that latency piece, we talked about one example earlier of prior authorization, can you reduce the latency to care? But you can imagine so much more. Oh, as soon as the lab value gets updated, do you have like a background agent that, kicks off and uses all the context to be “Oh, hey, the patient should do this next,” for example. And of flagging that to the clinician who's always in the loop but reducing that latency, to care. And then you can imagine this is much further down the road but it's like even connecting that to the direct patient and the consumer. And so how can you, how can you build a bridge to all of these things?EHR Partnerships and the Clinical Intelligence LayerJacob [00:44:10]: Very cool. The connections piece is just an ever-growing thing. And one of the key partners is the EHR and I wonder what that relationship is like. Will they, look at this as, something that is valuable enough that they want to own someday?Janie [00:44:29]: Our partnerships with the EHR is, we know that we have to be extremely close partners with all the EHRs who we partner with. Being able to not only pull and push all of the data into the right places is, not only table stakes, if we can't do that, health systems don't want to use us. The second and the reality of today is clinicians spend a lot of their days in the EHR. So much of what allowed us to win in the largest health systems was pretty direct and, very close partnerships with some of the largest electronic health records that allowed us to pull and push data with APIs that weren't ready out of the box. And clinicians want to save clicks. Anytime we introduce a new product that, adds two clicks for them in their day, they're “We're not going to use it.”Janie [00:45:21]: They have 15-minute back-to-back appointments with their patients. They're spending, hours during pajama time doing documentation. Every second and every minute counts and so we really think about being deeply integrated into the EHR as also table stakes to getting real usage and adoption. And anything that we build or introduce, we really talk about earn the right internally a lot, which is we have to provide so much value or save so much time that people will use us. But those are the two things that are close to us, is we know that the product won't be used unless it is deeply interoperable.Chai [00:46:01]: And strategically, to your point, it's like what does EHR want to own versus us? EHRs are really focused on the clinical workflows and so forth but some of the things that we're talking about here, I do these traditionally are outside of the domain where it's oh, connecting pairs and providers together with provider policies or the clinical trial matching, as Janie brought up. And so these are, entirely — we position ourselves as building this entirely new intelligence, clinical intelligence layer across, again, providers, pharma and, payers.Chai [00:46:33]: And so that's a it's a whole different ballgame that we try to playChai [00:46:36]: In combination with them.Jacob [00:46:37]: But it's like a different layer of scope.Healthcare AI Regulation, Technical Depth, and What Changed Their MindsJacob [00:46:39]: I'm curious, you are both relatively newcomers to healthcare. People have these, there's lots of futuristic healthcare AI takes of “Oh, everything will look different.”, now that you've been in healthcare for a bit, you live at the edge of AI, what have you, changed your mind on around this, as you think about what healthcare looks like in ten, 20 years? Any updates to your mental model from the time being close to the problems?Chai [00:47:02]: One thing that IChai [00:47:04]: Was hesitant about before and it's a common thing when I'm trying to recruit engineers that people ask me around, is definitely oh, healthcare, heavily regulated space. And it is, rightfully so. You want to keep, the patients at the end of the day safe. But one of the interesting things that, is a that surprised me how much it is coming to the company is there's a lot of really favorable regulatory tailwinds as well. Where you think about, government really wants interoperability between all these systems that we talked about and so agents can access this information. The government just in January, the FDA released updated guidance on clinical decision support, what I work on in such a way that they used to have guidance from like 2022 that required you to have, mention all these options and do all these other things but it's a very forward and forward-looking way. And so for me, what's been really cool to work on is this, there's this very special moment both in AI in general, we all know that but there's a special moment also regulatory in healthcare as well.Janie [00:48:05]: One thing I would call out is for the very reasons things are higher stakes or, potentially considered more difficult in healthcare, it's where some of the hardest AI problems will get solved first, just because the bar is so high. When I first joined, I was “Oh, this is where we'll be on the tail end of where, all of the AI innovation will be able to be applied.” But when you think about, zero error evals or multi-step workflows that have really low tolerance, a lot of the innovation will happen here just because we have to or else we can't ship.Jacob [00:48:42]: ‘Cause like in other domains, you'd much rather just solve the 80%-is-good-enough problems firstJanie [00:48:46]: 80/20 doesn't work hereChai [00:48:48]: And building off that, traditionally, there was a bit of stigma that, oh, healthcare companies are not that interesting from a technical perspective or I've seen that or faced that myself. But these are really hard and fun problems from a pure technical perspective beyond just the impact. How do you bring the latency of this thing down and make it really high-quality?Reducing Latency: Clinical Workflows, Agents, and Implementation RealityJacob [00:49:07]: How do you bring the latency of things down?Chai [00:49:10]: Yeah. Yeah. Yeah. So okay, let's answer the latency question. And maybe hopefully not too redundant with some of the things I've said earlier but some part of it is with any latency, you have to like what is, what is really your bottleneck. In a lot of workflows, it's sometimes it's the model itself. And so that's where like our data flywheel, our post-training team and so forth come in so that can you make the models far more efficient. So that's one aspect of latency. But there's whole other aspects of latency where it's okay, on top of that, if you use a constellation of different models, can you use — can you first use like a — it's like thinking fast and slow. Can you use a cheap, fast model that triages and hands it off to a larger model where you get more intelligence and so forth and so all theseChai [00:49:56]: Clever tricks to make it work.Chai [00:49:58]: And by the way, we are totally — we also realize that the parameter frontier is changing and so these tricks will — may not get us to where we want to be in five years but we need to if we want to build a useful product right now.Jacob [00:50:11]: Should we go to the quick-fire or you want to ask more about Abridge? We can stuff everything that's not Abridge into the quick-fireSwyx [00:50:16]: I don't mind. I was — I feel like Janie was on the topic of more long tail stuff, which isSwyx [00:50:21]: Not the eighty/twenty thing and that really matters. And I'll —, if you have any tips or cool stories or just general approaches that have worked for you that's interesting to dig into.Janie [00:50:32]: One of them is even just how we staff our teams looks different than a traditional software engineering team, I'd say.Swyx [00:50:40]: Let's go.Clinician Scientists, Edge Cases, and Evals at ScaleJanie [00:50:41]: We have a bunch of folks with different roles who are clinicians and so we have this role called the clinician scientist and I heard one of our leaders refer to them as mutants recently. But they are people who've had clinical backgrounds, so MDs typically, who are also deeply technical, somewhere, on the spectrum of like a full stack engineer all the way to like extremely scrappy prompter. But having each of these people embedded within our teams instantly raises the bar for everything that we build because not only are they determining, is this product clinically useful but they're deeply embedded in our whole evals process. And so when we talk about LFDs, when we talk about what is our actual evaluation criteria, you don't want Chai or me creating what those are because we don't have clinical background. But is probably unique to Abridge but has been game changing. And when you think about where the puck is going, you have people build with clinical backgrounds who are technical and where AI tools are going, they just becomeJanie [00:51:53]: More and more, critical and like the killers of the team. And so that's one. And then the second is just the scale at which we do evals to catch that long tail up front before anything ever gets into production is something that we've pretty much like really started to fine-tune, both from a scale but when do we know we need to get several hundred versus several thousand offline responses, what helps us make that quick decision and make this less of an art and as much of a science as possible. But that's also been something we've had to tune over time.Swyx [00:52:27]: And you have partners who opted in to give you those evals.Janie [00:52:31]: So we work either internally or with third-party for offline evals and then we have customers who also agree to give us, whether it's like thumbs up, thumbs down to like choose this or that, a lot of data to get us to what is as close to fully confident as possible.Swyx [00:52:51]: The term that comes to mind isSwyx [00:52:53]: Like active learning on things where you're weak. I feel like it's a lost artSwyx [00:52:58]: Is a lot of the polish that comes into doing something like this.Janie [00:53:02]: Really.Chai [00:53:03]: Hundred percent.Lessons from Glean: Technical Foundations and AI App InfrastructureJacob [00:53:04]: Maybe, on a totally unrelated note, Chai, you had a very, storied run at Glean b

Mañanas BLU 10:30 - con Camila Zuluaga
“No es un tema de campaña política”: alcalde de Tierralta sobre Zonas de Ubicación Temporal

Mañanas BLU 10:30 - con Camila Zuluaga

Play Episode Listen Later May 14, 2026 30:26


See omnystudio.com/listener for privacy information.

Mañanas BLU con Néstor Morales
Gobierno defiende modelo de "dejación de armas gradual" y zonas de ubicación temporal en Putumayo

Mañanas BLU con Néstor Morales

Play Episode Listen Later May 14, 2026 14:04


Inicialmente, la zona recibirá a 100 integrantes de la organización, aunque los informes de inteligencia estiman que el grupo cuenta con aproximadamente 1.800 hombres en armas entre Nariño y Putumayo.See omnystudio.com/listener for privacy information.

Stuff You Missed in History Class
Carrington Event

Stuff You Missed in History Class

Play Episode Listen Later May 13, 2026 43:31 Transcription Available


The Carrington Event was a massive geomagnetic storm that happened in 1859. It led to expanded understanding of solar phenomena. Research: “Great Aurora of 1859. Art. XLII – The Great Auroral Exhibition of August 28th to September 4th, 1859.” American Journal of Science. Ser. 2. Vol. 28. July-November 1859. Cardenas, Freddy Moreno et al. “The Grand Aurorae Borealis Seen in Colombia in 1859.” Preprint submitted to Advances in Space Research. August 21, 2015. Cliver, E.W. “The 1859 space weather event: Then and now.” Advances in Space Research. 38 (2006) 119-129. Cliver, E.W. and L. Svalgaard. “The 1859 Solar-Terrestrial Disturbance and the Current Limits of Extreme Space Weather Activity.” Solar Physics. (2004) 224: 407–422. Cliver, Edward W. and William F. Dietrich. “The 1859 space weather event revisited: limits of extreme activity.” J. Space Weather Space Clim. 3 (2013) A31 DOI:10.1051/swsc/2013053 Dobrijevic, Daisy and Andrew May. “The Carrington Event: History's greatest solar storm.” Space.com. 5/20/2022. https://www.space.com/the-carrington-event Giegengack, Robert. “The Carrington Coronal Mass Ejection of 1859.” Proceedings of the American Philosophical Society , DECEMBER 2015, Vol. 159, No. 4. Via JSTOR.https://www.jstor.org/stable/26159195 Green, James L, and Scott Boardsen. “Duration and extent of the great auroral storm of 1859.” Advances in space research : the official journal of the Committee on Space Research (COSPAR) vol. 38,2 (2006): 130-135. doi:10.1016/j.asr.2005.08.054 Green, James L. et al. “Eyewitness Reports of the Great Auroral Storm of 1859.” Submitted to Advances in Space Research. NASA Technical Reports Server (NTRS) 20050210157. 8/5/2005. Haeberle, Tom. “The Carrington Affair!” Amateur Astronomers Association Eyepiece. 9/1/2018. https://aaa.org/2018/09/01/the-carrington-affair/ Hayakawa, Hisashi et al. “Temporal and Spatial Evolutions of a Large Sunspot Group and Great Auroral Storms Around the Carrington Event in 1859.” Space Weather. 8/29/2019. https://agupubs.onlinelibrary.wiley.com/doi/full/10.1029/2019SW002269 Hodgson, R. “On a Curious Appearance Seen in the Sun.” Monthly notices of the Royal Astronomical Society vol. 19-20 (1858-1860). https://academic.oup.com/mnras/article/20/1/15/983497 Hodžić, Jasna. “The Carrington Event of 1859 Disrupted Telegraph Lines. A ‘Miyake Event’ Would Be Far Worse.” JSTOR Daily. 9/7/2023. https://daily.jstor.org/the-carrington-event-of-1859-disrupted-telegraph-lines/ Howard, R.A. (2006). A Historical Perspective on Coronal Mass Ejections. In Solar Eruptions and Energetic Particles (eds N. Gopalswamy, R. Mewaldt and J. Torsti). https://doi.org/10.1029/165GM03 Josefowicz, Diane. “The British Magnetic Scheme (1839-1851): People and Institutions.” Victorian Web. https://victorianweb.org/science/geomagnetism/magneticcrusade.html Kaminski, Isabella. “'The fate of nations and the fall of kingdoms': History's epic theories of what causes aurora.” BBC. 11/16/2025. https://www.bbc.com/future/article/20251114-historys-epic-theories-of-what-causes-aurora Kimball, D.S. “A Study of the Aurora of 1859.” Scientific Report No. 6. NSF Grant No. Y/22.6/327. April 1960. Klein, Christopher. “A Perfect Solar Superstorm: The 1859 Carrington Event.” History. 1/29/2025. https://www.history.com/articles/a-perfect-solar-superstorm-the-1859-carrington-event Marinus Anthony van der Sluijs, Hisashi Hayakawa. “A candidate auroral report in the Bamboo Annals, indicating a possible extreme space weather event in the early 10th century BCE.” Advances in Space Research. Volume 72, Issue 12. 2023. https://doi.org/10.1016/j.asr.2022.01.01 Mills, Virginia. “A message from Alexander von Humboldt.” The Royal Society. 9/23/2019. https://royalsociety.org/blog/2019/09/a-message-from-alexander-von-humboldt/ Muller, C. “The Carrington solar flares of 1859: consequences on life.” Origins of life and evolution of the biosphere : the journal of the International Society for the Study of the Origin of Life vol. 44,3 (2014): 185-95. doi:10.1007/s11084-014-9368-3 Phillips, Tony. “A Warning from History: The Carrington Event Was Not Unique.” Space Weather Archive. 9/1/2020. https://spaceweatherarchive.com/2020/08/30/a-warning-from-history-the-carrington-event-was-not-unique/ Phillips, Tony. “Near Miss: The Solar Superstorm of July 2012.” NASA. 12/22/2014. https://science.nasa.gov/science-research/planetary-science/23jul_superstorm/ C. Carrington, Description of a Singular Appearance seen in the Sun on September 1, 1859, Monthly Notices of the Royal Astronomical Society, Volume 20, Issue 1, November 1859, Pages 13–15, https://doi.org/10.1093/mnras/20.1.13 Starmans, Barbara J. “Carrington Solar Flare of 1859.” The Social Historian. 11/27/2016. https://www.thesocialhistorian.com/carrington-solar-flare-of-1859/ Thompson, D. (2009) The Carrington Event and the Electric Telegraph in Victoria in Museums Victoria Collections https://collections.museumsvictoria.com.au/articles/2880 See omnystudio.com/listener for privacy information.

Hablemos sobre Dios, sobre ti y sobre mí.
No te dejes gobernar por algo temporal

Hablemos sobre Dios, sobre ti y sobre mí.

Play Episode Listen Later May 13, 2026 11:58


Hay temporadas difíciles que revelan mucho de nosotros.Y aunque el cansancio, la presión o la saturación son reales; Dios también quiere enseñarnos a tener dominio sobre cómo reaccionamos en medio de ellas.

Never Left: Our Flag Means Death
104 Temporal_Discounting

Never Left: Our Flag Means Death

Play Episode Listen Later May 12, 2026 81:00


Welcome aboard the Safe Space Ship!   Ariana Perry will be hosting this completely spoiled, totally unofficial, deep dive into Our Flag Means Death every Tuesday!   This week I'm talking to Temporal_Discounting about her amazing fics! Episode Mentions:  Never Left Patreon Temporal_discounting on Ao3   Don't forget to follow us on social media (@NeverLeftPodcast on BlueSky, @NeverLeftPod on Twitter, NeverLeftPodcast on Ig, Never Left on FB), and check out our Pateron.. The links are in our linktree!  Feel free to contact us at neverleftofmd@gmail.com with any thoughts or questions   Please remember to #DontStreamOnMax and #FireDavidZaslav If you want you can also let Netflix, Amazon Prime and Apple + know that you would still love to see Our Flag Means Death on their platforms. #SaveOFMD #AdoptOurCrew   Our artwork was created by Amy Gleason, you can see more of her art @AmysBirdHouse on instagram and in the comic series Mighty Mascots.   Our theme music is Gnossienne 5 by Erik Satie, preformed by La Pianista   Image Description: A lighthouse stands above the inn, wrapped in a purple Kraken tentacle. The text reads "Never Left: Our Flag Means Death"        

mozaic.fm
ep204 Temporal

mozaic.fm

Play Episode Listen Later May 11, 2026 157:15


第 204 回のテーマは Temporal です。9 年近くかけて策定し、やっと Stage 4 になったこの仕様は、これまでの JS における日付処理の積年の課題を解決する API として、注目されています。ところが、仕様が膨大で把握するのも難しく、何が具体的に良くなるのか、自分のプロジェクトではこの API とどう向き合うべきかという点は、これからのエコシステムの課題になるでしょう。今回は「この API はなんで必要だったのか」「これから日付処理はどうしていけばよいのか」について整理し、これからの Web における日付時刻処理の未来について議論しました。 Show Note はこちら: https://mozaic.fm/episodes/204/temporal.html

EM360 Podcast
Why Are Companies Struggling to Integrate AI Models into Business Workflows?

EM360 Podcast

Play Episode Listen Later May 11, 2026 27:18


Podcast: Tech TransformedGuests: Maxim Fateev, Co-Founder and CTO, Temporal Technologies and Cornelia Davis, Developer Advocate, Temporal TechnologiesHost: Kevin Petrie, VP of Research at BARCArtificial Intelligence (AI) models have been breaking ground in the last three years. In the race to boost capabilities month by month among platforms like OpenAI, Anthropic, and Google's Gemini models. However, for many enterprises, the main challenge is not creating AI prototypes; it's ensuring they can reliably support real business processes.In a recent episode of the Tech Transformed podcast, Kevin Petrie, VP of Research at BARC, hosted a discussion with Maxim Fateev, Co-Founder and CTO, Temporal Technologies and Cornelia Davis, Developer Advocate, Temporal Technologies. They talked about why enterprises find it hard to transition AI from experimentation to production and how infrastructure must change to support autonomous systems.Why AI Demos Break in the Real WorldAccording to Davis, many organisations make a common mistake: they focus on the "happy path" during experiments and overlook real-world operational challenges. “We have always ignored the non-functional requirements until we go to prod at our peril,” Davis said. “A lot of our experimentation is so focused on the models that we forget about the non-functional requirements.”This means developers often prioritise model performance but neglect reliability, scaling, and system resilience. Agent frameworks used in experiments—usually lightweight Python or TypeScript libraries—add to the issue.“What you're really building is a highly distributed system that's calling Large Language Models (LLMs) that will be rate-limited… networks are going to go down,” Davis explained. “When we move into prod, we haven't considered scale or instability.”As enterprises expand AI into their workflows, these overlooked details become imperative. A single outage, rate limit, or infrastructure failure can disrupt a complicated workflow that involves multiple AI steps.Also Watch: Developer Productivity 5X to 10X: Is Durable Execution the Answer to AI Orchestration Challenges?What Risks are Surfacing Since the Rise of Agentic Systems?The transition from simple AI workflows to autonomous agents adds a new layer of complexity. Traditional AI applications have predictable flows—such as summarising documents, tagging data, or creating recommendations. In contrast, agentic systems choose tools and decide on actions dynamically.“When we move from non-agentic to agentic, we introduce unpredictability,” Davis said. “The tools and the order they run in are unpredictable. Whether we go through the agentic loop once or a hundred times is unpredictable.”Such unpredictability creates new governance and compliance challenges, especially in regulated industries. “Enterprises are still responsible for predictable outcomes,” Davis noted. “We need stronger audit trails to understand why the agent made the decisions it did.”For enterprises, this means AI systems must ensure traceability, accountability, and compliance, even when decision paths differ from one interaction to another.Why is Durable Execution the New Foundation for Enterprise AIFateev argues that to manage such newly surfacing risks, enterprises need a new architectural layer focused on reliability. His concept, “Durable Execution,” aims to ensure that complex workflows keep running even when infrastructure fails.“You write code as if failures don't exist,” Fateev explained. “If a process crashes, we recover all the state and continue executing.” In practical terms, Durable Execution allows long-running AI workflows to survive interruptions—from network outages to system crashes—without losing progress or data.This is essential as agents start interacting with real systems and taking real actions. “The moment agents start acting on the external world—changing files, submitting orders—you absolutely don't want those things to get lost,” Fateev said.The Temporal co-founder further emphasised that enterprise AI will not completely replace traditional software systems.“You will always have deterministic code,” he said. “You can't imagine banks dynamically deciding what a money transfer means.”Instead, the future architecture will combine deterministic software with agents that interact through controlled tools and reliable communication layers.Also Watch: How Do You Make AI Agents Reliable at Scale?Key TakeawaysAI projects fail in production when non-functional requirements are ignoredAgentic systems bring unpredictability, making governance, traceability, and auditability essential.Lightweight experimentation frameworks aren't suited for enterprise workloads.Durable execution enables reliable AI workflows, ensuring processes continue despite infrastructure failures.Enterprise AI will blend deterministic software with agents.Chapters00:00 Introduction to AI's Impact on Business03:53 Challenges in Integrating AI into Business Workflows13:00 Understanding Non-Functional Requirements in AI19:14 The Role of Orchestration in AI Systems24:26 Exploring Durable Execution in AI Workflows30:28 Future Architectures for Autonomous AI Systems36:05 Key Takeaways for Executives in AI ImplementationFor more information, please visit em360tech.com and temporal.io.To learn more about Temporal and Durable Execution, follow:Temporal LinkedIn: Temporal TechnologiesTemporal X: @TemporalioTemporal YouTube: @TemporalioEM360Tech YouTube: @enterprisemanagement360EM360Tech LinkedIn: @EM360TechEM360Tech X: @EM360Tech#DurableExecution #EnterpriseAI #AIToProduction #AIOrchestration #TemporalTech #AutonomousAgents #SystemReliability #LLMs #TechTransformed #AIWorkflows

Noticentro
Metrobús modificará rutas en líneas 3, 4 y 7 este 10 de mayo

Noticentro

Play Episode Listen Later May 10, 2026 1:38 Transcription Available


Día de las Madres dejará derrama de 2 mil 350 mdp en Yucatán Brugada inaugura dos nuevas canchas en IztapalapaReanudan operaciones 193 buques en Argentina Más información en nuestro Podcast#grc

Sausage of Science
SoS 278: Using a biocultural approach to understand food allergies, consumption patterns, and guidelines with Erin Maxwell (Hosein)

Sausage of Science

Play Episode Listen Later May 9, 2026 43:30


In this episode, Mecca chats with Erin Maxwell (Hosein) about her research on food allergen consumption patterns in the U.S. using NHANES data, gaps in current research, and the value of anthropological approaches for contributing to a more holistic understanding and informing policy/guidelines. They also discuss the evolutionary dual-allergen exposure hypothesis and new, exciting methods for testing the theory. Erin Maxwell (Hosein) is a registered dietitian and human-biology PhD student in Anthropology at the University of North Carolina at Chapel Hill whose work centers on the rising prevalence of food allergies in the United States. Drawing on training in nutrition, food studies, and evolutionary perspectives on health, she studies how early-life feeding practices may shape the development of allergic disease. Her research focuses on maternal and infant nutrition and the early-life origins of allergic conditions using biosocial and nutritional epidemiology approaches. More broadly, she examines how food policy and shifting public health recommendations influence not only nutritional status but also everyday food practices, customs, and beliefs. Contact Erin at hosein@email.unc.edu, https://www.linkedin.com/in/erinhoseinnutrition/ ------------------------------ Find the papers discussed in this episode: Hosein, E. A., Virkud, Y. V., Kim, E. H., Hoke, M. K., Thompson, A. L., & Keet, C. A. (2025). Temporal, Age, and Racial and Ethnic Trends in Allergen Consumption from 2-Day 24-Hour Recalls, NHANES 2003-2023. The journal of allergy and clinical immunology. In practice, 13(10), 2795–2805. https://doi.org/10.1016/j.jaip.2025.07.028 Comment on Stanislaw J. Gabryszewski, Jesse Dudley, Jennifer A. Faerber, Robert W. Grundmeier, Alexander G. Fiks, Jonathan M. Spergel, David A. Hill; Guidelines for Early Food Introduction and Patterns of Food Allergy. Pediatrics November 2025; 156 (5): e2024070516. 10.1542/peds.2024-070516 ------------------------------ Contact the Sausage of Science Podcast and the Human Biology Association: Facebook: facebook.com/groups/humanbiologyassociation/, Website: humbio.org Mecca E. Howe, Host, E-mail: howemecca@gmail.com, LinkedIn: https://www.linkedin.com/in/mecca-howe/

Noticentro
¡Lluvias, granizo y fuertes vientos llegarán a la CDMX!

Noticentro

Play Episode Listen Later May 8, 2026 1:25 Transcription Available


Mexicable III avanza rumbo a Cuatro CaminoEspaña detecta nuevo caso vinculado a hantavirus en crucero Más información en nuestro Podcast#grc

TheTop.VC
($650M+ raised) Temporal Founder, Samar Abbas: #1 Startup Insight – Give the Problem a Name Before You Solve It (Andreessen Horowitz, Sequoia, Index Ventures, Sequoia invested).

TheTop.VC

Play Episode Listen Later May 6, 2026 27:01


Sponsored by Chargebee, subscription and revenue management → check out their startup offer: https://www.chargebee.com/startups - Samar Abbas, Founder of temporal.io https://www.linkedin.com/in/samar-abbas-381997/   - Samar Abbas, co-founder of Temporal.io, shares the journey of building an open-source platform that ensures durable execution of code, allowing developers to focus on business logic instead of handling failures and reliability. - Temporal.io originated from years of experience at companies like Amazon, Microsoft, and Uber, where Samar and his co-founder iterated on workflow and state management systems, eventually creating a new category called "durable execution." - The company's open-source approach led to rapid community adoption, with major companies like Snap using Temporal for mission-critical workloads, validating the product's value and scalability. - Temporal.io monetizes by offering a fully managed cloud service with a consumption-based pricing model, aligning customer costs with the value delivered. - The company has raised significant funding, including a $300M Series D led by Andreessen Horowitz (a16z), with participation from Lightspeed Venture Partners and Sapphire Ventures, reaching a $5B valuation.

php[podcast] episodes from php[architect]
The PHP Podcast 2026.04.30

php[podcast] episodes from php[architect]

Play Episode Listen Later May 1, 2026 72:07


PHP Podcast – April 30, 2026 Hosts: Eric Van Johnson & John Congdon Another fun episode of the PHP Podcast! Here’s what we covered: The Drone Slayer Strikes Eric and John wrapped up a Padres game at beautiful Petco Park in downtown San Diego — and things got weird on the way out. A rogue drone started buzzing around a busy intersection, lingering on a guy on a scooter, before making a fateful attempt to fly in front of Eric’s car. It did not make it. The controller came running out, Eric kept driving, and John has already dubbed him “the drone slayer.” Eric still hasn’t looked at whether his wife’s car got scratched, which feels like the bravest choice of all. Baseball Week Never Ends The reason today’s episode started an hour early? Baseball. John’s week was wall-to-wall: a Tuesday night little league game, the Padres game with Eric on Wednesday, practice Thursday night, the playoff draft reveal Friday, a little league game Saturday, and another Padres game Sunday. Eric pointed out John was wearing his own last name on a jersey to a Padres game, which opened up a whole sidebar on why anyone buys a $200 jersey with a player’s name on it when players change teams every two years anyway. Walking Pneumonia and the Power of the Right Antibiotic John’s week was also scrambled because his son had been diagnosed with regular pneumonia — but after not getting better, a second doctor visit revealed it was actually atypical (walking) pneumonia, which requires a completely different antibiotic. Once on the correct medication, his son bounced back almost immediately. The kid had been pushing himself trying to feel well enough for sixth grade camp, but there’s really no faking it with the wrong treatment. The Archie Situation — AI Standups Gone Sideways Eric has had a rough stretch after Anthropic shut down OpenClaw, the platform that powered their internal Discord bot Archie (a.k.a. Alfred). Archie had been running daily team standups, generating weekly summaries, letting team members tag it with updates throughout the day, and even setting reminders. Everyone got spoiled by it. Since then, attempts to migrate to Ollama — both locally and through the web service — have been plagued by slow response times and dropped messages. Eric is close to pulling the plug and going back to the old manual method, and he’s not happy about it. Claude SSH’d Into Eric’s Server and Fixed Everything For weeks, Eric had been fighting a broken Postiz Docker container — a self-hosted social media scheduling tool he uses to post across platforms. After updates broke it and multiple attempts at a fresh install still left it broken, he dropped the problem in Claude’s lap and explained the whole situation. Claude asked for permission to SSH into the remote server on Eric’s Tailscale network, and Eric said sure. Thirty minutes later, Claude had identified the culprit — a Temporal workflow engine losing its configuration on restart — wrote a fix script, configured the service to reconfigure properly on boot, and even set up a cron job to restart the container on reboot. Eric’s still trying to find that chat to review exactly what it did, but the service is running. GitHub is Getting Hammered by AI Agents GitHub has had a rough patch of outages, and the numbers tell the story: 20 million new repos per month, 1.4 billion commits, 90 million pull requests — with a dramatic spike right at the start of 2026. Part of the culprit? AI agents being unleashed on codebases to automatically open pull requests from backlog tickets. Eric has a client doing exactly this, and while it sounds impressive from the owner’s perspective (“look at all this work getting done!”), the developers on the ground report that a high percentage of those AI-generated PRs require significant human correction before they’re anywhere close to mergeable. The comparison to Reddit’s early explosion — and the one engineer who basically didn’t sleep for two years — felt pretty apt. The GitHub Security Vulnerability Nobody Talked About As if the outages weren’t enough, GitHub quietly disclosed a serious security vulnerability: a specially crafted git push — using malformed options in the push metadata — could allow arbitrary code execution on GitHub’s own servers. Eric had to dig to find the blog post because GitHub was not exactly shouting about it. To their credit, they state that their investigation found no evidence the vulnerability was ever exploited in the wild. But knowing that a specific sequence of bytes in a git push could have handed someone the keys to GitHub’s servers is genuinely unsettling. The Creator of Ghosty Is Leaving GitHub Mitchell Hashimoto — creator of the Ghostty terminal and formerly of HashiCorp — announced he’s leaving GitHub, where he’s been a user since 2008 (user #1299). This comes shortly after the Zig programming language made the same move, also citing reliability concerns. Eric was mildly skeptical of the “announcing I’m leaving” genre of posts, pointing out that GitHub doesn’t especially need your permission to stop using it. Notably, Hashimoto’s post doesn’t say what he plans to use instead. John joined GitHub in 2009, which led to a fun live expedition through his commit history — turns out he got serious about coding right around July 2013, roughly when DiegoDev landed its first client. Update Composer. Like, Right Now. PHP developers tend to set Composer up and forget about it — but there’s been a serious security vulnerability patched in a recent release that you absolutely want. The fix is simple: just run composer self-update. It updates in place and keeps a rollback copy in case anything breaks. While you’re at it, if you have global Composer packages installed, run composer global update to catch those too. Eric noted that Composer should really warn you when you’re significantly behind versions, the way Claude Code does. Until it does, just make a habit of it. Linux Kernel Exploit — Patch Your Servers A CVE was shared in the phparch Discord that affects Ubuntu, Amazon Linux, and Red Hat: a Linux kernel exploit that lets an attacker gain root access with a remarkably small payload — around 732 bytes targeting setuid. It’s a good reminder that the old sysadmin badge of honor (“my server has 5-year uptime, never rebooted”) is the wrong mentality now. With tools like Terraform and infrastructure-as-code, spinning up a freshly patched machine is the move. Keep your operating systems current, especially Linux servers running in production. Holly Built a PHP Tek App — And It’s Already Good Community member Holly built a native attendee app for PHP Tek, available now in beta on iOS (via TestFlight) and Android. You can browse the schedule, select the talks you want to attend, and it’ll warn you if two of your picks are in conflict — a “merge conflict,” as Eric put it. Best of all, it sends push notifications when sessions you’ve favorited get moved or rescheduled, which happens constantly at tech conferences. Eric’s wife installed it without being told anything about it and figured it out on her own — about as good a usability test as you can get. The app is built natively in Swift and Kotlin. Be kind to Holly — this is a gift to the community. PHP Tek in 19 Days + New PHP Architect Merch PHP Tek is nearly here — 19 days out in Chicago. A brand new PHP Architect elephant is coming (tentatively named Holly, after a live-stream vote). Eric also walked through new merch at store.phparch.com: a v-neck version of the classic rainbow PHP Architect shirt, and his personal labor of love — the “I have standards, specifically PSR 0, 1” tee — which he admits has sold exactly zero copies. If the hotel room block is sold out by the time you read this, reach out to the team directly and they’ll see what they can do. Links from the show: Postiz — Open Source Social Media Scheduling GitHub Security Advisory: Remote Code Execution via Git Push Options PHP Tek 2026 — Chicago PHP Architect Store PHP Architect Discord An update on GitHub availability Migrating from GitHub to Codeberg Ghostty Is Leaving GitHub Securing the git push pipeline: Responding to a critical remote code execution vulnerability Composer 2.9.6 fixes Perforce Driver Command Injection Vulnerabilities (CVE-2026-40261, CVE-2026-40176) Copy Fail: 732 Bytes to Root on Every Major Linux Distribution. Host: Eric Van Johnson X: @shocm Mastodon: @eric@phparch.social Bluesky: @ericvanjohnson.bsky.social PHPArch.me: @eric John Congdon X: @johncongdon Mastodon: @john@phparch.social Bluesky: @johncongdon.bsky.social PHPArch.me: @john Streams: Youtube Channel Twitch Connect & Hire PHP Architect Website Twitter/X Mastodon Hire PHP Developers Looking to hire PHP developers? Email support@phparch.com – Joe and the team are available for consulting, infrastructure work, Ansible playbooks, and code review. Partner This podcast is made a little better thanks to our partners Displace Infrastructure Management, Simplified Automate Kubernetes deployments across any cloud provider or bare metal with a single command. Deploy, manage, and scale your infrastructure with ease. https://displace.tech/ PHPScore Put Your Technical Debt on Autopay with PHPScore CodeRabbit Cut code review time & bugs in half instantly with CodeRabbit. Music Provided by Epidemic Sound https://www.epidemicsound.com/ Join Us Live Next Week Youtube Channel Got feedback? Join us on Discord at discord.phparch.com The post The PHP Podcast 2026.04.30 appeared first on PHP Architect.

Galactic Horrors
The Agency Sent A Time Travel Detective 20 Years Forward. He Brought Back A Temporal Parasite

Galactic Horrors

Play Episode Listen Later May 1, 2026 62:18


Part-Time Fanboy Podcast
Part-Time Fanboy Podcast Ep 598 Stephanie Williams Tells a Family Focused Time Travel Tale in Temporal!

Part-Time Fanboy Podcast

Play Episode Listen Later Apr 30, 2026 59:40


Stephanie Williams is a writer who has worked for publishers like Marvel,DC, and IDW on characters such as Moon Girl and Devil Dinosaur, Nubia, My Little Pony, and many, many others. This April saw the debut of her creator owned graphic novel, Temporal, from Mad Cave Studios. Temporal tells the story of a mother who […]

The BCC Club with Sarah Schauer and Kendahl Landreth
Let's Talk About Betrayal Trauma! (Pt. 2)

The BCC Club with Sarah Schauer and Kendahl Landreth

Play Episode Listen Later Apr 29, 2026 105:01


It's time to park it! This week we're tackling betrayal trauma (yet again) and we'll be covering the neuroscience, neurobiology, and psychology as well as some tangentially related research on betrayal, betrayal blindness, and the freeze response. I hope you enjoy this week's communal Schauer, it's a long one - you may leave this episode a bit pruney. We have fun here.

Troubled Minds Radio
The Other Sphinx - One Lion Short of a Threshold

Troubled Minds Radio

Play Episode Listen Later Apr 29, 2026 105:19 Transcription Available


Italian researchers recently claimed 80% confidence in a buried second sphinx at Giza, and the mainstream dismissed it before the news cycle finished. But the question they raised is not new. It goes back to the oldest religious texts we have. The god the sphinx was built to embody was Aker, one of the most ancient deities in the Egyptian pantheon. He was always depicted as two lions named Yesterday and Tomorrow, guarding the eastern and western horizons, holding the boundary between the world of the living and the world of the dead. The Dream Stele placed between the sphinx's paws shows two sphinxes facing each other. A fragmented Middle Kingdom papyrus mentions a second one. Every major ancient civilization that built a cosmology around thresholds built twin guardians at those thresholds: the Assyrian Lamassu, the Sumerian Divine Twins, the scorpion-beings at Mashu, the Zoroastrian Gopatha.Why does the human imagination keep arriving at the same answer when it contemplates the place where worlds meet? What does it mean that the sphinx may be one half of a paired mechanism that was either never completed or was completed and then lost? And what are the implications of four thousand years of civilization oriented around a cosmological instrument that may have been broken before any of us got here? Call in live during the show: 702-857-6939 Full archive of 1,100+ episodes:troubledminds.org

In Focus by The Hindu
India's temporal inequality: Why the poorer you are, longer the queue

In Focus by The Hindu

Play Episode Listen Later Apr 29, 2026 32:14


We all know that India is one of the most unequal societies on the planet. But most debates around inequality are focussed on wealth and income inequality. But there is another form of inequality that isn't talked about much – temporal inequality. How much time a person spends waiting – determines how much time she has for other life-critical activities. In India, it is the poor who spend more time waiting in queues – waiting for essential services like healthcare, rations, and in government offices. Digitisation was supposed to fix this by cutting waiting time. But has it done so? Or has ‘Digital India' benefited the already time-rich, while further depleting the resources of the poor? Is this lopsided digital governance by accident, or by design? Guest: Ankush Pal, a sociologist trained at the London School of Economics, who works on urban spatiality, caste epistemology, and social movements. Learn more about your ad choices. Visit megaphone.fm/adchoices

Life Logic
Echo of Static | The Device That Records Tomorrow

Life Logic

Play Episode Listen Later Apr 24, 2026 24:23


What if your smart home didn't just listen… it remembered your future? In this chilling 23-minute horror story, a woman fleeing her past moves into a perfect smart home that begins predicting — and pulling — something terrifying through time itself. Smart home horror meets psychological terror in this original tale of digital dread, temporal glitches, and the ultimate invasion of privacy.Listener-submitted concept brought to terrifying life. If you love Creepypasta, NoSleep, and stories about technology gone wrong, this one will leave your devices feeling unsafe.Timestamps below Subscribe for weekly horror stories that blur the line between our world and what waits beyond the static.The Black Book is your gateway into the unknown. We craft chilling horror stories, bone-chilling thrillers, spine-tingling mysteries, and terrifying true-inspired tales that linger long after the video ends. Every story pulls you deep into the darkness—where shadows whisper, secrets unravel, and nothing is ever quite what it seems. From psychological horror and supernatural nightmares to suspenseful crime mysteries and unsettling urban legends, we deliver immersive narrated stories designed to keep you on the edge of your seat… or wide awake at 3 AM. If you love getting lost in the macabre, the eerie, and the unexplained, you've found your new favorite channel. New stories every week. #CreepypastaPerfect for late-night listening… if you dareTurn off the lights.Lean in close. And step From The Other Side.Subscribe and join the fear.

FM Mundo
NotiMundo Estelar - Wilson Merino, Suspensión temporal del Metro de Quito, ¿boicot o falta de previsiones?

FM Mundo

Play Episode Listen Later Apr 21, 2026 15:13


NotiMundo Estelar - Wilson Merino, Suspensión temporal del Metro de Quito, ¿boicot o falta de previsiones? by FM Mundo 98.1

Ogletree Deakins Podcasts
Cross-Border Catch-Up: OECD's New Temporal Test for PE Increases Flexibility for Remote Workers

Ogletree Deakins Podcasts

Play Episode Listen Later Apr 20, 2026 9:50


In this episode of our Cross-Border Catch-Up podcast series, Shirin Aboujawde (New York/London) and Maya Barba (San Francisco) discuss the 2025 Model Tax Convention update from the Organization for Economic Cooperation and Development (OECD) and its implications for employers managing cross-border remote work. Maya and Shirin explain the new two-part framework for determining when an employee's remote work location creates a taxable permanent establishment, including the 50 percent working time safe harbor and the qualitative commercial reason test. The speakers also provide practical compliance steps for multinational employers as they address tax, social security, and global mobility issues in an increasingly borderless workforce.

Healthi Talks
Mindful Moments #41 - Designing Your Destiny: The 4 Environments of Success

Healthi Talks

Play Episode Listen Later Apr 20, 2026 8:58


Why discipline isn't enough. Coach Delicia breaks down the Physical, Digital, Emotional, and Temporal environments and how to audit them for a healthier life.

Microbiome Medics
Biological Liquid Gold: Why Breast Milk Isn't Actually for the Baby (It's for the Microbes) with Prof. Chris Stewart

Microbiome Medics

Play Episode Listen Later Apr 15, 2026 99:48


Join Dr. Siobhan McCormack as she welcomes one of the top early-life microbiome scientists in the field, Professor Chris Stewart. Discover the origins of your gut microbiome, the biological superpowers of breast milk, and how lab-grown "organoids" are uncovering surprising ways to protect preterm babies.What We Cover:The Genesis of You: How birth mode and early feeding shape our microbial foundations.Inside the Stewart Lab: Using lab-grown human "organoids" to study host-microbe interactions in real-time.The Magic of HMOs: Why mothers produce complex sugars (Human Milk Oligosaccharides) that feed gut microbes instead of the baby.The "Good" Clostridium: Groundbreaking research on a specific Clostridium perfringens strain that thrives on HMOs to protect preterm infants from gut diseases.Probiotics & Preemies: Navigating FDA regulations and the future of personalized medicine in neonatal care.About Professor Stewart leads human microbiome research at Newcastle University. His lab combines computational biology with innovative wet-lab human organoid models to study global health issues, focusing primarily on the early-life gut microbiome and protecting vulnerable infants.Connect with the Stewart Lab:Newcastle University ProfileGoogle ScholarTwitterNewcastle Neonatal Nutrition and NEC Research (N4)Scientific References & Further Reading:Stewart CJ, et al. (2018). Temporal development of the gut microbiome in early childhood from the TEDDY study. Nature, 562(7728):583–8.Chapman JA, et al. (2026). Clostridia from preterm infants metabolize human milk oligosaccharides to suppress pathobionts and modulate intestinal function in organoids. Nat Microbiol, 1–20.Masi AC, et al. (2021). Human milk oligosaccharide DSLNT and gut microbiome in preterm infants predicts necrotising enterocolitis. Gut, 70(12):2273–82.Beck LC, et al. (2022). Strain-specific impacts of probiotics are a significant driver of gut microbiome development in very preterm infants. Nat Microbiol, 7(10):1525–35.UNICEF UK: Breastfeeding in the UKThis podcast is brought to you in collaboration with the British Society of Lifestyle Medicine.Disclaimer:The content in this podcast is not intended to be a substitute for professional medical advice, diagnosis, or treatment. Always seek the advice of your doctor or qualified healthcare provider. Never disregard professional medical advice or delay in seeking it because of something you have heard on this podcast.

Robert Lewis Sermons
Becoming Single-Minded

Robert Lewis Sermons

Play Episode Listen Later Apr 13, 2026 54:15


Guided Question: How does Paul's teaching in 1 Corinthians 7 shape our understanding of singleness, marriage, and undistracted devotion to Christ, and what practical applications does it have for Christians today? Summary: Dr. Robert Lewis explores 1 Corinthians 7 with a focus on singleness, showing how Paul's guidance offers both theological insight and practical wisdom. The single life, Paul argues, has distinct advantages over married life, particularly in the ability to devote oneself fully to Christ without divided interests. While marriage is not condemned, the single life allows greater flexibility, opportunity, and focus for spiritual service, and may even be the ideal lifestyle for some. Dr. Lewis emphasizes that singleness should be viewed as a gift and opportunity, not a restriction, and encourages both singles and married Christians to live with eternity in mind, subordinating temporal concerns to their devotion to God. The sermon also touches on widows, showing that older widows who dedicate their lives to Christ serve as examples of purpose and vitality, inspiring the church. Throughout, practical illustrations, historical examples, and anecdotes highlight how undistracted devotion can impact individuals and the broader Christian community. Outline: Introduction Recognition of the increasing number of singles in the church. Importance of addressing singles directly (1 Corinthians 7). Paul's Instructions on Singleness “Now concerning virgins…”: Paul's opinion is trustworthy, guided by the Spirit. Conditions for remaining single (verses 25–35): Temporal considerations: “present distress” in the first-century church. Theological considerations: “time has been shortened,” eternal perspective prioritizes Christ over marriage. Advantages of Singleness Undistracted devotion to the Lord (verses 32–34). Flexibility and freedom to serve. Opportunity to focus on prayer, study, ministry, and service without divided interests. Practical Illustrations Daily life comparisons between singles and married individuals. Anecdotes emphasizing the freedom and productivity of single life. Historical examples of influential singles (Joseph, C.S. Lewis, John R. Stott, Billy Graham's mentors, etc.). Conditions for Choosing Singleness Verse 36–38: Control: ability to resist sinful desires. Conviction: firm decision in one's heart to remain single for undistracted devotion. Marriage is permissible if control or conviction is lacking. Widows and Dedication to Christ 1 Timothy 5:9: Recognition of widows committed to service. Older widows serve as examples of purpose, vitality, and ongoing mission. Conclusion & Practical Application Singles: embrace advantages, focus on God, use time and energy wisely. Married individuals: remember marriage is temporary in light of eternity. Life's ultimate focus should be Jesus Christ, not temporal concerns. Encouragement to serve God undistractedly, impacting church and world. Key Takeaways: Singleness is a gift and an opportunity for spiritual focus and ministry. Marriage is not wrong, but it inherently divides attention between family and God. Temporal concerns (career, wealth, social expectations) should not overshadow devotion to Christ. Undistracted devotion requires both control over desires and conviction in one's heart. Widows who dedicate their lives to God exemplify purposeful living beyond marriage. Historical examples demonstrate the lasting impact of single individuals in God's kingdom. Scripture References: 1 Corinthians 7:1–40 – Paul's instructions on marriage and singleness. Hebrews 10:32–34 – The “present distress” of first-century Christians. Matthew 22:30 – Marriage does not exist in the resurrection. 2 Peter 3:15 – Paul's letters recognized as Scripture. 1 Timothy 5:9 – Guidelines for recognizing widows dedicated to Christ. Recorded 10/11/81

No Sharding - The Solana Podcast
The Constellation Debate, Part 1 with Ben Coverston (Temporal)

No Sharding - The Solana Podcast

Play Episode Listen Later Apr 9, 2026 86:49


In this first part in a series of debates on Constellation (Anza's MCP proposal), Austin chats with Ben from Temporal. The conversation focuses on Ben's measured critical perspective, highlighting his concerns around the proposal's trade-offs, including added complexity, potential unintended effects on market structure, and whether alternative approaches could achieve similar gains in censorship resistance and throughput. They dig into the proposal's underlying motivation, debate whether current limitations in Solana's architecture are being correctly diagnosed, and explore what these design decisions could mean for the next generation of on-chain trading and market infrastructure. Hosted by Simplecast, an AdsWizz company. See pcm.adswizz.com for information about our collection and use of personal data for advertising.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Extreme Harness Engineering for Token Billionaires: 1M LOC, 1B toks/day, 0% human code, 0% human review — Ryan Lopopolo, OpenAI Frontier & Symphony

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 7, 2026 72:43


We're proud to release this ahead of Ryan's keynote at AIE Europe. Hit the bell, get notified when it is live! Attendees: come prepped for Ryan's AMA with Vibhu after.Move over, context engineering. Now it's time for Harness engineering and the age of the token billionaires.Ryan Lopopolo of OpenAI is leading that charge, recently publishing a lengthy essay on Harness Eng that has become the talk of the town:In it, Ryan peeled back the curtains on how the recently announced OpenAI Frontier team have become OpenAI's top Codex users, running a >1m LOC codebase with 0 human written code and, crucially for the Dark Factory fans, no human REVIEWED code before merge. Ryan is admirably evangelical about this, calling it borderline “negligent” if you aren't using >1B tokens a day (roughly $2-3k/day in token spend based on market rates and caching assumptions):Over the past five months, they ran an extreme experiment: building and shipping an internal beta product with zero manually written code. Through the experiment, they adopted a different model of engineering work: when the agent failed, instead of prompting it better or to “try harder,” the team would look at “what capability, context, or structure is missing?”The result was Symphony, “a ghost library” and reference Elixir implementation (by Alex Kotliarskyi) that sets up a massive system of Codex agents all extensively prompted with the specificity of a proper PRD spec, but without full implementation:The future starts taking shape as one where coding agents stop being copilots and start becoming real teammates anyone can use and Codex is doubling down on that mission with their Superbowl messaging of “you can just build things”.Across Codex, internal observability stacks, and the multi-agent orchestration system his team calls Symphony, Ryan has been pushing what happens when you optimize an entire codebase, workflow, and organization around agent legibility instead of human habit.We sat down with Ryan to dig into how OpenAI's internal teams actually use Codex, why the real bottleneck in AI-native software development is now human attention rather than tokens, how fast build loops, observability, specs, and skills let agents operate autonomously, why software increasingly needs to be written for the model as much as for the engineer, and how Frontier points toward a future where agents can safely do economically valuable work across the enterprise.We discuss:* Ryan's background from Snowflake, Brex, Stripe, and Citadel to OpenAI Frontier Product Exploration, where he works on new product development for deploying agents safely at enterprise scale* The origin of “harness engineering” and the constraint that kicked off the whole experiment: Ryan deliberately refused to write code himself so the agent had to do the job end to end* Building an internal product over five months with zero lines of human-written code, more than a million lines in the repo, and thousands of PRs across multiple Codex model generations* Why early Codex was painfully slow at first, and how the team learned to decompose tasks, build better primitives, and gradually turn the agent into a much faster engineer than any individual human* The obsession with fast build times: why one minute became the upper bound for the inner loop, and how the team repeatedly retooled the build system to keep agents productive* Why humans became the bottleneck, and how Ryan's team shifted from reviewing code directly to building systems, observability, and context that let agents review, fix, and merge work autonomously* Skills, docs, tests, markdown trackers, and quality scores as ways of encoding engineering taste and non-functional requirements directly into context the agent can use* The shift from predefined scaffolds to reasoning-model-led workflows, where the harness becomes the box and the model chooses how to proceed* Symphony, OpenAI's internal Elixir-based orchestration layer for spinning up, supervising, reworking, and coordinating large numbers of coding agents across tickets and repos* Why code is increasingly disposable, why worktrees and merge conflicts matter less when agents can resolve them, and what it really means to fully delegate the PR lifecycle* “Ghost libraries”, spec-driven software, and the idea that a coding agent can reproduce complex systems from a high-fidelity specification rather than shared source code* The broader future of Frontier: safely deploying observable, governable agents into enterprises, and building the collaboration, security, and control layers needed for real-world agentic workRyan Lopopolo* X: https://x.com/_lopopolo* Linkedin: https://www.linkedin.com/in/ryanlopopolo/* Website: https://hyperbo.la/contact/Timestamps00:00:00 Introduction: Harness Engineering and OpenAI Frontier00:02:20 Ryan's background and the “no human-written code” experiment00:08:48 Humans as the bottleneck: systems thinking, observability, and agent workflows00:12:24 Skills, scaffolds, and encoding engineering taste into context00:17:17 What humans still do, what agents already own, and why software must be agent-legible00:24:27 Delegating the PR lifecycle: worktrees, merge conflicts, and non-functional requirements00:31:57 Spec-driven software, “ghost libraries,” and the path to Symphony00:35:20 Symphony: orchestrating large numbers of coding agents00:43:42 Skill distillation, self-improving workflows, and team-wide learning00:50:04 CLI design, policy layers, and building token-efficient tools for agents00:59:43 What current models still struggle with: zero-to-one products and gnarly refactors01:02:05 Frontier's vision for enterprise AI deployment01:08:15 Culture, humor, and teaching agents how the company works01:12:29 Harness vs. training, Codex model progress, and “you can just do things”01:15:09 Bellevue, hiring, and OpenAI's expansion beyond San FranciscoTranscriptRyan Lopopolo: I do think that there is an interesting space to explore here with Codex, the harness, as part of building AI products, right? There's a ton of momentum around getting the models to be good at coding. We've seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you're trying to.Build a user journey that you're trying to solve into code. It's pretty natural to use the Codex Harness to solve that problem for you. It's done all the wiring and lets you just communicate in prompts. To let the model cook, you have to step back, right? Like you need to take a systems thinking mindset to things and constantly be asking, where is the Asian making mistakes?Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I'm putting in place. So I have solved this part of the SDLC.swyx: [00:01:00] All right.[00:01:03] Meet Ryan swyx: We're in the studio with Ryan from OpenAI. Welcome.Ryan Lopopolo: Hi,swyx: Thanks for visiting San Francisco and thanks for spending some time with us.Ryan Lopopolo: Yeah, thank you. I'm super excited to be here.swyx: You wrote a blockbuster article on harness engineering. It's probably going to be the defining piece of this emerging discipline, huh?Ryan Lopopolo: Thank you. It is it's been fun to feel like we've defined the discourse in some sense.swyx: Let's contextualize a little bit, this first podcast you've ever done. Yes. And thank you for spending with us. What is, where is this coming from? What team are you in all that jazz?Ryan Lopopolo: Sure, sure.Ryan Lopopolo: I work on Frontier Product Exploration, new product development in the space of OpenAI Frontier, which is our enterprise platform for deploying agents safely at scale, with good governance in any business. And. The role of VMI team has been to figure out novel ways to deploy our models into package and products that we can sell as solutions to enterprises.swyx: And you have a background, I'll just squeeze it in there. Snowflake, brick, [00:02:00] stripe, citadel.Ryan Lopopolo: Yes. Yes. Same. Any kind of customerswyx: entire life. Yes. The exact kind of customer that you want to,Vibhu: so I'll say, I was actually, I didn't expect the background when I looked at your Twitter, I'm seeing the opposite.Stuff like this. So you've got the mindset of like full send AI, coding stuff about slop, like buckling in your laptop on your Waymo's. Yes. And then I look at your profile, I'm like, oh, you're just like, you're in the other end too. Oh, perfect. Makes perfect.Ryan Lopopolo: I it's quite fun to be AI maximalist if you're gonna live that persona.Open eye is the place to do it. And it'sswyx: token is what you say.Ryan Lopopolo: Yeah. Certainly helps that we have no rate limits internally. And I can go, like you said, full send at this stay.swyx: Yeah. Yeah. So the Frontier, and you're a special team within O Frontier.Ryan Lopopolo: We had been given some space to cook, which has been super, super exciting.[00:02:47] Zero Code ExperimentRyan Lopopolo: And this is why I started with kind of a out there constraint to not write any of the code myself. I was figuring if we're trying to make agents that can be deployed into end to enterprises, they should be [00:03:00] able to do all the things that I do. And having worked with these coding models, these coding harnesses over 6, 7, 8 months, I do feel like the models are there enough, the harnesses are there enough where they're isomorphic to me in capability and the ability to do the job.So starting with this constraint of I can't write the code meant that the only way I could do my job was to get the agent to do my job.Vibhu: And like a, just a bit of background before that. This is basically the article. So what you guys did is five months of working on an internal tool, zero lines of code over a mi, a million lines of code in the total code base.You say it was cenex, more like it was cenex faster than you would've. If you had done it by end. SoRyan Lopopolo: yeah, thatVibhu: was the mindset going into this, right?Ryan Lopopolo: That's right.[00:03:46] Model Upgrades LessonsRyan Lopopolo: Started with some of the very first versions of Codex CLI, with the Codex Mini model, which was obviously much less capable than the ones we have today.Which was also a very good constraint, right? Quite a visceral feeling to ask the [00:04:00] model to build you a product feature. And it just not being able to assemble the pieces together.Which kind of defined one of the mindsets we had for going into this, which is whenever the model just cannot, you always pop open at the task, double click into it, and build smaller building blocks that then you can reassemble into the broader objective.And it was quite painful to do this. Honestly, the first month and a half was. 10 times slower than I would be. But because we paid that cost, we ended up getting to something much more productive than any one engineer could be because we built the tools, the assembly station for the agent to do the whole thing.[00:04:43] Model Generations, Build Systems & Background ShellsRyan Lopopolo: But yeah, so onward to G BT 5, 5, 1, 5, 2, 5, 3, 5 4. To go through all these model generations and see their kind of corks and different working styles also meant we had to adapt the code base to change things up when the model was revved. [00:05:00] One interesting thing here is five two, the Codex harness at the time did not have background shells in it, which means we were able to rely on blocking scripts to perform long horizon work.But with five, three and background shells, it became less patient, less willing to block. So we had to retool the entire build system to complete in under a minute and. This is not a thing I would expect to be able to do in a code base where people have opinions. But because the only goal was to make the Asian productive over the course of a week, we went from a bespoke make file build to Basil, to turbo to nx and just left it there because builds were fast at that point.swyx: Interesting. Talk more about Turbo TenX. That's interesting ‘cause that's the other direction that other people have been doing.Ryan Lopopolo: Ultimately I have. Not a lot of experience with actual frontend repo architecture.swyx: You're talking that Jessica built the sky. So I'm like, I know the NX team. I know Turbo from Jared [00:06:00] Palmer.And I'm like, yeah, that's an interesting comparison.[00:06:02] One Minute Build LoopRyan Lopopolo: The hill we were climbing right, was make it fast.swyx: Is there a micro front end involved? Is it how how complex reactRyan Lopopolo: electron base single app sort of thingswyx: And must be under a minute. That's an interesting limitation. I'm actually not super familiar with the background shelf stuff.Probably was talked about in the fight three release.Ryan Lopopolo: BA basically means that codex is able to spawn commands in the background and then go continue to work while it waits for them to finish. So it can spawn an expensive build and then continue reviewing the code, for example.swyx: Yeah.Ryan Lopopolo: And this helps it be more time efficient for the user invoking the harness.swyx: And I guess and just to really nail this, like what does one minute matter? Like why not five, okay, good. We want no. WeRyan Lopopolo: want the inner loop to be as fast as possible. Okay. One minute was just a nice round number and we were able to hit it.swyx: And if it doesn't complete, it kills it or some something,Ryan Lopopolo: No.We just take that as a signal that we need to stop what we're doing, double click, decompose a build graph a bit to get us to high back under so that we [00:07:00] can able the agent continue to operate.swyx: It's almost like you're, it's like a ratchet. It's like you're forcing build time discipline, because if you don't, it'll just grow and grow.That's right. And you mentioned that my current, like the software I work on currently is at 12 minutes. It sucks.Ryan Lopopolo: This has been my experience with platform teams in the past, where you have an envelope of acceptable build times and you let it go up to breach and then you spend two, three weeks to bring it back down to the lower end of the average low bed stop.But because tokens are so cheap Yeah. And we're so insanely parallel with the model, we can just constantly be gardening this thing to make sure that we maintain these in variants, which means. There's way less dispersion in the code and the SDLC, which means we can simplify in a way and rely on a lot more in variance as we write the software.[00:07:45] Observability, Traces & Local Dev StackVibhu: Lovely.[00:07:46] Humans Are BottleneckVibhu: You mentioned in your article, like humans became the bottleneck, right? You kicked off as a team of three people. You're putting out a million line of code, like 1500 prs, basically. What's the mindset there? So as much as code is disposable, you're doing a lot of review. A lot [00:08:00] of the article talks about how you wanna rephrase everything is prompting everything, is what the agent can't see.It's kind of garbage, right? You shouldn't have it in there. So what's like the high level of how you went about building it, and then how you address okay, humans are just PR review. Like how is human in the loop for this?Ryan Lopopolo: We've moved beyond even the humans reviewing the code as well.[00:08:19] Human Review, PR Automation & Agent Code ReviewRyan Lopopolo: Most of the human review is post merge at this point.But post, post merge, that's not even reviewed. That's justswyx: Oh, let's just make ourselves happy by YouRyan Lopopolo: haven't used fundamentally. The model is trivially paralyzable, right? As many GPUs and tokens as I am willing to spend, I can have capacity to work with my hood base.The only fundamentally scarce thing is the synchronous human attention of my team. There's only so many hours in the day we have to eat lunch. I would like to sleep, although it's quite difficult to, stop poking the machine because it makes me want to feed it. You have to step back, right?Like you need to take a systems thinking mindset to things and [00:09:00] constantly be asking where is the agent making mistakes? Where am I spending my time? How can I not spend that time going forward? And then build confidence in the automation that I'm putting in place. So I have solved this part of the SDLC, and usually what that has looked like is like we started needing to pay very close attention to the code because the agent did not have the right building blocks to produce.Modular software that decomposed appropriately that was reliable and observable and actually accrued a working front end in these things, right?[00:09:35] Observability First SetupRyan Lopopolo: So in order to not spend all of our time sitting in front of a terminal at most, doing one or two things at a time, invested in giving the model that observability, which is that that graph in the post here.swyx: Yeah. Let's walk through this traces and which existed firstRyan Lopopolo: we started with just the app and the whole rest of it. From vector through to all these login metrics, APIs was, I dunno, half an [00:10:00] afternoon of my time. We have intentionally chosen very high level fast developer tools. There's a ton of great stuff out there now.We use me a bunch, which makes it trivial to pull down all these go written Victoria Stack binaries in our local development. Tiny little bit of python glue to spin all these up. And off you go. One neat thing here is we have tried to invert things as much as possible, which is instead of setting up an environment to spawn the coding agent into, instead we spawn the coding agent, like that's the entry point.It's just Codex. And then we give Codex via skills and scripts the ability to boot the stack if it chooses to, and then tell it how to set some end variables. So the app and local Devrel points at this stack that it has chosen to spin up. And this I think is like the fundamental difference between reasoning models and the four ones and four ohs of the past, where these models could not think so you had to put them in [00:11:00] boxes with a predefined set of state transitions.Whereas here we have the model, the harness be the whole box. And give it a bunch of options for how to proceed with enough context for it to make intelligent choices. SoVibhu: sales, so like a lot of that is around scaffolding, right? Yes. Previous agents, you would define a scaffold. It would operate in that.Lube, try again. That's pivoted off from when we've had reasoning models. They're seeming to perform better when you don't have a scaffold, right? That's right.[00:11:28] Docs Skills GuardrailsVibhu: And you go into like niches here too, like your SPEC MD and like having a very short agent MG Agent md.swyx: Yes. Yes.Vibhu: Yeah. So you even lay out what it is here, but I likeswyx: the table contents.Vibhu: Yeah.swyx: Like stuff like this, it really helps guide people because everyone's trying to do this.Ryan Lopopolo: This structure also makes it super cheap to put new content into the repository to steer both the humans and the agents.swyx: You, you reinvented skills, right?Vibhu: One big agents andswyx: skills from first princip holdsRyan Lopopolo: all skills did not exist when we started doing this.Vibhu: You have a short [00:12:00] one 100 line overall table of contents and then you have little skills, right? Core beliefs, MD tech tracker. Yeah. Yeah. The scale is overRyan Lopopolo: The tech jet tracker and the quality score are pretty interesting because this is basically a tiny little scaffold, like a markdown table, which is a hook for Codex to review all the business logic that we have defined in the app, assess how it matches all these documented guardrails and propose follow up work for itself.Before beads and all these ticketing systems, we were just tracking follow up work as notes in a markdown file, which, we could spa an agent on Aron to burn down. There's this really neat thing that like the models fundamentally crave text. So a lot of what we have done here is figure out ways to inject textswyx: intoRyan Lopopolo: the system right when we get a page, because we're missing a timeout, for example.I can just add Codex in Slack on that page and say, I'm gonna fix this by adding a timeout. Please update our reliability documentation. To require that all network calls have [00:13:00] timeouts. So I have not only made a point in time fix, but also like durably encoded this process knowledge around what good looks like.swyx: Yeah.Ryan Lopopolo: And we give that to the root coding agent as it goes and does the thing. But you can also use that to distill tests out of, or a code review agent, which is pointed at the same things to narrow the acceptable universe of the code that's produced.swyx: I think one of the concerns I have with that kind of stuff is you think you're making the right call by making, it's persisted for all time across everything.Yes. But then you didn't think about the exceptions that you need to make, right? And that you have to roll it back.Vibhu: Part of it isswyx: also sometimes it can follow your s instructions too.Vibhu: It's somewhat a skill, right? So it determines when it uses the tools, right? Like it's not like it'll run outta every call.It'll determine when it wants to check quality score, right?Ryan Lopopolo: Yeah. And we do in the prompts we give these agents, allow them to push back,[00:13:51] Agent Code Review RulesRyan Lopopolo: When we first started adding code review agents to the pr, it would be Codex, CLI. Locally writes the change, pushes up a PR on [00:14:00] those PR synchronizations of review agent fires.It posts a comment. We instruct Codex that it has to at least acknowledge and respond to that feedback. And initially the Codex driving the code author was willing to be bullied by the PR reviewer, which meant you could end up in a situation where things were not converging. So yeah, we had to,swyx: he's just a thrash.Ryan Lopopolo: We had to add more optionality to the prompts on both of these things, right? The reviewer agents were instructed to bias toward merging the thing to not surface anything greater than a P two in priority. We didn't really define P two, but we gave it, youswyx: did define P two.Ryan Lopopolo: We gave it a framework within which to score its outputswyx: and then greater than P zero is worse, right?Yes. P two is very good.Ryan Lopopolo: P zero is you will mute the code place ifswyx: you merch thisRyan Lopopolo: thing, right?swyx: Yeah.Ryan Lopopolo: But also on the code authoring agent side, we also gave it the flexibility to either defer or push back against review feedback, right? This happens all the time, right? Like I happen to notice something and leave a code review, [00:15:00] which.Could blow up the scope by a factor of two. I usually don't mean for that to be addressed Exactly. In the moment. It's more of an FYI file it to the backlog, pick it up in the next fix it week sort of thing. And without the context that this is permissible, the coding agents are gonna bias toward what they do, which is following instructions.swyx: Yeah.[00:15:19] Autonomous Merging Flowswyx: I do wanted to check in on a couple things, right? Sure. All the coding review agent, it can merge autonomously. I think that's something that a lot of people aren't comfortable with. And you have a list here of how much agents do they do Product code and tests, CI configuration and release tooling, internal Devrel tools, documentation eval, harness review, comments, scripts that manage the repository itself, production dashboard definition files, like everything.Yes. And so they're just all churning at the same time, is there like a record that, that any human on the team pulls to stop everythingRyan Lopopolo: Because we are building a native application here. We're not doing continuous deploy. So there's still a human in the loop for cutting the release branch.I see. We require a blessed [00:16:00] human approved smoke test of the app before we promote it to distribution, these sort of things.swyx: So you're working on the app, you're not building like infrastructure where you have like nines of reliability, that kinda stuff?Ryan Lopopolo: That's correct. That's correct. Okay. And also like full recognition here that all of this activity took in a completely greenfield repository.There's. Should be no script that this applies generally toswyx: this is a production thing, you're gonna shipRyan Lopopolo: toswyx: customers. Of course. Yeah, of course. So this is realVibhu: And like one of the things there is, you mentioned you started this as a repo from scratch. The onboarding first month or so was pretty, it was like working backwards, right?Yeah. And then you had to work with the system and now you're at that point where you know, you're very autonomous. I'm curious like, okay, so what, how human in the loop is it? So what are the bottlenecks that you wish you could still automate? And part of that is also like, where do you see the model trajectory improving and offloading more human in the loop?We just got 5.4. It's a really good,Ryan Lopopolo: fantastic model, by the way.Vibhu: Yeah. Yeah. It's the first one that's merged. Top tier coding. So it's codex level coding and reasoning. So general reasoning both in one model. SoRyan Lopopolo: andVibhu: computer [00:17:00] use vision.Ryan Lopopolo: Now we now with five four, I can just have Codex write the blog post, whereas for this one I had to balance between chat.swyx: Oh, I need to, I might be out of a job. Oh my God.Ryan Lopopolo: Oh,swyx: I know. You just gave me an idea for a completely AI newsletter that five four could do. Yeah, I get it Now.Ryan Lopopolo: This sort of thing is just one example of closing the loop, right? Like the dashboard thing you mentioned. We have Codex authoring the Js ON, for the Grafana dashboards and publishing them and also responding to the pages, which means when it gets the page, it knows exactly which dashboards are defined and what alerts.What alert was triggered by which exact log in the code base. ‘cause all of this stuff is collated together.swyx: It has to own everything.Yes. Yeah. Yeah.Ryan Lopopolo: And it means that if we have an outage that did not result in a page. It has the existing set of dashboards available to it. It has the existing set of metrics and logs and can figure out where the gaps in the dashboard are or [00:18:00] in the underlying metrics and fix them in one go.In the same way, you would have a full stack engineer be able to drive a feature from the backend all the way to the front end.Vibhu: So it, it seems like a lot of the work you guys had to do was you as a small team are fully working for a way that the model wants the software to be written. It's like less human legible for better. Code legibility, agent legibility. How do you think that affects broader teams? So one at OpenAI, do liaison, like this is how software should be written. Like I can imagine, say you join a new team with this methodology, this mindset there's ways that, teams do code review, teams write code, like teams are structured and a lot of it is for human legibility.So should we all swap? Like how does this play back one broader into OpenAI and then like broader into the software engineering, right? Is it like teams that pick this up will it's pretty drastic, right? You have to make a pretty big switch. Should they just full send Yeah.Ryan Lopopolo: The mindset is very much that I'm removed from the process, right? I can't really have deep code level opinions about [00:19:00] things. It's as if I'm. Group tech leading a 500 person organization.Vibhu: Yeah.Ryan Lopopolo: Like it's not appropriate for me to be in the weeds on every pr. This is why that post merge code review thing is like a good analog here, right?Like I have some representative sample of the code as it is written, and I have to use that to infer what the teams are struggling with, where they could use help, where they're already moving quickly and I can pivot my focus elsewhere.Vibhu: Yeah.Ryan Lopopolo: So I don't really have too many opinions around the code as it is written.I do, however, have a command based class, which is used to have repeatable chunks of business logic that comes with tracing and metrics and observability for free. And the thing to focus on is not how that business logic is structured, but that it uses this primitive ‘cause I know that's gonna give leverage by default.Vibhu: Yeah.Ryan Lopopolo: Yeah, back to that sort of systems stinking,Vibhu: and you have part of that in your blog post, enforcing architecture and ta taste how you set boundaries for what's used. There's also a section on redefining [00:20:00] engineering and stuff, but yeah, it's just, it's interesting to hear,Ryan Lopopolo: and as the models have gotten better, they have gotten better at proposing these abstractions to unblock themselves, which again, lets me move higher and higher up the stack to look deeper into the future on what ultimately blocked the team from shipping.swyx: Yeah. You mentioned so you, this is primarily a, it is like a 1 million line of code base electron app. But it manages its own services as well, so it's like a backend for front end type thing.Ryan Lopopolo: We do have a backend in there, but that's hosted in the cloud.Yeah. This sort of structure is actually within the separate main and render processesWithin theswyx: electric.That's just how electronic works.Ryan Lopopolo: Yeah, of course. So have also treated like. MVC style decomposition with the same level of rigor, which has been very fun.swyx: I have a fun pun. This is a tangent, NVC is model view controller. Any sort of full stack web Devrel knows that.But my AI native version of this is Model view Claw, the clause the harness.Ryan Lopopolo: That's right. That's right. I do think that there is an interesting space to [00:21:00] explore here with Codex, the harness as part of building AI products, right? There's a ton of momentum around getting the models to be good at coding.We've seen big leaps in like the task complexity with each incremental model release where if you can figure out how to collapse a product that you're trying to build, a user journey that you're trying to solve into code, it's pretty natural to use the Codex Harness to solve that problem for you. It's done all the wiring and lets you just communicate and prompts to let the model cook.Yeah. It's been very fun. And there's also a very engineering legible way of increasing capabil. It's fantastic, right? Yeah. Just give you, just give the model scripts, the same scripts you would already build for yourself.swyx: Yeah.Yeah. So for listeners, this is Ryan saying that software engineering or coding against will eat knowledge work like the non-coding parts that you would normally think.Oh, you have to build a separate agent for it. No, start a coding agent and go out from there. Which open Claw has like it's pie Underhood.Ryan Lopopolo: [00:22:00] Yes.Vibhu: Basically define your task in code. Everything is a codingswyx: agent by the way. Since I brought it up, it's probably the only place we bring it up. Is any open claw usage from you?Any?Ryan Lopopolo: No. No. Not for me. I don't have any spare Mac Minis rattling around my house.swyx: You can afford it? No. I just, I'm curious if it's changed anything in opening eye yet, but it's probably early days. And then the other, the other thing I, I wanna pull on here is like you mentioned ticketing systems and you mentioned prs and I'm wondering if both those things have to go away or be reinvented for this kind of coding.So the git itself and is like very hostile to multi-agent.Ryan Lopopolo: Yeah. We make very heavy use of work trees.swyx: But like even then, like I just did a, dropped a podcast yesterday with Cursors saying, and they said they're getting rid of work trees ‘cause it still has too many merge conflicts.It's still un too un unintuitive. But go ahead.Ryan Lopopolo: The models are really great at resolving merge conflicts. Yeah. And to get to a state where I'm not synchronously in the loop in my terminal, I almost don't care that there are mergeswyx: with disposable.[00:23:00] Yeah.Ryan Lopopolo: We invoke a dollar land skill and that coaches codex to push the PR Wait for human and agent reviewers Wait for CI to be green.Fix the flakes if there are any merged upstream. If the PR comes into conflict, wait for everything to pass. Put it in the merge queue. Deal with flakes until it's in Maine. End. This is what it means to delegate fully, right? This is in a, very large model re probably a significant tax on humans to get PRS merged, but the agent is more than capable of doing this and I really don't have to think about it other than keep my laptop open.swyx: Yeah. I used to be much more of a control freak, but now I'm like, yeah, actually you could do a better job of this than me. Yeah. With the right context. Yes.[00:23:47] Encoding Requirementsswyx: Anything else in harness in general? Just this piece, I just wanna make sure we,Ryan Lopopolo: I think one thing that I maybe didn't make super clear in the article that I heard on Twitter as an interesting, that's respond [00:24:00]swyx: to them.What's the chatter and then what's your response?Ryan Lopopolo: Ultimately, all the things that we have encoded in docs and tests and review agents and all these things are ways to put all the non-functional requirements of building high scale, high quality, reliable software into a space that prompt injects the agent.We either write it down as docs, we add links where the error messages tell how to do the right thing. So the whole meta of the thing is to basically tease out of the heads of all the engineers on my team, what they think good looks like, what they would do by default, or what they would coach a new hire on the team to do to get things to merch.And that's why we pay attention to all the mistakes, mistakes that the agent makes, right? This is code being written that is misaligned with some as yet not written down, non-functional requirement.swyx: Sorry, what? Did the online people misunderstand orRyan Lopopolo: No,swyx: whatyouRyan Lopopolo: responded to? Somebody just literally said that.I was like, oh yeah,swyx: okay,Ryan Lopopolo: This is the [00:25:00] thing. This is what I've been doing. Oh, youswyx: agree? Yeah. I see. Interesting.Ryan Lopopolo: One other neat thing, which I did totally did not expect is folks were just. Taking the link to the article and giving it to pi or Codex and say, make my repo this,Vibhu: you achi a whole recursion.Ryan Lopopolo: And it was wildly effective. Really? It was wildly effective. NoVibhu: way. It just actually is something I tried with five, four yesterday. I didn't have time. Last time I was like out speaking of something, and this is one of my things, I was like, okay, I have this article. Can we just scaffold out what it would be like to run this?And I, I did it first as that and then I was like, okay, let me take another little side repo and say okay, if I was to fully automate this like this because I haven't written a line of code, it'sRyan Lopopolo: like over full, setVibhu: it right. The side thing I'm doing of voice. TTS I'm just like, slobbing out, whatever.It's nothing production. I'm like, how would I make this like this? And it's actually like a really good way. It's like a good way to learn what could be changed, what could be like, it's just a good analyzing, right? You give it all the codes, you give it all the context, you give it the article and it walks you through it very well.That's right. That's right.[00:25:57] Inlining Dependencies[00:25:57] Dependencies Going Away & Brett Taylor's Responseswyx: I guess one more thing before we go to Symphony is I wanted to cover [00:26:00] Brett Taylor's response. We had him on the show. He is your chairman, which is wild. Yeah. That he's reading your articles as well and like getting engaged in it. He says software dependencies are going away.Basically they can just be like vendored. Yes. Response.Ryan Lopopolo: Aswyx: hundred percent. A hundred percent agree. You still pro qr, you still pay Datadog. You still pay Temporal. Thank you.Ryan Lopopolo: Yep. The level of complexity of the dependencies that we can internalize is, I would say low, medium right now. Just based on model capability.What does the,swyx: what is medium?Ryan Lopopolo: I would say like a. A couple thousand line dependency is a thing that we could in-house No problem. Call in an afternoon of time. One neat thing about it is like probably most of that code you don't even need. Like by in-house and abstraction, you can strip away all the generic parts of it and only focus on what you need to enable the specific thing.Yes. You're building,swyx: I've been calling this the end of b******t plugins.Ryan Lopopolo: Yeah.swyx: Because there's so much when I published an open source thing, I want to accept everything, be liberal. I want to accept, this is post's law, but that means there's so much bloat. Yes. There's so much overhead.Ryan Lopopolo: One other neat thing about [00:27:00] this too is when we deploy Codex Security on the repo, it is able to deeply review and change. The internalized dependencies in a much lower friction way than it would be to like, push patches upstream, wait for them to be released, pull them down, make sure that's compatible with all the transitive I have in my repo and things like that.So it's also much lower friction to internalize some of these things if code is free. ‘cause the tokens are cheap sort of thing.swyx: Yeah. Yeah. I think like the only argument I have against this is basically scale testing, which obviously the larger pieces of software like Linux, MySQL, he calls up even the Datadog and Temporals and then maybe security testing where Yes.Classically, I think, is it linis tos, it said security open source is the best disinfectant.Ryan Lopopolo: Many eyes.swyx: Many eyes. And if inline your dependencies and code them up, you're gonna have to relearn mistakes from other people that Yep.Ryan Lopopolo: Yep. And to internalize that dependency, you're back to zero and you have to start.Reassembling all those bits and pieces to Yeah. Have [00:28:00] high confidence in the code as it is written. Yeah.Vibhu: Even part of the first intro of this, you basically mentioned like everything was written by codex, including internal tooling, right? So internal tooling, like when you're visualizing what's going on it's writing it for itself.swyx: Yeah. I'm built internal tools way I now, and like I just show them off and they're like, how long did you spend? And I didn't spend any time. I just prompted it,Ryan Lopopolo: very funny story here.swyx: Yeah, go ahead.Ryan Lopopolo: We had deployed our app to the first dozen users internally had some performance issues, so we asked them to export a trace for us get a tar ball, gave it to our on-call engineer, and he did a fantastic job of working with Codex to build this beautiful local Devrel tool, next JS app, the drag and drop the tar ball in, and it visualizes the entire trace.It's fantastic. Took an afternoon, but none of this was necessary. Because you could just spin up codex and give it the tar ball and ask the same thing and get the response immediately. So in a way, optimizing for human [00:29:00] legibility of that debugging process was wrong. It kept him in the loop unnecessarily when instead he could have just like Codex cooked for five minutes and gotten this same.swyx: Yeah, you verify your instincts here of this is how we used to do it. Or this is how I would have used to solve it.Ryan Lopopolo: Yeah. In this local observability stack. Like sure, you can de deploy Yeager to visualize the traces, but I wouldn't expect to be looking at the traces in the first place because I'm not gonna write the code to fix them.swyx: Yeah. So basically there needs to be like this kind of house stack and owning the whole loop. I think that is very well established. And it sounds like you might be like sharing more about that in the future, right?Ryan Lopopolo: Yeah. I think we're excited to do[00:29:36] Ghost Libraries Specs[00:29:36] Ghost Libraries & Distributing Software as SpecsRyan Lopopolo: We're gonna talk about Symphony in a little bit, but like the way we distribute it as a spec, which I think folks are calling Ghost Libraries on Twitter.This is like a such a cool name. It does mean it becomes much cheaper to share software with the world, right? You define a spec, how you could build your own specifying as much as is required for a coding agent to reassemble it [00:30:00] locally. The flow here is very cool. Like we have taken. All the scaffolding that has existed in our proprietary repo spun up a new one.Ask Codex with our repo as a reference. Write the spec. We tell it. Spin up a team ox spawn a disconnected codex to implement the spec. Wait for it to be done. Spawn another codex and another team ox to review the spec com or review the implementation compared to upstream and update the spec so it diverges less.And then you just loop over and over Ralph style until you get a spec that is with high fidelity able to reproduce the system as it is. It's fantastic.Vibhu: And you're basically, you're not really adding any of your human bias in there, right? That's correct. A lot of times people write a spec and be like, okay, I think it should be done this way, and you'll riff on something.And it's no, the agent could have just handled it like you're still scaffolding in a sense, right? I want it done this way. It can determine its spec better.swyx: That's right. That's right. Part of me it, I'm, I've been working a lot on evals recently, and part of me is wondering if [00:31:00] an agent can produce a spec that it cannot solve.Is it always capable of things that he can imagine or can you imagine things that it is impossible to do?Ryan Lopopolo: I think with Symphony, we, there's like this there's this axis where you have things that are easier, hard, or established or new, right? And I think things that are hard and new is still something that the models need humans.Yeah. Drive.swyx: Yeah. Yeah.Ryan Lopopolo: But I think those other quadrants are largely salt. Given the right scaffold and the right thing that's gonna drive the agent to completion,swyx: it's crazy that it solved,Ryan Lopopolo: but it means that the humans, the ones with limited time and attention get to work on the hardest stuff, like the problems where it's pure white space out in front. Or like the deepest refactorings where you don't know what the proper shape of the interfaces are. And this is where I wanna spend my time. ‘cause it lets me set up for the next level of scale.swyx: Yeah. Yeah. Amazing. Let's introduce Symphony.I think we've been mentioning it every now and then. Elixir. Interesting option.Ryan Lopopolo: Yeah.swyx: Yeah. I'm not,Ryan Lopopolo: again, like the [00:32:00] elixir manifestation here is just a derivative. Is it a modelswyx: chosen? Yeah.Ryan Lopopolo: Yeah. Yeah. And it chose that because the process supervision and the gen servers are super amenable to the type of process orchestration that we're doing here.You are essentially spinning up little Damons for every task that is in execution and driving it to completion, which. Means the mall gets a ton of stuff for free by using Elixir and the Beam.swyx: I had to go do a crash course in Beam and Elixir, and I think most people are not operating at that scale of concurrency where you need that.But it is a good mental model for Resum ability and all those things. And these are things I care about. But tell me the story, the origin story of Symphony. What do you use it for? Is this, how did it form maybe any abandoned paths that you didn't take?[00:32:46] Terminal Free Orchestration[00:32:46] Symphony: Removing Humans from the LoopRyan Lopopolo: At the end of December we were at about three and a half PRS per engineer per day.This was before five two came out in the beginning of January. Everyone gets back from holiday with five two and no other work [00:33:00] on the repository. We were up in the five to 10 PRS per day per engineer. And I don't know about y'all, but like it's very taxing to constantly be switching like that. Like I was pretty tapped out at the end of the day, again, where are the humans spending their time? They're spending their time context switching between all these active tmox pains to drive the agent forward.swyx: Yeah. No way. Yeah.Ryan Lopopolo: So let's again, build something to remove ourselves from the loop. And this is what frantic sprinted adapt here to find a way to remove the need for the human to sit in front of their terminal.So a lot of experimentation with Devrel boxes and, automatically spinning up agents, like it seems like a fantastic end state here, where my life is beach. I open live twice a day and say yes no to these things. Yeah. And this is again, a super, super interesting framing for how the work is done.Because I become more latency and sensitive. I have [00:34:00] way less attachment to the code as it is written. Like I've had close to zero investment in the actual authorship experience. So if it's garbage. I can just throw it away and not care too much about it. In Symphony, there's this like rework state where once the PR is proposed and it's escalated to the human for review, it should be a cheap review.It is either mergeable or it is not. And if it's not, you move it to rework. The elixir service will completely trash the entire work tree NPR and start it again from scratch. Okay. And this is that opportunity again to say, why was it trash right? What did the agent do that wasswyx: bad. Yeah.Ryan Lopopolo: Fix that before moving the ticket toswyx: endRyan Lopopolo: of progress again.swyx: Yeah. Why is this not in codex app? I guess this, you guys are ahead of Codex app,Ryan Lopopolo: yeah, so the way the team has been working is basically to be as AI pilled as possible and spread ahead. And a lot of the things we have worked on have fallen out [00:35:00] into a lot of the products that we have.Like we were in deep consultation with the Codex team to. Have the Codex app be a thing that exists, right? To have skills be a thing that Codex is able to use. So we didn't have to roll our own to put automations into the product. So all of our automatic refactoring agents didn't have to be these hand rolled control loops.It has been really fantastic to be, in a way, un anchored to the product development of Frontier and Codex and just very quickly try to figure out what works and then later find the scalable thing that can be deployed widely. It's been a very fun way to operate. It's certainly chaotic. I have lost track very often of what the actual state of the code looks like.‘cause I'm not in the loop. There was. One point where we had wired playwright directly up to the Electron app. With MCPM CCPs, I'm pretty bearish on because the harness forcibly injects all those tokens in the [00:36:00] context, and I don't really get a say over it. They mess with auto compaction. The agent can forget how to use the tool.There's probably only what three calls in playwright that I actually ever want to use. So I pay the cost for a ton of things. Somebody vibed a local Damon that boots playwright and exposes a tiny little shim CLI to drive it. And I had zero idea that this had occurred because to me, I run Codex and it's able to, it's oh, it's better.Yeah. Like no knowledge of this at all. Uhhuh.[00:36:30] Multi Human ChaosRyan Lopopolo: So we have had like in human space to spend a lot of time doing synchronous knowledge sharing. We have a daily standup that's 45 minutes long because we almost have to. Fan out the understanding of the current state.swyx: Yeah, I was gonna say this is good for a single human multi-agent, but multi human, multi-agent is a whole like po like explosion of stuff.Ryan Lopopolo: Yeah. And that this is fundamentally why we have such a rigid, like 10,000 [00:37:00] engineer level architecture in the app because we have to find ways to carve up the space so people are not trampling on each other.swyx: Sorry, I don't get the 10,000 thing. Did I miss that?Ryan Lopopolo: The structure of the repository is like 500 NPM packages.It's like architecture to the excess for what you would consider, I think normal for a seven person team. But if every person is actually like 10 to 50. Then the like numbers on being super, super deep into decomposition and sharding and like proper interface boundaries make a lot more sense.swyx: Yeah. To me, that's why I talked about Microfund ends and I, an anex is from that world, but Cool. It is just coming back to, to, to this I dunno if you have other, thoughts on. Orchestrating so much work coin going through this. Is this enough? Is this like any aha moments?Vibhu: It'll be interesting to see like where, okay, so right now you pick linear as your issue tracker, right?swyx: Or it's like a is it actually linear? This is actually linear.[00:37:55] Linear vs Slack WorkflowVibhu: Oh, that's linear. It's linear.swyx: Oh I never looked atVibhu: video. The demo video I had to download to [00:38:00] run.swyx: So I, because I'm a Slack maxie, but Yeah, linear. Linear is also really good. Yes,Ryan Lopopolo: we do make a good use of Slack. We we fire off codex to do all these lotion, elasticity, fix ups, the things that like sync that knowledge into the repository.It's super cheap. Yeah.swyx: Yeah.Ryan Lopopolo: Just do it in Codex.swyx: My biggest plug is OpenAI needs to build Slack. You need to own Slack. Build yours. Turn this into Slack.Ryan Lopopolo: I did read about it. Youswyx: did?Ryan Lopopolo: Yeah.[00:38:25] Collaboration Tools for AgentsRyan Lopopolo: I would say that if we think that we want these agents to do economically valuable work, which is like this is the mission, right?We want AI to be deployed widely, to do economically valuable work, then we need to find ways for them to naturally collaborate with humans, which means collaboration tooling, I think, is an interesting space to explore.swyx: Yeah, totally. Yeah. GitHub, slack, linear.Vibhu: Yeah, that was my thing. Okay, where do we see right now Codex has started Codex Model, then CLI, now there's an app, app can let me shoot off multiple Codex is in parallel, but there's no great team collaboration for Codex.And it [00:39:00] seems like your team had some say into what comes out, right? So you talked to ‘em, codex kind of was a thing. From there, if you guys are on the bound, what stuff that like, you might not focus on, but what do you expect other people to be building, right? So people that are like five x 50 Xing.Should you build stuff that's like very niche for your workflow, for your team? Should it be more general so other people can adopt? Is there a niche there? ‘Cause part of it is just okay, is everything just internal tooling? Do we have everything our own way? Like the way our team operates has our own ways that we like to communicate or is there a broader way to do it?Is it something like a issue tracker? Just thoughts if you wanna riff on that.[00:39:35] Standardizing Skills and CodeRyan Lopopolo: I think TBD we have not figured this out in a general way. I do think that there is leverage to be had in making the code and the processes as much the same as possible. If you think that code is context, code is prompts, it's better from the agent behavior perspective to be able to look in a package in directory X, Y, Z, and it not to have to page so [00:40:00] deeply into directory if you C, because they have the same structure, use the same language, they have the same patterns internally.And that same like leverage comes from aligning on a single set of skills that you're pouring every engineer's taste into to make sure that the agent is effective. So like in our code base, we have, I think, six skills. That's it. And if some part of the software development loop is not being covered, our first attempt is to encode it in one of the existing setup skills, which means that we can change the agent behavior.Yeah. More cheaply than changing the human driver behavior.swyx: Yeah.[00:40:39] Self Improvement via Logsswyx: Have you ever, have you experimented with agents changing their own behavior?Ryan Lopopolo: We do.swyx: Yeah. Or parent agent changing a subagents, behavior or something like that.Ryan Lopopolo: We have some bits for skill distillation. So for example, there's one neat thing you can do with Codex, which is just point it at its own session logs to ask it to tell you how you can use [00:41:00] the tool pedal better.swyx: It's like introspectionRyan Lopopolo: or ask it to do things. I useVibhu: this session better. What skills should Iswyx: high? I like the modification of, you can do, just do things to you can just ask agent to do things.Ryan Lopopolo: Yeah. You can just codex things. This is like a, this is like a silly emoji that we have, right? You can just codex things, you can just prompt things.It's really glorious future we live in, but okay, you can do that one-on-one. But we're actually slurping these up for the entire team into blob storage and. Running agent loops over them every day to figure out where as a team can we do better and how do we reflect that back into the repositories?Yes, though everybody benefits from everybody else's behavior for free. Same for like PR comments, right? These are all feedback. That means the code as written, deviated from what was good, a PR comment, a failed build. These are all signals that mean at some point the agent was missing context. We gotta figure out how toswyx: Yeah.Ryan Lopopolo: Slurp it up and put it back in the reboot.swyx: By the way, I do this exactly right. I used to, when I use cloud code for [00:42:00] knowledge work, cloud cowork is like a nice product, right? Yes. In I think you would agree. I always have it tell me what do I do better next time? And that's the meta programming reflection thing.So I almost think like you have six reflection extraction levels in symphony and almost like the zero of layer. So the six levels are PO policy, configuration, coordination, execution, integration, observability. We've talked about a couple of these, but the zero layer is like the, okay, are we working well?Can we improve how we work? Yes. Can I modify my own workflow without MD or something? I don't know.Ryan Lopopolo: Yeah, of course. Yeah, of course you can. Like this thing is also able to cut its own tickets ‘cause we give it full access.Yeah. Make it a ticket to have it cut. Tickets you can.Put in the ticket that you expect it to file as on follow up work,swyx: like Yeah. Self-modifying. Yeah.Ryan Lopopolo: Yeah.[00:42:44] Tool Access and CLI FirstRyan Lopopolo: Put, don't put the agent in a box. Give the agent full accessibility over it. Domain.swyx: I had a mental reaction when you said don't put the agent in a box. So I think you should put it in a box. Like it's just that you're giving the box everything it needs.Ryan Lopopolo: Yeah. Context and tools.swyx: But we're like, as developers, we're used to calling [00:43:00] out to different systems, but here you use the open source things like the Prometheus, whatever, and you run it locally so that you can have the full loop. I assume.Ryan Lopopolo: Yep.Vibhu: I think likeRyan Lopopolo: another, you wanna minimize cloud, cloud dependencies.Vibhu: You also want to make sure that you think about what the agent has access to. What does it see? Does it go back into the loop, like from the most basic sense of you let it see its own like calls, traces it can determine where it went wrong. But are you feeding that back in? So you know, just the most basic level of you wanna see exactly what's input output, like does the agent have access to.What is being outputted, right? It can self-improve a lot of these things. It's allRyan Lopopolo: text, right? My job is to figure out ways to funnel text from one agent to the other.swyx: It's so strange like way back at the start of this whole AI wave Andre was like, English is the hottest day programming language.It's here, it's just Yeah. The feature as well.Vibhu: A lot of, okay. Like a lot of software, a lot of stuff. There's a gui, it's made for the human. We're seeing the evolution of CLI for everything, right? All tools have CLIs. Your agents can use [00:44:00] them well, do we get good vision? Do we get good little sandboxes?Like right now? It's a really effective way, right? Models love to use tools. They love the best. They love to read through text. So slap a CLI let it go loose. That works for everything.Ryan Lopopolo: It does. Yeah. Yeah.[00:44:14] UI Perception and RasterizingRyan Lopopolo: We've also been adapting nont, textual things to that shape in order to improve model behavior in some ways, right?We want the agent to be able to see the UI agents do not perceive visually in the same way that we do. They don't see a red box, they see red box button, right? They see these things in latent space. So if we want, Hey, yeah, I do. We haveswyx: a ding if that goes off every time. Alien spaceRyan Lopopolo: ding.Anyway if we wanna actually make it see the layout, it's almost easier to rasterize that image to ask EOR and feed it in to the agent. Ha. And there's no reason you can't do both, right? To like further refine how the model perceives the object it's [00:45:00] manipulating.swyx: Cool. Could we, you wanna talk about a couple more of these layers that might bear more introspection or that you have personal passion for?[00:45:07] Coordination Layer with ElixirRyan Lopopolo: I will say that the coordination layer here was a really tricky piece to get right.swyx: Let's do it. Yep. I'm all about that. And this is Temporal core.Ryan Lopopolo: This is where when we turn the spec into Elixir, where like the model takes a shortcut, right? Like it's oh, I have all these primitives that I can make use of in this lovely runtime that has native process supervision.Which is I think, a neat way to have taken the spec and made it more choices achievable by making choices that naturally mapswyx: Yeah.Ryan Lopopolo: To the domain, right? In the same way that like you would prefer to have a TypeScript model repo if you are doing full stack web development, right? Because the ability to share types across the front end and backend reduces a lot of complexity.And becauseswyx: that's what graph kill used to be.Ryan Lopopolo: That's right. Andswyx: I don't know if it's still alive, butRyan Lopopolo: [00:46:00] no humans in the loop here. So like my own personal ability to write or not write elixir. Doesn't really have to bias us away from using the right tool for the job. It is just wild.swyx: Love it. I love it.Yeah. I wonder if any languages struggle more than others because of this? I feel like everyone has their own abstractions. That would make sense. But maybe it might be slower, it might be more faulty where like you'd have to just kick the server every now and then. I, I don't know. I think observability layer is really well understood.Integration layer, CP is dead. I think all these just like a really interesting hierarchy to travel up and down. It's common language for people working on the system to understandRyan Lopopolo: The policy stuff is really cool, right? Yeah. You don't really have to build a bunch of code to make sure the system wait for the, to passswyx: it's institutional knowledge.Ryan Lopopolo: Yeah. You just give it the G-H-C-L-I with some text that say CI has to pass. It makes the maintenance of these systems a lot easier.[00:46:57] Agent Friendly CLI Outputswyx: Do you think that CLI maintainers need to be [00:47:00] do anything special for agents or just as is? It's good because like I don't think when people made the G GitHub, CLI, they anticipated this happening.Ryan Lopopolo: That's correct. The GH CLI is fantastic. It's great super industry.swyx: Everyone go try GH repo create GH pull and then pull request number, right? GH HPR, like 1 53, whatever. And then it like pullsRyan Lopopolo: basically my only interaction with the GitHub web UI at this point is GH PR view dash web.Exactly. Glanceswyx: at the diffRyan Lopopolo: and be like Sure thing. Send it. Yeah. But the CLI are nice ‘cause they're super token efficient and they can be made more token efficient really easily. Like I'm sure you all have seen like I go to build Kite or Jenkins and I could just get this massive wall of build output.And in order to unblock the humans, your developer productivity team is almost certainly gonna write some code that parses the actual exception out of the build logs and sticks it in a sticky note at the top of the page. And you basically [00:48:00] want CLI to be structured in a similar way, right? You're gonna want to patch dash silent to prettier because the agent doesn't care that every file was already formatted.Just wants to know it's either formatted or not. So it can then go run a right command. Similarly, like in our PNPM distributed script runner, when we had one, when you do dash recursive, like it produces a absolute mountain of text. But all of that is for passing. Test suites. So we ended up wrapping all of this in another scriptswyx: to suppress the,Ryan Lopopolo: which you can vibe the channel only output the failing parts of the tests.swyx: You make a pipe errors versus the standard, standard out. I don't know. Okay. Whatever. Too much thinking have to do that. The CII used to maintain SCLI for my company and yeah, this is like core, very core to my heart. But you're vibing my job.Ryan Lopopolo: That's right.swyx: Cool. Any other things?This is a long spec. [00:49:00] I appreciate that. It's got a lot of strong opinions in here. Any other things that we should highlight? I think obviously you can spend the whole day going through some of these, but I do think that some of these have a lot of care or some of this you might wanna tell people, Hey, take this, but, make it your own.[00:49:15] Blueprint Spec and GuardrailsRyan Lopopolo: Fundamentally, software is made more flexible when it's able to adapt to the environment in which it is deployed, which means that things like linear or GitHub even are specified within the spec, but not required pieces of it. There's like a more platonic ideal of the thing that you could swap in like Jira or Bitbucket, for example.But being able to tightly specify things like the ID formats or how the Ralph Loop works for the individual agents. Basically means you can get up and running with a fully specified system quickly that you then evolve later on. I think we never intended for this to be a static spec that you can [00:50:00] never change.It's more like a blueprint to get something worth a starting point up and running.swyx: Yeah.Ryan Lopopolo: For you then to vibe later to your heart's content,swyx: you have like code and scripts in here where it's oh, I think this is a really good prompt. It's just a very long prompt.Ryan Lopopolo: Fundamentally, the agents are good at following instructions, so give them instructions.And it will, improve the reliability of the result. We, much like the way we use Symphony, we don't want folks to have to monitor the agent as it is vibing the system into existence. So being very opinionatedVery strict around what these success criteria are means that our deployment success rate goes up. Yeah. It means we don't have to get tickets on this thing.Vibhu: Think it all goes back to that like code to disposable, right? Like early on when you had CLI or you'd kick off a Codex run, it would take two hours. You would wanna monitor okay, I'm in the workflow of just using one.I don't want it to go down the wrong path. I'll cut it off and, just shoot off four, like that was my favorite thing of the Codex app, right? Yeah. Just Forex it like, [00:51:00] it's okay. One of them will probably be right, one of them might be better. Stop overthinking it. Like my first example was probably like deep research.When you put out deep research and I'd ask it something like, I asked it something about LLM, it thought it was legal something and spent an hour, came back with a report completely off the rails. And I was like, okay, I gotta monitor this thing a bit. No don't monitor it. Just you want to build it so it's that it, it goes the right way.And you don't wanna, you don't wanna sit there and babysit, right? You don't want to babysit your agentsRyan Lopopolo: with that deep research query that you made. Looking at the bad result, you probably figured out you needed to tweak your prompt Yeah. A bit, right? That's that guardrail that you fed back into the code base for the task, your prompt to further align the agent's execution.Same sort of concept supply there too.swyx: When you talk, how are the customers feelingRyan Lopopolo: for Symphony? I think we have none, right? This is a thing we have put out into theswyx: world. Symphony's internal, right? As long as you are happy, you are the customer. That'

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
Moonlake: Causal World Models should be Multimodal, Interactive, and Efficient — with Chris Manning and Fan-yun Sun

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Apr 2, 2026 66:47


We've been on a bit of a mini World Models series over the last quarter: from introducing the topic with Yi Tay, to exploring Marble with World Labs' Fei-Fei Li and Justin Johnson, to previewing World Models learned from massive gaming datasets with General Intuition's Pim de Witte (who has now written down their approach to World Models with Not Boring), to discussing the Cosmos World Model with with Andrew White of Edison Scientific on our new Science pod, to writing up our own theses on Adversarial World Models. Meanwhile Nvidia, Waymo and Tesla have published their own approaches, Google has released Genie 3, and Yann LeCun has raised $1B for AMI and published LeWorldModel.Today's guests have a radically different approach to World Modeling to every player we just mentioned — while Genie 3 is impressive, its many flaws demonstrate the issues with their approach - terrain clipping, noninteractivity (single player, no physics/no objects other than the player move), and maximum of 60 second immersion. Moonlake AI (inspired by the Dreamworks logo) is the diametric opposite - immediately multiplayer, incredibly interactive, indefinite lifetime, capable of MANY different kinds of world models by simulating environments, predicting outcomes, and planning over long horizons. This is enabled by bootstrapping from game engines and training custom agents: In Towards Efficient World Models, Chris Manning and Ian Goodfellow join Fan-Yun in explaining why their approach to efficiency with structure and casuality instead of just blind scaling is sorely needed:SOTA models still show physical or spatial understanding glitches, such as solid objects floating in mid-air or moving “inside” other solid objects.If the goal is to plan for the next action, how often is a high-resolution pixel view necessary for modeling the world? Our bet is that there is a disproportionately large share of economically valuable tasks where such detail is not required. After all, humans with a wide variety of sensory limitations have little difficulty doing almost everything in the world. Furthermore, for a large number of purposes, describing a scene or a situation in a few words of language (“the car's tires squealed as it cornered sharply”) is sufficient for understanding and planning.Experiments also show that humans only partially process visual input in a top-down, task-directed way, often making use of abstracted object-level modeling. In almost all cases, partial representations combined with semantic understanding are sufficient.…If the goal is to facilitate the understanding of causality in multimodal environments, then the world model—whether it is used in the virtual world or the physical world—must prioritize properties such as spatial and physical state consistency maintained over long time periods, and an ability to evolve the world that accurately reflects the consequences of actions. That's what Moonlake is building.Game engines are the right starting point abstraction to efficiently extract causal relationships, and building the interfaces and community (including their new $30,000 Creator Cup) to kickstart the flywheel of actions-to-observations.We were fortunate enough to attend their sessions at GDC 2026 (the Mecca of Game Devs), and were impressed by the huge variety and flexibility of the worlds people were building with Moonlake's tools already! Live videos on the pod.Full Video Pod on YouTube!Timestamps00:00 Benchmarking Gets Hard00:47 Meet Moonlake Founders01:26 Why Build World Models03:12 Structure Not Just Scale05:37 Defining Action Conditioned Worlds07:32 Abstraction Versus Bitter Lesson14:39 Language Versus JEPA Debate20:27 Reasoning Traces And Rendering Layer37:00 Gameplay Over Graphics38:02 Fiction Rules And World Tweaks39:15 Code Engines Beat Learned Priors41:10 Diffusion Scaling Limits43:23 Symbolic Versus Diffusion Boundary46:14 Platform Vision Beyond Games50:24 Spatial Audio And Multimodal Latents54:23 NLP Roots Hiring And Moon Lake NameTranscript[00:00:00] Cold Open[00:00:00] Chris Manning: Think this whole space is extremely difficult as things are emerging now. And I mean, it's not only for world models, I think it's for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks.[00:00:20] But these days so much of what people are wanting to do is nothing like that, right? You're wanting to get some recommendations about which backpack would be best for you for your trip in Europe next month. It's not so easy to come up with a benchmark, and it's the same problem with these world models.[00:00:41] Meet the Founders[00:00:41] swyx: Okay. We're back in the studio with Moon Lake's, two leads. I, I guess there's other founders as well, but, sun and Chris Manning. Welcome to the studio.[00:00:54] Fan-yun Sun: Thanks. Thanks, Chris. Thanks for having us.[00:00:56] swyx: You've got, you guys have, come burst onto the scene with a really refreshing [00:01:00] new take of mold models.[00:01:01] I would just want to, I guess ask how you, the two of you came together. Chris, you're a legend in NLP and just AI in, in, in general. You're, you're his grad student, I guess[00:01:10] Fan-yun Sun: Actually my co-founder.[00:01:11] swyx: Oh, yeah.[00:01:12] Fan-yun Sun: I should give a lot of credit to my co-founder, Sharon. Yeah. She was, she was actually working with Professor Fe Androgyn and then she ended up working with, Ron and Chris Manning here.[00:01:22] And then, so I got connected through to Chris initially, actually through my co-founder,[00:01:26] What is Moon Lake?[00:01:26] swyx: what is Moon Lake? What, what is, actually, I'm also very curious about the name, but like why going into world models?[00:01:33] Fan-yun Sun: So I was working a lot. With actually Nvidia research during my PhD years on essentially generating interactive worlds to train reinforcement learning agents or embody EA agents.[00:01:44] And then there's two observations. One in academia and one in industry. An industry like folks at Nvidia are actually paying a lot of dollars to purchase these types of interactive worlds, whether it's for the sake of evaluation or training the robots, or policies or models. And [00:02:00] then, in academia, same thing is happening.[00:02:02] And more specifically, when I was actually working with Nvidia on the synthetic data foundation model training project, we were actually generating a lot of these synthetic data and showing that, hey, you can actually, these synthetic data are actually as useful as real world data when it comes to multimodal pre-training.[00:02:16] But then, like I said, there's a lot of dollars being paid out to like external vendors or, or like. Other folks to manually curate these types of data. It was very clear to us that, okay, on our way to, let's call it embody general intelligence models need to learn the consequences behind their actions, which means that they need interactive data and the demand for those types of data are growing exponentially.[00:02:38] But everybody's sort of thinking about it from a pure, say, video generation perspective or something else. But we feel like the true actually opportunity is actually building reasoning models that can do these things, like how humans do these things today. So that's a little bit on the genesis of Moon Lake, and I think the reason I got into world models was partly.[00:02:59] A philosophical [00:03:00] take of the on the world where I like, believe the simulation theory and stuff like that. But on the other, on the other hand, it's really just like, oh, like there's an opportunity there that I feel like nobody's doing it the way I think should be done.[00:03:10] Structure, Not Scale: The Vision[00:03:10] Chris Manning: I can say a little bit about that.[00:03:12] Yeah. So of the overall goal is the pursuit of artificial intelligence and most of my career has been doing that in the language space and that's been just extremely productive. As we all know, the story of the last few years, I don't have to tell about how much we've achieved with large language models, but, uh.[00:03:31] Although they have been extremely effective for ramping language and general intelligence, it's clearly not the whole world. There's this multimodal world of vision, sound, taste that you'd like to be dealing with more than just, language. And then the question is how to do it. And despite, a huge investment in the computer vision space, right, as the research field computer [00:04:00] vision has been for decades, far, far larger than the language space, actually.[00:04:05] I think it's fair. Say that, vision, understanding sort of stalled out, right? You got to object recognition and then progress just wasn't being made right? If you look at any of these, vision language models, it's the language that's doing 90% of the work and the vision barely works. And so there's really an interesting research question as to why that is and at heart, the ideas behind Moon Lake are an attempt to answer that, believing that there can be a really rich connection between a more symbolic layer of abstracted understanding of visual domains, which aren't in the mainstream vision models, which are still trying to operate on the surface level of pixels.[00:04:50] swyx: I think one of your blog posts, you put it as structure, not scale. Is that, a general thesis?[00:04:57] Chris Manning: Yeah. Well, scale is good too.[00:04:58] swyx: Yeah. Scale is good. Too[00:04:59] lot,[00:04:59] Chris Manning: [00:05:00] lots of data is good as well and scale, but nevertheless, you want the structure Yeah. To be able to much more efficiently learn.[00:05:07] swyx: Yeah. The other thing I really liked also is you put out an example of what your kind of reasoning traces look like.[00:05:12] Right. Which you would distill is the word that comes to mind. I don't even think that's a good, good description, but it would involve, for example, geometry, physics, affordances, symbolic logic, perceptual mappings, and what, what have you. But like that, that is the kind of example that involves, let's call it spatial reasoning, role model reasoning as as compared to normal LM reasoning.[00:05:35] Yeah.[00:05:36] Defining World Models vs Video Generation[00:05:36] Vibhu: But also like taking it a step back. So how do you guys define world models? A lot of people see okay, you can do diffusion, you can do video generation. But, you guys put out quite a few blog posts. You put out a essay recently, we can even pull it up about efficient world models. You have a pretty like structural definition here, but for the general audience that don't super follow the space, right.[00:05:55] What's, what's the difference in what we see from like a video generation model to [00:06:00] a world gen A simulator? How do you kind of paint that last[00:06:02] Chris Manning: year? Yeah, so I think this is actually a little bit subtle because, people look at these amazing generative AI video models, SAWA VO three, one of these things, and they think Genie, they think, oh, this is amazing.[00:06:17] This is we've solved understanding the world because you can produce these generative AI videos, but. The reality is that although the visuals do look fantastic, those visuals actually are accompanied by an understanding of the 3D world, understanding how objects can move, what the consequences of different actions are, and that's what's really needed for spatial intelligence.[00:06:49] So I mean, a term we sometimes use is that you need action condition, world models. That you only actually have a world model if you can predict, [00:07:00] given some action is taken, what is going to change in the world because of it. And in particular, that becomes hard over longer time scales. So if you're simply, trying to.[00:07:12] Predict the next video frame. That's not so difficult. But what you actually want to do is understand the consequences, likely consequences of actions minutes into the future. And to do that, you actually much more of an abstracted semantic model of the world.[00:07:32] The Bitter Lesson & Data Abstraction[00:07:32] swyx: Yeah, the question comes where you want to have more structure than is available in just predicting the next token.[00:07:41] And typically, well, let's, let's call it the experience of the last five years has been that is just washed away by scale, right? So what is the right middle ground here that, you don't ignore the bitter lesson, but also you. Can be more efficient than what we're doing today.[00:07:57] Chris Manning: One possibility [00:08:00] is, look, if we just collect masses and masses and masses and masses of video data, this problem will be solved.[00:08:11] Under certain assumptions that could be true, but there are sort of multiple avenues in which it could not be true. The first is what's really essential is understanding the, the consequences of actions producing an action conditioned world model. And if you are simply, collecting observational video data, which is the easy stuff to collect, when you're sort of mining online videos, you don't actually.[00:08:41] Know the actions that are being taken to see how the video is changing. And so if you are never collecting directly actions and you are having to try and infer them from what happened in the observed video, that's not impossible. But it's very [00:09:00] hard and it's not really established that you can get that to work at any scale yet.[00:09:05] And so there's a lot of premium on collecting action condition video data, which is part of why there's been a lot of interest in using simulation so that you can be collecting data where you do know the actions, which isn't quite limited supply, but there's also in the limit of as much data as you could possibly have.[00:09:28] Maybe the problem is eventually solvable, but. Even though we collect huge amounts of text data is always at a great level of abstraction, right? Language is a human designed, abstracted representation where there's meaning in each token and it's representing and abstraction of the world, right?[00:09:51] As soon as you are describing someone as a professor, and as soon as you are saying that they're condescending, right? These are very [00:10:00] abstracted descriptions of the world. It's not at what you're observing as pixel level, and to get to that kind of degree of abstraction, starting from pixels is orders and magnitude of extra data and processing.[00:10:14] And so, although, we absolutely want to exploit, get as much data as possible, use the bitter lesson. Nevertheless, if there are ways in which you can work with five orders of magnitude less data than people working purely from pixels, you're gonna be able to make a lot more progress, a lot more quickly.[00:10:34] And that's the bet here. And so you could just say that's only wanting to be able to, do it more efficiently, do it more quickly, do it more cheaply. But I think it's actually more than that, I think. One should be making the analogy to how human beings work at one level. You know? Yes, we have these high [00:11:00] resolution eyes and we can look and see a scene like a video, but all of the evidence from neuroscience and psychology is that most of what comes into people's eyes is never processed.[00:11:13] Right. That you are doing fairly fine ated processing of exactly what you're focusing on. But as soon as it's away from that of yeah, there's another guy over there that you've sort of only processing top down this very abstracted semantic description of the world around you. And so, that's what human beings are doing.[00:11:33] They're working with semantic abstractions and so. I think it is just the right representation. ‘cause we also have other goals we want to be able to do, real time worlds. So that means there's a limit to how much processing you can do and we want to do long-term planning and consistency. And again, that favors abstraction.[00:11:55] I mean, I guess there was actually a recent. Blog posts that [00:12:00] came out from our Friends of physical intelligence and, they were sort of heading in the same direction they were saying Oh, to the pay[00:12:06] swyx: pay model.[00:12:07] Chris Manning: Yeah. Yeah. To maintain a long term memory of what's happening in the world. So we can, do longer term we actually storing text of what is, been happening in the world.[00:12:19] Right. It is not such a successful strategy of trying to keep it all at a pixel level.[00:12:24] Vibhu: And yeah, I mean, you can see it in video models like that Temporal consistency. We're at a scale of train on, all the video data we have. We have it for maybe 30 seconds, a few minutes. That's not the same as a game state played for half an hour.[00:12:37] Right. I thought you guys break it down pretty well. You have a, you have a blog post about. Building multimodal worlds with an agent. I dunno if you guys wanna talk about this. This is one of the things I read, I[00:12:48] swyx: thought, yeah, it's the thing I talked about with the reasoning chain. Yeah.[00:12:51] Vibhu: So there's like different phases to this.[00:12:53] It seems like it's more of an agent, a scaffold, very different approach than just, type in a prompt and you, you don't have the same consistency. [00:13:00] It also, like, for people that are listening, I, I would highly recommend reading it. It breaks down the problem in a different light, right?[00:13:06] So like, what do you need to consider when you're talking about video, like world game models, right? How would, what do you need to consider? What are the factors? What are the elements? What's the state? So I don't know if you guys have stuff to talk about for this one.[00:13:19] Fan-yun Sun: Yeah. Actually, I wanted to add on a little bit Yeah.[00:13:22] On our previous point, which is just like, change topics so quickly. I, I do feel like sometimes people confuse like, oh, like we're taking an an, an method with abstraction. That means they don't believe in bitter lesson. Like that's just false, right? Like we are believed is a bitter lesson. But then I feel like the question that we always discuss is like, what is the right abstraction level today?[00:13:42] The analogy I like to make is like, let's just say we can encode and decode. Represent all of images, videos, audio and bytes. Then the most bitter lesson approached is to train a next byte prediction model as opposed to the next token prediction model where it's just like, okay, it's natively multimodal, can just, but it's like, yeah, like [00:14:00] to, to Chris's point, it's like the scale and computing you need to achieve that.[00:14:03] So that's why we always come back to like, okay, what is the most efficient way to do it? And reasoning models to the point of this blog post is a showcase of like, Hey, we're actually just like reasoning about the world and reasoning about. The aspects of the world that CAGR that matter for me to learn what I want to learn from this role model.[00:14:21] swyx: Yeah, it's like you're improving the en encoder of whatever you're, trying to model. And like a better representation would just represent the important things in less space. Yeah. Which would just be more efficient.[00:14:33] Fan-yun Sun: Yeah.[00:14:34] swyx: So yeah, I, I, I fully agree that it is not, antagonistic to, bitter lesson.[00:14:38] I do wanna wanna mention one more thing. Is there any philosophical differences with the JPA stuff that, Yun is working on? I gotta go there. You, you, you, you're, you're imagining like some latent abstraction. I'm like, okay, fine. Let's, let's talk about it, right? Like it's an elephant in the room.[00:14:52] Chris Manning: Yeah.[00:14:53] JEPA & Philosophical Differences with LeCun[00:14:53] Chris Manning: There are philosophical differences. Jan Lacoon is a dear friend of mine, but. [00:15:00] He has never appreciated the power of language in particular, or symbolic representations in general. Yarn is a very visual thinker. He always wants to claim that he thinks visually and there are no words, symbols, or math in his head.[00:15:21] Maybe that's true of yarn. It's certainly not the way I think. Um. But at any rate, the world according to yarn is the basic stuff of the, the world and of intelligence is visual and language is just. This low bit rate communication mechanism between humans and it doesn't have much other utility and it's far inferior to the high bit rate video, that comes into your eyes.[00:15:53] And I think he's fundamentally missing a number of important things [00:16:00] there. Think of this evolutionary argument looking at animals, right? That the closest analogies, the things with chimps, right? So chimpanzees, have fairly similar brains to human beings. They have great vision systems, they have great memory systems.[00:16:18] They've got, better memory than we do of short term memories. They can plan, they can build primitive tools that, humans. Massively ahead in what we understand about the world, what we can plan, what we can build. And essentially what took off for us was that humans managed to develop language and that gave a symbolic knowledge, representation, and reasoning level, which just, okay if this sort of vaulting of what could be done with the intelligence in brains.[00:16:59] So the [00:17:00] philosopher Dan de refers to language as a cognitive tool and argues that, humans unique among the creatures in the world have managed to build their own cognitive tools and language is the famous first example. But other things like, mathematics and programming languages are also cognitive tools.[00:17:21] They give you an ability to. Think in abstractions, in extended causal reasoning chains. And that allows you to do much more. And we use that for spatial representation and intelligence and planning and gameplay as well. So we believe, and this is, underlying the specific technologies that Moon Lake is making, that symbolic representations are powerful.[00:17:50] And you want to use that in your understanding of the visual world when you want a causal understanding, when you want to maintain long-term [00:18:00] consistency and prediction. And as I understand it, that's just not in ya Koon's worldview. So I think that's the fundamental philosophical difference. Then there's the specific model.[00:18:11] He's been advancing jpa, that's a reasonable. Research bed is a direction as to, to head for building out a model of the visual world. To my mind, it's sort of one reasonable research bed. It's not really established. It's the best one that everyone should be following,[00:18:32] swyx: at least developed at scale, at Meta.[00:18:34] But it's not just vision, right? Like, I mean, JPA is a, just joint admitting prediction can be applied to anything really. And people have done it. The argument is that there is a latent representation or that is probably more. Suited to the task, then why not let machines do it for us instead of predefining it at all?[00:18:50] And isn't something like a JPA shaped thing the right answer? And if not, why not?[00:18:55] Chris Manning: So I think there's a part of jpa that's right, which is [00:19:00] you do want to have a joint. Embedding that gives you a consistent model of the world. And Jan's argument is you can never get that from auto aggressive language models ‘cause they're sort of left to right churning out one token at a time.[00:19:22] I guess this is where we're the research arguments of the field, I'm not actually convinced that's right. ‘cause although the token production is this auto aggressive, process that's heading, left to right, I guess don't have to be left to right. But anyway, in sequence of tokens we could have right to left Arabic.[00:19:40] But although that's true, all of the weights of the model that are internal to the transformer, they are a joint model of the model's understanding of the world. And so I think you can think of the weights of the model as a form of. Joint representation, [00:20:00] and therefore it is plausible to think that could be the basis of a world model, which avoids, ya's objections.[00:20:10] swyx: I think I follow, and obviously that would touch on what Moon Lake eventually ends up doing as well. Right. Like, which it's hard to tell because you put out the end results, but we don't know the inputs that go into it. So it's, it's, that's something that we have to figure out over time.[00:20:25] Vibhu: Yeah. I mean, I guess this kind of breaks down some of the outputs. Do you wanna walk us through it?[00:20:31] Reasoning Traces & Interactive Worlds[00:20:31] Fan-yun Sun: Yeah. So this, this really just walks us through the reasoning traces of like, okay. So that just say, if we wanna build a world in this context, it's really just a game demo that, that shows the, the variety of interactions that this world model can build.[00:20:45] And yeah, it's really just a reasoning traces of like, okay it prompted to create a bowling game. Like how did it achieve what you saw? That level of causality, interaction and consistency, right? So yeah, this is almost just like a, an example of [00:21:00] like a reasoning traces. Very[00:21:01] swyx: detailed.[00:21:01] Fan-yun Sun: Yeah.[00:21:01] Vibhu: Very, very detailed.[00:21:02] You gotta you don't even realize it, right? Like when a video is generated, what happens when a ball strikes a pin, right? So first, like you, there's audio in that, like audio triggers happens, score increments, the world changes. Like pins have to start dropping. There's a timer that goes on. It's just like very similar to how now we're used to reasoning for language models.[00:21:20] There's a whole state of what happens. So geometry, physics, all this stuff. And then yeah, there's kind of that single prompt. So asset, ation all this stuff. It's like a, it's a nice view to see what's going on.[00:21:32] swyx: I think Sun is also too polite to point out that, both like Google's genie, demos as well as world Labs is marble, do not have interactive worlds.[00:21:41] Fan-yun Sun: That's the benefit of having a reasoning model, right? Like, because you can, you can say, oh, like maybe in this particular context, I want to learn how to bowl. And then you can say, okay, then what is it important when it comes to learning how to bowl? Okay, maybe it's like I need to understand the, the basic of like, physics and I want to throw it over [00:22:00] them.[00:22:00] I wanna know that when I, when it resets it's a new game. So I know that yeah, basically, you know to pick up the ball, you know that ball's gonna cause the pins to fall down. You know that what's important to this particular bowling game is to score and you know that the score corresponds to the number of pins that fell down.[00:22:19] So it's just like, if it's a model that sort of knows what it. Looks like, knows what a bowling game looks like, but doesn't actually allows you to practice over and over again and to understand that, oh, like what it takes to actually get a high score. Then it sort of doesn't actually allow you to learn what you set out to learn within the world model.[00:22:38] And I think this is really just one example of showing like the advantages of the approach that we're taking over most the, let's call it the zeitgeist, is today, when people talk about clinical role models,[00:22:51] Chris Manning: right? So it sort of seems like the question to ask when there's a world model is.[00:22:58] Can I not [00:23:00] only just wander around the world and look at the beautiful graphics, can I interact with the objects in the world and see the right consequences of actions?[00:23:11] Vibhu: And you also understand what the consequences would be if you do something right. So it's not just like, okay, there's one thing if I pick it up, something will happen.[00:23:19] But, there's 50 options and I know I can expect, I can infer what would happen if I do any of them. Right. So very different when you can actually see it play around with it.[00:23:28] swyx: There,[00:23:28] Beyond Unity: Cognitive Tools for World Building[00:23:31] swyx: there's two cheeky elements of that. I mean, the, the, the I guess, less ambitious one is, let's really establish for listeners, why is this fundamentally different than writing Unity code, right?[00:23:40] Like just creating a model to translate a prompt into Unity code[00:23:44] Fan-yun Sun: so there is an underlying physics engine. Yeah. In that sense, there's some overlapping things to Unity, but the way we think about it is like physics engine. Tools or code are cognitive tools like borrowing Chris's term, right? Like tools [00:24:00] that the model can employ as means to an end.[00:24:04] So today maybe you say, okay, in this particular context we care about physics, we care about the long-term causality consequences. Then yes, we deploy it, employ physics engine, and then maybe tomorrow we say, okay, we're we're training that. Just say drones where we only care about really fluid dynamics and the visual aspect of the world.[00:24:25] Then, then yeah, maybe we don't actually, the model actually doesn't have to use a physics engine. Or maybe it employs other types of representation or physics engine to achieve the task. So yes, writing code for Unity is sort of similar to a tool that our A model can employ, but our goal is for a model to take a representation conditioned reasoning.[00:24:46] Approach or process.[00:24:47] swyx: Yeah,[00:24:47] Fan-yun Sun: internally.[00:24:48] swyx: Yeah. Using these things as just like general two calls. Right. Which I think is very interesting. The other more ambitious one is, some kind of recursive element where it becomes multiplayer, right? Like here, there's a single player element, you're not [00:25:00] modeling any other people involved.[00:25:01] And that is a whole other thing.[00:25:04] Fan-yun Sun: But in fact, we can really do multiplayers. Oh yeah, okay. I haven't seen any double situations. So just actually just like prompt our, our model to say, Hey, like configure to multiplayer. Then it'll do like this. You'll be able to configure multiplayer[00:25:16] swyx: great[00:25:17] Fan-yun Sun: persistency database for you.[00:25:18] Easy. Yeah.[00:25:19] Vibhu: So what, what are like some of the current limitations in where we're at? So there's one approach of like, okay, scale up video predictors. Obviously there's data issues. With approaches like this, is it data constraints? What are like the next steps? Is it real time? Like, so there's one side of, write an agent to write Unity code, but okay, I want to be streaming a game real time.[00:25:38] I want to have characters being also like agent, but where, where do we kinda see this scaling up? Right?[00:25:44] Fan-yun Sun: Yeah, there's definitely a data constraint. Like the more data, the, the better. This reasoning model can almost basically act as humans to like operate a variety of tools and softwares to build whatever's necessary.[00:25:57] And then there's a sort [00:26:00] of fidelity constraint, which we're actually solving with another model, which we can talk about later. But it's like, it's not as easy to get to photorealism with the approach that we're taking. But we think there are better solutions to that, which is we can dive into later.[00:26:14] Later.[00:26:15] Vibhu: The one one thing you note here is it's a diffusion model, right? So there's, there's a few approaches, diffusion caution, splatting, yeah, so Ry diffusion model, you guys wanna[00:26:25] Fan-yun Sun: Yeah.[00:26:25] Vibhu: Introduce,[00:26:26] Fan-yun Sun: yeah, totally.[00:26:26] Rie: Neural Rendering & Skins for Worlds[00:26:26] Fan-yun Sun: So within our world modeling framework, we think there are two models that we train, right?[00:26:31] Like, there's the multimodal reasoning model that we just talked about that essentially handles. Mainly the, the causality, the persistency and logic determinism of the world. And then RY is our bet on saying, okay, like while all those model, can take care of all these things that we just talked about, it's limitations compared to existing, say, video models, is that it doesn't have as high of a pixel [00:27:00] ality right off the gate, right?[00:27:02] And EE is to say, Hey, we can actually take whatever persistent representation that we generate with our multimodal reasoning model and learn to restyle it into photo photorealistic styles or arbitrary styles you want. So this model is almost to say, Hey, I'm going to respect the persistency and interactivity of the world that you created, but my only job is to make sure that its pixel distribution is close to what we want.[00:27:29] Vibhu: Yeah.[00:27:30] swyx: Great example right there. You kept the KL divergence.[00:27:33] Fan-yun Sun: Oh. Where,[00:27:34] swyx: no, no. I mean this, this is a, a classic like, how you don't stray too far from the source material as you, you kept the kl, which is Oh yeah. Kind of cool. Yeah.[00:27:43] Fan-yun Sun: Yeah.[00:27:44] swyx: I mean, and the[00:27:44] Chris Manning: difference is, and I mean sun was pointing at this, where sort of saying it's in one way a more difficult path, but a better path that, typically the diffusion models are producing the whole scene and it looks lovely, [00:28:00] but there isn't spatial understanding behind it, which is allowing for the real time graphics gameplay, the spatial intelligence, understanding the consequences of worlds where this is, taking a path where it is assuming an abstracted semantic model of the world's state.[00:28:20] And then the diffusion model is then being used on top of that to produce the high quality graphics.[00:28:27] swyx: Is there an intended practical, or business use for this, or is it like a, like a demonstration of capabilities?[00:28:34] Fan-yun Sun: We actually believe that this is gonna be the next paradigm of rendering. So it's gonna replace how ra raizer, it's gonna replace DLSS today because it not only has these pixel prior that's learned from the world such that you can literally play any game in photo realistic styles, which is a lot of people's desire when they do GTA, right?[00:28:51] Like,[00:28:51] Vibhu: all the mods, all the people adding perfect lighting and all this.[00:28:54] swyx: So[00:28:54] Fan-yun Sun: skins[00:28:55] swyx: for worlds, let's call it[00:28:56] Fan-yun Sun: skins, let's call it skin for worlds. I,[00:28:58] Vibhu: it's also like, you can call it skin, you can call it [00:29:00] customization. You can play it how you want, right?[00:29:01] Fan-yun Sun: Yeah, exactly. And I think another thing that we really pointed out specific specifically in this blog is the programmability of it, right?[00:29:09] So what this means is that this render historically render is always a derivative of the game state, right? You're saying, oh, here's the game state, I'm rendering out a frame. But here I'm saying actually this render can be part of the gameplay loop. I can say something along the lines of, if upon getting 10.[00:29:26] Apples, I'm gonna, my weapon of choice, my bullet's gonna turn into apples. And that's, that's possible because we can say, we can basically dynamically have certain game state trigger the, the preconditions to the render such that the rendering is now part of the game loop too. One thing is to just say, okay, it's, it's, it's the appearance.[00:29:47] But the second thing is also to say there's these novel interactions that are possible because this render now has actually priors of the world.[00:29:57] swyx: It is up to the artist to figure out what to do with it.[00:29:59] Fan-yun Sun: It [00:30:00] is up to the creators. Yes.[00:30:01] swyx: Yeah.[00:30:01] Fan-yun Sun: And I also think that's actually another big argument that we're making and the reason that we're picking, taking the bet we're baking is that a lot of the times, whether it's for embody AI gaming, like you want a layer where human can inject their intentions.[00:30:15] So, for example, let's just say in the context of gaming, it's obviously like my creative intent, but maybe in the context of embodied ai, it's like, oh, like I take this foundational policy and I want to actually fine tune it to deploy in my house. So you want to almost say, inject, have a layer where human can say, oh, here's the distribution of things I want to create to achieve my goal.[00:30:35] And I think 3D graphics as it as it is today, is basic, the layer for people to say, Hey, what do I care about in this world? And it allows, basically human intent to be expressed in these worlds much more explicitly and distributionally as opposed to just saying, Hey, I'm gonna generate like, arbitrary.[00:30:54] And it's like just prompts,[00:30:55] swyx: it's one of those things where like, I think you, you're going to build up a series of models, right? [00:31:00] This is just one of, this is probably like the highest utility or heaviest, frequency one, I don't dunno what to call this. Where like you Yeah. You can immediately drop this in on any game and you don't need anything else that.[00:31:10] That you guys do. But, I, I could see, I could see that I think the, the human intent is something that people are not even used to because we're so used to static worlds or, worlds that just don't react, or, I don't know. It's, it, you're kind of blowing my mind right now with like, I'm, I wonder if you've talked to people at GDC Hmm.[00:31:27] And what are they gonna do with it?[00:31:30] Fan-yun Sun: Yeah. Now the stance that we take on this front is like, we're not gonna be more creative than our users to ship[00:31:35] swyx: it out.[00:31:35] Fan-yun Sun: Yeah. But we wanna make sure that we're building things in a way that really allows them to express their intent.[00:31:41] swyx: The thing that you said about, here's the distribution that I want.[00:31:45] I think text may be too low of a bandwidth to. To really demonstrate, because I, I, there, I'm, I'm probably just gonna want to drop in a bunch of, reference assets and then you can figure it out from[00:31:58] Vibhu: there. But you probably wanna do a, a mixture of [00:32:00] both, right? Like you throw in a few images. I wanted this style.[00:32:02] Yeah. I want it to look like this. So it, it's, it's a mixture, right?[00:32:05] Chris Manning: I, I think it's a mixture. I mean, yeah, I mean there's clearly a visual component of this, and it's not that, everything can be text. ‘cause of course you want to give a visual look, but there's also a massive amount of giving the overall picture of the look of the world and the behavior of things that you can express in a few words of text.[00:32:32] And it be very time consuming and difficult to do via visual means. So I think, yeah, you want a combination of both.[00:32:40] Evaluating World Models[00:32:40] Vibhu: So one question I kind of have is, how do we go about evaluating world models? So like, there's many axes, right? One is like, okay. I have preferences. How well do we adhere to prompts? One is the simulation.[00:32:50] One is like do things, is there core logic that's broken? So coming from we know how to evaluate diffusion, there's fidelity, there's [00:33:00] stuff like that. But what are some of the challenges that most people probably aren't thinking about?[00:33:04] Fan-yun Sun: Yeah, I think this is like a great question and probably one of the hardest questions in role models because like, I think it always comes back to what are you building this role model for?[00:33:13] And depending on your end goal and purpose, the evaluation should defer. So in the context of games, then the most direct way of measuring is how much behind are people actually spending in this world that you create? And if your goal is to say, for example, in the context that we just talked about, like, hey, deploying, deploying action in body, a agent, then your, your end.[00:33:33] Metric is then, okay, after training in these worlds that you generate how robust it is to when you actually deploy to the target environment. But then, it's, it's hard to measure these end metrics. So today people have like these proxy metrics that I call that basically try to measure what we really care about, which is the end metrics, but then frankly it's different for every use case.[00:33:57] Yeah,[00:33:57] Vibhu: which seems like quite a challenge, right? Like in [00:34:00] in language models or video models. Image models, your benchmarks are proxies, right? People aren't actually asking instruction, following tool use questions. They're proxies of how well it will do downstream. But for this, so like, should teams, should companies have their own individual benchmarks outside of games?[00:34:16] If you think of stuff like, okay, video production, movies, stuff like that, that also want to use world models. Should, should they sort of internalize like. Their own proxy. Is this something you guys do? Where, where does that connect[00:34:28] Chris Manning: go? Yeah, I think this whole space is extremely difficult as things are emerging now.[00:34:35] And I mean, it's not only for world models, I think it's for everything including text-based models, right? ‘cause in the early days it seemed very easy to have good benchmarks ‘cause we could do things like question answering benchmarks and could you answer the question based on these documents and the various other kinds of, do pieces of logical reasoning or math.[00:34:58] But again, these are sort of. [00:35:00] And there were sort of visual equivalents of things like object recognition, right? For these small component tasks. These days so much of what people are wanting to do also with language models is nothing like that, right? You're wanting to, have an interaction with the language model and get some recommendations about which backpack would be best for you for your trip in Europe next month.[00:35:25] And it's not the same kind of thing, right? And it's not so easy to come up with a benchmark as to does this large language model give you an effective interaction for guiding you in a good way for shopping, right? So, and it's the same problem with these world models. So if we take the game design case, well success is that a game designer can.[00:35:57] Produce what they are [00:36:00] imagining in a reasonable amount of time. And that's really the kind of macro task. That's a very hard thing to turn into a benchmark and I think a lot of this is actually going to turn into people walking, walking with their feet. Right? I mean, I guess that's what's happening, at the large language model level, right?[00:36:23] When people are choosing to use, GPT five or Gemini or clawed, individuals are trying out these different models and deciding, oh, I like the kind of answers that GT five gives me, or no, I feel like I get more accurate detail from Claude, right?[00:36:43] Vibhu: It's a lot of[00:36:43] Chris Manning: vitech, a lot of people just using it.[00:36:45] It's vibe checking. I realize that, but it's actually whether. People feel it's giving them utility in what they want. Right.[00:36:52] Vibhu: And the the interesting thing there is like a lot of people prefer the visual, right? This looks pretty, which is not the objective of what this is [00:37:00] for, right? It's if a, if a game designer is working on something, they care about the game engine, right?[00:37:04] The state, it's, it can look whatever. You can fix that up later. Or you can have a really good game state and you can quickly edit it to 20. 20 different versions, like Keep State,[00:37:14] Chris Manning: right?[00:37:14] Vibhu: So[00:37:14] Chris Manning: that's a really important distinction, for and for speaking to Moon Lake strength, right? So, yeah, great visuals are lovely to look at for a few seconds, but gains are really all about the concept, the game play.[00:37:33] And a lot of the time that doesn't actually even require great visuals. I mean, there are just lots of very successful games which have relatively primitive visuals, and there are other games where people have spent millions producing photo realistic, visuals, and the game sucks, right? So, keeping those two axes apart is really important in thinking about what's important in a [00:38:00] world model for different uses.[00:38:02] swyx: This conversation is reminding me of some game review and fiction discussions I've, had in my sort of non-AI related life. Some, for some people might know Brandon Sanderson, who's a very famous, fiction author, had, is is a big game reviewer. And he, he's a big fan of video games where you change one thing about a normal what you might assume about, about the world.[00:38:22] For example, Baba is you, I don't know if you might have come across that, where like the rules change as you play the game. And also like where, you can do things like reverse time selectively or like change gravity selectively. And I think this is also reminds, reminds me of other kinds of world models that are created by authors.[00:38:38] Where Ted Chang is, is my typical example where he'll take the world that, you know today, but change one thing about it and, but then create a consistent world based on that. Which is long-winded answer of me to, of. For me to say is it's it easy to create alternative roles that don't exist, but you change one thing and then let's, let's run a whole bunch of people through it to see if it works.[00:38:58] Chris Manning: My first dance will [00:39:00] be, that seems a lot easier and more conceivable to do using Techn technology like Moon Lakes than with some of the other world models out there, where the sun can actually make it happen. I'll let him give a second answer.[00:39:15] swyx: If I guess for you, you're constrained by the game engine tool, right?[00:39:18] Like at the end of the day, that's the, that's the thought, partner that you have. If I ask for something where like, if it never is allowed to reverse time or if gravity only ever works one way, then well that's it. But sometimes gravity might change,[00:39:33] Fan-yun Sun: but it's a lot easier to change with code as opposed to a model that is learned primarily on data of.[00:39:42] Real world and virtual worlds that are, I guess, like for example, junior, like there's actually trained on a lot of real world data and a lot of virtual gaming data, and it's hard to say maybe it's easier to say, okay, I wanna change the visuals in like the time period of, of the world. Like, you can't change gravity, for [00:40:00] example.[00:40:00] Vibhu: I feel like you can to light bounds, right? Everything comes down to like, code is a better way to execute it, but the models aren't that diverse and creative, right? You can say, okay, make gravity slower. It can do that, but it's limited to your representation of how you text it out, right? Like they're, they're only gonna do a few iterations, whereas programmatically, if there's a game engine under the hood, you can kind of go wild, right?[00:40:22] So one of the, I dunno, one of the limitations of most models is that they're very overtrained to one style. Right. And extracting diversity is pretty difficult. At least that's something we've seen.[00:40:35] Fan-yun Sun: I mean, are there examples you have in mind where you Existing models? Yeah. Like it would be easier to do that's not using code.[00:40:43] Certain types of creative intent or like transition state transitions,[00:40:47] swyx: Clipping, other models, other wo models are very good at clipping through things. Clipping my, my, my legs clipping through a rock because it's, it's just, it's just bad. [00:41:00] Like, you would have to struggle very hard with your stuff to actually make that happen.[00:41:04] Which I think is maybe a topic that you actually prepared on, Gian Splatting versus, the other stuff.[00:41:09] Vibhu: Yeah. Yeah. It's just for those not super familiar, right? There's a, there's gian splatting, there is diffusion. Like what works, what scales up. I feel like in February when Soro one came out the blog post was literally titled like,[00:41:21] swyx: you bring it up.[00:41:22] You never know.[00:41:23] Vibhu: World, world, video generation models are world simulators. It's super bitter lesson pilled. Yeah, emer, a lot of it is emergence, right? So, not to go through their blog post, basically their whole thing was as you scale up all this consistency, all this stuff just kind of solves, it's a very simple premise, right?[00:41:41] They just scaled up, diffusion, and from there, this is, this is Feb 2024, how much can we, it's already been two years, which is basically five years. How much more in AI time do we need to just scale up or, or do we hit a data cap? But I think we already talked about this a lot, right? Like this is back to the beginning discussion of what's [00:42:00] appropriate for the time.[00:42:01] And that seems like your approach, right?[00:42:03] Fan-yun Sun: Yeah. The point I'm trying to make is that they're very many, many different types of world simulators and like having a world simulator that can produce pixel coherency is very, very useful for games and, marketing and all these things, but it's not as useful as people think when it comes to causal reasoning.[00:42:25] When it comes to embodied ai. Yeah, like it this title is true. We're not saying that it's, it's like, not a great world simulator, but actually in the blog that we, we, we, we wrote, the bet is more so that there are gonna be disproportionately large share of value of real world tasks or, and virtual tasks where high resolution pixel fidelity is not needed.[00:42:47] Yes. Video models have their values.[00:42:50] swyx: Yeah. This is at the absolute limit of my physics understanding, but one example that comes to mind is basically having to solve like ba the equivalent of a three [00:43:00] body problem in a deterministic Well, where the video models, which is approximated good enough. Yeah.[00:43:08] Right. Like there's, there's some point at which your approach kind of runs into like the you now have to simulate the world. Please, thank you very much. And like you're trying to do that, but only to the extent that the game engine lets you and like game engines cannot do some things.[00:43:23] Fan-yun Sun: Yeah, no, I mean, I think the interesting or more technical question here actually is where do you draw the boundary between.[00:43:32] What's handled with, let's say, diffusion prior and what, when? What's handled with symbolic priors?[00:43:38] swyx: Yes.[00:43:38] Fan-yun Sun: Okay.[00:43:38] swyx: Okay.[00:43:39] Fan-yun Sun: Right. Let's go there. Because this, this boundary can actually be fluid. Like I think like maybe what you're trying to get at is like, okay, people are saying pixel prior, everything. But what we're saying is, okay, there's a boundary that we draw where this is where we think provides the most economical value for the domains and things that we care about today.[00:43:59] [00:44:00] And I actually do think, and it's something that we do internally all the time, which is like, okay, given new equations that we learn or new elements of the world and that we, we learn, or maybe some other knowledge that we acquire in the process of developing the models. Should we still be maintaining this line exactly as it is today?[00:44:22] Or should we move it a little bit left or a little bit right? Right. Like sometimes that we realize that, oh, like maybe customers or, or folks like want certain things that are better handled with preop pryor as opposed to, symbolic prior than,[00:44:34] swyx: yeah. Your, your skin thing is a, is a example moving it, right.[00:44:37] Yeah.[00:44:37] Or left. Yeah,[00:44:37] Fan-yun Sun: exactly.[00:44:38] swyx: I dunno what the, the left right is.[00:44:39] Fan-yun Sun: Yeah, yeah, yeah. No the, the model.[00:44:42] swyx: Yes.[00:44:42] Fan-yun Sun: Actually we have a few iterations of them. They're actually at slightly different[00:44:45] swyx: I know boundaries. You should, you should do that. That's a cool dimension to show.[00:44:49] Fan-yun Sun: Yeah.[00:44:50] swyx: Is quantum mechanics the diffusion prior of our world?[00:44:55] Right. It's like that's the boundary of classical mechanics versus quantum. Right? Like, that's it. At one [00:45:00] point God plays dice and the other point doesn't.[00:45:02] Fan-yun Sun: I dunno if Chris, you wanna say it, but I think, I think generally I feel like physics is better with symbol P priors.[00:45:08] Chris Manning: Even quantum physics.[00:45:09] Fan-yun Sun: Even quantum physics.[00:45:11] swyx: Yeah. This is starts against to, MLST territory is, is what I call it, where, he, he likes to get philosophical. We, we we're quite friendly.[00:45:18] Vibhu: I mean, we need to get, we need to get singularity. I heard some of that.[00:45:23] swyx: No, no, I think that is actually really helpful and man, I just want you to productize this like, as a product guy, I'm just like, oh, also[00:45:32] Vibhu: a gamer, I[00:45:33] swyx: wanna, it's like a researcher, like, it's cool.[00:45:35] Like this is a, the theoretical, like you have a very good, I don't know, like the way of thinking about these things, but I just wanna see you like, express it. I do think like your fundamentally things when, when you leave open new tools, like, okay, use, use human intent to incorporate it into how you render.[00:45:52] Artists are gonna have to take like two to three years to figure out what to do with this. And you just don't know.[00:45:57] Chris Manning: Right. But I think, this is, [00:46:00] gives a much more approachable and controllable world for the society, which is the beauty, the beauty of, NLP, that that will enable it to be adopted and used.[00:46:10] And we are very hopeful about that. Yeah,[00:46:13] Fan-yun Sun: yeah. Yeah. I mean, we are, we are very focused actually on commercialization in the sense that like we do, we do really believe in the data flywheel app approach. Yeah. Where, we put this in the hands of the creators and the users and then they will teach us when, what capability our model should improve.[00:46:27] And that's why we are, we are actually, like products and beta[00:46:31] swyx: Yeah. Focusing on gaming. What, what's like the adjacent thing to gaming[00:46:34] Fan-yun Sun: embody adjacent, basically. So maybe we can, we can I'll maybe start with where we see the platform in three years. Yeah. Which is like, okay. The users would tell us what they want to achieve.[00:46:45] The end goal could be, Hey, I just, I wanna make something to teach my kids the value of humility. Or it could be, Hey, I wanna fine tune my, drones to be really good at rescue situations. I could be vacuum robots. I want to like train [00:47:00] my manipulation or like vacuum robot to be very robust to my office, right?[00:47:04] But it's like, whatever it is, scenario robust to[00:47:06] swyx: my office[00:47:07] Fan-yun Sun: or like navigate very robustly in my office. But then it's like, whatever end goal that you want, our role model will say, okay, given what you want to achieve, let me generate a distribution of environments such that I can train and evaluate whatever it is you want.[00:47:24] Yeah. Right. Maybe for the purpose of games, it's just the end simulation and that's the end product for certain policies. It's like I can train it within these environments and then help you see where your policy is failing or not. Yeah. And then, so I think,[00:47:37] swyx: so in that case, much more of a training tool.[00:47:40] Than in other training[00:47:41] Vibhu: evaluation? Both. Right?[00:47:43] swyx: Sure. Same. Same thing.[00:47:43] Fan-yun Sun: Yeah, same thing. I think it's just this role model that allows people to train any policy that can act in any multimodal environments.[00:47:51] swyx: Would it be harder to reward hack? Is there an angle here where it is harder to reward hack? Like it's just, I'll just put it generally because I think that's a, that's obviously a key [00:48:00] problem that a lot of people face when in training agents in these environments, and I don't know, can you solve it?[00:48:07] Chris Manning: I think not necessarily. To the extent that there's a mis specified reward that. It seems like it could be hacked in a more symbolic world or in a more pixel based world. I dunno if Sun's got any thoughts, but I don't think that's really being solved.[00:48:26] swyx: The other thing that comes to mind is just you could just build a better sawa as a video generator model, right?[00:48:31] Because then you, you would move the diffusion, side a bit more further to the right. I think if I got the directionality correct. And that's it.[00:48:40] Vibhu: It's better on domains, right? Like on consistency over now, or for sure it exists versus something doesn't, right.[00:48:46] Chris Manning: So[00:48:46] swyx: yeah. Yeah. Is[00:48:49] Vibhu: is a question more like, like[00:48:51] swyx: I'm just riffing on like, how do you, what can you build, you know?[00:48:54] Oh, with the stuff that you have. I do think that the minor, the academic does go immediately to training [00:49:00] and in eval evaluation, but like art tends to take unusual directions. Like you might end up,[00:49:06] Chris Manning: okay. Yeah. But the question is, can you use this piece of software to develop compelling gameplay and. I don't think you can take SOAR and produce compelling gameplay, right?[00:49:19] If you want to have a world that you can wander around in a bit, you are good. But what are your abilities to have gameplay mechanics implemented the way you'd like them to be and to have things stay, with the long-term history of your gameplay that influences future actions. I think there's just nothing there for that.[00:49:39] swyx: Yeah, I do tend to agree. I, I'm just trying to sort of test the boundaries. I would also make the observation that as AAA games industry has developed the line between what is a movie and what is a game has blurred. And you, you, you do end up basically producing a two hour movie as part of your game.[00:49:57] Fan-yun Sun: No, honestly, there, there's so many actually [00:50:00] applications in adjacent markets that our world model can go into. Yeah. But yeah, it, it's sort of fun to riff, riff on. Although on the execution side, we we, we need to stay focused with like, okay, what are the capabilities we want to unlock over time?[00:50:11] And there's a roadmap for that. But yeah, if we're just riffing on sort of like the possibilities, I feel like, whether it's endless Yeah, it's like classic[00:50:18] swyx: and the embedding for a possibility and endless in my mind, it's very close. Yeah. I do wanna, focus on one, like weird choice. I, I don't know if it's weird.[00:50:28] Maybe I'm, I got something here. Audio, right? You could have just said no audio And audio in my mind has a lot of recursion, whereas in video you can just do recasting and that's much computationally much simpler. Audio just seems way harder. I don't know if you wanna just comment on just the special 3D audio.[00:50:46] Problem. Did you really have to do it? I guess you do to be immersive, but like a lot of people do treat it as like, well, you just stick a, a tt S model on top of[00:50:57] Vibhu: Well, there's a lot more to game audio than [00:51:00] just speech. Right. It's not just[00:51:01] swyx: tts. Yeah. Tts. S Fxt, GM Spatial in my mind Echoes[00:51:06] Chris Manning: Yeah.[00:51:06] swyx: And reflections.[00:51:07] And I, I don't even know what's, what else? I don't know what, what other problems in this space.[00:51:13] Fan-yun Sun: Yeah, I think this point like the, it's sort of a more, more pointing to the benefits of using an game engine as a tool that's available to the model, right? Because like part of the spatial audio is from the code that is underlying the simulation.[00:51:32] And while we do give our model access to other types of audio models as. Tools.[00:51:39] swyx: None of them would be spatial, I think.[00:51:41] Fan-yun Sun: But that's exactly sort of more 0.2. We're giving our model an abstraction or a suite of tools such that it's able to achieve that. And you can argue that sort of spatial is like a, like a emergence out of the, the tools that we and abstraction that we provide to the agents.[00:51:59] And I think that's the beauty of [00:52:00] this, this, this approach is like there's a lot of things kind of like how human's built technology and they're like Lego blocks that build on top of each other. And it's the same thing here. There's gonna be things that sort of just sort of emerges from being able to put these things together in like combinatorially interesting ways,[00:52:14] Chris Manning: right?[00:52:15] So this integrated audio model exploits the understanding and semantics of the Moon Lake world, right? And whereas in general for the Gen AI video models. There's no actual integration across to audio at all, right? That someone might stick some music or stick a soundscape or whatever else on top of their video.[00:52:44] So it's not a silent video, but they're in no way connected into a consistent world model. And there's nothing that's okay. An action is happening in the video. Therefore there should be a sound that's [00:53:00] coming from this part of the visual field.[00:53:03] swyx: Yeah.[00:53:03] Vibhu: Is that different than Sora too? Does it not have audio?[00:53:06] Not to say it's not like[00:53:08] swyx: amazing[00:53:08] Vibhu: isn't a spatial[00:53:09] swyx: audio.[00:53:09] Vibhu: It doesn't,[00:53:10] swyx: no. I've played around it with it enough. It just sounds like someone put an 11 laps voice on top of it and just tried to do the lip sync.[00:53:18] Vibhu: Oh, yeah. I've seen, okay. Generate a dog at the beach and reactions to big wave and move[00:53:23] swyx: around.[00:53:23] It's definitely like, so have the dog, have the dog move away from camera and see if the, the song goes down. It doesn't. ‘Cause they don't have facial audio.[00:53:32] Fan-yun Sun: We do want to basically like we, our moral model, like the one we're training is basically towards the goal of having a combined latent representation across all these different modalities.[00:53:42] Right? Such that it can like reason across these different modalities. So for example, if I close my eyes and like you play a video, you play a sound of like a car skidding away from me. I almost can like, visually extrapolate that trajectory in my mind. And I think that type of capability, we want our model to be able to reason, right?[00:53:59] And that's the reason that [00:54:00] we're sort of taking this multimodal reasoning approach. It's like we want this combine late in space that can[00:54:05] swyx: Yeah. Oh, you said late in space. We like that. Here we have to play the, the bell Every time that someone says late in space, no, you gotta train daredevil one. Where you, you, you, it's only audio, but you have to work out.[00:54:15] Where everything is.[00:54:19] Cool. I I think that that was, that was about it for our Moon Lake coverage. I do think that we have like a couple of, Chris Madden questions on, on IR and, just any, any other sort of attention topics or n NLP topics.[00:54:31] Vibhu: Okay.[00:54:31] swyx: Go ahead.[00:54:32] Chris Manning's Journey: From NLP to World Models[00:54:32] Vibhu: Well, no, I mean, yeah, it's just fun. We talked a bit about how you guys met, but you basically, you, you were like the godfather of NLP per se, right?[00:54:39] You spent the whole career from early embeddings, early early attention. You did 2015 attention for machine translation, everything. You, you had information retrieval, so RAG before rag, we just wanna shout that out and admire a lot of that. Right? So what prompted the switch over to world models?[00:54:56] How, how'd all that come about?[00:54:58] Chris Manning: To some answer it [00:55:00] is, the enthusiasms and creativity of students, but there's a bit of a history there, right? So, yeah. So clearly most of my career has been doing stuff with language and how I got into research was thinking, ah, this is just so amazing how humans can produce speech and understand each other in real time.[00:55:21] And somehow they managed to learn languages from their kids. How could this possibly happen? And so, yeah, starting off I was very focused on language, but as it sort of got into the 2000 and tens, I started, going, I'd been working on question answering, and then I started to get, interest in visual question answering.[00:55:42] And that was an area where it was very noticeable. That the visual understanding was bad. Right. These were the days when like, it sort of seemed like there's almost no visual [00:56:00] understanding. You were just getting answers that came from priors. So, if you asked how many people are sitting at the table, it'd always answer two regardless of how many, how many people you could see in the picture.[00:56:11] And so it seemed like, oh, these models actually aren't able to get semantic information outta

Earthfiles Podcast with Linda Moulton Howe
Ep 177: Which E.T.s Claim They Are Closest Genetically to Human Descendants?

Earthfiles Podcast with Linda Moulton Howe

Play Episode Listen Later Mar 26, 2026 69:04


Ep 177: Which E.T.s Claim They Are Closest Genetically to Human Descendants?   Rebroadcast as Linda recovers from a bout of COVID. US Space Force USS Curtis LeMay USS Hoyt Vandenberg, “tip of the spear” USS Roscoe Hillenkoetter Tall Nordics near Sirius A & Sirius B Tall White Pales live near 82 G Eridani where numerous earth-like planets reside Inside US Space Force source details interactions with humans Linda's brother Jim Moulton fallen ill, but recovering at home Highly placed source: Original greys are very ancient race spanning millions of years Blonde Nordics are oldest race from Sirius B system Tall whites or “tall pales” working with us from “Creech Air Force Base” near Las Vegas, Working on Security Access Protocols Hybrid greys “Our true allies are the tall whites” “Nordic types claim we humans are their closest descendants” “..directly related” “These 3 et species occupy the majority” “I don't believe we originated from Earth” “Alliances exist…Eliminating the Trantalod insect..species” “AI biological cloned Grey are…abducting humans and other species” “Temporal technology..used by Trantaloid species” USS Hoyt Vandenberg stationed in Procyon system USS Roscoe Hillenkoetter investigating radio signal source We have had teleportation since 1971 The main purpose of Apollo was to investigate alien sites ====   #LindaMoultonHowe #Earthfiles — For more incredible science stories, Real X-Files, environmental stories and so much more. Please visit my site https://www.earthfiles.com — Be sure to subscribe to this Earthfiles Channel the official channel for Linda Moulton Howe https://www.youtube.com/Earthfiles. — To stay up to date on everything Earthfiles, follow me on FaceBook@EarthfilesNews and Twitter @Earthfiles.  To purchase books and merchandise from Linda Moulton Howe, be sure to only shop at my official Earthfiles store at https://www.earthfiles.com/earthfiles-shop/ — Countdown Clock Piano Music:  Ashot Danielyan, Composer:  https://www.pond5.com/stock-music/100990900/emotional-piano-melancholic-drama.html