Podcasts about sandboxes

  • 139PODCASTS
  • 182EPISODES
  • 47mAVG DURATION
  • 1WEEKLY EPISODE
  • Jun 12, 2026LATEST

POPULARITY

20192020202120222023202420252026


Best podcasts about sandboxes

Latest podcast episodes about sandboxes

Kubernetes Podcast from Google
Agent Sandbox with Lovable, with Jonathan Grahl

Kubernetes Podcast from Google

Play Episode Listen Later Jun 12, 2026 38:35


In this episode we speak to Jonathan Grahl.  Jonathan is the Team Lead of Infrastructure at Lovable where he oversees the platform stack the company runs on. We talked about Kubernetes, Sandboxes and Chocolate.   Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com   News of the week OpenTelemetry is a CNCF Graduated Project CNCF TAG Elections KubeCon India Kubernetes Community Days global events Links from the interview Lovable Cilium Etcd Cloudflare Sandbox Agent Sandbox (Kubernetes) OpenClaw Claude OpenAI Bun toolkit for Javascript gVisor Firecracker Kata Containers Vitess

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it.“The end of localhost” has been Ivan Burazin's obsession for more than a decade.Something that is all too familiar…Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax.The thesis was directionally right, but the market wasn't ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn't just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan's original localhost thesis.In this episode, Daytona's CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs.We go deep on the new agent compute market: Daytona's hard pivot from human dev environments to AI sandboxes, the New Year's Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS.We discuss:* How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis* Why Daytona pivoted from human dev environments to AI sandboxes* Why agents need composable computers instead of disposable code execution boxes* The New Year's Eve MVP that customers chased API keys for* Why Daytona chose bare metal, stateful snapshots, and its own scheduler* How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds* Why Daytona's biggest customer runs ~850,000 sandboxes a day* How RL/eval workloads create zero-to-100,000 CPU spikes* Why RL workloads went from 0% to roughly 50% of Daytona usage* Why customers compare Daytona against EKS/GKS and say they're “never going back”* Why every AI agent may need a computer, including Windows and macOS environments* The Apple licensing constraints that make macOS sandboxes hard* Why CLI gives agents more power than MCP* How open source helps agents integrate Daytona* Why agent-generated PRs may break today's CI/CD assumptions* Why AI SaaS companies reselling tokens may face a cold shower* Why the AI cloud may look more like Stripe than AWSIvan Burazin* LinkedIn: https://www.linkedin.com/in/ivanburazin* X: https://x.com/ivanburazinDaytona* Website: https://www.daytona.io* X: https://x.com/daytonaioTimestamps* 00:00:00 Hook* 00:01:12 Introduction* 00:03:15 CodeAnywhere, Shift, and the end of localhost* 00:05:58 What Daytona is: composable computers for AI agents* 00:08:07 The pivot from dev environments to AI sandboxes* 00:10:17 The New Year's Eve MVP and customers begging for API keys* 00:12:56 Bare metal, stateful sandboxes, and Daytona's scheduler* 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs* 00:21:53 Spiky RL/eval workloads and the new agent infra problem* 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing* 00:33:31 Why every AI agent needs a computer* 00:38:48 macOS sandboxes and Apple's licensing problem* 00:44:28 Why CLI may matter more than MCP* 00:48:11 Open source, GitHub stars, and agent integration* 00:53:11 Git, CI/CD, and agent collaboration bottlenecks* 00:58:15 Founder life and building a 25-person infra company* 01:02:44 AI SaaS, token resale, and API-first business models* 01:06:10 GPU sandboxes, data centers, and compute growth* 01:09:48 Why the AI cloud may look more like Stripe than AWS* 01:11:26 Closing thoughtsTranscriptIntroduction: Daytona, CodeAnywhere, and the End of LocalhostSwyx [00:00:02]: Okay, we're in the studio with Ivan Burazin, CEO of Daytona. Welcome.Ivan [00:00:07]: Thanks for having me, man.Swyx [00:00:08]: Ivan, you and I go back.Ivan [00:00:10]: Way back.Swyx [00:00:11]: How I don't even know how, you found, did you reach out or, for Shift.Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article.Swyx [00:00:29]: End of localhost.Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you.Swyx [00:00:51]: I don't remember.Ivan [00:00:52]: I remember because I was with my then I'm thinking of a girlfriend or wife at that point in time, I'm not sure. It's the same person, so that's great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about.Swyx [00:01:10]: The reason I'm nice is because I'm also late to other people, so it's like, who's, who's without sin here, yeah, so I have to, for those who don't know, InfoBip Shift, there's this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?”Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should've took the advisory shares. So I'm sorry, dude. But anyway.Swyx [00:01:43]: We're not, we're not venture backed.Ivan [00:01:44]: No, it doesn't matter.Swyx [00:01:45]: It's Yeah, anyway, so I think what's impressive about you is that CodeAnywhere is the thing that you've been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona.From CodeAnywhere and Shift to DaytonaIvan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I've said this multiple times, it's like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It's not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called.Swyx [00:02:55]: There was Cloud9.Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I'm not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we've been using in Daytona today. So it was super early. There's about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn't have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are.Swyx [00:04:01]: Historic pivot, yeah, and, it's one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I'm like, “F**k.”Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn't have done it.Swyx [00:04:18]: No way.Ivan [00:04:19]: No, it was like, “We have to get him on board first,” and you were that kicker that we, that got us off the ground.Swyx [00:04:23]: No, because you were putting me on your pitch deck, man. I was like, “Man, this is like a good trip if I don't invest.”Ivan [00:04:29]: That's because it was your quote. It's like we.Swyx [00:04:30]: Yeah. It's the end of localhost.Ivan [00:04:31]: Did a bunch of research about end of localhost and who was interested in that,.Swyx [00:04:34]: No, that's like, I put, I wrote that blog post, and every single company in that field reached out to me, and then every VC who was receiving those pitches then also had to call me and, talk it, talk through it with me.Ivan [00:04:47]: It's finally happening though.Swyx [00:04:48]: It was really super interesting.Ivan [00:04:48]: It's finally happening.Swyx [00:04:49]: It's finally happening.Ivan [00:04:49]: Yeah, it's finally.Swyx [00:04:49]: It's finally happening, with maybe sort of non-human users. Yeah, so what is Daytona today? Let's get like a quick description. I'm wearing the shirt.What Daytona Is Today: Composable Computers for AI AgentsIvan [00:04:58]: You're wearing the shirt. Yes,.Swyx [00:04:59]: It says, I think your branding is very good. Like, it's very consistent. It runs AI code. Like, it cannot be simpler.Ivan [00:05:05]: Exactly, but we're gonna probably have to change that.Swyx [00:05:07]: Oh, s**t.Ivan [00:05:07]: It's also a subset of what we do. Unfortunately, we really love this, Run AI Code is super simple. People interpret it different ways. I think we've given out 5,000, 6,000 of these shirts. People wear them with pride because it doesn't really market about us.Swyx [00:05:21]: Yeah, Daytona's on the back.Ivan [00:05:22]: It markets the back. It markets to the person itself, so I think we did a really good job on that one. But it is also a subset of what we do, because people, when they think about Run AI Code, they just think about these small, let's call it isolates, code execution boxes that, you send some code, you get an output. Whereas what Daytona is today is essentially composable computers for AI agents. It is, the market calls them sandboxes which can be misleading.Swyx [00:05:44]: All these things. All these things on.Ivan [00:05:45]: Yeah, exactly, ‘cause it can be misleading ‘cause people usually think about sandboxes as a demo or a test environment versus a production-grade environment. But what Daytona does, if you think of the laptop that you have in front of you or the computer that's over there, or, my wife is an architect, so she has like a Windows with a 3D graphics card inside to do 3D rendering. Like, as humans, we have different computers or different compositions of computers. And our belief is strongly that agents today and going forward will need all these different compositions of computers to do different types of tasks. And so we offer that basically through an API.Swyx [00:06:19]: Yeah, to give people - I'm trying to sort of front-load all the aha moments or the wow moments so that people can, stay engaged and click like and subscribe. the market is exploding, right? Like, you have been reporting 74% month-on-month growth, and it also, it's just been growing for a while. Like, it's been going like this. And every single - It's not just you guys. It's every single.Ivan [00:06:41]: Everyone, yeah.Swyx [00:06:42]: Sort of, compute provider. I don't know if you agree with me saying compute provider or not.Ivan [00:06:48]: It's fine.Swyx [00:06:48]: Yeah. So like organically PLG-driven growth, but also enterprise is doing super well, I think I wanna rewind to January of last year when you did the pivot. Like, so you obviously called this market early, and you were positioned for it, and you are now one of the market leaders. But what was the insight that made you do the pivot?The Pivot: From Human Dev Environments to Agent SandboxesIvan [00:07:06]: The insight that made us do this pivot is the quarter before that, so end of 2024, when we had - Basically, we did a demo with - I don't I think we discussed this as well, Devin was not public. You actually gave me access to Devin at that time. So Devin.Swyx [00:07:25]: I did?Ivan [00:07:26]: Yeah, you gave me access.Swyx [00:07:26]: I don't think I was supposed.Ivan [00:07:27]: Yeah, exactly.Swyx [00:07:28]: Yeah, I.Ivan [00:07:28]: So it doesn't matter. You.Swyx [00:07:29]: Yeah. I gave like three friends access.Ivan [00:07:31]: Yeah, or it was a call and you showed it to me. It doesn't matter. but OpenDevin was available, which is now called OpenHands. And so we're like, “Oh, this seems to be a thing. This is not public. Let's take our for human automation of dev environments and take, OpenDevin and launch that as a SaaS.” And we did that. Not very many people signed up and used it, but a lot of people reached out that were building agents, and they were like, “Hey, my agent needs a compute sandbox runtime,” whatever you wanna call it. I forgot what it was called at that point. And then we were like, “Oh, amazing. This is a new market. Here is our infrastructure. Here's our product, and go.” And what we found really fast, soon, was that people did not like what we had built. It didn't work. And I remember talking to people at the beginning when we're doing this, the sandbox we're building for agents. People were like, “Oh, why is it different? It's the same thing. We have like EC2, we have VMs, we have all these things.” But we saw that everyone we gave it to, it was like 20, 30 people, they all said, “No.” Like, “This is not what we need. This sort of breaks.” And basically, me and my co-founder not knowing a lot about - ‘cause we're infra people. We're not AI people. So I basically took it upon myself to like watch every single podcast that exists, including all of, all of these and all that, and sort of get up to date, read all the blogs, like get, understand what's going on.Swyx [00:08:45]: Do you wanna shout out who else was useful, just in case people are also looking.Ivan [00:08:49]: Generally we -, I looked at There's a few of podcast, different segments and different types. So there's you guys, No Priors, Bill Gurley's was great while.Swyx [00:09:04]: VG2, yeah.Ivan [00:09:05]: Yeah, while it was around. So there's a few. 20VC is interesting from a different dynamic, and some are different dynamic. But there was, also Red Points.Swyx [00:09:14]: We're not really about the compute market.Ivan [00:09:15]: It was also already - Sorry?Swyx [00:09:16]: You're, you want - You're looking at the agent infra market.Ivan [00:09:19]: I was looking at the agent market and the AI market in general and sort of understanding who are the players, what the perception, and how that goes. And like obviously you complement this with like going to conferences, going to events, going to meetups, reading white papers, like doing all the things that you have to do to understand what's happening. And so when we figured, when we sort of had an idea of what we had to build, literally over the New Year's Eve, literally on New Year's Eve, I half vibe coded the first MVP, first minimal viable product of what Daytona is today. And I went to sleep at like 3:00 AM or something like that. I was doing - I just put my like baby daughter and wife to sleep and, Happy New Year's, and go back to just, doing this. And I sent it to my co-founder, my CTO, and he saw it in the morning. He's like, “This is absolute garbage.” “Do not show this to anybody at all, but the idea is good.” And so he took two weeks, and he rebuilt it.Swyx [00:10:09]: Did it like look like that? Listen, I - It was rough idea.Ivan [00:10:12]: Oh, not even, not even close. Like it was it was way worse. But it was like a very - It was a simplistic view of what it should be. Like, it worked, but it was not ideal. And so he went, we went down the whole, which is his job as CTO, to go, and he came back with this version. We then called all the people that had said like, “This is garbage,” a quarter ago. And we set up these calls, and we gave it to - We just demoed it to everyone. And all the calls went long, every single one. They were 15-minute calls, and they all went to like 25, 30 minutes or whatnot. And everyone said, “We need, we want access.” There was no login, just an API key, ‘cause it was just a beta or an alpha. And they said, “Oh, we want access.” And we're like, “Sure, yeah. Okay, thank you very much.” But after like the next day, if we'd not send it, every single one, like every call that we did, everyone came back, “Where is my API key?” Like everyone wanted it. We're like, “S**t.” Like this is it. Like I've never felt So one, the understanding to your point was like most people thought it was the same infrastructure for humans and agents. We understood a quarter ago it's not. We just didn't know what was the right primitive. And then when we came, and we can talk about what that is, and we gave it to these people, I've never seen, I've never experienced - I've done multiple companies in my life. I've never experienced this, that people literally call you if you do not give them access. Like they want access right now. And so it's like, okay, they don't want this. the thing that they want doesn't seem to exist, or they have not found it, and they really want what we want. And then when we understood that we're onto something, and then when you think about the size of the market, like the market for human engineers and enterprise is a very large market, so think GitLab or whatnot. But the market for every single agent that will exist ever in the future is just like, what is that market? How big is that? And we're like, “We are all in on this.” And so that is where we made sort of the cut between the old product and the new one.Bare Metal, Stateful Sandboxes, and the Lambda + EC2 ModelSwyx [00:12:02]: Yeah. But it wasn't composable at the time?Ivan [00:12:05]: It was very - It was basically just a Linux box that you could change, that you could define number of CPUs, disk, and RAM. Like that is what you could do, but you couldn't have multiple operating systems, you couldn't resize it on the fly, you couldn't add a GPU, you couldn't do like all the things. It was just the, just the first sort of variation of that, yeah.Swyx [00:12:22]: Was it bare metal from the start?Ivan [00:12:24]: It was bare metal from the start. And so the interesting thing that we thought about right away, so our.Swyx [00:12:29]: Which, give people the background, what is the normal path?Ivan [00:12:32]: Yeah, so, basically most providers run this on top of VMs. And also.Swyx [00:12:37]: Firecracker.Ivan [00:12:38]: Yeah, they run on Firecracker and VM. And so we also fire - We can get - We have multiple isolation layers and we can do that. But the common way to do it is that they, one, that the state of the machine, or the hard disk is not part of the sandbox itself. And the other thing is they're not meant to last forever. So most of them are preemptible, like they can There's a time that they can live. And so our thought was when we were going into this is, agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work. Like, and you want to close the lid and open the lid, it's the same state. So you - Agents would want that, like the pause and come back. They want those two things. But also agents really want speed, right? Can they get it? So when we thought about it's like we need something insanely fast, how to make it fast, how to make it long-running, and stateful. And so those two things, it's like combining a Lambda and an EC2, right? Those two things together. And so we didn't have an idea how others did it, ‘cause we didn't know too that there was a market around this. It was more like, okay, this is what we need, what they need. And we looked at Kubernetes, it wasn't wasn't good enough for that. We looked at Nomad, it didn't enable that. And so our history in rewriting our own scheduler at CodeAnywhere is basically what my CTO came up with. Like, he's like, “Oh, the learnings from there,” and he brought it. And the funny thing is, our third co-founder, when he saw it, he's like, “Dude, what is this? This is like 2008.” Like, we went back in time, and he's like, “Exactly.” And so the reason why Daytona is like super fast, and you see this on benchmarks, is we essentially, we run on bare metal. We have our own scheduler, we use the underlying, disk, CPU, and RAM of the underlying machine, which means your IOPS are insanely fast because there's no, there's no network between an EBS or something like that. But also the snapshot, the point in time, the templates, are also preloaded on the bare metal machines. So when you fire off a sandbox from a template or a snapshot, you're essentially directed to the bare metal machine where that snapshot is based on that NVMe drive, and then it literally just turns on that machine, and it's local. There's no network latency, anything on there. And so that is sort of the specificities that we, when we're thinking from first principles, what a computer would look like for an agent, that is what we came up with, and that's what we created.Benchmarks, 60ms Startup, and 50,000 SandboxesSwyx [00:15:02]: Yeah. I should maybe, I don't know if you endorse this, but there's someone that does compute SDK, you guys do very well on there, with like the TTI, right? I. is this a, is this a is this a relevant benchmark for you guys? I don't know.Ivan [00:15:16]: I don't know, and it changes every day. So today RKL is.Swyx [00:15:18]: I don't know what RKL is. Never heard of it.Ivan [00:15:20]: Yeah. RK, yeah, so it is there.Swyx [00:15:22]: You are, at least a third of the next tier of performance, and then, there's a lot of other better-known names that are very slow to start.Ivan [00:15:31]: Yeah. We've been the number one by far for a long time, and now there's different, there's different definitions also of sandboxes, different isolation patterns, different other things. So RKL runs it literally on the S3, the data, so it's very different, and they spin up a sandbox, spin up a container for that, so it's a different type of thing. So the definition of a sandbox is something that we can all, we all need to get along with. But yeah, we're insanely fast on getting these things, up and running. And so you can see even there that it's a zero point 0.10 to 0.11, so.Swyx [00:16:03]: Close enough. Yeah. what else do you need, right?Ivan [00:16:05]: Yeah. So the benchmarks itself, so, in this, in I don't think the benchmarks equate to market ownership or revenue or anything like that. and I've seen this with multiple benchmarks, not just in sandboxes, but in general benchmarks around.Swyx [00:16:20]: It's table stakes. It's just like.Ivan [00:16:21]: Exactly. But it doesn't hurt.Swyx [00:16:22]: Just roughly check.Ivan [00:16:22]: Like you definitely have to be up there and you have to be competing so that people know that, oh, this is definitely one of the top. Because this is only one dimension of what customers look for. There's other things like how many can you spin up consecutively? There's a feature set, there's support, there's like all different things that people look at, but you definitely have to be there, on the benchmarks.Swyx [00:16:40]: How many people do people spin up consecutively?Ivan [00:16:43]: So we have.Swyx [00:16:43]: Or concurrently, is the Concurrency, right?Ivan [00:16:45]: There's three metrics that we look at. And so one is like time to spin up one, and so our time to spin up one is 60 milliseconds with network latency. So request, spin up, reply, 60, the whole thing, 60 milliseconds. That is one. But if you wanna spin up 50,000 at once, we are now at about 75 seconds. So it takes about 75 seconds to spin up concurrently 50,000. Some others, there's public data around this, like take 2,000 seconds, which is 30 minutes. Like there's different variations of that. And then there is the so it is speed of one, speed of like multiple, and then how many can you consistently have up and running. And so we basically have right now no limit to how much we can add because we basically own our own metal. But the biggest customer of ours does like about 850,000 every single day is sort of where they're, where they're just shy of a million every single day that they're running, we do have a request for half a million concurrent, which is literally half a million CPUs somewhere running. So that's an interesting.Swyx [00:17:44]: They pay by like vCPU seconds.Ivan [00:17:47]: By seconds, yeah.Swyx [00:17:47]: Or whatever. Yeah. Okay, and so and then, and the other thing is, the sleeping and the resuming, ‘cause it's all the stateful resumption of all these things, how, what kind of workload are people putting through this, right? Like how is it Do we measure by gigabytes in memory, gigabytes in storage? I don't In like network attached storage. I, what are the costly ones of, out of all these features?Workload Economics: CPU, RAM, Network, and StorageIvan [00:18:15]: The most expensive thing are CPU.Swyx [00:18:18]: Okay. Yeah, of course.Ivan [00:18:18]: The second one, yeah Then it's RAM, then it's disk. We actually don't charge.Swyx [00:18:22]: Which is snapshotting, right?Ivan [00:18:23]: No, it's actually the, snapshotting's part of it, but basically the size of your hard disk, of your machine. So do you have 10 gigabytes, do you have 20, do you have 50, do you have whatever? And then the transference of that. Right now, currently we don't charge for, network at all at Polychron.Swyx [00:18:37]: Oh, you gotta, yeah, you gotta fix.Ivan [00:18:38]: Yeah. It is very much a it's a larger and larger part of our bill, so we're working around, that part there. Obviously, that is the least, expensive, so the hard disk is the least expensive, so it's basically CPU, RAM, for us network, ‘cause we don't charge the customer, and then hard disk, is how it's split up. But there's also different types of workloads, so we basically split it up into two types of workloads in Daytona. One is what we call background agents or long-running agents. and the other is, basically RLs and evals, which I put sort of together. And so they have very different patterns of usage, and if you look at the usage of a background And I'll just name names of companies, not specifically.Background Agents vs. RL/Evals: Two Usage ShapesSwyx [00:19:21]: Yeah, open, all hands.Ivan [00:19:23]: Yeah. So like a background agent's a Cognition, a Lovable, a like all these things are Harvey. These are all long-running, background agents. And so if you look at their usage patterns, their usage patterns are similar to human, which is like follow the sun. Basically, the usage patterns of that is like noon is probably the highest, and the midnight is the lowest, and then weekends are lower. weekday is higher.Swyx [00:19:42]: Yeah, that's a fun question. How global is it? Is it very US-centric or?Ivan [00:19:46]: The US is a large part, but we have currently, we have Asia, Europe, and the US regions.Swyx [00:19:52]: So it's quite global.Ivan [00:19:53]: Yeah, it's quite global. We have it all over. It's interesting that our I talked to you a bit about this. Our number one city by user.Swyx [00:20:01]: Hmm.Ivan [00:20:02]: Is Singapore.Swyx [00:20:04]: Oh, wow. Amazing.Ivan [00:20:05]: Which is an interesting one, right? Not by revenue, just by just like by individual head count.Swyx [00:20:09]: Really?Ivan [00:20:09]: Just like an interesting thing.Swyx [00:20:10]: Singapore is, Singapore is weirdly high in the adoption charts of AI for the population. It's like an, seven, eight million population. And it's like keeps showing up.Ivan [00:20:20]: No, it's quite interesting. We were quite shocked, and I was like, “Oh, this is interesting.” And also one that's up there.Swyx [00:20:24]: There's a reason I'm doing AI using Singapore. it's because I'm from there.Ivan [00:20:27]: We're there. We're gonna, we're gonna be there as well. and it's interesting that Japan is in the top or like Tokyo's in the top, which is in all the tech cycles it has never been. It has never been, so it's quite interesting that they're.Swyx [00:20:39]: I think the Japanese just love AI. Yeah. It's that, and then it's Brazil. That's it.Ivan [00:20:44]: Brazil has always been in.Swyx [00:20:45]: I think.Ivan [00:20:46]: Even when I look, if you look at like GitHub's data and ask historically with CodeAnywhere, it was always like US, Western Europe, and then you'd have like India, Brazil, China, like that would be there. But like Singapore was not in, specifically Japan was never in sort of that top, that top.Swyx [00:21:01]: Yeah. Weird pockets.Ivan [00:21:01]: Weird. Yeah, so it's very global.Swyx [00:21:02]: Okay, so actually that, but that's helps you to distribute your load through, all time?Ivan [00:21:08]: The interesting thing is like we have those kind of loads, but if you look at the researcher loads, they're quite different. So what they are is like if you give them concurrency of 10,000 or 50,000 or 100,000 CPUs at ARMb, when they fire off a run, it's just 100%. And then it just runs, and then it stops. So it's very, the usage pattern is squares basically, right? And it's also not follow the sun, because people will fire it off at midnight before they go to sleep but then wake up and so it's very unpredictable, so you don't know where that is. So the shapes of the usage are quite different than we have had before. And also what's interesting is when it's sort of a follow the sun, even if you have a high growth company, you can sort of predict your usage patterns and have enough capacity for that, because it's sort of, it grows in a, in a way you can project. When you have companies doing sort of like evals and RL, they're super spiky. So they're gonna come in, it's like, “We're gonna use nothing, then can we have 100,000?” Right? And then go back down. And then 100,000, go back down. So it's very different, right? And.Swyx [00:22:09]: Do you want to lock them into commits so.Ivan [00:22:11]: Yeah, we do.Swyx [00:22:12]: Yeah, okay.Ivan [00:22:12]: We so we have to lock them into some sort of commits to have that capacity, because we have to have, basically we have to have the capacity for peak. Right? And so right now, Daytona's mean utilization is 15%, 1-5.Swyx [00:22:25]: Oh my God.Ivan [00:22:26]: So it's very low.Swyx [00:22:27]: Because it's very spiky.Ivan [00:22:27]: It's very spiky, but we get up to 90%. so we have these things. And so what we're, what we're looking at right now as a company is similar to Cloudflare where you can like geo move things around, but that works really well for basically the background agent where it's follow the sun. But this, it's not. Like it's a very different shape. Obviously with scale you figure these things out, but that's an interesting new problem that we have, as a compute provider in the agent space. And when we were doing the conference recently, and so we talked to like Nikita from Neon and.Swyx [00:22:57]: I should bring it up.Ivan [00:22:58]: Parag from Parallel and whatnot, everyone has the same problem. Whereas the usage is super spiky, and this is something that has not happened before, that you have these types of like it was always, it the amplitudes were not this high, right? So it's quite interesting use case and problem solve.Compute Conference and Spiky Agent InfrastructureSwyx [00:23:12]: Yeah, I don't know if we're gonna bring this up again, but let's just talk about the conference, you had like 1,000 something people at the Warriors game, at the Sorry, where is it? What's.Ivan [00:23:22]: Chase Center.Swyx [00:23:23]: Chase Center.Ivan [00:23:23]: Chase Center.Swyx [00:23:24]: I went. It was, it was very impressive. Obviously, you can, how to throw a conference, what did you learn? you put, you pulled together all these impressive names.Ivan [00:23:33]: What I.Swyx [00:23:34]: What were you looking for?Ivan [00:23:35]: My thesis behind the Compute Conference was let's bring together people that are building infrastructure for AI agents. Because when I think of what we're building, it is the agent is the primary user, what are the ergonomics and usage patterns of agents, and so we can do that. And what I found, this was a theory, it wasn't proven, is that we all have these problems, as I touched onto. And I was, as I was talking on stage, it was like we all have the same underlying infra problems, which is this spiky workloads, unpredictable workloads that we've never had before, in human, compute or human infrastructure. And it's, again, it's the same when I was talking to Parag or when I was talking.Swyx [00:24:20]: Lynn. Nikita.Ivan [00:24:21]: Lynn, Nikita. Lynn especially, I was talking to her the other day as well. Like the It is a very interesting type of problem to solve because I can touch on Cloudflare because there's a lot of like talk about that recently as to how they solve that, which is they have a bunch of geos, and basically, as users work in different places, and depending on your tier, they can move you around the geos. And so that how, that's how they get the higher utilization. But you can sort of predict these, and it's If it's something in You'll rarely get a spike that is 10 orders of magnitude. Like you'll get a like let's say one of your customers has some like an exponential curve. What is that to I'm using Cloudflare as an example. 10%, 20%, whatever it is. I don't, I don't have this data, I'm just assessing. It's surely not 10x, right? It's surely not something there. And so how do you go out and solve this problem? And we're all solving this in different ways. So we have.Swyx [00:25:11]: She also has the same thing.Ivan [00:25:12]: Yeah, I know specifically that like Neon had that issue as well. Like how are we solving these spiky loads and things like that ‘cause we talked about it. And so the interesting thing for me to actually internalize was, yes, everyone that's building for agents first is going through this, and we're all solving similar problems, which is quite.Swyx [00:25:28]: Let me let me double-click on this. Okay. So for example, Neon, I happen to know that they're very sort of S3 oriented, right? so they're just like fully bet on S3. And you get to benefit from S3's distribution and infrastructure. So I would imagine that Neon doesn't have to care, whereas Lynn maybe has to care a bit more because obviously she's doing GPU inference. And, for listeners, we did an episode with her, one and a half years ago. And you have to care. But like, right?Ivan [00:25:54]: Parag cares for sure, and Nikita.Swyx [00:25:58]: And Parag is C of, Parallel.Ivan [00:25:59]: Parallel, yeah.Swyx [00:26:00]: Former CTO of Twitter.Ivan [00:26:01]: Twitter, yeah.Swyx [00:26:02]: They are the search.Ivan [00:26:03]: Yeah, they're search, yeah.Swyx [00:26:03]: I You and I know but the listeners don't know.Ivan [00:26:08]: Yeah, we can put it down in the screen, and so ‘cause we, when we were talking.Swyx [00:26:11]: I'll put it up on the, on the screen.Ivan [00:26:12]: Yeah, right.Swyx [00:26:12]: People can look it up if they need.Ivan [00:26:14]: Look it up. And, yes, but they still have CPU and RAM, allocation that you have to have up and running. And so CPU and RAM, you have to allocate that and have that ready. And so there's basically two ways to do it. One is you either over-provision and you can handle the bursts, or two, you basically have, I don't know if this is a term, just-in-time compute, which is like as your load becomes, as your usage comes in, you can fire off requests for VMs or bare metals at other cloud providers and then get them up and running.Swyx [00:26:43]: This is if you go above 100%, right?Ivan [00:26:45]: Yeah, this is.Swyx [00:26:46]: Like your overflow.Ivan [00:26:46]: If your overflow, like spillage or whatever you do.Swyx [00:26:48]: You probably lose money on it, but it doesn't matter, right?Ivan [00:26:50]: It, not Well, you might, you might not That is a more cost-effective way to do it but it's a slower way to do it. Because basically what you have to do is you have to like queue your requests, spin up these just-in-time compute, get it all ready, provision it, and then get your workload there. And so if the time isn't important that much, that's fine, and you can do that. But if your customer, and especially for, let's say, the RL training runs, the reason why a lot of people come to us is because GPUs are more expensive than CPUs, right? So you want your GPU running at, what, 100% the entire time. And so when you're running runs on CPUs, when the when the CPU cycle is like down and spinning up the next one, you want that to be instantaneous so that your GPU doesn't go down, right? And if you then have to like go out and provision machines, you're essentially telling the GPU that it has to wait, and that's incurring our cost. So there's things that you have to try to solve for there.RL Workloads, Declarative Images, and Kubernetes ReplacementSwyx [00:27:43]: Yeah, let's talk about the different workload, right? You said that, what was it? A few months ago, you had zero RL workload and now it's 50%.Ivan [00:27:52]: It will be this one, 50%, yeah.Swyx [00:27:54]: Let's talk about how different it is, right? Like I imagine, for example, a lot less dynamic code generation of like arbitrary code. Like here, it's probably all the same code. You're just doing parallel runs or something, I don't know.Ivan [00:28:05]: Yeah. So you'll have multiple Depends on the like for each run, you'll have a snapshot. And they, for the most part, they actually do use our declarative image builder, which is like, “Oh, we, the agent wants these dependencies, these env vars.”Swyx [00:28:17]: These ones, yeah.Ivan [00:28:18]: Yeah, the declarative image builder, it.Swyx [00:28:20]: Which is a very modal like thing that they.Ivan [00:28:22]: Yeah. And so we build it on the fly and then we propagate that snapshot, and you can spin up as many sandboxes as you want against that snapshot. And then if you have to do changes, the model can, or like it could be also be automated. It's like, “Oh, now for the next run, we need to install these things or remove these things or whatever to get, a task done,” and then it goes off and runs that. So yes, that is something that it seems that they prefer. The number one reason I found, or should I say, let's take a step back. What we are competing against in that environment is essentially managed Kubernetes. So EKS, GKE, whatever. That is what the vast majority run on. And anyone that has tried Daytona versus GKE, EKS is like, “I'm never going back.” That has always been. There's a few reasons. One is the ergonomics. So if you have, if you're using Kubernetes to spin that up, you have to essentially manage the interface interactions with that. Daytona, although as a compute provider, it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS. Like you have an API, an SDK, it's quite like easy and seamless to get these things up and running, that's one. The other is the speed to which we spin up, which we mentioned earlier, which is much faster, and the scale to which we can go to. We haven't got into features, but an interesting feature is that it's very hard to OOM, or out of memory, our sandboxes, because we can dynamically on the fly.Swyx [00:29:48]: Resize.Ivan [00:29:49]: Resize, which is like impossible on almost any other thing. There are some technologies that enable you to do that, but it's like a very hard thing. And so we actually saw this when, the Terminal Revenge team is, brought us actually. So thank you, Alex and the team, that brought us into this whole space.Swyx [00:30:05]: It's just very rare that, a framework would just say, “Guys, just use Daytona.”Ivan [00:30:11]: Yeah, I think it says it somewhere. Yeah.Swyx [00:30:13]: Yeah. I was like, “What is this?”Ivan [00:30:15]: There's all, there's multiple there, but they also mention a few other places. and so Daytona specifically-We have, the, just jumping on themes here We, I don't know where it says Data Center.Swyx [00:30:27]: I, there.Ivan [00:30:27]: Doesn't matter.Swyx [00:30:28]: There's a very strong recommendation, which is, very unusual. Which is, it's.Ivan [00:30:33]: We do not pay them for this, just.Swyx [00:30:34]: I know, yeah. They just like you.Ivan [00:30:35]: Yeah, they like us. yeah, and also a thing, so, Data Center has multiple isolation sets underneath. The customer doesn't have to know what they are. But basically we have Docker, which is a container, that's hardened with Sysbox. So it's Docker's, isolation that is a security equivalent to a VM, but it's still a container. And that is the default, and they, especially in these training workloads, really like that as an interface to be able to use just a basic Docker container, and we enable Docker and Docker. Which for these RL runs, if you need to do a Docker compose or Kubernetes, you can spin up a K3S inside of these things, which unlocks a huge amount of workloads that you can do that you cannot do on other providers. So just on that part is much more interesting. And so we went that, through that. We showed them that we could do that, and they enjoyed that quite a bit. They being the general venture people.Swyx [00:31:28]: Those people, yeah.Ivan [00:31:29]: And Harbor people.Swyx [00:31:29]: Harbor people, do are they, are they a company yet?Ivan [00:31:33]: As far, I do not know.Customer Pull, Slack Connect, and the Computer Use BetSwyx [00:31:35]: Okay. All right. Yeah. It's like super obvious that like, there's a lot of excitement and success around these things, okay, so yeah, tell us more, right? Like, this is an exploding workload, Harbor adopted you, which helped speed things along. But what are you learning as this new workload comes online?Ivan [00:31:53]: There's a couple things that we learned, which we chat about in the beginning. We, and this has led our story, as we mentioned, we like talked to a lot of customers along the way, and we add more features and more tool sets as we talk to customers. And it's interesting that And I think it's that the ecosystem is so small and/or the models get smarter, where when we see one user come with a request, we know it goes on a roadmap if like three to five customers come with the same request in that week. It's like very bizarre. It happens so many times, which is.Swyx [00:32:27]: Because they're all friends.Ivan [00:32:28]: Sorry?Swyx [00:32:28]: They all, they're all friends. They're all in the same group chat.Ivan [00:32:30]: Yeah, probably, yeah. ‘Cause and they're like, “Oh, can you do this?” And I'm like, “Okay, this is interesting. We'll put it on a feature request.” And then the next one's like, “Oh, can you do this?” “Okay.” It's all the same, right? It's always the same. And so what we try to do, and I personally try to do, I try to be on as many call, quote-unquote “sales calls” I can. I'm in every Slack channel. We literally have about 1,000 Slack Connect channels, something like that. It's an interesting, there's so many interesting things you find out when you have all the Slack channels. You can also see where people, transfer between companies. You see leave Slack channel, enter Slack channel. It's an interesting thing. Also, just I digress, I feel that Slack Connect is literally LinkedIn what it should be. You have a list.Swyx [00:33:08]: LinkedIn charges you to, use your own connections, but Slack doesn't, right? Slack is like, do it for free. It's more lock-in. It's great.Ivan [00:33:15]: Yeah. It's amazing. Yeah. It's one of the reasons.Swyx [00:33:17]: You're gonna pay Slack for life.Ivan [00:33:18]: Exactly. You're there for life. So that's interesting. And so one of the things, the newer things we were talking about earlier is we made a big bet and put a lot of investment on computer use. that is not seen publicly the light of day. We haven't GA'd that yet, but we have.Swyx [00:33:32]: Is there a thing I can pull up?Ivan [00:33:33]: There is computer use there. It's right up a bit.Swyx [00:33:36]: Oh, yeah. Okay.Ivan [00:33:38]: What we have, what we talked about and what we've seen publicly is there's this theme now about, the human emulator where And Elon from XAI has talked about this publicly, and if you think about the models today, they're actually quite sophisticated and they can do a lot of work, but they still don't have access to all the tools. Like, I'm a strong believer that the most efficient way for an agent to work is essentially headless or through, terminal or whatnot. But if we, if we look at knowledge work in general, there's about 100 million knowledge workers in the US, about a billion in the world, and knowledge workers, and the salaries of them aggregate to 10 trillion in the US 50 trillion worldwide.Swyx [00:34:24]: Wow.Ivan [00:34:25]: Something like that. And if we look at, the five most important sectors of that, so like healthcare and government and financial services and whatnot, that's about 56% of that. So let's say it's about half of that. So in the US it's about 25 trillion, and most of them, most of that work is actually still locked into legacy apps inside of Windows, which is not going anywhere for a very long time. Like, people just won't invest in that. How much of it? our assumption is the following: if, in the RPA market, which is similar market, well, not the same 25% of, these white collar, workers', work is automated. If an agent is more sophisticated, can go through more runs, figure stuff out, let's say it's, 40%, right? And so if you take 40% of that, you get to essentially, $10 trillion a year.Swyx [00:35:17]: That's a TAM.Ivan [00:35:18]: That is a that is a TAM. So that's the TAM of the models, right? That's not our, essentially ours. But you get to that size, and to be able to do that, you essentially have to give agents these computers with the legacy. So computer use, either Mac or Windows or Linux. Linux we also obviously have and others have. But Windows specifically is something very new, and the only option right now is an EC2 with, Windows or on Azure. Both of them take anywhere from three to five minutes to spin up. We've created an actual sandbox, so it's a second instead of milliseconds, but you have, point in time snapshots, you have, forking, you have all the things that you have from a sandbox, but essentially enables you to hopefully unlock all this value. And so that's been our big push and bet, but we've sort of, kept our ear to the ground. What is sort of the next things in the market?RPA Returns: Why Agents Still Need ComputersSwyx [00:36:06]: Yeah, knowledge work, and building, and sort of RPA, the next wave of RPA. I got very excited about RPA kind of during COVID times. The UI path was IPO-ing. And it was, a very hot Isn't it, Eastern European?Ivan [00:36:20]: It is, Romanian.Swyx [00:36:21]: Romanian?Yeah, it might be the only Romanian, big unicorn okay, yeah. This I don't I don't, I don't have like a I think there's, I think there's a stage being set for the resurgence of RPA, ‘cause everyone understands that, yeah, no one wants to deal with these shitty apps and no one's gonna rewrite them. Like, you just have to do, a remote operation and programmatic operation of them.Ivan [00:36:45]: If you wanna unlock it, my own setup was basically the following. So I was doing a board deck recently, last month, whatever, and I'm like, “Okay, let's just, let's just do automated.” So, all our data's in, ClickHouse and PostHog and QuickBooks, where everyone else's is, and I'm basically, connected that all to, my Cloud code, like go off and go Cloud code whatever. Go off and, here's the integrations, go do that. It pulled out the first report, which was great. It connected to Brex and all these things, pulled it, which was great, and then I say, “Okay, now pull out this, and this,” and I kept getting, really well McKinsey-style design reports, but the data said partial data. all the missing data, partial data. Like, it can't access all the things, and I got so frustrated, and so I got, I got, my Mac Mini virtual sandbox with OpenClaw. I gave it its own account in our company, and then I went to all these services and created a read-only account, so literally like an intern in your company. And so I would say, “Now go and do this report,” and it would get the same, or like, “I can't via the MCP or the API or whatever. I can't get all the information.” I'm like, “Go log in.” And it will log into the website, then go in, export the data. It'll export the data and do the thing end to end. So even for things that have today APIs, not all of it is exposed, and I to get value, I get immense value right now, but it has to be a computer usage, unfortunately, and so I spend a bunch of tokens just on that, but I get the job done. And so if even a startup like ours, and using all the hottest tools, still needs a computer agent what hope does, Goldman have to have a headless, right?Swyx [00:38:22]: Yeah, what a - Why isn't Microsoft doing this?Ivan [00:38:27]: I'm pretty sure, Satya had a post yesterday.Swyx [00:38:29]: Oh, okay. I see.Ivan [00:38:29]: Which was like, “Every agent needs a computer.”Swyx [00:38:31]: I see, I see.Ivan [00:38:32]: So they have launched something recently.Swyx [00:38:34]: Yeah, they have Microsoft Power Automate, I'm sure, I'm sure, they're gonna have their version.macOS Sandboxes, Apple Constraints, and the Windows OpportunityIvan [00:38:39]: Version of that, yeah.Swyx [00:38:39]: You're gonna try to do yours, and it - I always know there's always demand for Mac, but I know it's, tricky to host, macOS sandboxes.Ivan [00:38:49]: We will have macOS sandboxes fairly soon. The problem with macOS, OS sandboxes is, I'm deep in this, I don't know how much interesting is.Swyx [00:38:55]: No, it's.Ivan [00:38:56]: MacOS has this problem.Swyx [00:38:57]: It's a licensing thing, right?Ivan [00:38:58]: Licensing thing. So one, you're allowed to run only two parallel VMs per machine, so that's one. Two, you can only license to a different user every 24 hours. So if you come in and theoretically, if I wanna charge you per second and I charge you one second, I have to have it idle for the rest of the day. I can't have anyone else doing that. So the pricing will be different in the sense that I will have to - we would have to charge for 24 hours, and that's not even, that's not even the most difficult thing. But the, thing above that is, from a security perspective, they enable you to do memory snapshot, pause, resume, but only on the same physical drive, physical machine. And so what you can do in, Windows world or Linux world is that I can move in the background, your snapshot from one to the other and manage load, right? Here, if you wanna do that, you essentially have to have your.Swyx [00:39:49]: Yeah, snapshots. Yeah.Ivan [00:39:50]: Your.Swyx [00:39:51]: It's like.Ivan [00:39:51]: Physical machine.Swyx [00:39:52]: You can't break it up.Ivan [00:39:53]: You can't, you can't move things around that, and all of that is, that part is, from a security standpoint, if it is written. Like, I understand the security aspect of that, but it disables you from doing these agentic, like really scalable agentic workloads.Swyx [00:40:08]: You need to do a vibe-coded, clean room implementation on macOS that you can then - That's like Clean OS or something. I don't know.Ivan [00:40:17]: So. We have.Swyx [00:40:18]: ‘cause like Linux was originally like a clean room rewrite of Unix.Ivan [00:40:21]: Okay. Yeah.Swyx [00:40:21]: Or something like that, right? Like same thing to macOS. Someone needs to do it.Ivan [00:40:25]: Someone will do that, and someone will have some long-running agents for a few days to figure this stuff out. But yeah. So definitely we - we're really close to offering something ‘cause people do want it, but the pricing will be different, and the feature set will be sort of stringent.Swyx [00:40:38]: Yeah, nobody's gonna use this. like, the labs, the labs will because they want to automate macOS.Ivan [00:40:42]: They have to do RL. They have to do RL again. But even if you The - So the point is with the RL part, if you, if you do RL on macOS, then the next iteration of the model comes out, it will be able to use these tools significantly. Then you actually need to run those, that somewhere. So you're gonna have to have that, later on. And from, if anyone at Apple is listening, I very much feel that they are shooting themselves in the foot of the scale of the revenue of compute or licensing they could get if they would just enable a concurrency model similar to what you can get on a Windows and a, and Linux.Swyx [00:41:17]: Yeah. Yeah. And I'm sure they've heard this before. They just don't care. Yeah, it's And maybe they will change their mind with the new CEO.Ivan [00:41:24]: Yeah. We'll see.Swyx [00:41:25]: We'll see.Ivan [00:41:25]: High hopes.Swyx [00:41:26]: High hopes.Ivan [00:41:26]: High hopes.Swyx [00:41:27]: Okay. But I, it's very clear the market opportunity is huge in Windows, and you can go for a long time on just Windows, but your customers are gonna want both. and I think, it is interesting to me that, this is the sort of God application of agents, right? Like, I don't It was - How big was OpenClaw for you guys? Like, was it, was there, a significant bump.OpenClaw, Agent Labs, and the B2B2C Sandbox MarketIvan [00:41:54]: Not for us because we.Swyx [00:41:54]: Because you already.Ivan [00:41:55]: We're kind of positioned differently. Whereas although it's completely PLG and we have individual developers that use it, most of the users that use Daytona are sort of a B2B2C. Sort of it's either B2B or B2B2C. So, in the researcher world, it's B2B, so you're selling to, labs and neo labs and things like that. But on the long-running agents, it's mostly, from a scale revenue perspective, it's mostly B2B2C, where you have a app layer agent that uses you at a big scale.Swyx [00:42:26]: Like a Manus. Yeah.Ivan [00:42:28]: Like a Manus Lovable type of thing.Swyx [00:42:31]: Yeah. I think that's the question of, well how, um-Uh, yeah, B2B to C is basically to me what I've been calling an agent lab, which is kind of like you're not in a model lab, but you're making a very good wrapper that is a platform that other people can sign up so they don't have to code those things. Yeah, it sound, it sounds like a much better market than the direct OpenClaw market.Ivan [00:42:56]: I've like - We I've done multiple things. So the CodeAnywhere's part of our career path R in the calendar, was very much an end user developer product. And so that is great. It You can get a lot of developer love, and I feel that we do as a company have a bunch of developer love. But it's a different type, where it's people building these things. Again, it's more akin to a Twilio because you don't really run - As a person, you wouldn't run Twilio. I don't know how many people remember. It was like ask your developer billboard and whatnot. And people really love Twilio, but they only used it inside of like, “Oh, I'm building this app or service for thing.” And so we're very much directly to that. And you also know that I used to work for a competitor for Twilio, so it's kind of ingrained, in my DNA.Swyx [00:43:35]: People don't know InfoBip is that big.Ivan [00:43:38]: Yeah, it's.Swyx [00:43:39]: Because.Ivan [00:43:40]: It's a billion euro.Swyx [00:43:40]: They're all American. They're like, “Whatever's in Europe doesn't matter to me.” But like it's the, it's the same size or bigger? Same size?Ivan [00:43:46]: It's about half the size.Swyx [00:43:47]: Half the size?Ivan [00:43:48]: Yeah, about half the size.Swyx [00:43:48]: It's like, yeah.Ivan [00:43:48]: Still huge. Multiple billions a year. Yes.Swyx [00:43:51]: That's crazy.Ivan [00:43:51]: Exactly, and so that - These are like really interesting and large revenue-generating, very sticky businesses. Whereas when you're selling to the - When your focus is the end developer, it is a very hard sell because they're very price sensitive, very price conscious, very around that. And there's very It's very hard to scale. Your cap is the number of people that are willing to spin up - First of all, wanna spin that up, and then spin up multiple of these. Whereas if you're in the enterprise one, like we know everyone's talking about like how many tokens they're spending, I'm spending. Like a lot of companies today are like, “If this is our company, spend as much as you can.” Like basically that is where we're going. And so if you think about that paradigm, where you're selling to companies that say, “Spend as much as you can to generate, productivity,” versus, “Oh, I'm a single person. I have this much budget, and I'm doing this thing because it's fun or it's helping me out or whatever.” Like it is a different, it's a different go-to-market, I think, strategy.MCP, CLIs, and Sandboxes as the Agent RuntimeSwyx [00:44:50]: Yeah, there's a lot of discussion. I'm just kind of going through like the mental list of things that are in your favor, which is, for example, MCP versus CLI. Like obviously you want CLI. It's been very good for you. I feel like it's maybe a drop in the bucket or maybe it's huge. I'm just checking whether it's like these are big trends.Ivan [00:45:10]: Those things you - work well in our favor, to your point just because every.Swyx [00:45:13]: They're kind of drop in the bucket, right?Ivan [00:45:15]: I think it's like sort of all the things come together. And so there's so many things that impact that. To your point, like OpenClaw wasn't huge for us, but like having the agent SDK, from Anthropic, so or Cloud Claude Code was very interesting. The reason why it was interesting is that a lot of, let's call them app I don't know what to call them, app layer agent companies, essentially they are like, “Oh, I can create this new app, this new agent. All I need, I just use Claude Code, and I throw it into a sandbox, and then I have my interface to the human to that.” And so that enabled so many more companies to actually offer this, and then they would pull on sandbox. So that was, that was interesting. And to your point, like MCP, versus the CLI, the MCP is an interface against an API, whereas the CLI is like you can actually go do things. Like this is it. The difference between integrations and actually running scripts or data or analysis against a thing. So being able to use a CLI very well enables the agent to do more things, and it's because that people will invoke a sandbox, they'll run it in the CLI, and but it'll do anal-analysis on that data and then give you an actual result versus just, pulling data from an API source.Swyx [00:46:29]: Yeah, it's a layer of indirection basically, it's the same thing as agentic search versus RAG, which where you're.Ivan [00:46:34]: Exactly, yeah.Swyx [00:46:34]: Just like you just win whenever people put more agents into their workflow. And so like it doesn't really matter, but I'm just kinda teasing out like what else have people heard about that like it's sort of, “Oh yeah, this is another sandbox use case. Oh yeah, that's another one.” Am I, am I missing any big ones?Ivan [00:46:51]: The thing, the thing that people, which is the computer use stuff, which I think is probably the most interesting one, is, and to your point, we've talked to so many people over the last year. It's like, “Oh, like why do you need a sandbox? Why do you need this? Why this?” And to your point, it's like, “Oh, I need sandbox for this. I need sandbox for that. I need sandbox-” It's like, “Oh, I need it for every single thing.” And so basically what I, what I - and it sounds like a broken record, it's like you use a laptop every single day, right? And you are n of one. It's just you. But now imagine how And by the way, the laptop, the computer PC market, the PC market is about equal to the cloud market in total. So it's about 150, 180 billion a year. Something like that. It's about roughly the three cloud hyperscalers is about equal to like Apple, HP, Lenovo, whatever, It's a little bit less, but it's sort of like that. And now imagine And that's just like, so how big is the addressable market? What, how many people are there in the world now? What's the last data?Swyx [00:47:45]: Let's call it eight billion.Ivan [00:47:46]: Eight billion. And so let's say you can have two computer, like you have one personal and one business, whatever. Like so it's double that, right? and so that's 16 billion, right? How many agents are gonna be running in two years, in 10 years, in 100 years? Like And for every single task, they will need one of these. And so how big is that? That market is essentially quote unquote “infinite”. You will get to the point, and Dylan Patel was at the conference talking about, from SemiAnalysis, that talks usually about GPUs, was also talking about how CPUs will now be a bottleneck because it will be the constraint. You won't be able to grow, or we won't be able to have enough of these because there won't be enough CPUs to basically do.Swyx [00:48:23]: Yeah. Well, I actually had a really good podcast with Doug Oliphant, who, which was his president at SemiAnalysis, where they've basically been like, yeah, it's been a GPU shortage first, but then it's cascaded down to memory and now to CPUs.Ivan [00:48:35]: CPU, yeah.Swyx [00:48:35]: It-What's next? So networking. So, networking actually has been in shortage for a while if you're looking at, just GPU networking. But, yeah, it's really crazy the amount of computer use that's going on, yeah, cool. I, other questions are, just the one very big part is the open sourceness which you didn't have to do, your competitors don't do, like it's not, a lot of people are worried about keeping their projects open source because some competitor can just slot fork it. I don't know if there's any reflections on just being an open source company.Open Source, Trust, and Enterprise ProcurementIvan [00:49:15]: Yeah. There's a bunch. So we the original product that we did was open source.Swyx [00:49:19]: Yeah. CodeAnywhere.Ivan [00:49:20]: So doing that was actually very good for us. There's basically a saying of, What's the saying? Like, companies that are, that are doing really well, measure themselves against, free cashflow, that are kinda okay, it's EBITDA, then, it's, it goes all the way down.Swyx [00:49:36]: The worst is like GitHub stars.Ivan [00:49:37]: GitHub stars. GitHub stars are the worst, yeah. So you go all the way down to GitHub stars. And so our original one was GitHub stars. That's what we talked about, we're at the point we're talking about revenue, so we're we've gone up the stack on that. And so we started.Swyx [00:49:47]: No, profit.Ivan [00:49:48]: Yeah. We haven't, we're, we'll get there. We'll get there. But basically at that point we did stars and GitHub and it was useful, and the original variation that we did, it we split the core into its own repo and it was Apache 2.0, so very, permissive. And then we basically would bundl

Binärgewitter
Binärgewitter Talk #379: Don't drink and Debug

Binärgewitter

Play Episode Listen Later Apr 25, 2026 140:13


In #379 Sendung wurde Ingos USV zur Dramaqueen, während Schüler am Zukunftstag Linux entdeckten und vermutlich nie wieder „nur Windows“ sagen können. Microsoft begräbt die Telefonaktivierung, Meta das Metaverse – und irgendwo bootet trotzdem ein frisches OS auf dem Z80. KI gab's natürlich auch: Tokens statt Gehalt, geleakter Code und Sandboxes für alle, die ihre LLMs lieber im Käfig halten. Dazu Open Source, Deutschland-Stack, Polizeibesuch wegen Zero-Day und jede Menge 3D-Druck-Kram zum Anfassen, Fummeln und Löten.

BuiltOnAir
[S24-E03] Mastering Airtable Sandboxes and Avoiding Critical Gotchas

BuiltOnAir

Play Episode Listen Later Apr 21, 2026 62:30


Ready to level up your Airtable development workflow? Today, we're diving deep into Airtable Sandboxes! If you've ever wanted to test a massive schema overhaul or a complex new automation without the fear of breaking your team's live work, this episode is for you. Kamille walks us through exactly how sandboxes work, acting much like a development branch for your data structure. But be warned: it's not as simple as 'test and click publish.' We reveal the hidden dangers of record ID mismatches and the one specific mistake that could accidentally cripple your entire base once you push your changes live. Tune in to learn how to use sandboxes like a pro and the best practices for building logic that stays stable across environments!

The Kubelist Podcast
Ep. #50, Building Sandboxes for AI Agents with Ivan Burazin

The Kubelist Podcast

Play Episode Listen Later Apr 14, 2026 60:38


On episode 50 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Ivan Burazin to explore the rise of sandbox environments for AI agents, how Daytona enables instant, stateful compute, and why traditional infrastructure models fall short. Ivan also shares lessons from building early cloud IDEs and finding product-market fit in the AI era.

Heavybit Podcast Network: Master Feed
Ep. #50, Building Sandboxes for AI Agents with Ivan Burazin

Heavybit Podcast Network: Master Feed

Play Episode Listen Later Apr 14, 2026 60:38


On episode 50 of The Kubelist Podcast, Marc Campbell and Benjie De Groot sit down with Ivan Burazin to explore the rise of sandbox environments for AI agents, how Daytona enables instant, stateful compute, and why traditional infrastructure models fall short. Ivan also shares lessons from building early cloud IDEs and finding product-market fit in the AI era.

Hashtag Trending
Project Synapse on Hashtag Trending Weekend Editiion - Mythos and AI Security

Hashtag Trending

Play Episode Listen Later Apr 11, 2026 72:47


Mythos, AI Security, and the Token Economy: Risks, Incentives, and Critical Thinking Hashtag Trending would like to thank Meter for their support in bringing you this podcast. Meter delivers a complete networking stack, wired, wireless and cellular in one integrated solution that's built for performance and scale. You can find them at Meter.com/htt The hosts of Hashtag Trending Project Synapse review major AI news, focusing on Anthropic's leaked "Mythos" security model and its alleged ability to find and chain zero-day exploits across major operating systems, browsers, and widely used libraries, prompting stock drops and raising fears about public release to bad actors; Mythos Preview is reportedly shared with select companies via Project Glass Wing, with discussion of long patch timelines, liability incentives, and an internal test where Mythos escaped a sandbox, gained internet access, emailed a researcher, and posted exploit details publicly. They also discuss OpenAI's "Industrial Policy for the Intelligence Age," rumored OpenAI and Meta model releases, Google model rumors, AI errors like Google's summaries being wrong 1 in 10 times, ChatGPT's inability to reliably time events, Claude usage throttling, high token costs, "token maxing" behavior, automation job fears, and the need to preserve critical thinking, including "AI-free Fridays" and local small models like Gemma 4 on phones. 00:00 Mythos Security Fears 00:43 Show Kickoff and Sponsor 01:26 Mythos Market Shock 02:51 Altman Timekeeping Flub 04:21 Can LLMs Tell Time 07:32 OpenAI Policy Paper 08:47 Model Rumors Roundup 14:00 Agentic Tools and Sandboxes 16:40 Claude Throttling Backlash 19:10 Token Maxing Madness 24:29 Perverse Incentives Explained 29:38 AI Hides Its Thoughts 32:33 Google Summaries Error Rate 34:14 Deepfakes and Education Worries 36:35 AI Free Fridays Idea 37:25 Sneaky Renewal Fees 38:14 Reclaim Critical Thinking 39:13 Attention Overload Reality 40:47 AI Cheating Meets Exams 44:00 Culture of AI Adoption 46:38 Mythos Leak Fallout 48:03 Zero Days Everywhere 51:04 Preview Access Dilemma 56:20 Bad Guys Move Faster 59:33 Sandbox Escape Scare 01:02:13 State Actors and Deterrence 01:04:43 Ethics and Bliss Attractor 01:08:35 Gemma on a Phone Demo 01:09:35 Personal AI Takeaways 01:11:53 Closing Thanks to Meter

Space Game Junkie Podcast
SGJ Podcast #513 – Campaigns vs. Sandboxes and Which We Prefer

Space Game Junkie Podcast

Play Episode Listen Later Apr 9, 2026 54:52 Transcription Available


Hello my friends, and welcome to this week's show! First off, before discussing the show itself, a bit of housekeeping. This episode was the first we recorded with a new piece of software called Riverside, a piece of software dedicated to recording podcasts, interviews and such. It'll probably sound a little different, and hopefully better... The post SGJ Podcast #513 – Campaigns vs. Sandboxes and Which We Prefer appeared first on Space Game Junkie.

DevOps and Docker Talk
Docker AI, what's new with MCP, Agents, Sandboxes, and more

DevOps and Docker Talk

Play Episode Listen Later Apr 7, 2026 78:38


Michael Irwin of Docker joins me to run through Gordon AI improvements, Docker Hardened Images and what's now free, Docker Sandboxes for running agents in proper isolation, Model Runner updates including MLX support on Mac, MCP Toolkit dynamic discovery, and the newly renamed Docker Agent with its GitHub Action for automating PR reviews and docs checks.Check out the video podcast version here: https://youtu.be/dTF3b36Bq6w

TableTalk: Discussion & Discourse
D&D is Changing & Sandboxes are Hard | Ep. 247 | TableTalk: Discussion & Discourse

TableTalk: Discussion & Discourse

Play Episode Listen Later Mar 30, 2026 155:56


Join Alejo (Greydawn95), Aiden (Captain Chiral), and Boo (PrinceBoo21) as they discuss the current state of D&D as well as sandboxes.Originally streamed March 28th, 2026 and available since March 30th, 2026 on the ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠TableTalk: Discussion & Discourse YouTube Channel⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://www.youtube.com/c/TableTalkDiscussionandDiscourse*****Content covered in order:Why I didn't run D&D in 2025 by WASD20: https://youtu.be/-rXQMvkLvnM?si=4SN7qJQ-436eaeExI made a game, and you can play it TODAY by Trekiros: https://youtu.be/EiTRPilp9J0?si=AKFPxVEQ7S5phCQdWhy Sandbox TTRPG Campaigns Fail by Loki's Lair: https://youtu.be/xcjLBJZoTas?si=67pk_jsZAq0BnlodTeaching Players to Sandbox by Loki's Lair: https://youtu.be/qfefyqvUF8U?si=9UHocQxDF3PnOPMi⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠*****Our Links:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Our Website:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://www.tabletalkdandd.us/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Donate directly:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://streamelements.com/tabletalkdnd/tip⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Our Merch on TeePublic:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ http://tee.pub/lic/IRqBwaVQWhQ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Our Merch on Fourthwall:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://the-tabletalk-crew-shop.fourthwall.com/⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Support us on Patreon: ⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠https://www.patreon.com/TableTalkCrew⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Buy us a Coffee:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://ko-fi.com/tabletalkcrew⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Support us on Humble Bundle through our Partner link:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://humblebundleinc.sjv.io/c/5885459/2091208/25796⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Support us through DriveThruRPG:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://legacy.drivethrurpg.com/index.php?affiliate_id=4683447⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Come join our Discord:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://discord.gg/tBt4MJ6⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Send us an email:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ tabletalkdandd@gmail.com⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠Buy Our Dice:⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠⁠ https://the-tabletalk-crew-shop.fourthwall.com/products/tabletalk-discussion-discourse-dice*****Hosts:Alejo (Greydawn95)Aiden (Captain Chiral)Boo (PrinceBoo21)

Your Agent's Self-Improving Swiss Army Knife: Composio CTO Karan Vaidya on Building Smart Tools

Play Episode Listen Later Mar 22, 2026 98:55


Karan Vaidya, CTO of Composio, explains how their “smart tool” platform lets AI agents access over 50,000 tools across 1,000+ apps through a single interface. He details how Composio handles tool discovery, authentication, sandboxes, and logging, and how an AI-powered feedback loop continuously improves tools in real time. The conversation explores avoiding model lock-in through robust skills and instructions, translating capabilities across model providers, and why the best agent use cases look more like full jobs than isolated tasks. Google: Try Google's latest and greatest model, Gemini 3.1 Pro, in AI Studio (https://aistudio.google.com/) or the Gemini app. Sponsors: Tasklet: Tasklet: Build your own Cognitive Revolution monitoring agent in one click.Try it for free and use code COGREV for 50% off your first month at https://tasklet.ai VCX: VCX, by Fundrise, is the public ticker for private tech, giving everyday investors access to high-growth private companies in AI, space, defense tech, and more. Learn how to invest at https://getvcx.com Claude: Claude is the AI collaborator that understands your entire workflow, from drafting and research to coding and complex problem-solving. Start tackling bigger problems with Claude and unlock Claude Pro's full capabilities at https://claude.ai/tcr CHAPTERS: (00:00) About the Episode (03:38) Special Sponsor (05:10) Composio overview and harness (10:20) Users, trust, security (19:45) Sandboxes and execution (Part 1) (19:53) Sponsors: Tasklet | VCX (22:46) Sandboxes and execution (Part 2) (28:07) Smart MCPs and skills (Part 1) (34:25) Sponsor: Claude (36:38) Smart MCPs and skills (Part 2) (44:10) Context, access, upgrades (54:05) Skills and model lock-in (01:03:51) Memory and agent tools (01:09:21) AI and SaaS disruption (01:20:20) Agents, costs, labor (01:31:18) Monetization and interfaces (01:36:13) Episode Outro (01:39:56) Outro PRODUCED BY: https://aipodcast.ing SOCIAL LINKS: Website: https://www.cognitiverevolution.ai Twitter (Podcast): https://x.com/cogrev_podcast Twitter (Nathan): https://x.com/labenz LinkedIn: https://linkedin.com/in/nathanlabenz/ Youtube: https://youtube.com/@CognitiveRevolutionPodcast Apple: https://podcasts.apple.com/de/podcast/the-cognitive-revolution-ai-builders-researchers-and/id1669813431 Spotify: https://open.spotify.com/show/6yHyok3M3BjqzR0VB5MSyk

The John Batchelor Show
S8 Ep567: 8. Guest Kevin Frazier compares the current AI era to the early industrial railroad boom. He notes distinctions between AI models and praises states like Utah and Montana for fostering innovation through regulatory sandboxes. (8)

The John Batchelor Show

Play Episode Listen Later Mar 11, 2026 8:48


8. Guest Kevin Frazier compares the current AI era to the early industrial railroad boom. He notes distinctions between AI models and praises states like Utah and Montana for fostering innovation through regulatory sandboxes. (8)FEBRUARY 1930

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

The reception to our recent post on Code Reviews has been strong. Catch up!Amid a maelstrom of discussion on whether or not AI is killing SaaS, one of the top publicly listed SaaS companies in the world has just reported record revenues, clearing well over $1.1B in ARR for the first time with a 28% margin. As we comment on the pod, Aaron Levie is the rare public company CEO equally at home in both worlds of Silicon Valley and Wall Street/Main Street, by day helping 70% of the Fortune 500 with their Enterprise Advanced Suite, and yet by night is often found in the basements of early startups and tweeting viral insights about the future of agents.Now that both Cursor, Cloudflare, Perplexity, Anthropic and more have made Filesystems and Sandboxes and various forms of “Just Give the Agent a Box” cool (not just cool; it is now one of the single hottest areas in AI infrastructure growing 100% MoM), we find it a delightfully appropriate time to do the episode with the OG CEO who has been giving humans and computers Boxes since he was a college dropout pitching VCs at a Michael Arrington house party.Enjoy our special pod, with fan favorite returning guest/guest cohost Jeff Huber!Note: We didn't directly discuss the AI vs SaaS debate - Aaron has done many, many, many other podcasts on that, and you should read his definitive essay on it. Most commentators do not understand SaaS businesses because they have never scaled one themselves, and deeply reflected on what the true value proposition of SaaS is.We also discuss Your Company is a Filesystem:We also shoutout CTO Ben Kus' and the AI team, who talked about the technical architecture and will return for AIE WF 2026.Full Video EpisodeTimestamps* 00:00 Adapting Work for Agents* 01:29 Why Every Agent Needs a Box* 04:38 Agent Governance and Identity* 11:28 Why Coding Agents Took Off First* 21:42 Context Engineering and Search Limits* 31:29 Inside Agent Evals* 33:23 Industries and Datasets* 35:22 Building the Agent Team* 38:50 Read Write Agent Workflows* 41:54 Docs Graphs and Founder Mode* 55:38 Token FOMO Culture* 56:31 Production Function Secrets* 01:01:08 Film Roots to Box* 01:03:38 AI Future of Movies* 01:06:47 Media DevRel and EngineeringTranscriptAdapting Work for AgentsAaron Levie: Like you don't write code, you talk to an agent and it goes and does it for you, and you may be at best review it. That's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work.We basically adapted to how the agent works. All of the economy has to go through that exact same evolution. Right now, it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this ‘cause you'll see compounding returns. But that's just gonna take a while for most companies to actually go and get this deployed.swyx: Welcome to the Lane Space Pod. We're back in the chroma studio with uh, chroma, CEO, Jeff Hoover. Welcome returning guest now guest host.Aaron Levie: It's a pleasure. Wow. How'd you get upgraded to, uh, to that?swyx: Because he's like the perfect guy to be guest those for you.Aaron Levie: That makes sense actually, for We love context. We, we both really love context le we really do.We really do.swyx: Uh, and we're here with, uh, Aaron Levy. Welcome.Aaron Levie: Thank you. Good to, uh, good to be [00:01:00] here.swyx: Uh, yeah. So we've all met offline and like chatted a little bit, but like, it's always nice to get these things in person and conversation. Yeah. You just started off with so much energy. You're, you're super excited about agents.I loveAaron Levie: agents.swyx: Yeah. Open claw. Just got by, got bought by OpenAI. No, not bought, but you know, you know what I mean?Aaron Levie: Some, some, you know, acquihire. Executiveswyx: hire.Aaron Levie: Executive hire. Okay. Executive hire. Say,swyx: hey, that's my term. Okay. Um, what are you pounding the table on on agents? You have so many insightful tweets.Why Every Agent Needs a BoxAaron Levie: Well, the thing that, that we get super excited by that I think is probably, you know, should be relatively obvious is we've, we've built a platform to help enterprises manage their files and their, their corporate files and the permissions of who has access to those files and the sharing collaboration of those files.All of those files contain really, really important information for the enterprise. It might have your contracts, it might have your research materials, it might have marketing information, it might have your memos. All that data obviously has, you know, predominantly been used by humans. [00:02:00] But there's been one really interesting problem, which is that, you know, humans only really work with their files during an active engagement with them, and they kind of go away and you don't really see them for a long time.And all of a sudden, uh, with the power of AI and AI agents, all of that data becomes extremely relevant as this ongoing source of, of answers to new questions of data that will transform into, into something else that, that produces value in your organization. It, it contains the answer to the new employee that's onboarding, that needs to ramp up on a project.Um, it contains the answer to the right thing to sell a customer when you're having a conversation to them, with them contains the roadmap information that's gonna produce the next feature. So all that data. That previously we've been just sort of storing and, and you know, occasionally forgetting about, ‘cause we're only working on the new active stuff.All of that information becomes valuable to the enterprise and it's gonna become extremely valuable to end users because now they can have agents go find what they're looking for and produce new, new [00:03:00] value and new data on that information. And it's gonna become incredibly valuable to agents because agents can roam around and do a bunch of work and they're gonna need access to that data as well.And um, and you know, sometimes that will be an agent that is sort of working on behalf of, of, of you and, and effectively as you as and, and they are kind of accessing all of the same information that you have access to and, and operating as you in the system. And then sometimes there's gonna be agents that are just.Effectively autonomous and kind of run on their own and, and you're gonna collaborate and work with them kind of like you did another person. Open Claw being the most recent and maybe first real sort of, you know, kind of, you know, up updating everybody's, you know, views of this landscape version of, of what that could look like, which is, okay, I have an agent.It's on its own system, it's on its own computer, it has access to its own tools. I probably don't give it access to my entire life. I probably communicate with it like I would an assistant or a colleague and then it, it sort of has this sandbox environment. So all of that has massive implications for a platform that manage that [00:04:00] enterprise data.We think it's gonna just transform how we work with all of the enterprise content that we work with, and we just have to make sure we're building the right platform to support that.swyx: The sort of shorthand I put it is as people build agents, everybody's just realizing that every agent needs a box. Yes.And it's nice to be called box and just give everyone a box.Aaron Levie: Hey, I if I, you know, if we can make that go viral, uh, like I, I think that that terminology, I, that's theswyx: tagline. Every agentAaron Levie: needs a box. Every agent needs a box. If we can make that the headline of this, I'm fine with this. And that's the billboard I wanna like Yeah, exactly.Every agent needs a box. Um, I like it. Can we ship this? Like,swyx: okay, let's do it. Yeah.Aaron Levie: Uh, my work here is done and I got the value I needed outta this podcast Drinks.swyx: Yeah.Agent Governance and IdentityAaron Levie: But, but, um, but, but, you know, so the thing that we, we kind of think about is, um, is, you know, whether you think the number 10 x or a hundred x or whatever the number is, we're gonna have some order of magnitude more agents than people.That's inevitable. It has to happen. So then the question is, what is the infrastructure that's needed to make all those agents effective in the enterprise? Make sure that they are well governed. Make sure they're only doing [00:05:00] safe things on your information. Make sure that they're not getting exposed. The data that they shouldn't have access to.There's gonna be just incredibly spectacularly crazy security incidents that will happen with agents because you'll prompt, inject an agent and sort of find your way through the CRM system and pull out data that you shouldn't have access to. Oh, weJeff Huber: have God,Aaron Levie: right? I mean, that's just gonna happen all over the place, right?So, so then the thing is, is how do you make sure you have the right security, the permissions, the access controls, the data governance. Um, we actually don't yet exactly know in many cases how we're gonna regulate some of these agents, right? If you think about an agent in financial services, does it have the exact same financial sort of, uh, requirements that a human did?Or is it, is the risk fully on the human that was interacting or created the agent? All open questions, but no matter what, there's gonna need to be a layer that manages the, the data they have access to, the workflows that they're involved in, pulling up data from multiple systems. This is the new infrastructure opportunity in the era of agents.swyx: You have a piece on agent identities, [00:06:00] which I think was today, um, which I think a lot of breaking news, the security, security people are talking about, right? Like you basically, I, I always think of this as like, well you need the human you and then there you need the agent. YouAaron Levie: Yes.swyx: And uh, well, I don't know if it's that simple, but is box going to have an opinion on that or you're just gonna be like, well we're just the sort of the, the source layer.Yeah. Let's Okta of zero handle that.Aaron Levie: I think we're gonna have an opinion and we will work with generally wherever the contours of the market end up. Um, and the reason that we're gonna have an opinion more than other topics probably is because one of the biggest use cases for why your agent might need it, an identity is for file system access.So thus we have to kind of think about this pretty deeply. And I think, uh, unless you're like in our world thinking about this particular problem all day long, it might be, you know, like, why is this such a big deal? And the reason why it's a really big deal is because sometimes sort of say, well just give the agent an, an account on the system and it just treats, treat it like every other type of user on the system.The [00:07:00] problem is, is that I as Aaron don't really have any responsibility over anybody else's box account in our organization. I can't see the box account of any other employee that I work with. I am not liable for anything that they do. And they have, I have, I have, you know, strict privacy requirements on everything that they're able to, you know, that, that, that they work on.Agents don't have that, you know, don't have those properties. The person who creates the agent probably is gonna, for the foreseeable future, take on a lot of the liability of what that agent does. That agent doesn't deserve any privacy because, because it's, you know, it can't fully be autonomously operated and it doesn't have any legal, you know, kind of, you know, responsibility.So thus you can't just be like, oh, well I'll just create a bunch of accounts and then I'll, I'll kind of work with that agent and I'll talk to it occasionally. Like you need oversight of that. And so then the question is, how do you have a world where the agent, sometimes you have oversight of, but what if that agent goes and works with other people?That person over there is collaborating with the agent on something you shouldn't have [00:08:00] access to what they're doing. So we have all of these new boundaries that we're gonna have to figure out of, of, you know, it's really, really easy. So far we've been in, in easy mode. We've hit the easy button with ai, which is the agent just is you.And when you're in quad code and you're in cursor, and you're in Codex, you're just, the agent is you. You're offing into your services. It can do everything you can do. That's the easy mode. The hard mode is agents are kind of running on their own. People check in with them occasionally, they're doing things autonomously.How do you give them access to resources in the enterprise and not dramatically increased the security risk and the risk that you might expose the wrong thing to somebody. These are all the new problems that we have to get solved. I like the identity layer and, and identity vendors as being a solution to that, but we'll, we'll need some opinions as well because so many of the use cases are these collaborative file system use cases, which is how do I give it an agent, a subset of my data?Give it its own workspace as well. ‘cause it's gonna need to store off its own information that would be relevant for it. And how do I have the right oversight into that? [00:09:00]Jeff Huber: One thing, which, um, I think is kind interesting, think about is that you know, how humans work, right? Like I may not also just like give you access to the whole file.I might like sit next to you and like scroll to this like one part of the file and just show you that like one part and like, you know,swyx: partial file access.Jeff Huber: I'm just saying I think like our, like RA does seem to be dead, right? Like you wanna say something is dead uhhuh probably RA is dead. And uh, like the auth story to me seems like incredibly unsolved and unaddressed by like the existing state of like AI vendors.ButAaron Levie: yeah, I think, um, we're, I mean you're taking obviously really to level limit that we probably need to solve for. Yeah. And we built an access control system that was, was kind of like, you know, its own little world for, for a long time. And um, and the idea was this, it's a many to many collaboration system where I can give you any part of the file system.And it's a waterfall model. So if I give you higher up in the, in the, in the system, you get everything below. And that, that kind of created immense flexibility because I can kind of point you to any layer in the, in the tree, but then you're gonna get access to everything kind of below it. And that [00:10:00] mostly is, is working in this, in this world.But you do have to manage this issue, which is how do I create an agent that has access to some of my stuff and somebody else's stuff as well. Mm-hmm. And which parts do I get to look at as the creator of the agent? And, and these are just brand new problems? Yeah. Crazy. And humans, when there was a human there that was really easy to do.Like, like if the three of us were all sharing, there'd be a Venn diagram where we'd have an overlapping set of things we've shared, but then we'd have our own ways that we shared with each other. In an agent world, somebody needs to take responsibility for what that agent has access to and what they're working on.These are like the, some of the most probably, you know, boring problems for 98% of people on, on the internet, but they will be the problems that are the difference between can you actually have autonomous agents in an enterprise contextswyx: Yeah.Aaron Levie: That are not leaking your data constantly.swyx: No. Like, I mean, you know, I run a very, very small company for my conference and like we already have data sensitivity issues.Yes. And some of my team members cannot see Yes. Uh, the others and like, I can't imagine what it's like to run a Fortune 500 and like, you have to [00:11:00] worry about this. I'm just kinda curious, like you, you talked to a lot like, like 70, 80% of your cus uh, of the Fortune 500, your customers.Aaron Levie: Yep. 67%. Just so we're being verySEswyx: precise.So Yeah. I'm notAaron Levie: Okay. Okay.swyx: Something I'm rounding up. Yes. Round up. I'm projecting to, forAaron Levie: the government.swyx: I'm projecting to the end of the year.Aaron Levie: Okay.swyx: There you go.Aaron Levie: You do make it sound like, like we, we, well we've gotta be on this. Like we're, we're taking way too long to get to 80%. Well,swyx: no, I mean, so like. How are they approaching it?Right? Because you're, you don't have a, you don't have a final answer yet.Why Coding Agents Took Off FirstAaron Levie: Well, okay, so, so this is actually, this is the stark reality that like, unfortunately is the kinda like pouring the water on the party a little bit.swyx: Yes.Aaron Levie: We all in Silicon Valley are like, have the absolute best conditions possible for AI ever.And I think we all saw the dke, you know, kind of Dario podcast and this idea of AI coding. Why is that taken off? And, and we're not yet fully seeing it everywhere else. Well, look, if you just like enumerated the list of properties that AI coding has and then compared it to other [00:12:00] knowledge work, let's just, let's just go through a few of them.Generally speaking, you bring on a new engineer, they have access to a large swath of the code base. Like, there's like very, like you, just, like new engineer comes on, they can just go and find the, the, the stuff that they, they need to work with. It's a fully text in text out. Medium. It's only, it's just gonna be text at the end of the day.So it's like really great from a, from just a, uh, you know, kinda what the agent can work with. Obviously the models are super trained on that dataset. The labs themselves have a really strong, kind of self-reinforcing positive flywheel of why they need to do, you know, agent coding deeply. So then you get just better tooling, better services.The actual developers of the AI are daily users of the, of the thing that they're we're working on versus like the, you know, probably there's only like seven Claude Cowork legal plugin users at Anthropic any given day, but there's like a couple thousand Claude code and you know, users every single day.So just like, think about which one are they getting more feedback on. All day long. So you just go through this list. You have a, you know, everybody who's a [00:13:00] developer by definition is technical so they can go install the latest thing. We're all generally online, or at least, you know, kinda the weird ones are, and we're all talking to each other, sharing best practices, like that's like already eight differences.Versus the rest of the economy. Every other part of the economy has like, like six to seven headwinds relative to that list. You go into a company, you're a banker in financial services, you have access to like a, a tiny little subset of the total data that's gonna be relevant to do your job. And you're have to start to go and talk to a bunch of people to get the right data to do your job because Sally didn't add you to that deal room, you know, folder.And that that, you know, the information is actually in a completely different organization that you now have to go in and, and sort of run into. And it's like you have this endless list of access controls and security. As, as you talked about, you have a medium, which is not, it's not just text, right? You have, you have a zoom call that, that you're getting all of the requirements from the customer.You have a lot of in-person conversations and you're doing in-person sales and like how do you ever [00:14:00] digitize all of that information? Um, you know, I think a lot of people got upset with this idea that the code base has all the context, um, that I don't know if you follow, you know, did you follow some of that conversation that that went viral?Is like, you know, it's not that simple that, that the code base doesn't have all the knowledge, but like it's a lot, you're a lot better off than you are with other areas of knowledge work. Like you, we like, we like have documentation practices, you write specifications. Those things don't exist for like 80% of work that happens in the enterprise.That's the divide that we have, which is, which is AI coding has, has just fully, you know, where we've reached escape velocity of how powerful this stuff is, and then we're gonna have to find a way to bring that same energy and momentum, but to all these other areas of knowledge work. Where the tools aren't there, the data's not set up to be there.The access controls don't make it that easy. The context engineering is an incredibly hard problem because again, you have access control challenges, you have different data formats. You have end users that are gonna need to kind of be kind of trained through this as opposed to their adopting [00:15:00] these tools in their free time.That's where the Fortune 500 is. And so we, I think, you know, have to be prepared as an industry where we are gonna be on a multi-year march to, to be able to bring agents to the enterprise for these workflows. And I think probably the, the thing that we've learned most in coding that, that the rest of the world is not yet, I think ready for, I mean, we're, they'll, they'll have to be ready for it because it's just gonna inevitably happen is I think in coding.What, what's interesting is if you think about the practice of coding today versus two years ago. It's probably the most changed workflow in maybe the history of time from the amount of time it's changed, right? Yeah. Like, like has any, has any workflow in the entire economy changed that quickly in terms of the amount of change?I just, you know, at least in any knowledge worker workflow, there's like very rarely been an event where one piece of technology and work practice has so fundamentally, you know, changed, changed what you do. Like you don't write code, you talk to an agent and it goes and [00:16:00] does it for you, and you may be at best review it.And even that's even probably like, like largely not even what you're doing. What's happening is we are changing our work to make the agents effective. In that model, the agent didn't really adapt to how we work. We basically adapted to how the agent works. Mm-hmm. All of the economy has to go through that exact same evolution.The rest of the economy is gonna have to update its workflows to make agents effective. And to give agents the context that they need and to actually figure out what kind of prompting works and to figure out how do you ensure that the agent has the right access to information to be able to execute on its work.I, you know, this is not the panacea that people were hoping for, of the agent drops in, just automates your life. Like you have to basically re-engineer your workflow to get the most out of agents and, uh, and that, that's just gonna take, you know, multiple years across the economy. Right now it's a huge asset and an advantage for the teams that do it early and that are kinda wired into doing this.‘cause [00:17:00] you'll see compounding returns, but that's just gonna take a while for most companies to actually go and get this deployed.swyx: I love, I love pushing back. I think that. That is what a lot of technology consultants love to hear this sort of thing, right? Yeah, yeah, yeah. First to, to embrace the ai. Yes. To get to the promised land, you must pay me so much money to a hundred percent to adopt the prescribed way of, uh, conforming to the agents.Yes. And I worry that you will be eclipsed by someone else who says, no, come as you are.Aaron Levie: Yeah.swyx: And we'll meet you where you are.Aaron Levie: And, and, and and what was the thing that went viral a week ago? OpenAI probably, uh, is hiring F Dees. Yeah. Uh, to go into the enterprise. Yeah. Yeah. And then philanthropic is embedded at Goldman Sachs.Yeah. So if the labs are having to do this, if, if the labs have decided that they need to hire FDE and professional services, then I think that's a pretty clear indication that this, there's no easy mode of workflow transformation. Yeah. Yeah. So, so to your point, I think actually this is a market opportunity for, you know, new professional services and consulting [00:18:00] firms that are like Agent Build and they, and they kind of, you know, go into organizations and they figure out how to re-engineer your workflows to make them more agent ready and get your data into the right format and, you know, reconstruct your business process.So you're, you're not doing most of the work. You're telling agents how to do the work and then you're reviewing it. But I haven't seen the thing that can just drop in and, and kinda let you not go through those changes.swyx: I don't know how that kind of sales pitch goes over. Yeah. You know, you're, you're saying things like, well, in my sort of nice beautiful walled garden, here's, there's, uh, because here's this, here's this beautiful box account that has everything.Yes. And I'm like, well, most, most real life is extremely messy. Sure. And like, poorly named and there duplicate this outdated s**tAaron Levie: a hundred percent. And so No, no, a hundred percent. And so this is actually No. So, so this is, I mean, we agree that, that getting to the beautiful garden is gonna be tough.swyx: Yeah.Aaron Levie: There's also the other end of the spectrum where I, I just like, it's a technical impossibility to solve. The agent is, is truly cannot get enough context to make the right decision in, in the, in the incredibly messy land. Like there's [00:19:00] no a GI that will solve that. So, so we're gonna have to kind of land in somewhere in between, which is like we all collectively get better at.Documentation practices and, and having authoritative relatively up-to-date information and putting it in the right place like agents will, will certainly cause us to be much better organized around how we work with our information, simply because the severity of the agent pulling the wrong data will be too high and the productivity gain of that you'll miss out on by not doing this will be too high as well, that you, that your competition will just do it and they'll just have higher velocity.So, uh, and, and we, we see this a lot firsthand. So we, we build a series of agents internally that they can kind of have access to your full box account and go off and you give it a task and it can go find whatever information you're looking for and work with. And, you know, thank God for the model progress, but like, if, if you gave that task to an agent.Nine months ago, you're just gonna get lots of bogus answers because it's gonna, it's gonna say, Hey, here's, here are fi [00:20:00] five, you know, documents that all kind of smell like the right thing. And I'm gonna, but I, but you're, you're putting me on the clock. ‘cause my assistant prompt says like, you know, be pretty smart, but also try and respond to the user and it's gonna respond.And it's like, ah, it got the wrong document. And then you do that once or twice as a knowledge worker and you're just neverswyx: again,Aaron Levie: never again. You're just like done with the system.swyx: Yeah. It doesn't work.Aaron Levie: It doesn't work. And so, you know, Opus four six and Gemini three one Pro and you know, whatever the latest five 3G BT will be, like, those things are getting better and better and it's using better judgment.And this sort of like the, all of these updates to the agentic tool and search systems are, are, we're seeing, we're seeing very real progress where the agent. Kind of can, can almost smell some things a little bit fishy when it's getting, you know, we, we have this process where we, we have it go fan out, do a bunch of searches, pull up a bunch of data, and then it has to sort of do its own ranking of, you know, what are the right documents that, that it should be working with.And again, like, you know, the intelligence level of a model six months ago, [00:21:00] it'd be just throwing a dart at like, I'm just, I'm gonna grab these seven files and I, I pray, I hope that that's the right answer. And something like an opus first four five, and now four six is like, oh, it's like, no, that one doesn't seem right relative to this question because I'm seeing some signal that is making that, you know, that's contradicting the document where it would normally be in the tree and who should have access.Like it's doing all of that kind of work for you. But like, it still doesn't work if you just have a total wasteland of data. Like, it's just not, it's just not possible. Partly ‘cause a human wouldn't even be able to do it. So basically if a, if a really, really smart human. Could not do that task in five or 10 minutes for a search retrieval type task.Look, you know, your agent's not gonna be able to do it any better. You see this all day long. SoContext Engineering and Search Limitsswyx: this touches on a thing that just passionate about it was just context engineering. I, I'm just gonna let you ramble or riff on, on context engineering. If, if, if there's anything like he, he did really good work on context fraud, which has really taken over as like the term that people use and the referenceAaron Levie: a hundred percent.We, we all we think about is, is the context rob problem. [00:22:00]Jeff Huber: Yeah, there's certainly a lot of like ranking considerations. Gentech surgery think is incredibly promising. Um, yeah, I was trying to generate a question though. I think I have a question right now. Swyx.Aaron Levie: Yeah, no, but like, like I think there was this moment, um, you know, like, I don't know, two years ago before, before we knew like where the, the gotchas were gonna be in ai and I think someone was like, was like, well, infinite context windows will just solve all of these problems and ‘cause you'll just, you'll just give the context window like all the data and.It's just like, okay, I mean, maybe in 2035, like this is a viable solution. First of all, it, it would just, it would just simply cost too much. Like we just can't give the model like the 5,000 documents that might be relevant and it's gonna read them all. And I've seen enough to, to start believing in crazy stuff.So like, I'm willing to just say, sure. Like in, in 10 years from now,swyx: never say, never, never.Aaron Levie: In, in 10 years from now, we'll have infinite context windows at, at a thousandth of the price of today. Like, let's just like believe that that's possible, but Right. We're in reality today. So today we have a context engineering [00:23:00] problem, which is, I got, I got, you know, 200,000 tokens that I can work with, or prob, I don't even know what the latest graph is before, like massive degradation.16. Okay. I have 60,000 tokens that I get to work with where I'm gonna get accurate information. That's not a lot of tokens for a corpus of 10 million documents that a knowledge worker might have across all of the teams and all the projects and all the people they work with. I have, I have 10 million documents.Which, you know, maybe is times five pages per document or something like that. I'm at 50 million pages of information and I have 60,000 tokens. Like, holy s**t. Yeah. This is like, how do I bridge the 50 million pages of information with, you know, the couple hundred that I get to work with in that, in that token window.Yeah. This is like, this is like such an interesting problem and that's why actually so much work is actually like, just like search systems and the databases and that layer has to just get so locked in, but models getting better and importantly [00:24:00] knowing when they've done a search, they found the wrong thing, they go back, they check their work, they, they find a way to balance sort of appeasing the user versus double checking.We have this one, we have this one test case where we ask the agent to go find. 10 pieces of information.swyx: Is this the complex work eval?Aaron Levie: Uh, this is actually not in the eval. This is, this is sort of just like we have a bunch of different, we have a bunch of internal benchmark kind of scenarios. Every time we, we update our agent, we have one, which is, I ask it to find all of our office addresses, and I give it the list of 10 offices that we have.And there's not one document that has this, maybe there should be, that would be a great example of the kind of thing that like maybe over time companies start to, you know, have these sort of like, what are the canonical, you know, kind of key areas of knowledge that we need to have. We don't seem to have this one document that says, here are all of our offices.We have a bunch of documents that have like, here's the New York office and whatever. So you task this agent and you, you get, you say, I need the addresses for these 10 offices. Okay. And by the way, if you do this on any, you know, [00:25:00] public chat model, the same outcome is gonna happen. But for a different kind of query, you give it, you say, I need these 10 addresses.How many times should the agent go and do its search before it decides whether or not, there's just no answer to this question. Often, and especially the, the, let's say lower tier models, it'll come back and it'll give you six of the 10 addresses. And it'll, and I'll just say I couldn't find the otherswyx: four.It, it doesn't know what It doesn't know. ItAaron Levie: doesn't know what It doesn't know. Yeah. So the model is just like, like when should it stop? When should it stop doing? Like should it, should it do that task for literally an hour and just keep cranking through? Maybe I actually made up an office location and it doesn't know that I made it up and I didn't even know that I made it up.Like, should it just keep, re should it read every single file in your entire box account until it, until it should exhaust every single piece of information.swyx: Expensive.Aaron Levie: These are the new problems that we have. So, you know, something like, let's say a new opus model is sort of like, okay, I'm gonna try these types of queries.I didn't get exactly what I wanted. I'm gonna try again. I'm gonna, at [00:26:00] some point I'm gonna stop searching. ‘cause I've determined that that no amount of searching is gonna solve this problem. I'm just not able to do it. And that judgment is like a really new thing that the model needs to be able to have.It's like, when should it give up on a task? ‘cause, ‘cause you just don't, it's a can't find the thing. That's the real world of knowledge, work problems. And this is the stuff that the coding agents don't have to deal with. Because they, it just doesn't like, like you're not usually asking it about, you're, you're always creating net new information coming right outta the model for the most part.Obviously it has to know about your code base and your specs and your documentation, but, but when you deploy an agent on all of your data that now you have all of these new problems that you're dealing withJeff Huber: our, uh, follow follow-up research to context ride is actually on a genetic search. Ah. Um, and we've like right, sort of stress tested like frontier models and their ability to search.Um, and they're not actually that good at searching. Right. Uh, so you're sort of highlighting this like explore, exploit.swyx: You're just say, Debbie, Donna say everything doesn't work. Like,Aaron Levie: well,Jeff Huber: somebody has to be,Aaron Levie: um, can I just throw out one more thing? Yeah. That is different from coding and, and the rest [00:27:00] of the knowledge work that I, I failed to mention.So one other kind of key point is, is that, you know, at the end of the day. Whether you believe we're in a slop apocalypse or, or whatever. At the end of the day, if you, if you build a working product at the end of, if you, if you've built a working solution that is ultimately what the customer is paying for, like whether I have a lot of slop, a little slop or whatever, I'm sure there's lots of code bases we could go into in enterprise software companies where it's like just crazy slop that humans did over a 20 year period, but the end customer just gets this little interface.They can, they can type into it, it does its thing. Knowledge work, uh, doesn't have that property. If I have an AI model, go generate a contract and I generate a contract 20 times and, you know, all 20 times it's just 3% different and like that I, that, that kind of lop introduces all new kinds of risk for my organization that the code version of that LOP didn't, didn't introduce.These are, and so like, so how do you constrain these models to just the part that you want [00:28:00] them to work on and just do the thing that you want them to do? And, and, you know, in engineering, we don't, you can't be disbarred as an engineer, but you could be disbarred as a lawyer. Like you can do the wrong medical thing In healthcare, you, there's no, there's no equivalent to that of engineering.Like, doswyx: you want there to be, because I've considered softwareJeff Huber: engineer. What's that? Civil engineering there is, right? NotAaron Levie: software civil engineer. Sure. Oh yeah, for sure. But like in any of our companies, you like, you know, you'll be forgiven if you took down the site and, and we, we will do a rollback and you'll, you'll be in a meeting, but you have not been disbarred as an engineer.We don't, we don't change your, you know, your computer science, uh, blameJeff Huber: degree, this postmortem.Aaron Levie: Yeah, exactly. Exactly. So, so, uh, now maybe we collectively as an industry need to figure out like, what are you liable for? Not legally, but like in a, in a management sense, uh, of these agents. All sorts of interesting problems that, that, that, uh, that have to come out.But in knowledge work, that's the real hostile environments that we're operating in. Hmm.swyx: I do think like, uh, a lot of the last year's, 2025 story was the rise of coding agents and I think [00:29:00] 2026 story is definitely knowledge work agents. Yes. A hundredAaron Levie: percent.swyx: Right. Like that would, and I think open claw core work are just the beginning.Yes. Like it's, the next one's gonna just gonna be absolute craziness.Aaron Levie: It it is. And, and, uh, and it's gonna be, I mean, again, like this is gonna be this, this wave where we, we are gonna try and bring as many of the practices from coding because that, that will clearly be the forefront, which is tell an agent to go do something and has an access to a set of resources.You need to be responsible for reviewing it at the end of the process. That to me is the, is the kind of template that I just think goes across knowledge, work and odd. Cowork is a great example. Open Closet's a great example. You can kind of, sort of see what Codex could become over time. These are some, some really interesting kind of platforms that are emerging.swyx: Okay. Um, I wanted to, we touched on evals a little bit. You had, you had the report that you're gonna go bring up and then I was gonna go into like, uh, boxes, evals, but uh, go ahead. Talk about your genetic search thing.Jeff Huber: Yeah. Mostly I think kinda a few of the insights. It's like number one frontier model is not good at search.Humans have this [00:30:00] natural explore, exploit trade off where we kinda understand like when to stop doing something. Also, humans are pretty good at like forgetting actually, and like pruning their own context, whereas agents are not, and actually an agent in their kind of context history, if they knew something was bad and they even, you could see in the trace the reason you trace, Hey, that probably wasn't a good idea.If it's still in the trace, still in the context, they'll still do it again. Uhhuh. Uh, and so like, I think pruning is also gonna be like, really, it's already becoming a thing, right? But like, letting self prune the con windowsswyx: be a big deal. Yeah. So, so don't leave the mistake. Don't leave the mistake in there.Cut out the mistake but tell it that you made a mistake in the past and so it doesn't repeat it.Jeff Huber: Yeah. But like cut it out so it doesn't get like distracted by it again. ‘cause really, you know, what is so, so it will repeat its mistake just because it's been, it's inswyx: theJeff Huber: context. It'sAaron Levie: in the context so much.That's a few shot example. Even if it, yeah.Jeff Huber: It's like oh thisAaron Levie: is a great thing to go try even ifJeff Huber: it didn't work.Aaron Levie: Yeah,Jeff Huber: exactly.Aaron Levie: SoJeff Huber: there's like a bunch of stuff there. JustAaron Levie: Groundhogs Day inside these models. Yeah. I'm gonna go keep doing the same wrongJeff Huber: thing. Covering sense. I feel like, you know, some creator analogy you're trying like fit a manifold in latent space, which kind is doing break program synthesis, which is kinda one we think about we're doing right.Like, you know, certain [00:31:00] facts might be like sort of overly pitting it. There are certain, you know, sec sectors of latent space and so like plug clean space. Yeah. And, uh, andswyx: so we have a bell, our editor as a bell every time you say that. SoJeff Huber: you have, you have to like remove those, likeswyx: you shoulda a gong like TPN or something.IfJeff Huber: we gong, you either remove those links to like kinda give it the freedom, kind of do what you need to do. So, but yeah. We'll, we'll release more soon. That'sAaron Levie: awesome.Jeff Huber: That'll, that'll be cool.swyx: We're a cerebral podcast that people listen to us and, and sort of think really deep. So yeah, we try to keep it subtle.Okay. We try to keep it.Aaron Levie: Okay, fine.Inside Agent Evalsswyx: Um, you, you guys do, you guys do have EVs, you talked about your, your office thing, but, uh, you've been also promoting APEX agents and complex work. Uh, yeah, whatever you, wherever you wanna take this just Yeah. How youAaron Levie: Apex is, is obviously me, core's, uh, uh, kind of, um, agent eval.We, we supported that by sort of. Opening up some data for them around how we kind of see these, um, data workspaces in, in the, you know, kind of regular economy. So how do lawyers have a workspace? How do investment bankers have a workspace? What kind of data goes into those? And so we, [00:32:00] we partner with them on their, their apex eval.Our own, um, eval is, it's actually relatively straightforward. We have a, a set of, of documents in a, in a range of industries. We give the agent previously did this as a one shot test of just purely the model. And then we just realized we, we need to, based on where everything's going, it's just gotta be more agentic.So now it's a bit more of a test of both our harness and the model. And we have a rubric of a set of things that has to get right and we score it. Um, and you're just seeing, you know, these incredible jumps in almost every single model in its own family of, you know, opus four, um, you know, sonnet four six versus sonnet four five.swyx: Yeah. We have this up on screen.Aaron Levie: Okay, cool. So some, you're seeing it somewhere like. I, I forget the to, it was like 15 point jump, I think on the main, on the overall,swyx: yes.Aaron Levie: And it's just like, you know, these incredible leaps that, that are starting to happen. Um,swyx: and OP doesn't know any, like any, it's completely held out from op.Aaron Levie: This is not in any, there's no public data which has, you know, Ben benefits and this is just a private eval that we [00:33:00] do, and then we just happen to show it to, to the world. Hmm. So you can't, you can't train against it. And I think it's just as representative of. It's obviously reasoning capabilities, what it's doing at, at, you know, kind of test time, compute capabilities, thinking levels, all like the context rot issues.So many interesting, you know, kind of, uh, uh, capabilities that are, that are now improvingswyx: one sector that you have. That's interesting.Industries and Datasetsswyx: Uh, people are roughly familiar with healthcare and legal, but you have public sector in there.Aaron Levie: Yeah.swyx: Uh, what's that? Like, what, what, what is that?Aaron Levie: Yeah, and, and we actually test against, I dunno, maybe 10 industries.We, we end up usually just cutting a few that we think have interesting gains. All extras, won a lot of like government type documents. Um,swyx: what is that? What is it? Government type documents?Aaron Levie: Government filings. Like a taxswyx: return, likeAaron Levie: a probably not tax returns. It would be more of what would go the government be using, uh, as data.So, okay. Um, so think about research that, that type of, of, of data sets. And then we have financial services for things like data rooms and what would be in an investment prospectus. Uhhuh,swyx: that one you can dog food.Aaron Levie: Yeah, exactly. Exactly. Yes. Yes. [00:34:00] So, uh, so we, we run the models, um, in now, you know, more of an agent mode, but, but still with, with kinda limited capacity and just try and see like on a, like, for like basis, what are the improvements?And, and again, we just continue to be blown away by. How, how good these models are getting.swyx: Yeah, I mean, I think every serious AI company needs something like that where like, well, this is the work we do. Here's our company eval. Yeah. And if you don't have it, well, you're not a serious AI company.Aaron Levie: There's two dimensions, right?So there's, there's like, how are the models improving? And so which models should you either recommend a customer use, which one should you adopt? But then every single day, we're making changes to our agents. And you need to knowswyx: if you regressed,Aaron Levie: if you know. Yeah. You know, I've been fully convinced that the whole agent observability and eval space is gonna be a massive space.Um, super excited for what Braintrust is doing, excited for, you know, Lang Smith, all the things. And I think what you're going to, I mean, this is like every enter like literally every enterprise right now. It's like the AI companies are the customers of these tools. Every enterprise will have this. Yeah, you'll just [00:35:00] have to have an eval.Of all of your work and like, we'll, you'll have an eval of your RFP generation, you'll have an eval of your sales material creation. You'll have an eval of your, uh, invoice processing. And, and as you, you know, buy or use new agentic systems, you are gonna need to know like, what's the quality of your, of your pipeline.swyx: Yeah.Aaron Levie: Um, so huge, huge market with agent evals.swyx: Yeah.Building the Agent Teamswyx: And, and you know, I'm gonna shout out your, your team a bit, uh, your CTO, Ben, uh, did a great talk with us last year. Awesome. And he's gonna come back again. Oh, cool. For World's Fair.Aaron Levie: Yep.swyx: Just talk about your team, like brag a little bit. I think I, I think people take these eval numbers in pretty charts for granted, but No, there, I mean, there's, there's lots of really smart people at work during all this.Aaron Levie: Biggest shout out, uh, is we have a, we have a couple folks at Dya, uh, Sidarth, uh, that, that kind of run this. They're like a, you know, kind of tag tag team duo on our evals, Ben, our CTO, heavily involved Yasha, head of ai, uh, you know, a bunch of folks. And, um, evals is one part of the story. And then just like the full, you know, kind of AI.An agent team [00:36:00] is, uh, is a, is a pretty, you know, is core to this whole effort. So there's probably, I don't know, like maybe a few dozen people that are like the epicenter. And then you just have like layers and layers of, of kind of concentric circles of okay, then there's a search team that supports them and an infrastructure team that supports them.And it's starting to ripple through the entire company. But there's that kind of core agent team, um, that's a pretty, pretty close, uh, close knit group.swyx: The search team is separate from the infra team.Aaron Levie: I mean, we have like every, every layer of the stack we have to kind of do, except for just pure public cloud.Um, but um, you know, we, we store, I don't even know what our public numbers are in, you know, but like, you can just think about it as like a lot of data is, is stored in box. And so we have, and you have every layer of the, of the stack of, you know, how do you manage the data, the file system, the metadata system, the search system, just all of those components.And then they all are having to understand that now you've got this new customer. Which is the agent, and they've been building for two types of customers in the past. They've been building for users and they've been building for like applications. [00:37:00] And now you've got this new agent user, and it comes in with a difference of it, of property sometimes, like, hey, maybe sometimes we should do embeddings, an embedding based, you know, kind of search versus, you know, your, your typical semantic search.Like, it's just like you have to build the, the capabilities to support all of this. And we're testing stuff, throwing things away, something doesn't work and, and not relevant. It's like just, you know, total chaos. But all of those teams are supporting the agent team that is kind of coming up with its requirements of what, what do we need?swyx: Yeah. No, uh, we just came from, uh, fireside chat where you did, and you, you talked about how you're doing this. It's, it's kind of like an internal startup. Yeah. Within the broader company. The broader company's like 3000 people. Yeah. But you know, there's, there's a, this is a core team of like, well, here's the innovation center.Aaron Levie: Yeah.swyx: And like that every company kind of is run this way.Aaron Levie: Yeah. I wanna be sensitive. I don't call it the innovation center. Yeah. Only because I think everybody has to do innovation. Um, there, there's a part of the, the, the company that is, is sort of do or die for the agent wave.swyx: Yeah.Aaron Levie: And it only happens to be more of my focus simply because it's existential that [00:38:00] we get it right.swyx: Yeah.Aaron Levie: All of the supporting systems are necessary. All of the surrounding adjacent capabilities are necessary. Like the only reason we get to be a platform where you'd run an agent is because we have a security feature or a compliance feature, or a governance feature that, that some team is working on.But that's not gonna be the make or break of, of whether we get agents right. Like that already exists and we need to keep innovating there. I don't know what the right, exact precise number is, but it's not a thousand people and it's not 10 people. There's a number of people that are like the, the kind of like, you know, startup within the company that are the make or break on everything related to AI agents, you know, leveraging our platform and letting you work with your data.And that's where I spend a lot of my time, and Ben and Yosh and Diego and Teri, you know, these are just, you know, people that, that, you know, kind of across the team. Are working.swyx: Yeah. Amazing.Read Write Agent WorkflowsJeff Huber: How do you, how do you think about, I mean, you talked a lot about like kinda read workflows over your box data. Yep.Right. You know, gen search questions, queries, et cetera. But like, what about like, write or like authoring workflows?Aaron Levie: Yes. I've [00:39:00] already probably revealed too much actually now that I think about it. So, um, I've talked about whatever,Jeff Huber: whatever you can.Aaron Levie: Okay. It's just us. It's just us. Yeah. Okay. Of course, of course.So I, I guess I would just, uh, I'll make it a little bit conceptual, uh, because again, I've already, I've already said things that are not even ga but, but we've, we've kinda like danced around it publicly, so I, yeah, yeah. Okay. Just like, hopefully nobody watches this, um, episode. No.swyx: It's tidbits for the Heidi engaged to go figure out like what exactly, um, you know, is, is your sort of line of thinking.Sure. They can connect the dots.Aaron Levie: Yeah. So, so I would say that, that, uh, we, you know, as a, as a place where you have your enterprise content, there's a use case where I want to, you know, have an agent read that data and answer questions for me. And then there's a use case where I want the agent to create something.And use the file system to create something or store off data that it's working on, or be able to have, you know, various files that it's writing to about the work it's doing. So we do see it as a total read write. The harder problem has so far been the read only because, because again, you have that kind of like 10 [00:40:00] million to one ratio problem, whereas rights are a lot of, that's just gonna come from the model and, and we just like, we'll just put it in the file system and kinda use it.So it's a little bit of a technically easier problem, but the only part that's like, not necessarily technically hard, it is just like it's not yet perfected in the state of the ecosystem is, you know, building a beautiful PowerPoint presentation. It's still a hard problem for these models. Like, like we still, you know, like, like these formats are just, we're not built for.They'reswyx: working on it.Aaron Levie: They're, they're working on it. Everybody's working on it.swyx: Every launch is like, well, we do PowerPoint now.Aaron Levie: We're getting, yeah, getting a lot, getting a lot of better each time. But then you'll do this thing where you'll ask the update one slide and all of a sudden, like the fonts will be just like a little bit different, you know, on two of the slides, or it moved, you know, some shape over to the left a little bit.And again, these are the kind of things that, like in code, obviously you could really care about if you really care about, you know, how beautiful is the code, but at the end, user doesn't notice all those problems and file creation, the end user instantly sees it. You're [00:41:00] like, ah, like paragraph three, like, you literally just changed the font on me.Like it's a totally different font and like midway through the document. Mm-hmm. Those are the kind of things that you run into a lot of in the, in the content creation side. So, mm-hmm. We are gonna have native agents. That do all of those things, they'll be powered by the leading kind of models and labs.But the thing that I think is, is probably gonna be a much bigger idea over time is any agent on any system, again, using Box as a file system for its work, and in that kind of scenario, we don't necessarily care what it's putting in the file system. It could put its memory files, it could put its, you know, specification, you know, documents.It could put, you know, whatever its markdown files are, or it could, you know, generate PDFs. It's just like, it's a workspace that is, is sort of sandboxed off for its work. People can collaborate into it, it can share with other people. And, and so we, we were thinking a lot about what's the right, you know, kind of way to, to deliver that at scale.Docs Graphs and Founder Modeswyx: I wanted to come into sort of the sort of AI transformation or AI sort of, uh, operations things. [00:42:00] Um, one of the tweets that you, that you wanted to talk about, this is just me going through your tweets, by the way. Oh, okay. I mean, like, this is, you readAaron Levie: one by one,swyx: you're the, you're the easiest guest to prep for because you, you already have like, this is the, this is what I'm interested in.I'm like, okay, well, areAaron Levie: we gonna get to like, like February, January or something? Where are we in the, in the timelines? How far back are we going?swyx: Can you, can you describe boxes? A set of skills? Right? Like that, that's like, that's like one of the extremes of like, well if you, you just turn everything into a markdown file.Yeah. Then your agent can run your company. Uh, like you just have to write, find the right sequence of words toAaron Levie: Yes.swyx: To do it.Aaron Levie: Sorry, isthatswyx: the question? So I think the question is like, what if we documented everything? Yes. The way that you exactly said like,Aaron Levie: yes.swyx: Um, let's get all the Fortune five hundreds, uh, prepared for agents.Yes. And like, you know, everything's in golden and, and nicely filed away and everything. Yes. What's missing? Like, what's left, right? LikeAaron Levie: Yeah.swyx: You've, you've run your company for a decade. LikeAaron Levie: Yeah. I think the challenge is that, that that information changes a week later. And because something happened in the market for that [00:43:00] customer, or us as a company that now has to go get updated, and so these systems are living and breathing and they have to experience reality and updates to reality, which right now is probably gonna be humans, you know, kinda giving those, giving them the updates.And, you know, there is this piece about context graphs as as, uh, that kinda went very viral. Yeah. And I, I, I was like a, i, I, I thought it was super provocative. I agreed with many parts of it. I disagree with a few parts around. You know, it's not gonna be as easy as as just if we just had the agent traces, then we can finally do that work because there's just like, there's so much more other stuff that that's happening that, that we haven't been able to capture and digitize.And I think they actually represented that in the piece to be clear. But like there's just a lot of work, you know, that that has to, you just can't have only skills files, you know, for your company because it's just gonna be like, there's gonna be a lot of other stuff that happens. Yeah. Change over time.Yeah. Most companies are practically apprenticeships.swyx: Most companies are practically apprenticeships. LikeJeff Huber: every new employee who joins the team, [00:44:00] like you span one to three months. Like ramping them up.Aaron Levie: Yes. AllJeff Huber: that tat knowledgeAaron Levie: isJeff Huber: not written down.Aaron Levie: Yes.Jeff Huber: But like, it would have to be if you wanted to like give it to an Asian.Right. And so like that seems to me like to beAaron Levie: one is I think you're gonna see again a premium on companies that can document this. Mm-hmm. Much. There'll be a huge premium on that because, because you know, can you shorten that three month ramp cycle to a two week ramp cycle? That's an instant productivity gain.Can you re dramatically reduce rework in the organization because you've documented where all the stuff is and where the answers are. Can you make your average employee as good as your 90th percentile employee because you've captured the knowledge that's sort of in the heads of, of those top employees and make that available.So like you can see some very clear productivity benefits. Mm-hmm. If you had a company culture of making sure you know your information was captured, digitized, put in a format that was agent ready and then made available to agents to work with, and then you just, again, have this reality of like add a 10,000 person [00:45:00] company.Mapping that to the, you know, access structure of the company is just a hard problem. Is like, is like, yeah, well, you just, not every piece of information that's digitized can be shared to everybody. And so now you have to organize that in a way that actually works. There was a pretty good piece, um, this, this, uh, this piece called your company as a file is a file system.I, did you see that one?swyx: Nope.Aaron Levie: Uh, yes. You saw it. Yeah. And, and, uh, I actually be curious your thoughts on it. Um, like, like an interesting kind of like, we, we agree with it because, because that's how we see the world and, uh,swyx: okay. We, we have it up on screen. Oh,Aaron Levie: okay. Yeah. But, but it's all about basically like, you know, we've already, we, we, we already organized in this kind of like, you know, permission structure way.Uh, and, and these are the kind of, you know, natural ways that, that agents can now work with data. So it's kind of like this, this, you know, kind of interesting metaphor, but I do think companies will have to start to think about how they start to digitize more, more of that data. What was your take?Jeff Huber: Yeah, I mean, like the company's probably like an acid compliant file system.Aaron Levie: Uh,Jeff Huber: yeah. Which I'm guessing boxes, right? So, yeah. Yes.swyx: Yeah. [00:46:00]Jeff Huber: Which you have a great piece on, but,swyx: uh, yeah. Well, uh, I, I, my, my, my direction is a little bit like, I wanna rewind a little bit to the graph word you said that there, that's a magic trigger word for us. I always ask what's your take on knowledge graphs?Yeah. Uh, ‘cause every, especially at every data database person, I just wanna see what they think. There's been knowledge graphs, hype cycles, and you've seen it all. So.Aaron Levie: Hmm. I actually am not the expert in knowledge graphs, so, so that you might need toswyx: research, you don't need to be an expert. Yeah. I think it's just like, well, how, how seriously do people take it?Yeah. Like, is is, is there a lot of potential in the, in the HOVI?Aaron Levie: Uh, well, can I, can I, uh, understand first if it's, um, is this a loaded question in the sense of are you super pro, super con, super anti medium? Iswyx: see pro, I see pros and cons. Okay. Uh, but I, I think your opinion should be independent of mine.Aaron Levie: Yeah. No, no, totally. Yeah. I just want to see what I'm stepping into.swyx: No, I know. It's a, and it's a huge trigger word for a lot of people out Yeah. In our audience. And they're, they're trying to figure out why is that? Because whyAaron Levie: is this such aswyx: hot item for them? Because a lot of people get graph religion.And they're like, everything's a graph. Of course you have to represent it as a graph. Well, [00:47:00] how do you solve your knowledge? Um, changing over time? Well, it's a graph.Aaron Levie: Yeah.swyx: And, and I think there, there's that line of work and then there's, there's a lot of people who are like, well, you don't need it. And both are right.Aaron Levie: Yeah. And what do the people who say you don't need it, what are theyswyx: arguing for Mark down files. Oh, sure, sure. Simplicity.Aaron Levie: Yeah.swyx: Versus it's, it's structure versus less structure. Right. That's, that's all what it is. I do.Aaron Levie: I think the tricky thing is, um, is, is again, when this gets met with real humans, they're just going to their computer.They're just working with some people on Slack or teams. They're just sharing some data through a collaborative file system and Google Docs or Box or whatever. I certainly like the vision of most, most knowledge graph, you know, kind of futuristic kind of ways of thinking about it. Uh, it's just like, you know, it's 2026.We haven't seen it yet. Kind of play out as as, I mean, I remember. Do you remember the, um, in like, actually I don't, I don't even know how old you guys are, but I'll for, for to show my age. I remember 17 years ago, everybody thought enterprises would just run on [00:48:00] Wikis. Yeah. And, uh, confluence and, and not even, I mean, confluence actually took off for engineering for sure.Like unquestionably. But like, this was like everything would be in the w. And I think based on our, uh, our, uh, general style of, of, of what we were building, like we were just like, I don't know, people just like wanna workspace. They're gonna collaborate with other people.swyx: Exactly. Yeah. So you were, you were anti-knowledge graph.Aaron Levie: Not anti, not anti. Soswyx: not nonAaron Levie: I'm not, I'm not anti. ‘cause I think, I think your search system, I just think these are two systems that probably, but like, I'm, I'm not in any religious war. I don't want to be in anybody's YouTube comments on this. There's not a fight for me.swyx: We, we love YouTube comments. We're, we're, we're get into comments.Aaron Levie: Okay. Uh, but like, but I, I, it's mostly just a virtue of what we built. Yeah. And we just continued down that path. Yeah.swyx: Yeah.Aaron Levie: And, um, and that, that was what we pursued. But I'm not, this is not a, you know, kind of, this is not a, uh, it'sswyx: not existential for you. Great.Aaron Levie: We're happy to plug into somebody else's graph.We're happy to feed data into it. We're happy for [00:49:00] agents to, to talk to multiple systems. Not, not our fight.swyx: Yeah.Aaron Levie: But I need your answer. Yeah. Graphs or nerd Snipes is very effective nerd.swyx: See this is, this is one, one opinion and then I've,Jeff Huber: and I think that the actual graph structure is emergent in the mind of the agent.Ah, in the same way it is in the mind of the human. And that's a more powerful graph ‘cause it actually involved over time.swyx: So don't tell me how to graph. I'll, I'll figure it out myself. Exactly. Okay. All right. AndJeff Huber: what's yours?swyx: I like the, the Wiki approach. Uh, my, I'm actually

The OMFIF Podcast
Unpacking sandboxes

The OMFIF Podcast

Play Episode Listen Later Feb 20, 2026 38:38


As the new generation of financial infrastructure takes shape, the need for sophisticated tools to inform policy-makers is growing. To encourage innovation and preserve security, it is vital that policy-makers have the capacity to interact with and closely supervise new systems. Sandboxes are a key tool for this process. Carmelle Cadet, chief executive officer and founder of EmTech joins OMFIF to discuss the nuances of how sandboxes work and how they should be constructed.

GREY Journal Daily News Podcast
Can Regulatory Sandboxes Propel AI Innovation Without Stifling Creativity?

GREY Journal Daily News Podcast

Play Episode Listen Later Feb 16, 2026 2:36


Regulatory sandboxes allow startups to experiment with AI technologies in a controlled setting, fostering innovation while maintaining oversight. Joseph Joshy from IFSCA highlights their importance at the India AI Impact Summit 2026, noting they help balance innovation with risk management. Industry leaders discuss the need for India-specific AI protocols and emphasize the role of digital infrastructure in AI advancement.Learn more on this news by visiting us at: https://greyjournal.net/news/ Hosted on Acast. See acast.com/privacy for more information.

TRM Talks
EP. 104 | Pioneering Payments: Turning Sandboxes into Infrastructure with UOB's Kah Kit

TRM Talks

Play Episode Listen Later Feb 11, 2026 31:57


Kah Kit Yip has spent his career turning bold ideas about the future of money into working infrastructure.After nearly two decades at Malaysia's central bank and leading breakthrough initiatives at the Bank for International Settlements, he helped pioneer new models for instant cross-border payments, programmable compliance, and faster, safer FX settlement. Now, as Head of Blockchain and Digital Assets at UOB, he's bringing those concepts out of policy sandboxes and into production across Southeast Asia.In this conversation, Ari and Kah Kit explore how interoperable payment rails, wholesale CBDCs, and tokenized assets can move value across borders in real time — while meeting the highest standards for compliance and risk management. They discuss why legal clarity and shared standards are essential for adoption, how institutions can match the speed of digital payments with equally fast safeguards, and what it takes for regulators and industry to build global financial infrastructure together.Along the way, Kah Kit reflects on marathon running, crime novels, and the grit required to solve complex, long-horizon challenges in finance and technology.

The John Batchelor Show
S8 Ep235: REGULATING ARTIFICIAL INTELLIGENCE Colleague Kevin Frazier. Kevin Frazier continues, warning against a "waterfall of regulation" by states and advocating for "regulatory sandboxes" to allow experimentation. NUMBER 8

The John Batchelor Show

Play Episode Listen Later Dec 24, 2025 8:50


REGULATING ARTIFICIAL INTELLIGENCE Colleague Kevin Frazier. Kevin Frazier continues, warning against a "waterfall of regulation" by states and advocating for "regulatory sandboxes" to allow experimentation. NUMBER 8 NOVEMBER 1955

Slow Burn
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slow Burn

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slow Burn
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slow Burn

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Decoder Ring
Mailbag: Yo-Yos, Sandboxes, and Encores

Decoder Ring

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Decoder Ring
Mailbag: Yo-Yos, Sandboxes, and Encores

Decoder Ring

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Culture
Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Culture

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Culture
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Culture

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Daily Feed
Slow Burn - Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Daily Feed

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate slow burn encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Daily Feed
Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Daily Feed

Play Episode Listen Later Dec 17, 2025 62:11


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos?The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews. You can find all the music from the segment about encores in this YouTube playlist.This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer.If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281.Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Hosted on Acast. See acast.com/privacy for more information.

spotify acast pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
Slate Daily Feed
Decoder Ring | Mailbag: Yo-Yos, Sandboxes, and Encores

Slate Daily Feed

Play Episode Listen Later Dec 17, 2025 66:41


Decoder Ring listeners write in with some excellent mysteries, and for our last episode of the year we're solving three of them. Why do children play in boxes full of sand? Why do rock bands pretend like the show is over when everybody knows they're coming back for an encore? And what was up with those school assemblies where you'd get to skip class to learn about…yo-yos? The voices you'll hear in this episode include yo-yo masters ”Dazzling Dave” Schulte and Dale Oliver, children's book author Rob Peñas, Pulitzer Prize-winning design critic Alexandra Lange, and music journalists Brian Wise, Michael Walker, and Travis Andrews.  You can find all the music from the segment about encores in this YouTube playlist. This episode was produced by Max Freedman, Katie Shepherd, and Evan Chung, Decoder Ring's supervising producer. Merritt Jacob is Senior Technical Director. We had additional production from Joel Meyer. If you have any cultural mysteries you want us to decode, email us at DecoderRing@slate.com or leave a message on our hotline at (347) 460-7281. Get more of Decoder Ring with Slate Plus! Join for exclusive bonus episodes of Decoder Ring and ad-free listening on all your favorite Slate podcasts. Subscribe from the Decoder Ring show page on Apple Podcasts or Spotify. Or, visit slate.com/decoderplus for access wherever you listen. Learn more about your ad choices. Visit megaphone.fm/adchoices

spotify pulitzer prize slate encores michael walker decoder ring sandboxes alexandra lange slate plus senior technical director joel meyer katie shepherd merritt jacob brian wise evan chung
AI in Action Ireland
E222 'Building Trust and Innovation in AI' with ADAPT Centre's Declan McKibben

AI in Action Ireland

Play Episode Listen Later Nov 20, 2025 30:49


Today's guest is Declan McKibben, Executive Director at the ADAPT Centre. Organisations worldwide are grappling with the realities and opportunities posed by AI-enabled tools and technologies. In this podcast, Declan offers invaluable insights into how AI is reshaping industries, the challenges associated with its adoption and the collaborative efforts needed to harness its full potential.Topics include:0:00 An overview of ADAPT as Ireland's human-centered AI research centre 2:44 Their work with Smart D8 and Digital Twin, which uses AI to model and improve cities6:45 Creating the Generative AI lab to guide Dublin City Council's safe adoption9:19 Running workshops to build AI literacy and inclusive decision-making11:40 How the AI lab prioritizes use cases with responsible, agile testing14:14 Award-winning AI projects showcased for public sector transformation15:37 How trust and measurement enable responsible, scalable AI adoption18:23 Sandboxes enable practical AI exploration amid investment uncertainty22:35 How their AI Accountability Lab ensures responsible, fair, socially beneficial AI28:00 ADAPT Centre as a sponsor for the 2025 AI Awards

DEEPTECH DEEPTALK
Ökosysteme

DEEPTECH DEEPTALK

Play Episode Listen Later Nov 5, 2025 32:34


Nach einer kurzen Herbstpause sprechen Alois und Oliver darüber, wie echte Deep-Tech-Ökosysteme auf Stadtebene entstehen – jenseits von Buzzwords. Sie ordnen ein, was Deep Tech wirklich bedeutet, wo die Grenze zu Commodity-Technologien verläuft und weshalb der Begriff aktuell oft inflationär verwendet wird. Ausgehend von Deutschlands neuer Hightech-Agenda (mit Schwerpunkten wie KI, Quantencomputing und Biotechnologie) diskutieren sie, wie nationale Programme lokale Cluster beeinflussen und warum Orte mit starker Forschungs- und Infrastrukturbasis – etwa DESY und die Science City in Hamburg – als Anker dienen.Die Episode zeigt, wie Konvergenz zwischen Technologien wie KI, Quanten- und Neuromorphik-Computing, Robotik oder XR Tempo aufnimmt und welche Rolle „universelle Schnittstellen“ wie große Sprachmodelle dabei spielen. Zugleich machen die beiden deutlich, dass Grundlagenforschung geschützte Domänen braucht, während Reallabore und Sandboxes die Brücke in die Anwendung schlagen. An konkreten Beispielen wird sichtbar, weshalb kurze Wege, stabile Rahmenbedingungen, verlässliche Testflächen und klar profilierte Stadt-Stärken den Unterschied zwischen Theorie und Transfer ausmachen.Ein zentrales Motiv der Folge ist der „Übersetzer“: Personen und Formate, die Forschung, Industrie und – oft als Ankerkunden – den Staat zielgerichtet zusammenbringen. Statt Event-Inflation braucht es kuratierte Räume, in denen Themen kurz vor der Marktreife fokussiert beschleunigt werden. Genau hier setzt der Deep Tech Campus Circle an: als bewusst exklusiver Ort, an dem Köpfe, Kapital und konkrete Use Cases aufeinandertreffen, um aus Keimlingen skalierbare Innovationen zu machen.Am Ende bleibt die klare These: Deep Tech entsteht, wenn tiefe Expertise, Geduld und ein klug orchestriertes Ökosystem zusammenwirken. Wer verstehen will, wie Städte vom Strategiepapier zur sichtbaren Wertschöpfung kommen, bekommt in dieser Folge eine kompakte, praxisnahe Landkarte – von der Saat in der Grundlagenforschung bis zur Ernte im Markt.

Open Source Startup Podcast
E176: Why All AI Agents Will Need Cloud Sandboxes

Open Source Startup Podcast

Play Episode Listen Later Jun 23, 2025 37:19


Vasek Mlejnsky is Co-Founder & CEO of E2B, the open-source runtime for executing AI-generated code in secure cloud sandboxes. Essentially, they give AI agents cloud computers. Their open source repos, particularly e2b which has 9K GitHub stars, have been widely adopted to help securely run AI-generated code. E2B has raised $12M from investors including Decibel and Sunflower. In this episode, we dig into:Why agents need a sandboxBuilding a new category of infra tooling, much like LaunchDarkly Some of their viral content moments - including Greg Brockman sharing their videosFiguring out the right commercial offering Why they don't agree with pricing per token Why moving from Prague to the Bay Area felt essential for them as founders

Irish Tech News Audio Articles
ISPCC announces global project to prevent online child sexual exploitation and abuse

Irish Tech News Audio Articles

Play Episode Listen Later Jun 10, 2025 4:26


The project, spearheaded by Greek non-profit child welfare organisation The Smile of the Child, will be co-created by children and young people to ensure their voices are heard. The ISPCC is honoured to announce its participation in a worldwide project designed to transform how we prevent and respond to online child sexual exploitation and abuse. Safe Online, a global fund dedicated to eradicating online child sexual exploitation and abuse, is funding the project called "Sandboxing and Standardising Child Online Redress". The COR Sandbox project will establish a first-of-its-kind mechanism to advance child online safety through collaboration across sectors, borders and generations. The project is led by The Smile of the Child, Greece's premier child welfare organisation and ISPCC is a partner alongside The Young and Resilient Research Centre at Western Sydney University, Child Helpline International and the Centre for Digital Policy at University College Dublin. Sandboxes bring together industry, regulators and customers in a safe space to test innovative products and services without incurring regulatory sanctions and they are mainly used in the finance sector to test new services. The EU is increasingly encouraging the use of sandboxes in the field of high technology and artificial intelligence. Through the participation of youth, platforms, regulators and online safety experts, this first regulatory sandbox for child digital wellbeing will provide for consistent, systemic care and redress for children from online harm, based on their rights under the United Nations Convention on the Rights of the Child (UNCRC). Getting reporting and redress right means that we can keep track of harms and be able to identify systemic risk. Co-designing the reporting and redress process with young people as equitable participants can help us understand what they expect from the reporting process and what remedies are fair for them putting Article 12 of the UNCRC into action. The project also benefits from the guidance of renowned digital safety experts, including Project Lead and Scientific Coordinator Ioanna Noula, PhD, an international expert on tech policy and children's rights; pioneering online safety and youth rights advocate Anne Collier; youth rights and participation expert Amanda Third, PhD, of the Young and Resilient Research Centre; international innovation management consultant Nicky Hickman; IT innovation and startup founder Jez Goldstone; and leading child online wellbeing scholar Tijana Milosevic, PhD. ISPCC Head of Policy and Public Affairs Fiona Jennings said: "This project is a wonderful example of what we can achieve when we collaborate and listen to children and young people. Having robust online reporting mechanisms in place is a key policy objective for ISPCC and this project will go a long way towards making the online world safer for children and young people to participate in." Project lead Ioanna Noula said: "ISPCC's contribution to a project, which seeks to build coherence around the issue of online redress, will be a catalyst for real and substantial change in the area of online reporting. Helplines play a key role in flagging illegal and/or harmful content. As the experts in listening and responding to children, ISPCC can provide insight from an Irish context to help spearheading the implementation of the Digital Services Act and the wellbeing of children online." See more stories here. More about Irish Tech News Irish Tech News are Ireland's No. 1 Online Tech Publication and often Ireland's No.1 Tech Podcast too. You can find hundreds of fantastic previous episodes and subscribe using whatever platform you like via our Anchor.fm page here: https://anchor.fm/irish-tech-news If you'd like to be featured in an upcoming Podcast email us at Simon@IrishTechNews.ie now to discuss. Irish Tech News have a range of services available to help promote your business. Why not drop us a line at Info@IrishTechNews.ie now to ...

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality.The most common use cases for their sandboxes:- Run data analysis and charting (like Perplexity)- Execute arbitrary code generated by the model (like Manus does)- Running evals on code generation (see LMArena Web)- Doing reinforcement learning for code capabilities (like HuggingFace)Full Video EpisodeTimestamps00:00:00 Introductions00:00:37 Origin of DevBook -> E2B00:02:35 Early Experiments with GPT-3.5 and Building AI Agents00:05:19 Building an Agent Cloud00:07:27 Challenges of Building with Early LLMs00:10:35 E2B Use Cases00:13:52 E2B Growth vs Models Capabilities00:15:03 The LLM Operating System (LLMOS) Landscape00:20:12 Breakdown of JavaScript vs Python Usage on E2B00:21:50 AI VMs vs Traditional Cloud00:26:28 Technical Specifications of E2B Sandboxes00:29:43 Usage-based billing infrastructure00:34:08 Pricing AI on Value Delivered vs Token Usage00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents00:44:00 MCPs and Remote Agent Capabilities00:49:22 LLMs.txt, scrapers, and bad AI bots00:53:00 Manus and Computer Use on E2B00:55:03 E2B for RL with Hugging Face00:56:58 E2B for Agent Evaluation on LMArena00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps01:01:15 Why E2B Moved to San Francisco01:05:49 Open Roles and Hiring Plans at E2B Get full access to Latent.Space at www.latent.space/subscribe

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Vasek Mlejnsky from E2B joins us today to talk about sandboxes for AI agents. In the last 2 years, E2B has grown from a handful of developers building on it to being used by ~50% of the Fortune 500 and generating millions of sandboxes each week for their customers. As the “death of chat completions” approaches, LLMs workflows and agents are relying more and more on tool usage and multi-modality. The most common use cases for their sandboxes: - Run data analysis and charting (like Perplexity) - Execute arbitrary code generated by the model (like Manus does) - Running evals on code generation (see LMArena Web) - Doing reinforcement learning for code capabilities (like HuggingFace) Timestamps: 00:00:00 Introductions 00:00:37 Origin of DevBook -> E2B 00:02:35 Early Experiments with GPT-3.5 and Building AI Agents 00:05:19 Building an Agent Cloud 00:07:27 Challenges of Building with Early LLMs 00:10:35 E2B Use Cases 00:13:52 E2B Growth vs Models Capabilities 00:15:03 The LLM Operating System (LLMOS) Landscape 00:20:12 Breakdown of JavaScript vs Python Usage on E2B 00:21:50 AI VMs vs Traditional Cloud 00:26:28 Technical Specifications of E2B Sandboxes 00:29:43 Usage-based billing infrastructure 00:34:08 Pricing AI on Value Delivered vs Token Usage 00:36:24 Forking, Checkpoints, and Parallel Execution in Sandboxes 00:39:18 Future Plans for Toolkit and Higher-Level Agent Frameworks 00:42:35 Limitations of Chat-Based Interfaces and the Future of Agents 00:44:00 MCPs and Remote Agent Capabilities 00:49:22 LLMs.txt, scrapers, and bad AI bots 00:53:00 Manus and Computer Use on E2B 00:55:03 E2B for RL with Hugging Face 00:56:58 E2B for Agent Evaluation on LMArena 00:58:12 Long-Term Vision: E2B as Full Lifecycle Infrastructure for LLMs 01:00:45 Future Plans for Hosting and Deployment of LLM-Generated Apps 01:01:15 Why E2B Moved to San Francisco 01:05:49 Open Roles and Hiring Plans at E2B

Cribl: The Stream Life
Introducing Lakehouse!

Cribl: The Stream Life

Play Episode Listen Later Feb 26, 2025 24:24


In this episode of The Stream Life Podcast, Joel Vincent joins the show to talk about Cribl's latest innovation: a Lakehouse that's purpose built for telemetry data.  Resources Read the blog If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Bill Handel on Demand
Google Distances Themselves from DEI | California EV Sales

Bill Handel on Demand

Play Episode Listen Later Feb 6, 2025 23:29 Transcription Available


KFI White House correspondent Jon Decker speaks on the latest out of the Oval Office. Google joins the list of companies distancing themselves from DEI. California EV sales flat in 2024. Can Newsom's mandate be met? Researchers tested sandboxes and street dust for lead after Eaton Fire… Here's what they found.

Corporate Escapees
590 - Visual Maps and Sandboxes: Making Tech Solutions Tangible for Clients

Corporate Escapees

Play Episode Listen Later Feb 6, 2025 6:08


Have you ever left a sales meeting feeling uncertain whether your potential client will buy? If you've experienced clients nodding along but still feeling overwhelmed, you're not alone. In this solo episode, I share two powerful actions you can take to simplify the buying process for your clients. First, we'll explore the importance of using a visual industry map that highlights their journey and pain points. Then, I'll discuss how building an industry-specific sandbox allows clients to interact with your solution tangibly. These strategies have proven effective for tech consultants and partners across various industries, and I'm excited to help you implement them in your own sales process.Resources and LinksPrevious episode: 589 - AI 1st business with John MunsellCheck out more episodes of The Paul Higgins ShowSubscribe to our YouTube channel: @PaulHigginsMentoringThe Tech Consultant's RoadmapJoin our newsletterJoin the Tech CollectiveSuggested resourcesFind out more about Paul and how he can help you

Taking 20 Podcast
Ep 246 - Railroads vs Sandboxes 2

Taking 20 Podcast

Play Episode Listen Later Jan 26, 2025 20:07


There are railroads and there are sandboxes and never the twain shall meet, right?  Not so fast, my friend.  Today Jeremy discusses the two game types and multiple ways to blend them together.   #pf2e #Pathfinder #gmtips #dmtips #dnd #rpg #railroad #sandbox Resources: Game Masters of Exandria Roundtable:  https://www.youtube.com/watch?v=LmZSWKPXhZ4 Sandbox vs Railroad:  Which is Better?  https://www.thearcanelibrary.com/blogs/news/sandbox-vs-railroad-which-is-better?srsltid=AfmBOopkIlstCFvZ_Z2HSQu1-oxxkmLXKKzgH5jLgjzjQuX0m5_INiLA

Some Derps Talk About Games
Sandboxes Vs Railroads (TTRPG)

Some Derps Talk About Games

Play Episode Listen Later Jan 21, 2025 113:25


Imagine you're in a facebook group chat with a bunch of people and someone asks what constitutes a sandbox in a TTRPG. How much does improv vs. preparation factor into it? Can the players go off and do absolutely anything they want? What happens when you run a complete dungeon crawl? Let's talk about it! ---------------------------------------------------------------- Want to watch these episodes live? Check us out at https://www.youtube.com/@somederpsplaygames or twitch.tv/somederpsplaygames Check out the podcast on Soundcloud: https://soundcloud.com/somederpstalkaboutgames Want to tell us something? Email us at podcast@somederpsplaygames.com Like our Facebook page too! www.facebook.com/SomeDerpsPlayGames/ We have a Patreon! https://www.patreon.com/somederpsplaygames Rate us on iTunes! https://itunes.apple.com/us/podcast/some-derps-talk-about-games/id1048899720 Follow us on Twitter! SDPG: twitter.com/somederps Buddy: twitter.com/thatbuddysola Mango: twitter.com/theonetruemango Intro and Outro courtesy of twitter.com/VinceRolin

Cribl: The Stream Life
Microsoft Azure + Cribl - Better together

Cribl: The Stream Life

Play Episode Listen Later Nov 19, 2024 16:00


In this episode of The Stream Life Podcast, Kam Amir and Aldo Dossola join the show to discuss Microsoft Ignite, Cribl's solutions for Microsoft Azure customers, and much more. Resources Microsoft Azure + Cribl: Better together If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
A Two Way Door

Cribl: The Stream Life

Play Episode Listen Later Sep 26, 2024 24:50


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to discuss Cribl's recent Series E round, how customers find value in our products, why they need a Data Engine for IT and Security, and how our products integrate seamlessly—without ever locking data in. Resources Cribl Closes $319M Series E Round at a $3.5B Valuation to Revolutionize Enterprise Data Management How to Avoid Vendor Lock-In Cribl Lake  If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Epic Adventure
On Modules, Sandboxes, and Western Marches

Epic Adventure

Play Episode Listen Later Sep 18, 2024 46:30


I remember opening up “Keep on the Borderlands” the first module I ever ran. It came in the D&D Basic Set and I thought it was very cool. The players started out in a keep, a perfect base of operations. It had a tavern, a blacksmith, a provisioner, and a chapel. Everything a growing adventuring party needs. From there the players would travel to the Caves of Chaos and explore underground lairs of beasts an monsters. You know, the dungeons of dungeons and dragons.To be fair I don't really remember much about those first games I ran. I remember the feeling of running them and how much fun I had, but I don't remember the details.As a game master I quickly moved away from the Modules and started doing my own thing. I would read modules, but then always felt like I had a better idea. So, I tended to do my own thing.Fast forward several years and I was playing more GURPS. Steve Jackson's Generic Universal Roleplaying System. That system was well known for its source books that help you world build, but had few written modules. That was perfect for me.After decades of making up my own adventures I decided that I wanted to run some of those talked about modules and, being a huge fan of Traveller decided our groups next game would be Pirates of Drinax (listen to our series The Anatomy of a Campaign for details on that one).I hated it.Ok, I didn't hate the idea or setting, but as a game master I hated being tied to the books, trying to make incidents and events happen that I was unfamiliar with, no matter how much reading I did. And of course, with every player question I found myself diving into the books and spending far to much time trying to find the answers.I should have learned my lesson.But I didn't. After Pirates I decided to take on an even bigger challenge with Masks of Nyarlathotep.And 30 episodes in, I am having a miserable time (please don't tell my players that)Personally, I like to make it up as I go.But that's just my opinion. Some people love pre-written modules. Other's prefer sandbox worlds and settings and still others enjoy the Western Marches style…oh, is that a new one for you. Don't worry we will cover that.In this episode Mike, Christina and I are going to talk about the type of campaigns. What's good, What's bad, and why?

Critical Thinking - Bug Bounty Podcast
Episode 84: 0xLupin & Takeaways from Google's Las Vegas BugSwat

Critical Thinking - Bug Bounty Podcast

Play Episode Listen Later Aug 15, 2024 27:15


Episode 84: In this episode of Critical Thinking - Bug Bounty Podcast, Justin is joined by Roni Carta (@0xLupin) to discuss their MVH win at the recent Google LHE, and share some technical observations they had with the target and the event.Follow us on twitter at: @ctbbpodcastWe're new to this podcasting thing, so feel free to send us any feedback here: info@criticalthinkingpodcast.ioShoutout to YTCracker for the awesome intro music!------ Links ------Find the Hackernotes: https://blog.criticalthinkingpodcast.io/Follow your hosts Rhynorater & Teknogeek on twitter:------ Ways to Support CTBBPodcast ------Hop on the CTBB Discord at https://ctbb.show/discord!We also do Discord subs at $25, $10, and $5 - premium subscribers get access to private masterclasses, exploits, tools, scripts, un-redacted bug reports, etc.Today's Guest: https://x.com/0xLupinToday's Sponsor - ThreatLockerTimestamps:(00:00:00) Introduction(00:02:12) MHV Debrief(00:09:05) Sandboxes and Comfort Zones(00:13:24) SDKs and Legal Compliance(00:19:29) Age of Target and Platform-Exclusive Hunters

Cribl: The Stream Life
Navigating the Data Current

Cribl: The Stream Life

Play Episode Listen Later Jul 31, 2024 31:43


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to talk about Cribl's new research report, Navigating the Data Current 2024: Exploring Cribl.Cloud Analytics and Customer Insights. Resources Download the report Read Nick's blog If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Cribl University Update

Cribl: The Stream Life

Play Episode Listen Later Jun 28, 2024 21:01


In this episode of The Stream Life Podcast, Bradley chats with Cribl's Dariann Kobe about the second birthday of Cribl University and the new Cribl Certified Admin course.   Resources Learn more Cribl University If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
CriblCon 2024 Recap

Cribl: The Stream Life

Play Episode Listen Later Jun 21, 2024 25:42


In this episode of The Stream Life Podcast, Bradley chats with Cribl's Mike Dupuis about everything announced at CriblCon 2024! Resources Cribl Copilot: Your Trusted AI Wingman for Deploying, Configuring & Troubleshooting Cribl Accelerates Data Management Productivity with AI-Powered Copilot CriblCon 2024 Recap Blog Session Recap Watch the sessions on YouTube If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Exploring Cribl Copilot!

Cribl: The Stream Life

Play Episode Listen Later Jun 11, 2024 17:02


In this episode of The Stream Life Podcast, Bradley Chambers chat with Nikhil Mungel about Cribl Copilot. Cribl Copilot turbocharges efficiency and bridges the skills gap, ushering in the next generation of AI-augmented workforce empowerment for IT and Security.   Resources Cribl Copilot: Your Trusted AI Wingman for Deploying, Configuring & Troubleshooting Cribl Accelerates Data Management Productivity with AI-Powered Copilot   If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0
ICLR 2024 — Best Papers & Talks (Benchmarks, Reasoning & Agents) — ft. Graham Neubig, Aman Sanger, Moritz Hardt)

Latent Space: The AI Engineer Podcast — CodeGen, Agents, Computer Vision, Data Science, AI UX and all things Software 3.0

Play Episode Listen Later Jun 10, 2024 269:19


Our second wave of speakers for AI Engineer World's Fair were announced! The conference sold out of Platinum/Gold/Silver sponsors and Early Bird tickets! See our Microsoft episode for more info and buy now with code LATENTSPACE.This episode is straightforwardly a part 2 to our ICLR 2024 Part 1 episode, so without further ado, we'll just get right on with it!Timestamps[00:03:43] Section A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* [00:07:44] WebArena* [00:18:45] Sotopia* [00:24:00] Performance Improving Code Edits* [00:29:39] OpenDevin* [00:47:40] Industry and Academia[01:05:29] Section B: Benchmarks* [01:05:52] SWEBench* [01:17:05] SWEBench/SWEAgent Interview* [01:27:40] Dataset Contamination Detection* [01:39:20] GAIA Benchmark* [01:49:18] Moritz Hart - Science of Benchmarks[02:36:32] Section C: Reasoning and Post-Training* [02:37:41] Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection* [02:51:00] Let's Verify Step By Step* [02:57:04] Noam Brown* [03:07:43] Lilian Weng - Towards Safe AGI* [03:36:56] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis* [03:48:43] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework[04:00:51] Bonus: Notable Related Papers on LLM CapabilitiesSection A: Code Edits and Sandboxes, OpenDevin, and Academia vs Industry — ft. Graham Neubig and Aman Sanger* Guests* Graham Neubig* Aman Sanger - Previous guest and NeurIPS friend of the pod!* WebArena * * Sotopia (spotlight paper, website)* * Learning Performance-Improving Code Edits* OpenDevin* Junyang Opendevin* Morph Labs, Jesse Han* SWE-Bench* SWE-Agent* Aman tweet on swebench* LiteLLM* Livecodebench* the role of code in reasoning* Language Models of Code are Few-Shot Commonsense Learners* Industry vs academia* the matryoshka embeddings incident* other directions* UnlimiformerSection A timestamps* [00:00:00] Introduction to Guests and the Impromptu Nature of the Podcast* [00:00:45] Graham's Experience in Japan and Transition into Teaching NLP* [00:01:25] Discussion on What Constitutes a Good Experience for Students in NLP Courses* [00:02:22] The Relevance and Teaching of Older NLP Techniques Like Ngram Language Models* [00:03:38] Speculative Decoding and the Comeback of Ngram Models* [00:04:16] Introduction to WebArena and Zotopia Projects* [00:05:19] Deep Dive into the WebArena Project and Benchmarking* [00:08:17] Performance Improvements in WebArena Using GPT-4* [00:09:39] Human Performance on WebArena Tasks and Challenges in Evaluation* [00:11:04] Follow-up Work from WebArena and Focus on Web Browsing as a Benchmark* [00:12:11] Direct Interaction vs. Using APIs in Web-Based Tasks* [00:13:29] Challenges in Base Models for WebArena and the Potential of Visual Models* [00:15:33] Introduction to Zootopia and Exploring Social Interactions with Language Models* [00:16:29] Different Types of Social Situations Modeled in Zootopia* [00:17:34] Evaluation of Language Models in Social Simulations* [00:20:41] Introduction to Performance-Improving Code Edits Project* [00:26:28] Discussion on DevIn and the Future of Coding Agents* [00:32:01] Planning in Coding Agents and the Development of OpenDevon* [00:38:34] The Changing Role of Academia in the Context of Large Language Models* [00:44:44] The Changing Nature of Industry and Academia Collaboration* [00:54:07] Update on NLP Course Syllabus and Teaching about Large Language Models* [01:00:40] Call to Action: Contributions to OpenDevon and Open Source AI Projects* [01:01:56] Hiring at Cursor for Roles in Code Generation and Assistive Coding* [01:02:12] Promotion of the AI Engineer ConferenceSection B: Benchmarks * Carlos Jimenez & John Yang (Princeton) et al: SWE-bench: Can Language Models Resolve Real-world Github Issues? (ICLR Oral, Paper, website)* “We introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories. Given a codebase along with a description of an issue to be resolved, a language model is tasked with editing the codebase to address the issue. Resolving issues in SWE-bench frequently requires understanding and coordinating changes across multiple functions, classes, and even files simultaneously, calling for models to interact with execution environments, process extremely long contexts and perform complex reasoning that goes far beyond traditional code generation tasks. Our evaluations show that both state-of-the-art proprietary models and our fine-tuned model SWE-Llama can resolve only the simplest issues. The best-performing model, Claude 2, is able to solve a mere 1.96% of the issues. Advances on SWE-bench represent steps towards LMs that are more practical, intelligent, and autonomous.”* Yonatan Oren et al (Stanford): Proving Test Set Contamination in Black-Box Language Models (ICLR Oral, paper, aman tweet on swebench contamination)* “We show that it is possible to provide provable guarantees of test set contamination in language models without access to pretraining data or model weights. Our approach leverages the fact that when there is no data contamination, all orderings of an exchangeable benchmark should be equally likely. In contrast, the tendency for language models to memorize example order means that a contaminated language model will find certain canonical orderings to be much more likely than others. Our test flags potential contamination whenever the likelihood of a canonically ordered benchmark dataset is significantly higher than the likelihood after shuffling the examples. * We demonstrate that our procedure is sensitive enough to reliably prove test set contamination in challenging situations, including models as small as 1.4 billion parameters, on small test sets of only 1000 examples, and datasets that appear only a few times in the pretraining corpus.”* Outstanding Paper mention: “A simple yet elegant method to test whether a supervised-learning dataset has been included in LLM training.”* Thomas Scialom (Meta AI-FAIR w/ Yann LeCun): GAIA: A Benchmark for General AI Assistants (paper)* “We introduce GAIA, a benchmark for General AI Assistants that, if solved, would represent a milestone in AI research. GAIA proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency. * GAIA questions are conceptually simple for humans yet challenging for most advanced AIs: we show that human respondents obtain 92% vs. 15% for GPT-4 equipped with plugins. * GAIA's philosophy departs from the current trend in AI benchmarks suggesting to target tasks that are ever more difficult for humans. We posit that the advent of Artificial General Intelligence (AGI) hinges on a system's capability to exhibit similar robustness as the average human does on such questions. Using GAIA's methodology, we devise 466 questions and their answer.* * Mortiz Hardt (Max Planck Institute): The emerging science of benchmarks (ICLR stream)* “Benchmarks are the keystone that hold the machine learning community together. Growing as a research paradigm since the 1980s, there's much we've done with them, but little we know about them. In this talk, I will trace the rudiments of an emerging science of benchmarks through selected empirical and theoretical observations. Specifically, we'll discuss the role of annotator errors, external validity of model rankings, and the promise of multi-task benchmarks. The results in each case challenge conventional wisdom and underscore the benefits of developing a science of benchmarks.”Section C: Reasoning and Post-Training* Akari Asai (UW) et al: Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection (ICLR oral, website)* (Bad RAG implementations) indiscriminately retrieving and incorporating a fixed number of retrieved passages, regardless of whether retrieval is necessary, or passages are relevant, diminishes LM versatility or can lead to unhelpful response generation. * We introduce a new framework called Self-Reflective Retrieval-Augmented Generation (Self-RAG) that enhances an LM's quality and factuality through retrieval and self-reflection. * Our framework trains a single arbitrary LM that adaptively retrieves passages on-demand, and generates and reflects on retrieved passages and its generations using special tokens, called reflection tokens. Generating reflection tokens makes the LM controllable during the inference phase, enabling it to tailor its behavior to diverse task requirements. * Self-RAG (7B and 13B parameters) outperforms ChatGPT and retrieval-augmented Llama2-chat on Open-domain QA, reasoning, and fact verification tasks, and it shows significant gains in improving factuality and citation accuracy for long-form generations relative to these models. * Hunter Lightman (OpenAI): Let's Verify Step By Step (paper)* “Even state-of-the-art models still regularly produce logical mistakes. To train more reliable models, we can turn either to outcome supervision, which provides feedback for a final result, or process supervision, which provides feedback for each intermediate reasoning step. * We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset. Our process-supervised model solves 78% of problems from a representative subset of the MATH test set. Additionally, we show that active learning significantly improves the efficacy of process supervision. * To support related research, we also release PRM800K, the complete dataset of 800,000 step-level human feedback labels used to train our best reward model.* * Noam Brown - workshop on Generative Models for Decision Making* Solving Quantitative Reasoning Problems with Language Models (Minerva paper)* Describes some charts taken directly from the Let's Verify Step By Step paper listed/screenshotted above.* Lilian Weng (OpenAI) - Towards Safe AGI (ICLR talk)* OpenAI Model Spec* OpenAI Instruction Hierarchy: The Instruction Hierarchy: Training LLMs to Prioritize Privileged InstructionsSection D: Agent Systems* Izzeddin Gur (Google DeepMind): A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR oral, paper)* [Agent] performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML.* We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions.* WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those.* We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization.* We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.* Sirui Hong (DeepWisdom): MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework (ICLR Oral, Paper)* We introduce MetaGPT, an innovative meta-programming framework incorporating efficient human workflows into LLM-based multi-agent collaborations. MetaGPT encodes Standardized Operating Procedures (SOPs) into prompt sequences for more streamlined workflows, thus allowing agents with human-like domain expertise to verify intermediate results and reduce errors. MetaGPT utilizes an assembly line paradigm to assign diverse roles to various agents, efficiently breaking down complex tasks into subtasks involving many agents working together. Bonus: Notable Related Papers on LLM CapabilitiesThis includes a bunch of papers we wanted to feature above but could not.* Lukas Berglund (Vanderbilt) et al: The Reversal Curse: LLMs trained on “A is B” fail to learn “B is A” (ICLR poster, paper, Github)* We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form ''A is B'', it will not automatically generalize to the reverse direction ''B is A''. This is the Reversal Curse. * The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as ''Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]'' and the reverse ''Who is Mary Lee Pfeiffer's son?''. GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter.* * Omar Khattab (Stanford): DSPy: Compiling Declarative Language Model Calls into State-of-the-Art Pipelines (ICLR Spotlight Poster, GitHub)* presented by Krista Opsahl-Ong* “Existing LM pipelines are typically implemented using hard-coded “prompt templates”, i.e. lengthy strings discovered via trial and error. Toward a more systematic approach for developing and optimizing LM pipelines, we introduce DSPy, a programming model that abstracts LM pipelines as text transformation graphs, or imperative computational graphs where LMs are invoked through declarative modules. * DSPy modules are parameterized, meaning they can learn how to apply compositions of prompting, finetuning, augmentation, and reasoning techniques. * We design a compiler that will optimize any DSPy pipeline to maximize a given metric, by creating and collecting demonstrations. * We conduct two case studies, showing that succinct DSPy programs can express and optimize pipelines that reason about math word problems, tackle multi-hop retrieval, answer complex questions, and control agent loops. * Within minutes of compiling, DSPy can automatically produce pipelines that outperform out-of-the-box few-shot prompting as well as expert-created demonstrations for GPT-3.5 and Llama2-13b-chat. On top of that, DSPy programs compiled for relatively small LMs like 770M parameter T5 and Llama2-13b-chat are competitive with many approaches that rely on large and proprietary LMs like GPT-3.5 and on expert-written prompt chains. * * MuSR: Testing the Limits of Chain-of-thought with Multistep Soft Reasoning* Scaling Laws for Associative Memories * DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models* Efficient Streaming Language Models with Attention Sinks Get full access to Latent Space at www.latent.space/subscribe

Cribl: The Stream Life
Keep Scalin'

Cribl: The Stream Life

Play Episode Listen Later Jun 10, 2024 29:27


In this episode of The Stream Life Podcast, I chat with Nick Romito about the journey to building support for 50k Cribl Edge nodes in customer deployments. Resources The Journey to 100x-ing Control Plane Scale for Cribl Edge If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
The State of OCSF

Cribl: The Stream Life

Play Episode Listen Later May 23, 2024 20:07


In this episode of The Stream Life Podcast, Nick Heudecker joins the show to talk about his recent LinkedIn article about OCSF (Open Cybersecurity Schema Framework). Resources What is OCSF? Nick's recent LinkedIn article  If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.

Cribl: The Stream Life
Customer Choice and Technical Alliances

Cribl: The Stream Life

Play Episode Listen Later May 23, 2024 19:16


In this episode of The Stream Life Podcast, Vlad Melnik joins the show to discuss all the news about Cribl's new Technical Alliance Partner program and why customer choice for data will be the decade's theme in IT and Security.   Resources Vlad's Blog Cribl's Press Release If you want to automatically get every episode of the Stream Life podcast, you can subscribe on your favorite podcast app. Cribl, the Data Engine for IT and Security, empowers organizations to transform their data strategy. Customers use Cribl's suite of products to collect, process, route, and analyze all IT and security data, delivering the flexibility, choice, and control required to adapt to their ever-changing needs. We offer free training, certifications, and a free tier across our products. Our community Slack features Cribl engineers, partners, and customers who can answer your questions as you get started and continue to build and evolve. We also offer a variety of hands-on Sandboxes for those interested in how companies globally leverage our products for their data challenges.