POPULARITY
William and Eyvonne discuss recent tech news, including the growing political and community opposition to AI data centers driven by fears over power and water usage. They also analyze the “AI Chip War” as hyperscalers such as AWS and Google invest in specialized silicon for training and inference. Episode Links: Amid backlash, O'Leary Digital CEO... Read more »
Send us Fan MailWhat's New in Cloud FinOps: May 2026 Monthly RecapIn this combined monthly recap for May 2026, Frank Contrepois and Stephen Old dive into a vast array of updates across AWS, Google Cloud, and Azure, with a special focus on the evolving landscape of AI FinOps, hybrid cloud challenges, and a barrage of storage news.The Expanding Scope of FinOps: From Data Centre to AIThe discussion opens by exploring the expansion of FinOps beyond the public cloud to encompass on-premise data centres, software, AI, and sustainability. A central theme is the application of the FinOps Open Cost and Usage Specification (FOCUS) to on-premise environments. Stephen shares firsthand experience transposing software data into FOCUS to create a converged platform, highlighting the fundamental data challenges, from ingesting contract data to managing the high velocity of cloud data.The conversation then shifts to the burgeoning role of AI, noting its inclusion alongside SaaS and professional services in the modern FinOps scope. This introduces new forecasting challenges, as traditional 18-month budget cycles clash with the rapid pace of weekly AI model releases.A critical point is also raised regarding sustainability. The hosts discuss Amazon's board rejecting a shareholder proposal for detailed climate disclosures, which poses a significant challenge for companies needing granular data for CSRD and SEC compliance.Major Cloud Updates: April 2026AI & FinOps Visibility:A major theme is the improvement in attributing AI spend. A game-changing update from AWS means Bedrock API calls now automatically record the IAM identity (user or role) of the caller directly into CUR 2.0 and Cost Explorer. This eliminates the complex need to reconcile CloudTrail logs to determine who is driving Bedrock costs.Similarly, Amazon Q is now embedded in the AWS Cost Explorer, allowing users to ask natural language questions about their spending (e.g., "Why did my RDS costs spike last month?"). This conversational analysis approach comes with a free tier of 50 queries per month.On the Google Cloud side, a new billing overview widget for Gemini and Vertex AI spend is now in preview. Google is also introducing a "FinOps Explainability Agent," an autonomous AI agent to investigate AI cost drivers, and "Spend Caps" (Private Preview) for services like AI Studio and Vertex AI, which provide crucial cost control by pausing API traffic when a budget is hit.For those managing GPU workloads, Amazon ECS managed instances now support NVIDIA GPU metrics in CloudWatch Container Insights, enabling real-time visibility into GPU utilisation and health to optimise expensive accelerated computing.Cost & Usage Reporting (CUR) Enhancements:There are hints of a potential enhancement to AWS CUR 2.0, which could see new columns added to directly link API calls with costs, revolutionising cost allocation. AWS has also introduced:Scheduled Email Delivery for Billing Dashboards: Securely send reports to stakeholders without console access.Billing Conductor Pass-Through Plan: Simplifies centralised billing for billing transfer users.Cost Optimization Hub CSV Downloads: Easily export savings recommendations.Find out how to leverage CUR for security: "Identifying security risks using AWS cost and usage report data"Compute & Database Innovations:AWS: Released a wave of 8th Generation Intel Instances (C8i, M8i, R8i and network-optimised versions) powered by custom 6th Gen Xeon processors. EC2 Capacity Manager also now supports tag-based dimensions, allowing for more granular capacity optimisation. Amazon Aurora Serverless now boasts up to 30% better performance and, crucially, scales down to zero, a cost-effective option for unpredictable agentic AI workloads.Google Cloud: At Google Cloud Next, they announced both ends of the performance spectrum. The 8th Generation TPUs (v8t for training, v8i for inference) offer massive scale and performance-per-dollar improvements. In a move to democratise access, Google also made fractional GPUs (1/2, 1/4, or 1/8) on the G4 series generally available, a game-changer for cost-effectively running smaller workloads. The GKE workload recommender is also now integrated into the FinOps Hub.Azure: Now supports NVIDIA's powerful H100 and H200 GPUs on Azure Red Hat OpenShift (ARO) for large-scale AI/HPC workloads. For database users, the GA of Premium SSD v2 for Azure Database for PostgreSQL promises significantly higher IOPS and better price-performance.A Deep Dive into Azure Storage:The episode covers an "overload" of Azure storage updates with significant FinOps implications:Minimum Billable Object Size: From 1st July 2026 for new accounts (and 2027 for all), objects smaller than 128KB in cool, cold, and archive tiers will be billed as if they are 128KB.Smart Tier for Azure Blob & ADLS (GA): To mitigate the above, this feature automatically tiers data based on access patterns but introduces a monitoring fee for objects over 128KB, creating a new optimisation puzzle.Azure NetApp Files (ANF) Ransomware Protection: Now GA and included as part of the service at no extra charge.Finally, the hosts tackle "The Big Silence on Memory Prices," noting that despite DDR memory prices soaring 300-400% from mid-2025 lows, the hyperscalers have remained silent, absorbing the cost and making it difficult for smaller providers to compete.Explore the official announcements:AI Bill of Materials Whitepaper: www.wiz.io/go/ai-security/ai-bill-of-materialsAWS Article on Amazon Q: https://aws.amazon.com/blogs/aws-cloud-financial-management/transforming-finops-with-the-latest-amazon-q-cost-capabilities/
Take the 2026 AI Engineering Survey and get >$2k in credits and AIE WF tickets!On the product side, everyone is getting Computer - Perplexity, Manus, Cursor, and so on. Meanwhile on the research side, agentic evals like TerminalBench and GDPVal are also assuming computer (Harbor). On both ends, the consolidating LLM OS stack has become a standard toolkit, and Daytona is one of a small set of AI Infra companies that are booming because of it.“The end of localhost” has been Ivan Burazin's obsession for more than a decade.Something that is all too familiar…Long before agents became the default way people talked about software development, Ivan was already chasing the idea that development should not depend on a fragile local machine. CodeAnywhere, one of the first browser-based IDEs, was an early attempt at that future: move the development environment into the cloud, make setup reproducible, and free developers from the endless “works on my machine” tax.The thesis was directionally right, but the market wasn't ready yet.However, agents changed that. They do not care about a laptop, desk setup, or favorite editor. They need a computer they can access through an API: something stateful enough to keep working, fast enough to spin up instantly, flexible enough to resize, isolated enough to be safe, and composable enough to run the messy real-world workflows that real software engineering actually requires.Daytona isn't just selling “sandboxes” in the narrow code-execution sense. It is the latest version of Ivan's original localhost thesis.In this episode, Daytona's CEO joins swyx to explain why AI agents need more than code execution boxes: they need composable computers, stateful sandboxes, instant startup, dynamic resources, and infrastructure that can survive workloads going from zero to 100,000 CPUs.We go deep on the new agent compute market: Daytona's hard pivot from human dev environments to AI sandboxes, the New Year's Eve MVP that customers begged for, why Daytona runs on bare metal with its own scheduler, how one customer runs almost 850,000 sandboxes a day, and why RL/eval workloads went from 0% to roughly 50% of usage in just months. Ivan also explains why agents need Windows and macOS machines, why CLI may matter more than MCP, why Kubernetes is painful for this workload, and why the future AI cloud may look more like Stripe than AWS.We discuss:* How Daytona grew out of CodeAnywhere, Shift, and the “end of localhost” thesis* Why Daytona pivoted from human dev environments to AI sandboxes* Why agents need composable computers instead of disposable code execution boxes* The New Year's Eve MVP that customers chased API keys for* Why Daytona chose bare metal, stateful snapshots, and its own scheduler* How Daytona spins up one sandbox in ~60ms and 50,000 sandboxes in ~75 seconds* Why Daytona's biggest customer runs ~850,000 sandboxes a day* How RL/eval workloads create zero-to-100,000 CPU spikes* Why RL workloads went from 0% to roughly 50% of Daytona usage* Why customers compare Daytona against EKS/GKS and say they're “never going back”* Why every AI agent may need a computer, including Windows and macOS environments* The Apple licensing constraints that make macOS sandboxes hard* Why CLI gives agents more power than MCP* How open source helps agents integrate Daytona* Why agent-generated PRs may break today's CI/CD assumptions* Why AI SaaS companies reselling tokens may face a cold shower* Why the AI cloud may look more like Stripe than AWSIvan Burazin* LinkedIn: https://www.linkedin.com/in/ivanburazin* X: https://x.com/ivanburazinDaytona* Website: https://www.daytona.io* X: https://x.com/daytonaioTimestamps* 00:00:00 Hook* 00:01:12 Introduction* 00:03:15 CodeAnywhere, Shift, and the end of localhost* 00:05:58 What Daytona is: composable computers for AI agents* 00:08:07 The pivot from dev environments to AI sandboxes* 00:10:17 The New Year's Eve MVP and customers begging for API keys* 00:12:56 Bare metal, stateful sandboxes, and Daytona's scheduler* 00:17:28 60ms startup, 50,000 sandboxes, and 850K daily runs* 00:21:53 Spiky RL/eval workloads and the new agent infra problem* 00:28:12 RL workloads, Kubernetes pain, and dynamic resizing* 00:33:31 Why every AI agent needs a computer* 00:38:48 macOS sandboxes and Apple's licensing problem* 00:44:28 Why CLI may matter more than MCP* 00:48:11 Open source, GitHub stars, and agent integration* 00:53:11 Git, CI/CD, and agent collaboration bottlenecks* 00:58:15 Founder life and building a 25-person infra company* 01:02:44 AI SaaS, token resale, and API-first business models* 01:06:10 GPU sandboxes, data centers, and compute growth* 01:09:48 Why the AI cloud may look more like Stripe than AWS* 01:11:26 Closing thoughtsTranscriptIntroduction: Daytona, CodeAnywhere, and the End of LocalhostSwyx [00:00:02]: Okay, we're in the studio with Ivan Burazin, CEO of Daytona. Welcome.Ivan [00:00:07]: Thanks for having me, man.Swyx [00:00:08]: Ivan, you and I go back.Ivan [00:00:10]: Way back.Swyx [00:00:11]: How I don't even know how, you found, did you reach out or, for Shift.Ivan [00:00:17]: I reached out to you. The reason was you - we were just - we were thinking about I was one of the co-founders of CodeAnywhere, the first browser-based IDE, and so we were thinking a long time of, localhost should die. And you had this article.Swyx [00:00:29]: End of localhost.Ivan [00:00:30]: Then I reached out to you because of that, and then we talked, and I was actually at a different job and learning about I was the head of, developer experience, and you were quite well-versed in that, and I actually reached out to you, among other people, how do we go about that? What are the key things and whatnot at this point in time? And you were nice enough to take the call, and I remember I was late on your call with you.Swyx [00:00:51]: I don't remember.Ivan [00:00:52]: I remember because I was with my then I'm thinking of a girlfriend or wife at that point in time, I'm not sure. It's the same person, so that's great, and I was late ‘cause we were, in, Italy on, vacation, and then I was late for something. I felt so bad, and you were so nice to be, good about.Swyx [00:01:10]: The reason I'm nice is because I'm also late to other people, so it's like, who's, who's without sin here, yeah, so I have to, for those who don't know, InfoBip Shift, there's this whole thing that, you did in the past, and, and that was basically one of the inspirations for me starting AI Engineer, which is like, I have to thank you for giving me that push to be like, “Oh, you can, you can build and sell conferences?”Ivan [00:01:34]: I remember you asked you asked me at the beginning to give me advisory shares, and I was so focused on what we were doing, I said no, and I should've took the advisory shares. So I'm sorry, dude. But anyway.Swyx [00:01:43]: We're not, we're not venture backed.Ivan [00:01:44]: No, it doesn't matter.Swyx [00:01:45]: It's Yeah, anyway, so I think what's impressive about you is that CodeAnywhere is the thing that you've been trying to build, and, you kind of put it on hold and then came back after InfoBip. Just give us the story, do you - the story and the origin story, going into Daytona.From CodeAnywhere and Shift to DaytonaIvan [00:02:05]: Sure. Like, really way back, me and my co-founder have been together. I say this, I've said this multiple times, it's like we were married and divorced and married. Some people actually ask me is my co-founder my partner. they thought it literally. It's not literally, but we have done multiple companies together, and to your point, we had this shift where we went from the CodeAnywhere to the conference called Shift, and then back to, Daytona. We originally started stacking servers, doing like virtualization in the early 2000s and, routers and doing basically all these things, at a foundational level, and that was a services company which we sold to focus on what my co-founder actually invented, which was the very first browser-based IDE, right, I say the first. Before us was actually Heroku. They did it for a very short time until they became Heroku. But outside of them, we were the only one, and it was called.Swyx [00:02:55]: There was Cloud9.Ivan [00:02:57]: Cloud9 came out slightly after us. There was Replit, which came out when we stopped doing it, Replit came out, and they have been successful since then, which is great. There was Nitrous.io. There was quite a few that existed at the time, but it was like too early. But the interesting part is that we, at that point in time, because there was no VS Code, there was no Kubernetes, and Docker had just started when we Or I'm not sure if it was even public at that point in time. And so we had to build everything to the whole stack ourselves and that was the key learning that we brought into and that we've been using in Daytona today. So it was super early. There's about 3 million people used CodeAnywhere. It was slightly, it was angel-backed more than venture-backed. We ended up paying everyone back because it didn't have that sort of scale. But, three years ago, we started something similar with Daytona, which is not what we are today, but it was automating dev environments for human engineers, the basically the underlying stack of CodeAnywhere. And then we did a hard pivot last January to sandboxes. And so here we are.Swyx [00:04:01]: Historic pivot, yeah, and, it's one of those things where, I had independently invested in CodeAnywhere, but also in E2B, and then both of you pivoted into the same thing, and I'm like, “F**k.”Ivan [00:04:12]: You invested, you invested in Daytona. You invested in Daytona. But you were the first If we had not got your check, we wouldn't have done it.Swyx [00:04:18]: No way.Ivan [00:04:19]: No, it was like, “We have to get him on board first,” and you were that kicker that we, that got us off the ground.Swyx [00:04:23]: No, because you were putting me on your pitch deck, man. I was like, “Man, this is like a good trip if I don't invest.”Ivan [00:04:29]: That's because it was your quote. It's like we.Swyx [00:04:30]: Yeah. It's the end of localhost.Ivan [00:04:31]: Did a bunch of research about end of localhost and who was interested in that,.Swyx [00:04:34]: No, that's like, I put, I wrote that blog post, and every single company in that field reached out to me, and then every VC who was receiving those pitches then also had to call me and, talk it, talk through it with me.Ivan [00:04:47]: It's finally happening though.Swyx [00:04:48]: It was really super interesting.Ivan [00:04:48]: It's finally happening.Swyx [00:04:49]: It's finally happening.Ivan [00:04:49]: Yeah, it's finally.Swyx [00:04:49]: It's finally happening, with maybe sort of non-human users. Yeah, so what is Daytona today? Let's get like a quick description. I'm wearing the shirt.What Daytona Is Today: Composable Computers for AI AgentsIvan [00:04:58]: You're wearing the shirt. Yes,.Swyx [00:04:59]: It says, I think your branding is very good. Like, it's very consistent. It runs AI code. Like, it cannot be simpler.Ivan [00:05:05]: Exactly, but we're gonna probably have to change that.Swyx [00:05:07]: Oh, s**t.Ivan [00:05:07]: It's also a subset of what we do. Unfortunately, we really love this, Run AI Code is super simple. People interpret it different ways. I think we've given out 5,000, 6,000 of these shirts. People wear them with pride because it doesn't really market about us.Swyx [00:05:21]: Yeah, Daytona's on the back.Ivan [00:05:22]: It markets the back. It markets to the person itself, so I think we did a really good job on that one. But it is also a subset of what we do, because people, when they think about Run AI Code, they just think about these small, let's call it isolates, code execution boxes that, you send some code, you get an output. Whereas what Daytona is today is essentially composable computers for AI agents. It is, the market calls them sandboxes which can be misleading.Swyx [00:05:44]: All these things. All these things on.Ivan [00:05:45]: Yeah, exactly, ‘cause it can be misleading ‘cause people usually think about sandboxes as a demo or a test environment versus a production-grade environment. But what Daytona does, if you think of the laptop that you have in front of you or the computer that's over there, or, my wife is an architect, so she has like a Windows with a 3D graphics card inside to do 3D rendering. Like, as humans, we have different computers or different compositions of computers. And our belief is strongly that agents today and going forward will need all these different compositions of computers to do different types of tasks. And so we offer that basically through an API.Swyx [00:06:19]: Yeah, to give people - I'm trying to sort of front-load all the aha moments or the wow moments so that people can, stay engaged and click like and subscribe. the market is exploding, right? Like, you have been reporting 74% month-on-month growth, and it also, it's just been growing for a while. Like, it's been going like this. And every single - It's not just you guys. It's every single.Ivan [00:06:41]: Everyone, yeah.Swyx [00:06:42]: Sort of, compute provider. I don't know if you agree with me saying compute provider or not.Ivan [00:06:48]: It's fine.Swyx [00:06:48]: Yeah. So like organically PLG-driven growth, but also enterprise is doing super well, I think I wanna rewind to January of last year when you did the pivot. Like, so you obviously called this market early, and you were positioned for it, and you are now one of the market leaders. But what was the insight that made you do the pivot?The Pivot: From Human Dev Environments to Agent SandboxesIvan [00:07:06]: The insight that made us do this pivot is the quarter before that, so end of 2024, when we had - Basically, we did a demo with - I don't I think we discussed this as well, Devin was not public. You actually gave me access to Devin at that time. So Devin.Swyx [00:07:25]: I did?Ivan [00:07:26]: Yeah, you gave me access.Swyx [00:07:26]: I don't think I was supposed.Ivan [00:07:27]: Yeah, exactly.Swyx [00:07:28]: Yeah, I.Ivan [00:07:28]: So it doesn't matter. You.Swyx [00:07:29]: Yeah. I gave like three friends access.Ivan [00:07:31]: Yeah, or it was a call and you showed it to me. It doesn't matter. but OpenDevin was available, which is now called OpenHands. And so we're like, “Oh, this seems to be a thing. This is not public. Let's take our for human automation of dev environments and take, OpenDevin and launch that as a SaaS.” And we did that. Not very many people signed up and used it, but a lot of people reached out that were building agents, and they were like, “Hey, my agent needs a compute sandbox runtime,” whatever you wanna call it. I forgot what it was called at that point. And then we were like, “Oh, amazing. This is a new market. Here is our infrastructure. Here's our product, and go.” And what we found really fast, soon, was that people did not like what we had built. It didn't work. And I remember talking to people at the beginning when we're doing this, the sandbox we're building for agents. People were like, “Oh, why is it different? It's the same thing. We have like EC2, we have VMs, we have all these things.” But we saw that everyone we gave it to, it was like 20, 30 people, they all said, “No.” Like, “This is not what we need. This sort of breaks.” And basically, me and my co-founder not knowing a lot about - ‘cause we're infra people. We're not AI people. So I basically took it upon myself to like watch every single podcast that exists, including all of, all of these and all that, and sort of get up to date, read all the blogs, like get, understand what's going on.Swyx [00:08:45]: Do you wanna shout out who else was useful, just in case people are also looking.Ivan [00:08:49]: Generally we -, I looked at There's a few of podcast, different segments and different types. So there's you guys, No Priors, Bill Gurley's was great while.Swyx [00:09:04]: VG2, yeah.Ivan [00:09:05]: Yeah, while it was around. So there's a few. 20VC is interesting from a different dynamic, and some are different dynamic. But there was, also Red Points.Swyx [00:09:14]: We're not really about the compute market.Ivan [00:09:15]: It was also already - Sorry?Swyx [00:09:16]: You're, you want - You're looking at the agent infra market.Ivan [00:09:19]: I was looking at the agent market and the AI market in general and sort of understanding who are the players, what the perception, and how that goes. And like obviously you complement this with like going to conferences, going to events, going to meetups, reading white papers, like doing all the things that you have to do to understand what's happening. And so when we figured, when we sort of had an idea of what we had to build, literally over the New Year's Eve, literally on New Year's Eve, I half vibe coded the first MVP, first minimal viable product of what Daytona is today. And I went to sleep at like 3:00 AM or something like that. I was doing - I just put my like baby daughter and wife to sleep and, Happy New Year's, and go back to just, doing this. And I sent it to my co-founder, my CTO, and he saw it in the morning. He's like, “This is absolute garbage.” “Do not show this to anybody at all, but the idea is good.” And so he took two weeks, and he rebuilt it.Swyx [00:10:09]: Did it like look like that? Listen, I - It was rough idea.Ivan [00:10:12]: Oh, not even, not even close. Like it was it was way worse. But it was like a very - It was a simplistic view of what it should be. Like, it worked, but it was not ideal. And so he went, we went down the whole, which is his job as CTO, to go, and he came back with this version. We then called all the people that had said like, “This is garbage,” a quarter ago. And we set up these calls, and we gave it to - We just demoed it to everyone. And all the calls went long, every single one. They were 15-minute calls, and they all went to like 25, 30 minutes or whatnot. And everyone said, “We need, we want access.” There was no login, just an API key, ‘cause it was just a beta or an alpha. And they said, “Oh, we want access.” And we're like, “Sure, yeah. Okay, thank you very much.” But after like the next day, if we'd not send it, every single one, like every call that we did, everyone came back, “Where is my API key?” Like everyone wanted it. We're like, “S**t.” Like this is it. Like I've never felt So one, the understanding to your point was like most people thought it was the same infrastructure for humans and agents. We understood a quarter ago it's not. We just didn't know what was the right primitive. And then when we came, and we can talk about what that is, and we gave it to these people, I've never seen, I've never experienced - I've done multiple companies in my life. I've never experienced this, that people literally call you if you do not give them access. Like they want access right now. And so it's like, okay, they don't want this. the thing that they want doesn't seem to exist, or they have not found it, and they really want what we want. And then when we understood that we're onto something, and then when you think about the size of the market, like the market for human engineers and enterprise is a very large market, so think GitLab or whatnot. But the market for every single agent that will exist ever in the future is just like, what is that market? How big is that? And we're like, “We are all in on this.” And so that is where we made sort of the cut between the old product and the new one.Bare Metal, Stateful Sandboxes, and the Lambda + EC2 ModelSwyx [00:12:02]: Yeah. But it wasn't composable at the time?Ivan [00:12:05]: It was very - It was basically just a Linux box that you could change, that you could define number of CPUs, disk, and RAM. Like that is what you could do, but you couldn't have multiple operating systems, you couldn't resize it on the fly, you couldn't add a GPU, you couldn't do like all the things. It was just the, just the first sort of variation of that, yeah.Swyx [00:12:22]: Was it bare metal from the start?Ivan [00:12:24]: It was bare metal from the start. And so the interesting thing that we thought about right away, so our.Swyx [00:12:29]: Which, give people the background, what is the normal path?Ivan [00:12:32]: Yeah, so, basically most providers run this on top of VMs. And also.Swyx [00:12:37]: Firecracker.Ivan [00:12:38]: Yeah, they run on Firecracker and VM. And so we also fire - We can get - We have multiple isolation layers and we can do that. But the common way to do it is that they, one, that the state of the machine, or the hard disk is not part of the sandbox itself. And the other thing is they're not meant to last forever. So most of them are preemptible, like they can There's a time that they can live. And so our thought was when we were going into this is, agents will be like humans in the sense of you don't want your laptop to be shut down until you're done with work. Like, and you want to close the lid and open the lid, it's the same state. So you - Agents would want that, like the pause and come back. They want those two things. But also agents really want speed, right? Can they get it? So when we thought about it's like we need something insanely fast, how to make it fast, how to make it long-running, and stateful. And so those two things, it's like combining a Lambda and an EC2, right? Those two things together. And so we didn't have an idea how others did it, ‘cause we didn't know too that there was a market around this. It was more like, okay, this is what we need, what they need. And we looked at Kubernetes, it wasn't wasn't good enough for that. We looked at Nomad, it didn't enable that. And so our history in rewriting our own scheduler at CodeAnywhere is basically what my CTO came up with. Like, he's like, “Oh, the learnings from there,” and he brought it. And the funny thing is, our third co-founder, when he saw it, he's like, “Dude, what is this? This is like 2008.” Like, we went back in time, and he's like, “Exactly.” And so the reason why Daytona is like super fast, and you see this on benchmarks, is we essentially, we run on bare metal. We have our own scheduler, we use the underlying, disk, CPU, and RAM of the underlying machine, which means your IOPS are insanely fast because there's no, there's no network between an EBS or something like that. But also the snapshot, the point in time, the templates, are also preloaded on the bare metal machines. So when you fire off a sandbox from a template or a snapshot, you're essentially directed to the bare metal machine where that snapshot is based on that NVMe drive, and then it literally just turns on that machine, and it's local. There's no network latency, anything on there. And so that is sort of the specificities that we, when we're thinking from first principles, what a computer would look like for an agent, that is what we came up with, and that's what we created.Benchmarks, 60ms Startup, and 50,000 SandboxesSwyx [00:15:02]: Yeah. I should maybe, I don't know if you endorse this, but there's someone that does compute SDK, you guys do very well on there, with like the TTI, right? I. is this a, is this a is this a relevant benchmark for you guys? I don't know.Ivan [00:15:16]: I don't know, and it changes every day. So today RKL is.Swyx [00:15:18]: I don't know what RKL is. Never heard of it.Ivan [00:15:20]: Yeah. RK, yeah, so it is there.Swyx [00:15:22]: You are, at least a third of the next tier of performance, and then, there's a lot of other better-known names that are very slow to start.Ivan [00:15:31]: Yeah. We've been the number one by far for a long time, and now there's different, there's different definitions also of sandboxes, different isolation patterns, different other things. So RKL runs it literally on the S3, the data, so it's very different, and they spin up a sandbox, spin up a container for that, so it's a different type of thing. So the definition of a sandbox is something that we can all, we all need to get along with. But yeah, we're insanely fast on getting these things, up and running. And so you can see even there that it's a zero point 0.10 to 0.11, so.Swyx [00:16:03]: Close enough. Yeah. what else do you need, right?Ivan [00:16:05]: Yeah. So the benchmarks itself, so, in this, in I don't think the benchmarks equate to market ownership or revenue or anything like that. and I've seen this with multiple benchmarks, not just in sandboxes, but in general benchmarks around.Swyx [00:16:20]: It's table stakes. It's just like.Ivan [00:16:21]: Exactly. But it doesn't hurt.Swyx [00:16:22]: Just roughly check.Ivan [00:16:22]: Like you definitely have to be up there and you have to be competing so that people know that, oh, this is definitely one of the top. Because this is only one dimension of what customers look for. There's other things like how many can you spin up consecutively? There's a feature set, there's support, there's like all different things that people look at, but you definitely have to be there, on the benchmarks.Swyx [00:16:40]: How many people do people spin up consecutively?Ivan [00:16:43]: So we have.Swyx [00:16:43]: Or concurrently, is the Concurrency, right?Ivan [00:16:45]: There's three metrics that we look at. And so one is like time to spin up one, and so our time to spin up one is 60 milliseconds with network latency. So request, spin up, reply, 60, the whole thing, 60 milliseconds. That is one. But if you wanna spin up 50,000 at once, we are now at about 75 seconds. So it takes about 75 seconds to spin up concurrently 50,000. Some others, there's public data around this, like take 2,000 seconds, which is 30 minutes. Like there's different variations of that. And then there is the so it is speed of one, speed of like multiple, and then how many can you consistently have up and running. And so we basically have right now no limit to how much we can add because we basically own our own metal. But the biggest customer of ours does like about 850,000 every single day is sort of where they're, where they're just shy of a million every single day that they're running, we do have a request for half a million concurrent, which is literally half a million CPUs somewhere running. So that's an interesting.Swyx [00:17:44]: They pay by like vCPU seconds.Ivan [00:17:47]: By seconds, yeah.Swyx [00:17:47]: Or whatever. Yeah. Okay, and so and then, and the other thing is, the sleeping and the resuming, ‘cause it's all the stateful resumption of all these things, how, what kind of workload are people putting through this, right? Like how is it Do we measure by gigabytes in memory, gigabytes in storage? I don't In like network attached storage. I, what are the costly ones of, out of all these features?Workload Economics: CPU, RAM, Network, and StorageIvan [00:18:15]: The most expensive thing are CPU.Swyx [00:18:18]: Okay. Yeah, of course.Ivan [00:18:18]: The second one, yeah Then it's RAM, then it's disk. We actually don't charge.Swyx [00:18:22]: Which is snapshotting, right?Ivan [00:18:23]: No, it's actually the, snapshotting's part of it, but basically the size of your hard disk, of your machine. So do you have 10 gigabytes, do you have 20, do you have 50, do you have whatever? And then the transference of that. Right now, currently we don't charge for, network at all at Polychron.Swyx [00:18:37]: Oh, you gotta, yeah, you gotta fix.Ivan [00:18:38]: Yeah. It is very much a it's a larger and larger part of our bill, so we're working around, that part there. Obviously, that is the least, expensive, so the hard disk is the least expensive, so it's basically CPU, RAM, for us network, ‘cause we don't charge the customer, and then hard disk, is how it's split up. But there's also different types of workloads, so we basically split it up into two types of workloads in Daytona. One is what we call background agents or long-running agents. and the other is, basically RLs and evals, which I put sort of together. And so they have very different patterns of usage, and if you look at the usage of a background And I'll just name names of companies, not specifically.Background Agents vs. RL/Evals: Two Usage ShapesSwyx [00:19:21]: Yeah, open, all hands.Ivan [00:19:23]: Yeah. So like a background agent's a Cognition, a Lovable, a like all these things are Harvey. These are all long-running, background agents. And so if you look at their usage patterns, their usage patterns are similar to human, which is like follow the sun. Basically, the usage patterns of that is like noon is probably the highest, and the midnight is the lowest, and then weekends are lower. weekday is higher.Swyx [00:19:42]: Yeah, that's a fun question. How global is it? Is it very US-centric or?Ivan [00:19:46]: The US is a large part, but we have currently, we have Asia, Europe, and the US regions.Swyx [00:19:52]: So it's quite global.Ivan [00:19:53]: Yeah, it's quite global. We have it all over. It's interesting that our I talked to you a bit about this. Our number one city by user.Swyx [00:20:01]: Hmm.Ivan [00:20:02]: Is Singapore.Swyx [00:20:04]: Oh, wow. Amazing.Ivan [00:20:05]: Which is an interesting one, right? Not by revenue, just by just like by individual head count.Swyx [00:20:09]: Really?Ivan [00:20:09]: Just like an interesting thing.Swyx [00:20:10]: Singapore is, Singapore is weirdly high in the adoption charts of AI for the population. It's like an, seven, eight million population. And it's like keeps showing up.Ivan [00:20:20]: No, it's quite interesting. We were quite shocked, and I was like, “Oh, this is interesting.” And also one that's up there.Swyx [00:20:24]: There's a reason I'm doing AI using Singapore. it's because I'm from there.Ivan [00:20:27]: We're there. We're gonna, we're gonna be there as well. and it's interesting that Japan is in the top or like Tokyo's in the top, which is in all the tech cycles it has never been. It has never been, so it's quite interesting that they're.Swyx [00:20:39]: I think the Japanese just love AI. Yeah. It's that, and then it's Brazil. That's it.Ivan [00:20:44]: Brazil has always been in.Swyx [00:20:45]: I think.Ivan [00:20:46]: Even when I look, if you look at like GitHub's data and ask historically with CodeAnywhere, it was always like US, Western Europe, and then you'd have like India, Brazil, China, like that would be there. But like Singapore was not in, specifically Japan was never in sort of that top, that top.Swyx [00:21:01]: Yeah. Weird pockets.Ivan [00:21:01]: Weird. Yeah, so it's very global.Swyx [00:21:02]: Okay, so actually that, but that's helps you to distribute your load through, all time?Ivan [00:21:08]: The interesting thing is like we have those kind of loads, but if you look at the researcher loads, they're quite different. So what they are is like if you give them concurrency of 10,000 or 50,000 or 100,000 CPUs at ARMb, when they fire off a run, it's just 100%. And then it just runs, and then it stops. So it's very, the usage pattern is squares basically, right? And it's also not follow the sun, because people will fire it off at midnight before they go to sleep but then wake up and so it's very unpredictable, so you don't know where that is. So the shapes of the usage are quite different than we have had before. And also what's interesting is when it's sort of a follow the sun, even if you have a high growth company, you can sort of predict your usage patterns and have enough capacity for that, because it's sort of, it grows in a, in a way you can project. When you have companies doing sort of like evals and RL, they're super spiky. So they're gonna come in, it's like, “We're gonna use nothing, then can we have 100,000?” Right? And then go back down. And then 100,000, go back down. So it's very different, right? And.Swyx [00:22:09]: Do you want to lock them into commits so.Ivan [00:22:11]: Yeah, we do.Swyx [00:22:12]: Yeah, okay.Ivan [00:22:12]: We so we have to lock them into some sort of commits to have that capacity, because we have to have, basically we have to have the capacity for peak. Right? And so right now, Daytona's mean utilization is 15%, 1-5.Swyx [00:22:25]: Oh my God.Ivan [00:22:26]: So it's very low.Swyx [00:22:27]: Because it's very spiky.Ivan [00:22:27]: It's very spiky, but we get up to 90%. so we have these things. And so what we're, what we're looking at right now as a company is similar to Cloudflare where you can like geo move things around, but that works really well for basically the background agent where it's follow the sun. But this, it's not. Like it's a very different shape. Obviously with scale you figure these things out, but that's an interesting new problem that we have, as a compute provider in the agent space. And when we were doing the conference recently, and so we talked to like Nikita from Neon and.Swyx [00:22:57]: I should bring it up.Ivan [00:22:58]: Parag from Parallel and whatnot, everyone has the same problem. Whereas the usage is super spiky, and this is something that has not happened before, that you have these types of like it was always, it the amplitudes were not this high, right? So it's quite interesting use case and problem solve.Compute Conference and Spiky Agent InfrastructureSwyx [00:23:12]: Yeah, I don't know if we're gonna bring this up again, but let's just talk about the conference, you had like 1,000 something people at the Warriors game, at the Sorry, where is it? What's.Ivan [00:23:22]: Chase Center.Swyx [00:23:23]: Chase Center.Ivan [00:23:23]: Chase Center.Swyx [00:23:24]: I went. It was, it was very impressive. Obviously, you can, how to throw a conference, what did you learn? you put, you pulled together all these impressive names.Ivan [00:23:33]: What I.Swyx [00:23:34]: What were you looking for?Ivan [00:23:35]: My thesis behind the Compute Conference was let's bring together people that are building infrastructure for AI agents. Because when I think of what we're building, it is the agent is the primary user, what are the ergonomics and usage patterns of agents, and so we can do that. And what I found, this was a theory, it wasn't proven, is that we all have these problems, as I touched onto. And I was, as I was talking on stage, it was like we all have the same underlying infra problems, which is this spiky workloads, unpredictable workloads that we've never had before, in human, compute or human infrastructure. And it's, again, it's the same when I was talking to Parag or when I was talking.Swyx [00:24:20]: Lynn. Nikita.Ivan [00:24:21]: Lynn, Nikita. Lynn especially, I was talking to her the other day as well. Like the It is a very interesting type of problem to solve because I can touch on Cloudflare because there's a lot of like talk about that recently as to how they solve that, which is they have a bunch of geos, and basically, as users work in different places, and depending on your tier, they can move you around the geos. And so that how, that's how they get the higher utilization. But you can sort of predict these, and it's If it's something in You'll rarely get a spike that is 10 orders of magnitude. Like you'll get a like let's say one of your customers has some like an exponential curve. What is that to I'm using Cloudflare as an example. 10%, 20%, whatever it is. I don't, I don't have this data, I'm just assessing. It's surely not 10x, right? It's surely not something there. And so how do you go out and solve this problem? And we're all solving this in different ways. So we have.Swyx [00:25:11]: She also has the same thing.Ivan [00:25:12]: Yeah, I know specifically that like Neon had that issue as well. Like how are we solving these spiky loads and things like that ‘cause we talked about it. And so the interesting thing for me to actually internalize was, yes, everyone that's building for agents first is going through this, and we're all solving similar problems, which is quite.Swyx [00:25:28]: Let me let me double-click on this. Okay. So for example, Neon, I happen to know that they're very sort of S3 oriented, right? so they're just like fully bet on S3. And you get to benefit from S3's distribution and infrastructure. So I would imagine that Neon doesn't have to care, whereas Lynn maybe has to care a bit more because obviously she's doing GPU inference. And, for listeners, we did an episode with her, one and a half years ago. And you have to care. But like, right?Ivan [00:25:54]: Parag cares for sure, and Nikita.Swyx [00:25:58]: And Parag is C of, Parallel.Ivan [00:25:59]: Parallel, yeah.Swyx [00:26:00]: Former CTO of Twitter.Ivan [00:26:01]: Twitter, yeah.Swyx [00:26:02]: They are the search.Ivan [00:26:03]: Yeah, they're search, yeah.Swyx [00:26:03]: I You and I know but the listeners don't know.Ivan [00:26:08]: Yeah, we can put it down in the screen, and so ‘cause we, when we were talking.Swyx [00:26:11]: I'll put it up on the, on the screen.Ivan [00:26:12]: Yeah, right.Swyx [00:26:12]: People can look it up if they need.Ivan [00:26:14]: Look it up. And, yes, but they still have CPU and RAM, allocation that you have to have up and running. And so CPU and RAM, you have to allocate that and have that ready. And so there's basically two ways to do it. One is you either over-provision and you can handle the bursts, or two, you basically have, I don't know if this is a term, just-in-time compute, which is like as your load becomes, as your usage comes in, you can fire off requests for VMs or bare metals at other cloud providers and then get them up and running.Swyx [00:26:43]: This is if you go above 100%, right?Ivan [00:26:45]: Yeah, this is.Swyx [00:26:46]: Like your overflow.Ivan [00:26:46]: If your overflow, like spillage or whatever you do.Swyx [00:26:48]: You probably lose money on it, but it doesn't matter, right?Ivan [00:26:50]: It, not Well, you might, you might not That is a more cost-effective way to do it but it's a slower way to do it. Because basically what you have to do is you have to like queue your requests, spin up these just-in-time compute, get it all ready, provision it, and then get your workload there. And so if the time isn't important that much, that's fine, and you can do that. But if your customer, and especially for, let's say, the RL training runs, the reason why a lot of people come to us is because GPUs are more expensive than CPUs, right? So you want your GPU running at, what, 100% the entire time. And so when you're running runs on CPUs, when the when the CPU cycle is like down and spinning up the next one, you want that to be instantaneous so that your GPU doesn't go down, right? And if you then have to like go out and provision machines, you're essentially telling the GPU that it has to wait, and that's incurring our cost. So there's things that you have to try to solve for there.RL Workloads, Declarative Images, and Kubernetes ReplacementSwyx [00:27:43]: Yeah, let's talk about the different workload, right? You said that, what was it? A few months ago, you had zero RL workload and now it's 50%.Ivan [00:27:52]: It will be this one, 50%, yeah.Swyx [00:27:54]: Let's talk about how different it is, right? Like I imagine, for example, a lot less dynamic code generation of like arbitrary code. Like here, it's probably all the same code. You're just doing parallel runs or something, I don't know.Ivan [00:28:05]: Yeah. So you'll have multiple Depends on the like for each run, you'll have a snapshot. And they, for the most part, they actually do use our declarative image builder, which is like, “Oh, we, the agent wants these dependencies, these env vars.”Swyx [00:28:17]: These ones, yeah.Ivan [00:28:18]: Yeah, the declarative image builder, it.Swyx [00:28:20]: Which is a very modal like thing that they.Ivan [00:28:22]: Yeah. And so we build it on the fly and then we propagate that snapshot, and you can spin up as many sandboxes as you want against that snapshot. And then if you have to do changes, the model can, or like it could be also be automated. It's like, “Oh, now for the next run, we need to install these things or remove these things or whatever to get, a task done,” and then it goes off and runs that. So yes, that is something that it seems that they prefer. The number one reason I found, or should I say, let's take a step back. What we are competing against in that environment is essentially managed Kubernetes. So EKS, GKE, whatever. That is what the vast majority run on. And anyone that has tried Daytona versus GKE, EKS is like, “I'm never going back.” That has always been. There's a few reasons. One is the ergonomics. So if you have, if you're using Kubernetes to spin that up, you have to essentially manage the interface interactions with that. Daytona, although as a compute provider, it's more akin to a Twilio and Stripe from a consumption perspective than it is an AWS. Like you have an API, an SDK, it's quite like easy and seamless to get these things up and running, that's one. The other is the speed to which we spin up, which we mentioned earlier, which is much faster, and the scale to which we can go to. We haven't got into features, but an interesting feature is that it's very hard to OOM, or out of memory, our sandboxes, because we can dynamically on the fly.Swyx [00:29:48]: Resize.Ivan [00:29:49]: Resize, which is like impossible on almost any other thing. There are some technologies that enable you to do that, but it's like a very hard thing. And so we actually saw this when, the Terminal Revenge team is, brought us actually. So thank you, Alex and the team, that brought us into this whole space.Swyx [00:30:05]: It's just very rare that, a framework would just say, “Guys, just use Daytona.”Ivan [00:30:11]: Yeah, I think it says it somewhere. Yeah.Swyx [00:30:13]: Yeah. I was like, “What is this?”Ivan [00:30:15]: There's all, there's multiple there, but they also mention a few other places. and so Daytona specifically-We have, the, just jumping on themes here We, I don't know where it says Data Center.Swyx [00:30:27]: I, there.Ivan [00:30:27]: Doesn't matter.Swyx [00:30:28]: There's a very strong recommendation, which is, very unusual. Which is, it's.Ivan [00:30:33]: We do not pay them for this, just.Swyx [00:30:34]: I know, yeah. They just like you.Ivan [00:30:35]: Yeah, they like us. yeah, and also a thing, so, Data Center has multiple isolation sets underneath. The customer doesn't have to know what they are. But basically we have Docker, which is a container, that's hardened with Sysbox. So it's Docker's, isolation that is a security equivalent to a VM, but it's still a container. And that is the default, and they, especially in these training workloads, really like that as an interface to be able to use just a basic Docker container, and we enable Docker and Docker. Which for these RL runs, if you need to do a Docker compose or Kubernetes, you can spin up a K3S inside of these things, which unlocks a huge amount of workloads that you can do that you cannot do on other providers. So just on that part is much more interesting. And so we went that, through that. We showed them that we could do that, and they enjoyed that quite a bit. They being the general venture people.Swyx [00:31:28]: Those people, yeah.Ivan [00:31:29]: And Harbor people.Swyx [00:31:29]: Harbor people, do are they, are they a company yet?Ivan [00:31:33]: As far, I do not know.Customer Pull, Slack Connect, and the Computer Use BetSwyx [00:31:35]: Okay. All right. Yeah. It's like super obvious that like, there's a lot of excitement and success around these things, okay, so yeah, tell us more, right? Like, this is an exploding workload, Harbor adopted you, which helped speed things along. But what are you learning as this new workload comes online?Ivan [00:31:53]: There's a couple things that we learned, which we chat about in the beginning. We, and this has led our story, as we mentioned, we like talked to a lot of customers along the way, and we add more features and more tool sets as we talk to customers. And it's interesting that And I think it's that the ecosystem is so small and/or the models get smarter, where when we see one user come with a request, we know it goes on a roadmap if like three to five customers come with the same request in that week. It's like very bizarre. It happens so many times, which is.Swyx [00:32:27]: Because they're all friends.Ivan [00:32:28]: Sorry?Swyx [00:32:28]: They all, they're all friends. They're all in the same group chat.Ivan [00:32:30]: Yeah, probably, yeah. ‘Cause and they're like, “Oh, can you do this?” And I'm like, “Okay, this is interesting. We'll put it on a feature request.” And then the next one's like, “Oh, can you do this?” “Okay.” It's all the same, right? It's always the same. And so what we try to do, and I personally try to do, I try to be on as many call, quote-unquote “sales calls” I can. I'm in every Slack channel. We literally have about 1,000 Slack Connect channels, something like that. It's an interesting, there's so many interesting things you find out when you have all the Slack channels. You can also see where people, transfer between companies. You see leave Slack channel, enter Slack channel. It's an interesting thing. Also, just I digress, I feel that Slack Connect is literally LinkedIn what it should be. You have a list.Swyx [00:33:08]: LinkedIn charges you to, use your own connections, but Slack doesn't, right? Slack is like, do it for free. It's more lock-in. It's great.Ivan [00:33:15]: Yeah. It's amazing. Yeah. It's one of the reasons.Swyx [00:33:17]: You're gonna pay Slack for life.Ivan [00:33:18]: Exactly. You're there for life. So that's interesting. And so one of the things, the newer things we were talking about earlier is we made a big bet and put a lot of investment on computer use. that is not seen publicly the light of day. We haven't GA'd that yet, but we have.Swyx [00:33:32]: Is there a thing I can pull up?Ivan [00:33:33]: There is computer use there. It's right up a bit.Swyx [00:33:36]: Oh, yeah. Okay.Ivan [00:33:38]: What we have, what we talked about and what we've seen publicly is there's this theme now about, the human emulator where And Elon from XAI has talked about this publicly, and if you think about the models today, they're actually quite sophisticated and they can do a lot of work, but they still don't have access to all the tools. Like, I'm a strong believer that the most efficient way for an agent to work is essentially headless or through, terminal or whatnot. But if we, if we look at knowledge work in general, there's about 100 million knowledge workers in the US, about a billion in the world, and knowledge workers, and the salaries of them aggregate to 10 trillion in the US 50 trillion worldwide.Swyx [00:34:24]: Wow.Ivan [00:34:25]: Something like that. And if we look at, the five most important sectors of that, so like healthcare and government and financial services and whatnot, that's about 56% of that. So let's say it's about half of that. So in the US it's about 25 trillion, and most of them, most of that work is actually still locked into legacy apps inside of Windows, which is not going anywhere for a very long time. Like, people just won't invest in that. How much of it? our assumption is the following: if, in the RPA market, which is similar market, well, not the same 25% of, these white collar, workers', work is automated. If an agent is more sophisticated, can go through more runs, figure stuff out, let's say it's, 40%, right? And so if you take 40% of that, you get to essentially, $10 trillion a year.Swyx [00:35:17]: That's a TAM.Ivan [00:35:18]: That is a that is a TAM. So that's the TAM of the models, right? That's not our, essentially ours. But you get to that size, and to be able to do that, you essentially have to give agents these computers with the legacy. So computer use, either Mac or Windows or Linux. Linux we also obviously have and others have. But Windows specifically is something very new, and the only option right now is an EC2 with, Windows or on Azure. Both of them take anywhere from three to five minutes to spin up. We've created an actual sandbox, so it's a second instead of milliseconds, but you have, point in time snapshots, you have, forking, you have all the things that you have from a sandbox, but essentially enables you to hopefully unlock all this value. And so that's been our big push and bet, but we've sort of, kept our ear to the ground. What is sort of the next things in the market?RPA Returns: Why Agents Still Need ComputersSwyx [00:36:06]: Yeah, knowledge work, and building, and sort of RPA, the next wave of RPA. I got very excited about RPA kind of during COVID times. The UI path was IPO-ing. And it was, a very hot Isn't it, Eastern European?Ivan [00:36:20]: It is, Romanian.Swyx [00:36:21]: Romanian?Yeah, it might be the only Romanian, big unicorn okay, yeah. This I don't I don't, I don't have like a I think there's, I think there's a stage being set for the resurgence of RPA, ‘cause everyone understands that, yeah, no one wants to deal with these shitty apps and no one's gonna rewrite them. Like, you just have to do, a remote operation and programmatic operation of them.Ivan [00:36:45]: If you wanna unlock it, my own setup was basically the following. So I was doing a board deck recently, last month, whatever, and I'm like, “Okay, let's just, let's just do automated.” So, all our data's in, ClickHouse and PostHog and QuickBooks, where everyone else's is, and I'm basically, connected that all to, my Cloud code, like go off and go Cloud code whatever. Go off and, here's the integrations, go do that. It pulled out the first report, which was great. It connected to Brex and all these things, pulled it, which was great, and then I say, “Okay, now pull out this, and this,” and I kept getting, really well McKinsey-style design reports, but the data said partial data. all the missing data, partial data. Like, it can't access all the things, and I got so frustrated, and so I got, I got, my Mac Mini virtual sandbox with OpenClaw. I gave it its own account in our company, and then I went to all these services and created a read-only account, so literally like an intern in your company. And so I would say, “Now go and do this report,” and it would get the same, or like, “I can't via the MCP or the API or whatever. I can't get all the information.” I'm like, “Go log in.” And it will log into the website, then go in, export the data. It'll export the data and do the thing end to end. So even for things that have today APIs, not all of it is exposed, and I to get value, I get immense value right now, but it has to be a computer usage, unfortunately, and so I spend a bunch of tokens just on that, but I get the job done. And so if even a startup like ours, and using all the hottest tools, still needs a computer agent what hope does, Goldman have to have a headless, right?Swyx [00:38:22]: Yeah, what a - Why isn't Microsoft doing this?Ivan [00:38:27]: I'm pretty sure, Satya had a post yesterday.Swyx [00:38:29]: Oh, okay. I see.Ivan [00:38:29]: Which was like, “Every agent needs a computer.”Swyx [00:38:31]: I see, I see.Ivan [00:38:32]: So they have launched something recently.Swyx [00:38:34]: Yeah, they have Microsoft Power Automate, I'm sure, I'm sure, they're gonna have their version.macOS Sandboxes, Apple Constraints, and the Windows OpportunityIvan [00:38:39]: Version of that, yeah.Swyx [00:38:39]: You're gonna try to do yours, and it - I always know there's always demand for Mac, but I know it's, tricky to host, macOS sandboxes.Ivan [00:38:49]: We will have macOS sandboxes fairly soon. The problem with macOS, OS sandboxes is, I'm deep in this, I don't know how much interesting is.Swyx [00:38:55]: No, it's.Ivan [00:38:56]: MacOS has this problem.Swyx [00:38:57]: It's a licensing thing, right?Ivan [00:38:58]: Licensing thing. So one, you're allowed to run only two parallel VMs per machine, so that's one. Two, you can only license to a different user every 24 hours. So if you come in and theoretically, if I wanna charge you per second and I charge you one second, I have to have it idle for the rest of the day. I can't have anyone else doing that. So the pricing will be different in the sense that I will have to - we would have to charge for 24 hours, and that's not even, that's not even the most difficult thing. But the, thing above that is, from a security perspective, they enable you to do memory snapshot, pause, resume, but only on the same physical drive, physical machine. And so what you can do in, Windows world or Linux world is that I can move in the background, your snapshot from one to the other and manage load, right? Here, if you wanna do that, you essentially have to have your.Swyx [00:39:49]: Yeah, snapshots. Yeah.Ivan [00:39:50]: Your.Swyx [00:39:51]: It's like.Ivan [00:39:51]: Physical machine.Swyx [00:39:52]: You can't break it up.Ivan [00:39:53]: You can't, you can't move things around that, and all of that is, that part is, from a security standpoint, if it is written. Like, I understand the security aspect of that, but it disables you from doing these agentic, like really scalable agentic workloads.Swyx [00:40:08]: You need to do a vibe-coded, clean room implementation on macOS that you can then - That's like Clean OS or something. I don't know.Ivan [00:40:17]: So. We have.Swyx [00:40:18]: ‘cause like Linux was originally like a clean room rewrite of Unix.Ivan [00:40:21]: Okay. Yeah.Swyx [00:40:21]: Or something like that, right? Like same thing to macOS. Someone needs to do it.Ivan [00:40:25]: Someone will do that, and someone will have some long-running agents for a few days to figure this stuff out. But yeah. So definitely we - we're really close to offering something ‘cause people do want it, but the pricing will be different, and the feature set will be sort of stringent.Swyx [00:40:38]: Yeah, nobody's gonna use this. like, the labs, the labs will because they want to automate macOS.Ivan [00:40:42]: They have to do RL. They have to do RL again. But even if you The - So the point is with the RL part, if you, if you do RL on macOS, then the next iteration of the model comes out, it will be able to use these tools significantly. Then you actually need to run those, that somewhere. So you're gonna have to have that, later on. And from, if anyone at Apple is listening, I very much feel that they are shooting themselves in the foot of the scale of the revenue of compute or licensing they could get if they would just enable a concurrency model similar to what you can get on a Windows and a, and Linux.Swyx [00:41:17]: Yeah. Yeah. And I'm sure they've heard this before. They just don't care. Yeah, it's And maybe they will change their mind with the new CEO.Ivan [00:41:24]: Yeah. We'll see.Swyx [00:41:25]: We'll see.Ivan [00:41:25]: High hopes.Swyx [00:41:26]: High hopes.Ivan [00:41:26]: High hopes.Swyx [00:41:27]: Okay. But I, it's very clear the market opportunity is huge in Windows, and you can go for a long time on just Windows, but your customers are gonna want both. and I think, it is interesting to me that, this is the sort of God application of agents, right? Like, I don't It was - How big was OpenClaw for you guys? Like, was it, was there, a significant bump.OpenClaw, Agent Labs, and the B2B2C Sandbox MarketIvan [00:41:54]: Not for us because we.Swyx [00:41:54]: Because you already.Ivan [00:41:55]: We're kind of positioned differently. Whereas although it's completely PLG and we have individual developers that use it, most of the users that use Daytona are sort of a B2B2C. Sort of it's either B2B or B2B2C. So, in the researcher world, it's B2B, so you're selling to, labs and neo labs and things like that. But on the long-running agents, it's mostly, from a scale revenue perspective, it's mostly B2B2C, where you have a app layer agent that uses you at a big scale.Swyx [00:42:26]: Like a Manus. Yeah.Ivan [00:42:28]: Like a Manus Lovable type of thing.Swyx [00:42:31]: Yeah. I think that's the question of, well how, um-Uh, yeah, B2B to C is basically to me what I've been calling an agent lab, which is kind of like you're not in a model lab, but you're making a very good wrapper that is a platform that other people can sign up so they don't have to code those things. Yeah, it sound, it sounds like a much better market than the direct OpenClaw market.Ivan [00:42:56]: I've like - We I've done multiple things. So the CodeAnywhere's part of our career path R in the calendar, was very much an end user developer product. And so that is great. It You can get a lot of developer love, and I feel that we do as a company have a bunch of developer love. But it's a different type, where it's people building these things. Again, it's more akin to a Twilio because you don't really run - As a person, you wouldn't run Twilio. I don't know how many people remember. It was like ask your developer billboard and whatnot. And people really love Twilio, but they only used it inside of like, “Oh, I'm building this app or service for thing.” And so we're very much directly to that. And you also know that I used to work for a competitor for Twilio, so it's kind of ingrained, in my DNA.Swyx [00:43:35]: People don't know InfoBip is that big.Ivan [00:43:38]: Yeah, it's.Swyx [00:43:39]: Because.Ivan [00:43:40]: It's a billion euro.Swyx [00:43:40]: They're all American. They're like, “Whatever's in Europe doesn't matter to me.” But like it's the, it's the same size or bigger? Same size?Ivan [00:43:46]: It's about half the size.Swyx [00:43:47]: Half the size?Ivan [00:43:48]: Yeah, about half the size.Swyx [00:43:48]: It's like, yeah.Ivan [00:43:48]: Still huge. Multiple billions a year. Yes.Swyx [00:43:51]: That's crazy.Ivan [00:43:51]: Exactly, and so that - These are like really interesting and large revenue-generating, very sticky businesses. Whereas when you're selling to the - When your focus is the end developer, it is a very hard sell because they're very price sensitive, very price conscious, very around that. And there's very It's very hard to scale. Your cap is the number of people that are willing to spin up - First of all, wanna spin that up, and then spin up multiple of these. Whereas if you're in the enterprise one, like we know everyone's talking about like how many tokens they're spending, I'm spending. Like a lot of companies today are like, “If this is our company, spend as much as you can.” Like basically that is where we're going. And so if you think about that paradigm, where you're selling to companies that say, “Spend as much as you can to generate, productivity,” versus, “Oh, I'm a single person. I have this much budget, and I'm doing this thing because it's fun or it's helping me out or whatever.” Like it is a different, it's a different go-to-market, I think, strategy.MCP, CLIs, and Sandboxes as the Agent RuntimeSwyx [00:44:50]: Yeah, there's a lot of discussion. I'm just kind of going through like the mental list of things that are in your favor, which is, for example, MCP versus CLI. Like obviously you want CLI. It's been very good for you. I feel like it's maybe a drop in the bucket or maybe it's huge. I'm just checking whether it's like these are big trends.Ivan [00:45:10]: Those things you - work well in our favor, to your point just because every.Swyx [00:45:13]: They're kind of drop in the bucket, right?Ivan [00:45:15]: I think it's like sort of all the things come together. And so there's so many things that impact that. To your point, like OpenClaw wasn't huge for us, but like having the agent SDK, from Anthropic, so or Cloud Claude Code was very interesting. The reason why it was interesting is that a lot of, let's call them app I don't know what to call them, app layer agent companies, essentially they are like, “Oh, I can create this new app, this new agent. All I need, I just use Claude Code, and I throw it into a sandbox, and then I have my interface to the human to that.” And so that enabled so many more companies to actually offer this, and then they would pull on sandbox. So that was, that was interesting. And to your point, like MCP, versus the CLI, the MCP is an interface against an API, whereas the CLI is like you can actually go do things. Like this is it. The difference between integrations and actually running scripts or data or analysis against a thing. So being able to use a CLI very well enables the agent to do more things, and it's because that people will invoke a sandbox, they'll run it in the CLI, and but it'll do anal-analysis on that data and then give you an actual result versus just, pulling data from an API source.Swyx [00:46:29]: Yeah, it's a layer of indirection basically, it's the same thing as agentic search versus RAG, which where you're.Ivan [00:46:34]: Exactly, yeah.Swyx [00:46:34]: Just like you just win whenever people put more agents into their workflow. And so like it doesn't really matter, but I'm just kinda teasing out like what else have people heard about that like it's sort of, “Oh yeah, this is another sandbox use case. Oh yeah, that's another one.” Am I, am I missing any big ones?Ivan [00:46:51]: The thing, the thing that people, which is the computer use stuff, which I think is probably the most interesting one, is, and to your point, we've talked to so many people over the last year. It's like, “Oh, like why do you need a sandbox? Why do you need this? Why this?” And to your point, it's like, “Oh, I need sandbox for this. I need sandbox for that. I need sandbox-” It's like, “Oh, I need it for every single thing.” And so basically what I, what I - and it sounds like a broken record, it's like you use a laptop every single day, right? And you are n of one. It's just you. But now imagine how And by the way, the laptop, the computer PC market, the PC market is about equal to the cloud market in total. So it's about 150, 180 billion a year. Something like that. It's about roughly the three cloud hyperscalers is about equal to like Apple, HP, Lenovo, whatever, It's a little bit less, but it's sort of like that. And now imagine And that's just like, so how big is the addressable market? What, how many people are there in the world now? What's the last data?Swyx [00:47:45]: Let's call it eight billion.Ivan [00:47:46]: Eight billion. And so let's say you can have two computer, like you have one personal and one business, whatever. Like so it's double that, right? and so that's 16 billion, right? How many agents are gonna be running in two years, in 10 years, in 100 years? Like And for every single task, they will need one of these. And so how big is that? That market is essentially quote unquote “infinite”. You will get to the point, and Dylan Patel was at the conference talking about, from SemiAnalysis, that talks usually about GPUs, was also talking about how CPUs will now be a bottleneck because it will be the constraint. You won't be able to grow, or we won't be able to have enough of these because there won't be enough CPUs to basically do.Swyx [00:48:23]: Yeah. Well, I actually had a really good podcast with Doug Oliphant, who, which was his president at SemiAnalysis, where they've basically been like, yeah, it's been a GPU shortage first, but then it's cascaded down to memory and now to CPUs.Ivan [00:48:35]: CPU, yeah.Swyx [00:48:35]: It-What's next? So networking. So, networking actually has been in shortage for a while if you're looking at, just GPU networking. But, yeah, it's really crazy the amount of computer use that's going on, yeah, cool. I, other questions are, just the one very big part is the open sourceness which you didn't have to do, your competitors don't do, like it's not, a lot of people are worried about keeping their projects open source because some competitor can just slot fork it. I don't know if there's any reflections on just being an open source company.Open Source, Trust, and Enterprise ProcurementIvan [00:49:15]: Yeah. There's a bunch. So we the original product that we did was open source.Swyx [00:49:19]: Yeah. CodeAnywhere.Ivan [00:49:20]: So doing that was actually very good for us. There's basically a saying of, What's the saying? Like, companies that are, that are doing really well, measure themselves against, free cashflow, that are kinda okay, it's EBITDA, then, it's, it goes all the way down.Swyx [00:49:36]: The worst is like GitHub stars.Ivan [00:49:37]: GitHub stars. GitHub stars are the worst, yeah. So you go all the way down to GitHub stars. And so our original one was GitHub stars. That's what we talked about, we're at the point we're talking about revenue, so we're we've gone up the stack on that. And so we started.Swyx [00:49:47]: No, profit.Ivan [00:49:48]: Yeah. We haven't, we're, we'll get there. We'll get there. But basically at that point we did stars and GitHub and it was useful, and the original variation that we did, it we split the core into its own repo and it was Apache 2.0, so very, permissive. And then we basically would bundl
Today our Packet Pushers team assembles to discuss whether the grass is greener on the NetOps or DevOps side of the telemetry fence. William of The Cloud Gambit, Scott of Total Network Operations, and Ned and Kyler of Day Two DevOps discuss the difficulties and differences of getting telemetry and state from devices across different... Read more »
“Google Cloud Next - nudno jak nie wiem co.” Szymon o 260 ogłoszeniach, w których słowo agent pada częściej niż litera D. Rebranding Vertexa w Gemini Enterprise Agent Platform brzmi jak generator nazw na pełnych obrotach, ale pod marketingowym szumem są konkrety: TPU Gen 8 z natywnym PyTorch bez przerabiania kodu (“po latach zrozumieli, że nie każdy jest Googlem”) i Apache Iceberg do query'owania danych z innych cloudów.
Amie Wei is a Sr. Solutions Engineer at HashiCorp and was the winner of last year's GKE Turns 10 Hackathon. It was Amie's first time entering a hackathon and she ended up bringing the prize home with a Cart-To-Kitchen AI Assistant. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com News of the week Kubernetes 1.36 codename Haru is here LLM-d is a CNCF Sandbox project CNCF Certification Advancement & Recertification Experience program (CARE) Agones is a CNCF Sandbox project Links from the interview Amie Wei on LinkedIn GKE turns 10 Hackathon Hackathon winner announcement Online boutique sample Bank of Anthos sample The Cart-to-Kitchen AI Assistant on GKE
Eyvonne and William sit down with Joseph Nicholson, a Network Operations Engineer with NTT DATA, to share how public speaking transformed his career and technical experience. Joseph went from a terrifying ten minute lightning talk at AutoCon 2 to presenting 45-minute sessions at conferences like NANOG. Together they discuss how conversations in conference halls influenced... Read more »
Eyvonne Sharp and William Collins speak with Sif Baksh, Principal Solutions Architect at Tines, to discuss the power of automation. Sif shares some personal stories of how he has been able to use automation to innovate and modernize networking operations. They also discuss the importance of learning AI and using it as a tool, how... Read more »
At KubeCon Europe, Google Cloud's Jago Macleod and Abdel Sghiouar argued that adopting Arm for Kubernetes workloads has shifted from a complex migration to a practical, low-friction choice. After a year of production use, Google's custom Arm-based Axion processors—powering C4A and N4A instances—are positioned as broadly viable for most containerized applications, offering strong gains in performance, cost efficiency, and energy usage compared to x86. Rather than requiring a full overhaul, moving to Arm typically involves recompiling containers for a multi-architecture target and gradually rolling out via Kubernetes practices like canary deployments. While edge cases exist, they are relatively uncommon. A key enabler is GKE's compute classes, which allow workloads to express preferences across VM types, turning infrastructure decisions into automated scheduling choices rather than manual provisioning. Ultimately, the conversation points to a larger constraint: energy. As AI workloads grow, efficiency—measured in “tokens per watt”—is emerging as the defining metric, with cost savings translating directly into greater compute capacity. Learn more from The New Stack about the latest developments around Google's work with Axion: Arm: See a Demo About Migrating a x86-Based App to ARM64 Do All Your AI Workloads Actually Require Expensive GPUs? Join our community of newsletter subscribers to stay on top of the news and at the top of your game.
Vibe coding: give AI a description of what you want, the model writes the code, you ship it, and then you hope for the best. It works great for side projects, but it can fall apart the moment you point an AI agent at production infrastructure. Today, William and Eyvonne sit down with John Capobianco,... Read more »
William Collins and Eyvonne Sharp invite Skylar Sands, Senior Automation Engineer at World Wide Technology, to discuss what it means to integrate AI into the daily workflow in a meaningful way. Together they break down the shift in the automation engineer's role now that AI can instantly generate the “toolkit” of Python, Ansible, and Bash,... Read more »
In this sponsored episode, FluidCloud co-founders Sharad Kumar and Harshit Omar sit down with William and Eyvonne to discuss how FluidCloud tackles multi-cloud portability. They detail how FluidCloud acts as a cloning platform that scans an existing cloud or VMware environment, extracts complex infrastructure configurations (including compute and storage, as well as firewall rules and... Read more »
The tech industry is split between two fantasies – that AI writes production software while you get coffee, and that everything AI touches is slop. The reality is messier and more interesting: AI tools are force multipliers for people who already know what good looks like, and an expertise amplifier disguised as an easy button. ... Read more »
William and Eyvonne tackle the biggest AI stories of early 2026. They dissect Matt Schumer’s viral “Something Big is Happening” essay – agreeing professionals need to skill up now while pushing back on the doomsday framing with real-world examples from engineering disciplines. The conversation takes a fascinating turn as Eyvonne draws a parallel between AI-assisted... Read more »
We’ve spent a decade figuring out how to (more or less) securely authenticate humans. Now AI agents are crashing the party, and identity just got a whole lot more complicated. Today we sit down with Dan Moore, Senior Director of CIAM Strategy and Identity Standards at FusionAuth, to explore the collision course between artificial intelligence... Read more »
In this episode, we sit down with Adam Zimman, author and VC advisor, to explore the world of progressive delivery and why shipping software is only the beginning. Adam shares his fascinating journey through tech—from his early days as a fire juggler to leadership roles at EMC, VMware, GitHub, and LaunchDarkly – and how those... Read more »
The industry has pivoted from scripting to automation to orchestration – and now to systems that can reason. Today we explore what AI agents mean for infrastructure with Chris Wade, Co-Founder and CTO of Itential. We also dive into the brownfield reality, the potential for vendor-specific LLMs trained on proprietary knowledge, and advice for the... Read more »
In this year-end episode, William and Eyvonne recap their experiences at AutoCon 4 in Austin, Texas. They discuss the conference’s new multi-track format, including Eyvonne’s presentation in the leadership track on why technical projects fail. The conversation dives into how AI tools like Google Gemini can augment – not replace – human creativity, from research... Read more »
In this sponsored episode recorded live at AutoCon 4 in Austin, we sit down with Peter Sprygada, Chief Architect at Itential, to discuss Itential’s on-stage announcement of FlowAI. Peter shares his journey from network engineering skeptic to AI advocate, explaining how Itential securely connects AI agents to infrastructure with enterprise-grade governance and traceability. We dive... Read more »
Recorded live at AutoCon4, William Collins and Eyvonne Sharp join forces with John Capobianco for some in the moment thoughts and reflections on the AutoCon experience – from the in-person connections to the workshops to the stage presentations. John gives us the inside story on his very own workshop and the latest version releases in... Read more »
Today we delve into the tech expertise deficit and why technical depth and decades of doing the work matter more than social media followers and content creation hype. Our guest is Russ White, engineer, author, teacher, and certification developer. We begin with current events in AI, and then investigate the differences between career and influence... Read more »
Join William Collins and Evyonne Sharp as they catch up on all things AI. They discuss the AI bubble and how it relates to venture capital, stock, and company evaluations. They talk about the AI experience for the average person, the adoption rate of AI tools, and how the AI infrastructure buildout might affect the... Read more »
GKE turned 10 in 2025! In this episode, we talk with GKE PM Gari Singh about GKE's journey from early container orchestration to AI-driven ops. Discover Autopilot, IPPR, and a bold vision for the future of Kubernetes. Do you have something cool to share? Some questions? Let us know: web: kubernetespodcast.com mail: kubernetespodcast@google.com X: @kubernetespod bluesky: @kubernetespodcast.com News of the week Cloud Native Computing Foundation Announces Knative's Graduation llm-d 0.3: Wider Well-Lit Paths for Scalable Inference vllm-project/semantic-router on github Announcing the Certified Meshery Contributor (CMC) Introducing Headlamp Plugin for Karpenter - Scaling and Visibility Links from the interview Kelsey Hightower's Kubernetes the Hard Way MiniKube Kind Docker Compose Docker Swarm GKE Autopilot Dynamic Resource Allocation (DRA) Google Cloud TPUs Node Auto Provisioning (GKE) Jax (Machine Learning Framework) Horizontal Pod Autoscaling (HPA) Serverless on Google Cloud Grafana Prometheus Kubectl-ai Vertical Pod Autoscaler (VPA) Kubernetes v1.33: In-Place Pod Resize Graduated to Beta In-place Vertical Scaling of Pods - Resize CPU and Memory Resources assigned to Containers GKE under the hood: Container-optimized compute delivers fast autoscaling for Autopilot
Michael Reid went from studying aerospace engineering to becoming CEO at Megaport, a global network-as-a-service platform. How did he get there, and what can we learn from his journey? We walk his career path, including a pivotal role scaling ThousandEyes from 74 million to over 2.4x ARR post-acquisition, and how those experiences shaped his approach to... Read more »
Network automation has a data problem. Traditional tools may hit limitations when managing complex infrastructure relationships. We explore how OpsMill’s InfraHub uses graph databases and temporal versioning to create what our guest calls “the knowledge graph of infrastructure” – enabling true version control at the database level while maintaining the flexibility to model anything from... Read more »
Shannon Kularathna is a technical writer working on the GKE docs. He contributes regularly to the upstream Kubernetes documentation. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com News of the week Reddit Post KubeCon NA KubeCon NA Colo events Jeager v2 Kubernetes 1.34 on GKE Links from the interview Shannon Kularathna Kubernetes SIG Docs Kubernetes Documentation Working in Public by Nadia Eghbal Kubernetes Code of Conduct SIG Contributor Experience Kubernetes Blog Join the Kubernetes Slack Kubernetes New Contributor Orientation Events SIG Docs Meetings Kubernetes/website GitHub Repository Good First Issues for kubernetes/website Help Wanted Issues for kubernetes/website
In this deep dive episode, we explore the evolution of networking with Avery Pennarun, Co-Founder and CEO of Tailscale. Avery shares his extensive journey through VPN technologies, from writing his first mesh VPN protocol in 1997 called “Tunnel Vision” to building Tailscale, a zero-trust networking solution. We discuss how Tailscale reimagines the OSI stack by... Read more »
John Capobianco is back! Just months after our first Model Context Protocol (MCP) discussion, John returns to showcase how this “USB-C of software” has transformed from experimental technology to an enterprise-ready solutions. We explore the game-changing OAuth 2.1 security updates, witness live demonstrations of packet analysis through natural language with Gemini CLI, and discover how... Read more »
Vyom Yadav is a software engineer in the security team at Canonical and a member of the Kubernetes Security Response Committee. We talked about the new Release theme and what major updates, deprecations and removals to expect in this version. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com News of the week GKE 10 years Hackathon Golden Kubestronauts 100 members Open Policy Agent (OPA) joined Apple DocumentDB joined the Linux Foundation Solo.io donated the agentgateway to the Linux Foundation Kubecrash.io: A platform Eng conference with a purpose Links from the interview Vyom Yadav Kubernetes 1.34 sneak peak
In this unplanned and unfiltered conversation, we dive deep into network automation realities with Ivan Pepelnjak, networking’s long standing and independent voice from ipSpace.net. We explore why automation projects fail, dissect the tooling landscape (Ansible vs. Terraform vs. Python), and discuss the cultural barriers preventing enterprises from modernizing their networks. Ivan delivers hard truths about... Read more »
Guests are Clayton Coleman and Rob Shaw. Clayton is a Core contributor to Kubernetes, the containerized cluster manager, and founding architect for OpenShift, the open source platform as a service. Clayton helped launch the shift to cloud native applications and the platforms that enable them. At Google my mission is to make Kubernetes and GKE the best place to run workloads, especially accelerated AI/ML workloads, and especially especially very large model inference at scale with the inference gateway and llm-d. Rob Shaw is an Engineering Director at Redhat and is a contributor to the vLLM project. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com News of the week Kubernetes 1.34 is expected to release end of August Kubecrash.io: A platform Eng conference with a purpose CNCF top 30 project of 2025 Links from the interview LLM-D KubeCon EU 25 Keynote: LLM-Aware Load Balancing in Kubernetes WG Serving vLLM Disaggregated Prefilling LWS: LeaderWorkerSet
Welcome to episode 315 of The Cloud Pod, where the forecast is always cloudy! Your hosts, Justin and Matt, are here to bring you the latest in cloud and AI news, including news about AI from the White House, the newest hacker exploits, and news from CloudWatch, CrowdStrike, and GKE – plus so much more. Let's get into it! Titles we almost went with this week: SharePoint and Tell: Government Secrets at Risk Zero-Day Hero: How Hackers Found SharePoint’s Achilles’ Heel Amazon Q Gets an F in Security Class Spark Joy: GitHub’s Marie Kondo Approach to App Development No Code? No Problem! GitHub Lights a Spark Under App Creation GKE Turns 10: Still Not Old Enough to Deploy Itself A Decade of Containers: Pokémon GO Caught Them All Kubernetes Engine Hits Double Digits, Still Can’t Count Past 9 Pods Account Names: The Missing Link in AWS Cost Optimization Flash Gordon Saves Your VMs from the Azure-verse The Flash: Fastest VM Monitor in the Multiverse Ctrl+AI+Delete: Rebooting America’s Artificial Intelligence Strategy The AImerican Dream: White House Plots Path to Silicon Supremacy CrowdStrike’s Year of Living Resiliently Kernel Panic at the Disco: A Recovery Story The Search is Over (But Your Copilot License Isn’t) Ground Control to Major Tom: You’re Fired GPU Booking.com: Reserve Your Neural Network’s Next Vacation Calendar Man Strikes Again: This Time He’s Scheduling Your TPUs AirBnB for AI: Short-Term Rentals for Your Machine Learning Models Claude’s World Tour: Now Playing in Every Region Going Global: Claude Gets Its Passport Stamped on Vertex AI SQS Finally Learns to Share: No More Queue Hogging The Noisy Neighbor Gets Shushed: Amazon’s Fair Play for Queues CloudWatch Gets Its AI Degree in Observability Teaching Old Logs New Tricks: CloudWatch Goes GenAI The Agent Whisperer: CloudWatch’s New AI Monitoring Powers NotebookLM Gets Its PowerPoint License Slides, Camera, AI-ction: NotebookLM Goes Visual The SSL-ippery Slope: Azure’s Managed Certs Go Public or Go Home Breaking Bad Certificates: DigiCert’s New Rules Leave Some Apps High and Dry Firewall Rules: Now with a Rough Draft Feature Azure’s New Policy: Think Before You Deploy General News 00:50 Hackers exploiting a SharePoint zero-day are seen targeting government agencies | TechCrunch Microsoft SharePoint servers are being actively exploited through a zero-day vulnerability (CVE-2025-53770), with initial attacks primarily targeting government agencies, universities, and energy companies, according to security researchers. The vulnerability affects on-premises SharePoint installations only, not cloud versions, with researchers identifying 9,000-10,000 vulnerable instances accessible from the internet that require immediate patching or disconnection. Initial exploitation appears t
Today we explore how to build sustainable tech companies with Brian Pontarelli, Founder of FusionAuth. Brian shares his path from early programming on an Apple IIe to creating innovative solutions in the complex world of customer identity and access management (CIAM). Brian argues that single-tenancy and local development capabilities are crucial for developers. He also... Read more »
How is Infrastructure-as-Code (IaC) evolving? How does user experience fit in? Today on The Cloud Gambit, Cory O'Daniel, Co-Founder & CEO of Massdriver, lends his experience as a coder, architect, and founder to help us answer these questions. Cory also discusses what it was like building and funding a startup in the 2021-2022 market, the... Read more »
The Cloud Gambit is joining the Packet Pushers network! Launched in 2023 as an independent podcast, The Cloud Gambit cuts through the hype to deliver what actually matters in cloud and AI. Hosts William Collins and Eyvonne Sharp decode the strategies, technologies, and market forces reshaping enterprise infrastructure. Built for engineers who lead, leaders who... Read more »
Send us a textApril 2025 news. A lot of news for you, dear listener, from Google, AWS and AzureTakeaway by the aiThe FinOps News podcast targets hardcore Phenops enthusiasts.Conflict can lead to better team dynamics and outcomes.Azure's VM hibernation feature offers cost-efficient workload management.Amazon EC2 introduces high-performance storage optimized instances.Bare metal instances provide significant performance improvements.Prompt optimization in Amazon Bedrock enhances AI model performance.AWS Database Migration Service now supports automatic storage scaling.Cloud gaming may benefit from new GPU instance offerings.The importance of feedback in improving cloud services is emphasized.The podcast aims to provide in-depth insights into cloud technology. Amazon S3 has significantly reduced its storage and request prices.Google Cloud's FinOps Hub 2.0 offers new tools for cost management.GKE now provides insights to optimize resource requests and limits.Azure AKS cost recommendations help identify savings opportunities.Google Cloud's backup services now support DB2 databases.Amazon Redshift introduces serverless reservations for cost predictability.AWS CodeBuild enhancements allow for better resource configuration.Microsoft Cost Management has improved export functionalities.Microsoft Copilot in Azure offers tailored prompts for cost analysis.Azure Static Web Apps will discontinue dedicated pricing plans.
In this episode, we're bringing you a curated selection of conversations from the KubeCon EU 2025 showfloor. We'll be diving into the rise of platform engineering, exploring some cutting-edge technologies, getting updates on core Kubernetes components, and hearing some truly unique user stories, like using Kubernetes on a dairy farm! Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod - bluesky: @kubernetespodcast.com News of the week CNCF Blog - Announcing the Automated Governance Maturity Model Kubernetes Blog CNCF Blog - Understanding Kubernetes Gateway API: A Modern Approach to Traffic Management Open Observability Summit Links from the interview NAIS at NAV, with Hans Kristian Flaatten and Audun Fauchald Strand Audun Fauchald Strand Hans Kristian Flaatten NAV (Norwegian Labor and Welfare Administration) Kubernetes Podcast 216: NAIS, with Johnny Horvi and Frode Sundby NAIS KubeCon EU 2025 Keynote: Adventures of Building a Platform as a Service for the Government - Hans Kristian Flaatten, Lead Platform Engineer, NAV & Audun Fauchald Strand, Principal Software Engineer, NAV GKE release notes Platform Engineering, with Max Körbächer and Andreas (Andi) Grabner Max Körbächer Andreas (Andi) Grabner Book: “Platform Engineering for Architects: Crafting modern platforms as a product” by Max Körbächer, Andreas Grabner, and Hilliary Lipsig Cloud Native Summit Munich Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani Ahmet Alp Balkan Ronak Nathani Kubernetes Podcast 249: Kubernetes at LinkedIn, with Ahmet Alp Balkan and Ronak Nathani Ahmet's Blog Introducing Multi-Cluster Orchestrator: Scale your Kubernetes workloads across regions LLMs on Kubernetes, with Mofi and Abdel KubeCon EU 2025 talk: Yes You Can Run LLMs on Kubernetes - Abdel Sghiouar & Mofi Rahman, Google Cloud About the Gateway API Gateway API Inference Extension Deploy GKE Inference Gateway SIG etcd with Ivan Valdes Ivan Valdes etcd.io SIG etcd on GitHub Open Source Kubernetes, with Jago Macleod Jago Macleod Google Open Source: Kubernetes Schedmd Slurm Ray Run:ai from Nvidia Medium blog: “Deploy Slurm on GKE” by Abdel Sghiouar AI-Hypercomputer, xpk XPK (Accelerated Processing Kit, pronounced x-p-k) is a command line interface that simplifies cluster creation and workload execution on Google Kubernetes Engine (GKE). XPK generates preconfigured, training-optimized clusters and allows easy workload scheduling without any Kubernetes expertise. Cursor AI Editor Dairy Farm Automation & Banking with Kubernetes, with Clément Nussbaumer Clément Nussbaumer Talos Linux Cluster-api Cluster API is a Kubernetes subproject focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. KubeCon EU 2025 Talk: “Day-2'000 - Migration From Kubeadm+Ansible To ClusterAPI+Talos: A Swiss Bank's Journey” - Clément Nussbaumer, PostFinance Kubeadm Kubeadm is a tool built to provide kubeadm init and kubeadm join as best-practice "fast paths" for creating Kubernetes clusters. Being a First-Time KubeCon Attendee, with Nick Taylor Kubernetes The Hard Way K3s - “The certified Kubernetes distribution built for IoT & Edge computing” Kubernetes Ingress Controllers Kubernetes Up and Running Kubernetes Docs KubeCon EU 2025 Sponsored Keynote: The Science of Winning: Oracle Red Bull Racing's Formula with Open Source, Kubernetes and AI - Sudha Raghavan, SVP of OCI Developer Platform, Oracle
At Google Cloud Next, Bobby Allen, Group Product Manager for Google Kubernetes Engine (GKE), emphasized GKE's foundational role in supporting AI platforms. While AI dominates current tech conversations, Allen highlighted that cloud-native infrastructure like Kubernetes is what enables AI workloads to function efficiently. GKE powers key Google services like Vertex AI and is trusted by organizations including DeepMind, gaming companies, and healthcare providers for AI model training and inference. Allen explained that GKE offers scalability, elasticity, and support for AI-specific hardware like GPUs and TPUs, making it ideal for modern workloads. He noted that Kubernetes was built with capabilities—like high availability and secure orchestration—that are now essential for AI deployment. Looking forward, GKE aims to evolve into a model router, allowing developers to access the right AI model based on function, not vendor, streamlining the development experience. Allen described GKE as offering maximum control with minimal technical debt, future-proofed by Google's continued investment in open source and scalable architecture.Learn more from The New Stack about the latest insights with Google Cloud: Google Kubernetes Engine Customized for Faster AI WorkKubeCon Europe: How Google Will Evolve Kubernetes in the AI EraApache Ray Finds a Home on the Google Kubernetes EngineJoin our community of newsletter subscribers to stay on top of the news and at the top of your game.
Welcome to episode 300 of The Cloud Pod – where the forecast is always cloudy! According to the title, this week's show is taking place inside of a Dr. Suess book, but don't despair – we're not going to make you eat green eggs and ham, but we WILL give you the low down on all things Vegas. Well, Google's Next event which recently took place in Vegas anyway. Did you make any Next predictions? Titles we almost went with this week: This is the CLOUDPOD Episode 300 Tonight we dine in the Cloud The Next Chapter Now in Preview: Episode 300 A big thanks to this week's sponsor: We're sponsorless! Want to get your brand, company, or service in front of a very enthusiastic group of cloud news seekers? You've come to the right place! Send us an email or hit us up on our slack channel for more info. GCP Pre-Next 02:35 Google shakes up Gemini leadership, Google Labs head taking the reins There was a lot of Gemini news at Next – but we'll get to all that. In this particular case, there's an employee shakeup. Sissie Hsiao is stepping down from leading the Google team, and is being replaced by Josh Woodward, who is currently leading the Google Labs. 04:35 Filestore instance replication now available GCP says customers have been asking for help in meeting business and regulatory goals, and so they are releasing Filestore instance replication. This new feature offers an efficient replication point objective (RPO) that can reach 30 minutes for data change rates of 100 MB/sec. 05:16 Multi-Cluster Orchestrator for cross-region Kubernetes workloads The public preview of Multi-Cluster Orchestrator was recently announced. This lets platform and application teams optimize resource utilization, enhance application resilience, and accelerate innovation in complex, multi-cluster environments. The need for effective multi-cluster management has become essential as organizations increasingly use Kubernetes to deploy and manage their applications; Challenges such as resource scarcity, ensuring high availability, and managing deployments across diverse environments create significant operational overhead. Multi-Cluster Orchestrator addresses these challenges by providing a centralized orchestration layer that abstracts away the complexities of underlying Kubernetes infrastructure matching workloads with capacity across regions. 06:26 GKE at 65,000 nodes: Evaluating performance for simulated mixed AI workloads Recently GKE announced it can now support up to 65,000 nodes (up from 15,000.) Saint Carrie be with your CFO. 09:15
Ever tried solving DNS security across a multi-cloud, multi-cluster Kubernetes setup? In this episode recorded live at KubeCon, Ashish chats with Nimisha Mehta and Alvaro Aleman from Confluent's Kubernetes Platform Team.Together, they break down the complex journey of migrating to Cilium from default CNI plugins across Azure AKS, AWS EKS, and Google GKE. You'll hear:How Confluent manages Kubernetes clusters across cloud providers.Real-world issues encountered during DNS security migration.Deep dives into cloud-specific quirks with Azure's overlay mode, GKE's Cilium integration, and AWS's IP routing limitations.Race conditions, IP tables, reverse path filters, and practical workarounds.Lessons they'd share for any platform team planning a similar move.Guest Socials: Alvaro's Linkedin + Nimisha's Linkedin Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels:-Cloud Security Podcast- Youtube- Cloud Security Newsletter - Cloud Security BootCampIf you are interested in AI Cybersecurity, you can check out our sister podcast - AI Cybersecurity PodcastQuestions asked:(00:00) Introduction(01:55) A bit about Alvaro(02:41) A bit about Nimisha(03:11) About their Kubecon NA talk(03:51) The Cilium use case(05:16) Using Kubernetes Native tools in all 3 cloud providers(011:41) Lessons learnt from the projectResources spoken about during the interviewConfluent's Multi-Cloud Journey to Cilium: Pitfalls and Lessons Lea... Nimisha Mehta & Alvaro Aleman
Send us a textWhat happens when you get Eyvonne, William, and our special guest Nick Eberts in the same conversation? You get a GKE party! In this episode, we dive deep into the world of multi-cluster Kubernetes management with Nick Eberts, Product Manager for GKE Fleets & Teams at Google. Nick shares his expertise on platform engineering, the evolution from traditional infrastructure to cloud-native platforms, and the challenges of managing multiple Kubernetes clusters at scale. We explore the parallels between enterprise architecture and modern platform teams, discuss the future of multi-cluster orchestration, and unpack Google's innovative work with Spanner database integration for GKE. Nick also shares his passion for contributing to open source through SIG Multi-Cluster and provides valuable guidance for those interested in getting involved with the Kubernetes community.Where to Find Nick EbertsLinkedIn: https://www.linkedin.com/in/nicholasebertsTwitter: https://twitter.com/nicholasebertsBluesky: @nickeberts.devShow LinksSIG Multi-Cluster: https://github.com/kubernetes/community/tree/master/sig-multiclusterGoogle Kubernetes Engine (GKE): https://cloud.google.com/kubernetes-engineSpanner Database: https://cloud.google.com/spannerKubernetes: https://kubernetes.io/KubeCon: https://events.linuxfoundation.org/kubecon-cloudnativecon-north-america/Argo CD: https://argoproj.github.io/cdFlux: https://fluxcd.io/CNCF: https://www.cncf.io/Follow, Like, and Subscribe!Podcast: https://www.thecloudgambit.com/YouTube: https://www.youtube.com/@TheCloudGambitLinkedIn: https://www.linkedin.com/company/thecloudgambitTwitter: https://twitter.com/TheCloudGambitTikTok: https://www.tiktok.com/@thecloudgambit
Guests are Maciej Rozacki, Product Manager on GKE for AI Training, and Wojciech Tyczyński, Software Engineer on the GKE team at Google. We explore what it means for GKE to support 65k nodes, and the open source contributions that made this possible Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week The Kubernetes Podcast is on Bluesky OpenTelemetry expanding into CI/CD observability Gitpod is moving away from Kubernetes OpenCost is a CNCF Incubated project Links from the interview Guests: Maciek Wojciech Kubernetes OSS Scalability thresholds PGS on the Kubernetes Podcast Batch Working Group Serving Working Group episode on the podcast Dynamic Resource Allocation Kueue Multitenancy and Fairness at Scale with Kueue SIG Scalability Links from the post-interview chat Consistent Reads from Cache Kubernetes Scalability: A Multi-Dimensional Analysis
In this episode, Frank and Steve discuss the July 2024 FinOps announcements from AWS, Azure, and Google Cloud. They cover topics such as committed use discounts, fully licensed commitments, hibernation for VMs, cost management features, and service discontinuations. The conversation is filled with insights and analysis of the latest developments in the cloud industry.There are new machine types available on GCP, such as the C3 bare metal machine types and the sixth generation Intel-based VMs.Serverless compute options, like Azure Databricks and OpenSearch serverless, provide efficient and cost-effective ways to run workloads without managing infrastructure.Storage enhancements, such as the Convert Azure Premium SSD v2 disk and Amazon FSX for Open ZFS, offer improved performance and cost optimization.Log management tools, like Azure Monitor Logs and AWS Cost Categories, help users collect, analyze, and act on telemetry data from various resources.Cost optimization features, like AWS Focus 1.0 and GCP backup commitment-based use discounts, provide ways to optimize cloud spending. Committed use discounts are available for backup for GKE, allowing users to commit to a consistent amount of usage in exchange for a discounted rate.AWS now offers fully licensed commitments and portable license commitments for purchasing VMware engine commitments, providing customers with more options for using VMware in the cloud.VM hibernation is now generally available in Azure and AWS, allowing users to save compute costs by persisting the VM state in memory and deallocating the VM.Azure has introduced a cost card in the portal, allowing engineers to estimate costs and adjust as needed before deploying VMs.Google Cloud's FinOps Hub now allows users to view their carbon footprint and estimated greenhouse gas emissions for their Google Cloud usage.AWS has announced the discontinuation of several services, including Cloud9, SimpleDB, Forecast, and CodeCommit, indicating a shift in focus and priorities.
In this episode, we talk to three active leaders who have been around since the very beginning of Kubernetes. We explore how Kubernetes has changed since its inception, with a particular focus on current efforts in Open source Kubernetes to support AI/ML style workloads. Maciej Szulik is currently taking a seat in the Kubernetes Steering Committee. He's also leading Special Interests Groups responsible for kubectl, workload and batch controllers. Maciej has been contributing to Kubernetes since the early days, jumping from one area to another where help was needed. He authored the first version of audit and helped shape its current one, as well as touched multiple other places in apimachinery. He was also responsible for designing and implementing Job and CronJob controllers. In kubectl he was responsible for the plugin mechanism and several major refactors to simplify the code. Since May 2024 he joined the ranks of Production Readiness Review (PRR) approvers helping ensure high production standards for the future of Kubernetes releases. Clayton Coleman is a long-time Kubernetes contributor, having helped launch Kubernetes as open source, being on the bootstrap steering committee, and working across a number of SIGs to make Kubernetes a reliable and powerful foundation for workloads. At Red Hat he led OpenShift's pivot onto Kubernetes and its growth across on-premise, edge, and into cloud. At Google he is now focused on enabling the next generation of key workloads, especially AI/ML in Kubernetes and on GKE. Dawn Chen has been a Principal Software Engineer at Google cloud since May 2007. Dawn has worked on an open source project called Kubernetes before the project was founded. She has been one of tech leads in both Kubernetes and GKE, and founded SIG Node from scratch. She also led Anthos platform team for the last 4 years, and mainly focuses on the core infrastructure. Prior to Kubernetes, she was the one of the tech leads for Google internal container infrastructure -- Borg for about 7 years. Outside of work, she is a wife, a mother of a 16-year old boy and a good friend. She enjoys reading, cooking, hiking and traveling. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week Kubernetes 1.31 Code Freeze is on July 9th Links from the interview Kubernetes Working Group Batch Kubernetes Working Group Serving Blog: Introducing Indexed Jobs (2021) Docs: Kubernetes Jobs KEP: Elastic Indexed Jobs Docs: Kubernetes CronJobs KubeCon EU 2021: The Long, Winding and Bumpy Road to CronJob's GA - Maciej Szulik, Red Hat & Alay Patel, Red Hat KubeCon EU 2018: Writing Kube Controllers for Everyone - Maciej Szulik, Red Hat (Beginner Skill Level) Kubernetes Working Group Device Management Kubernetes Enhancement Proposal process README DockerCon 2014: The announcement of Kubernetes at DockerCon Blog: AI & Kubernetes (by Kaslin) Kueue - “Kueue is a cloud-native job queueing system for batch, HPC, AI/ML, and similar applications in a Kubernetes cluster.” Whitepaper: Large-scale cluster management at {Google} with {Borg} Email: “Containers: Introduction” - An email introducing the concept of Linux containers to the Linux community Links from the post-interview chat Blog - “Scaling Kubernetes to 7,500 nodes” - OpenAI Ray on Kubernetes
Welcome to episode 264 of the Cloud Pod Podcast – where the forecast is always cloudy! Justin, Jonathan, Ryan (and eventually) Matthew are all on hand this week – and *announcement noise* this week it's the return of the Cloud Journey Series! There's also a lot of news from Re:inforce, a ground-breaking partnership between Oracle and Google Cloud, and updates to GKE. The guys also look ahead to Finops ‘24. Titles we almost went with this week: First, AI came for Writers/Artists, then it came for Developers, and now it comes for Security… What’s Next? Amazon Reinforces my Lack of Interest in Attending – JPB rl Object Storage Malware protection, everyone, please copy it! Amazon is the last man out in Oracle next-gen partnerships Dear Google, A partnership with Oracle is not Groundbreaking when Azure already did it AWS Announces some “We finally got around to it feature updates” Protect your S3 buckets from themselves with Amazon Guard Duty The CloudPod and AI play Guess Who? with IAM Access Analyzer. A big thanks to this week's sponsor: We're sponsorless! Want to reach a dedicated audience of cloud engineers? Send us an email, or hit us up on our Slack Channel and let's chat! AWS 01:04 Simplify risk and compliance assessments with the new common control library in AWS Audit Manager AWS Audit Manager is introducing a common control library that provides common controls with predefined and pre-mapped AWS data sources. This makes it easy for the GRC teams to use the common control library to save time when mapping enterprise controls into Audit Manager for evidence collection, reducing their dependence on IT teams. You can view the compliance requirements for multiple frameworks such as PCI or HIPAA, associated with the same common control in one place, making it easier to understand your audit readiness across multiple frameworks simultaneously. Interested in pricing? You can find that info here. 01:37 Ryan – “It’s the dream! Automated evidence generation. And now with the context of known frameworks. Yeah; because that’s always the challenge, you know, are the last step of the translation – this is the control. Hey, we need all these controls to do this level of compliance.” 04:36 Centrally manage member account root email addresses across your AWS Organization 2017 Justin is really digging all these quality-of-life features coming out, and we like to think that AWS has just finally gotten to our pile of feature requests from back then. This week, it’s now easier for AWS Organizations customers to centrally manage the root email address of member accounts across their organization using the CLI, SDK and Organizations Console.
In this episode, Corey chats with Google's Nick Eberts about how Kubernetes helps manage applications across different cloud environments. They cover the benefits and challenges of using Kubernetes, especially in Google's cloud (GKE), and discuss its role in making applications more flexible and scalable. The conversation also touches on how Kubernetes supports a multi-cloud approach, simplifies the deployment process, and can potentially save costs while avoiding being tied down to one cloud provider. They wrap up by talking about best practices in cloud infrastructure and the future of cloud-native technologies.Show Highlights: (00:00) - Introduction to the episode(03:28) - Google Cloud's approach to egress charges and its impact on Kubernetes(04:33) - Data transfer costs and Kubernetes' verbose telemetry(07:23) - The nature of Kubernetes and its relationship with cloud-native principles.. (11:14) - Challenges Nick's faced managing a Kubernetes cluster in a home lab setting(13:25) - Simplifying Kubernetes with Google's Fleets(17:34) - Introduction to GKE Fleets for managing Kubernetes clusters (20:39) - Building Kubernetes-like systems for complex application portfolios (24:06) - Internal company platforms and the utility of Kubernetes for CI/CD (27:49) - Challenges and strategies of updating old systems for today's cloud environment(32:43) - The dividing line between Kubernetes and GKE from a product perspective. (35:07) - Where to find Nick (36:48) - Closing remarks About Nick:Nick is an absolute geek who would prefer to spend his time building systems but has succumbed to capitalism and moved into product management at Google. For the last 20 years he has worked as a systems engineer, solution architect, and an outbound product manager. He is currently the product manager for GKE Fleets & Teams focusing on multi-cluster capabilities that streamline GCP customers experience while building platforms on GKE. Links referenced: Duck Bill Group's website:http://www.duckbillgroup.com Nick on Twitter/X : @nicholasebertsNicholas Eberts on instagram: https://www.instagram.com/neberts1/Nick on Linkedin: https://www.linkedin.com/in/nicholaseberts/
Guest is Bill Mulligan. Bill is Community Pollinator at Isovalent working on Cilium and eBPF. We learned how to properly pronounce Isovalent and what it actually means. We also spoke in depth about eBPF, Cilium, network function in Kubernetes and more. Do you have something cool to share? Some questions? Let us know: - web: kubernetespodcast.com - mail: kubernetespodcast@google.com - twitter: @kubernetespod News of the week The Kubernetes legacy Linux package repositories are going away in January 2024 Kubernetes 1.29 is now available on GKE in the Rapid Channel The Vmware Tanzu Application Catalog is fully compliant with the SLSA Level 3 AWS extended support for Kubernetes minor versions pricing update The Kubernetes Contributor Summit Paris CFP is Open, closes Feb 4th KubeCon and CloudNativeCon EU 2024 co-located events agenda is live The Cloud Native Glossary is now available in French Blixt a new experimental LoadBalancer based on the Gateway API and eBPF Links from the interview Bill Mulligan: LinkedIn Twitter/X Covalent bonds on Wikipedia Isovalent Hybridization on Wikipedia Isovalent company site BPF - Berkeley Packet Filtering eBPF project site Fast by Friday: Why eBPF is Essential - Brendan Gregg GKE Dataplane V2 Cilium project site Hubble documentation Cilium Service Mesh Cilium annual report Cilium Certified Associate (CCA) CCA Study Guide from Isovalent on GitHub Istio Certified Associate (ICA) Certified Kubernetes Administrator (CKA) Certified Kubernetes Application Developer (CKAD) Kubernetes and Cloud Native Associate (KCNA) Resources to prepare for the CCA certification Isovalent library The World of Cilium Cisco acquired Isovalent Developing eBPF Apps in Java BGP in eBPF
Not Escaping Containers but escaping Clusters - Managed Kubernetes distributions such as Amazon EKS, Google Kubernetes Engine (GKE) and Azure Kubernetes Service (AKS) attack vectors can allow you to reach the underlying AWS Account etc. In conversation with Christophe Tafani-Dereeper & Nick Frichette, from Datadog on how this is possible in Amazon EKS and achieving potentially the same in GKE & AKS too. Thank you to our episode sponsor Sagetap Guest Socials: Nick's and Christophe's Linkedin (Nick Frichette + Christophe Tafani-Dereeper) Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security Newsletter - Cloud Security BootCamp Questions asked: (00:00) Introduction (04:11) A bit about Christophe (04:37) A bit about Nick (05:03) What is managed Kubernetes? (06:26) Security of managed Kubernetes (09:02) Comparison between different managed Kubernetes (10:41) Service accounts and managed Kubernetes (14:22) What is container escape? (18:20) IMDSv2 for EKS (19:51) IMDSv2 in EKS vs AKES and GKE (22:01) Benchmark compliance for Kubernetes architecture (24:49) Low hanging fruits for container escape (27:17) Shared responsibility for managed Kubernetes (29:34) Fargate for Managed Kubernetes (32:00) Different ways to run containers (33:37) Escaping Managed Kubernetes cluster (38:39) Find more about this attack path (42:38) Escalation priviledge in EKS cluster (44:19) Reducing the Kubernetes attack service (44:58) MKAT for Kubernetes Security (48:23) Preventing AWS AuthConfig (50:11) Propagation Security (54:55) The fun section (57:47) Resources for latest Kubernetes updates Resources spoken about during the episode Nick Frichette's Blog - Hacking the Cloud Christophe Tafani-Dereeper' Blog Corey Quinn's - 17 ways to run containers on AWS MKAT cloudseclist newsletter
Welcome episode 226 of the Cloud Pod podcast - where the forecast is always cloudy! This week Justin, Matt and Ryan chat about all the news and announcements from Google Next, including - surprise surprise - the hot topic of AI, GKE Enterprise, Duet, Co-Pilot, Code Whisperer and more! There's even some non-Next news thrown into the episode. So whether you're interested in BART or Bard, we've got the news from SF just for you. Titles we almost went with this week:
Kendall Miller is the Co-Founder and COO of CTO Lunches, a network of engineering leaders to get trusted advice and connections. The first half of the conversation with host Victoria Guido and special guest host, Joe Ferris, CTO of thoughtbot revolves around the use, adoption, and growth of Kubernetes within the technology industry. The discussion explores Kubernetes' history, influence, and its comparison with other platforms like Heroku and WordPress, emphasizing its adaptability and potential. The second half focuses on more practical aspects of Kubernetes, including its adoption and scalability. It centers on the appropriateness of adopting Kubernetes for different projects and how it can future-proof infrastructure. The importance of translating technical language into business speak is emphasized to influence executives and others in the decision-making process and Kendall also discuss communication and empathy in tech, particularly the skill of framing questions and understanding others' emotional states. __ CTO Lunches (http://ctolunches.com/) Follow CTO Lunches on LinkedIn (https://www.linkedin.com/company/ctolunches/) or Twitter (https://twitter.com/cto_lunches). Follow Kendall Miller on LinkedIn (https://www.linkedin.com/in/kendallamiller/). Follow thoughtbot on Twitter (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido. And with me today is Kendall Miller, Co-Founder and COO of CTO Lunches, a network of engineering leaders to get trusted advice and connections. Kendall, thank you for joining me. KENDALL: Thanks for having me. I'm excited. VICTORIA: And today, we have a special guest host, Joe Ferris, CTO of thoughtbot. Joe, thank you for joining us. JOE: Hello there. Thank you for having me. KENDALL: Hi, Joe. Thanks for being here. It's exciting. VICTORIA: Yes. It's so exciting. I think this is going to be a great episode. So, Kendall, I met you at a San Diego CTO lunch recently, and I know that's not the only thing that you do. So, you're also an advisor, a board member, and CXO. So, maybe tell us a little bit more about your background. KENDALL: Gosh, my background is complicated. I've been involved in tech for a very long time. In college, I worked for a company that started Twitter about five years too soon, and then worked in the nonprofit space in China for ten years, then came back, got back involved in tech. Today, I'm usually the business guy. So, when technical founders start technical products and want help turning them into successful technical businesses, that's when they call me. So, I have the technical background. I have never been paid to write code, which is probably a good thing. But I can hang in the technical conversations for the most part, but I'm much more interested in the business side and the people leadership side of business. So that tends to be where I play. Every organization hires me to do something different. VICTORIA: Thank you for that. And I'm just curious about the CTO Lunches. Just tell me a little bit more about that. And what's the idea behind it that led you to co-found it? KENDALL: CTO Lunches has actually been around for about eight years. And I didn't start the initial incarnation of it. It was two people that got us started, and I was trying to hire one of them; one thing led to another. Actually, originally, they did not want me to join. I think, at the time, my title was COO at a company that I was working with. About six months later, I took over engineering as VP of engineering, and then they're like, you can join the group now. We're less strict about that [laughs] now. Although it is highly focused on senior engineering leaders, it's not exclusively CTOs. But the group's been in place for a very long time, just intended as a place to network, have conversation with people who are in that senior-most technical position at technical organization. So, the CTO role is a lonely role. CTOs get fired all the time. There's not a technical person at the company that doesn't think they can do the job better than them. So, the CTO is always getting feedback. You're doing this wrong. The trade-offs you're making are wrong. This isn't going where it should be going. We should automate that. Why haven't we automated that? We should switch to this other tool. I've used it before; it's 100 times better. Joe, let me know if I'm getting any of this wrong. But that's the experience that I've had. Having a place where people can get together and, you know, half the time just complain to each other, hey, this is hard, is really why the networking group exists. So, it's a listserv. And there are local lunches that started in Boulder, Colorado. It's gotten pretty global. About a year ago, a little over a year ago, I was talking with one of the people who'd gotten it started. I've been involved in the Denver chapter for most of those eight years. And I was suggesting to him that he change a few things about it, to monetize it so that he could invest in it further. And he came back a few months later and said, "I want to take your advice and do this, but I want you to come do it with me." So, we founded the company officially...I think in December is when paperwork went into place. And we started investing in it a little bit more heavily. I was living in Europe last year, so we went and put on lunches in Paris, and Lisbon, and London, and, gosh, all over the place. I'm sure I'm missing some, Amsterdam. But there's been chapters all over the U.S. and a couple of other parts of the world for a long time. VICTORIA: That reflects my experience attending a CTO lunch. It's just very casual, like, just get together and eat food and talk about what you've worked on recently, issues you're having, just get ideas and make some friends. So, I really appreciated the group, and I'm going to personally plug the San Diego Chapter has picked up again. And we're meeting next Friday down in Del Mar. And we're going to be meeting on the last Friday of every month through October. So, I'm super excited to be a part of the group. And Joe, yeah, I'm curious about your perspective. As a CTO with thoughtbot, just what are your thoughts about that kind of thing? KENDALL: Yeah. How right am I about how lonely you are, Joe? JOE: [laughs] You know, I've been lonelier since we went remote. I used to work in the office, and I was a CTO, but also, I had lunch with people, which was nice. So, I'm lonelier. But yeah, I think everybody needs a group like that, like, senior developer therapy just to talk about your woes together, drown your sorrows. KENDALL: Well, I think years ago, I heard that CTOs are the most fired C-level executive. JOE: You're making me nervous now. KENDALL: [laughs] You've been there a long time, Joe. I know you've been there a long time. If you haven't been fired yet, you probably got a little while longer in you. This will be really awkward if it's published and you've already been fired. VICTORIA: We can always edit that out afterwards. [laughter] KENDALL: Yeah, no, I think it is a particularly lonely position. And, again, I think a lot of it is the average engineer in a technical company doesn't look at the COO or the CFO or even the CEO and think I could do that. But they're all looking at the CTO and thinking, what does that person do that I can't do? It's ridiculous because most of them would make terrible CTOs because it does require some of the business sense. Or, you know, right out of the gate, they might make terrible CTOs. It actually is quite a skill to be the most technical person and speak the business language. I mean, am I right about that, Joe? Like, was that hard for you to learn? JOE: Yeah, I definitely think...so, my background is also technical. I have a background in consulting. So, I always did a lot of metaprogramming, if you will. But making that transition to thinking about organizations that way, thinking about how all the other pieces play into it, was a pretty big step for me, even before I became a CTO as a consultant. KENDALL: Well, because you can't just chase the newest, hottest technology. You have to make business trade-offs. And not everything can be resume-driven development, right? Even if that technology over there is newer or hotter, it doesn't mean you have a business model that supports it. And it doesn't mean that migrating to it can be done, right? JOE: Yeah. I mean, even beyond choosing technologies, just choosing where to invest in your software stack, like, what needs to be reworked, what doesn't, and trying to explain those trade-offs, I think, is a rare skill. Being able to explain why something would be harder than something else when you're working with the leadership to prioritize a backlog it's a puzzle. KENDALL: Well, and I think when I'm in an executive conversation, and the CTO says, "Here's the thing that I think is the best decision technically, and I think it's the wrong decision for the business because of X, Y, or Z," I'm always super impressed, right? Like, this is the right technical solution for what we want. However, we shouldn't pursue that for business reasons right now. Maybe we can in six months, but right now, we need to prioritize this other thing. I don't know, that's always when I feel like, oh, this person knows what they're doing. JOE: There's nothing more dangerous to software than a bored developer. [laughs] One nice thing about being a consultant is that I don't have to invent problems to solve with technology at my company because sooner or later, I'll run across a company that has those problems, and I'll get to use that technology. But I think a lot of people are mostly happy...they might be happy in their role. They might be happy with our team. But they're very interested in whatever is hot right now, like machine learning, AI. And so, suddenly, that surreptitiously makes its way into the tech stack. And then, years later, it's somebody's problem to maintain. KENDALL: [laughs] Well, I have a specific memory of a firm in New York City that was, you know, this is relevant to y'all as thoughtbot is that, you know, at least historically, it was, to me, the premier Ruby on Rails consulting shop. I think that's still largely y'alls focus. Am I right about that? JOE: We still do a ton of Rails, yeah. KENDALL: Okay. Well, so this organization was all Ruby on Rails. It was a big organization. They had a very large customer base. And they hired a new CTO who came in, told everybody in the company they were stupid, laid off 70% of the engineering organization, and told the CEO he was going to completely rewrite the product from scratch in .NET, and he could do it in three weeks. And I'm pretty sure the business went under about three months later [laughs] because that was just so outrageously nuts to me. JOE: It's too bad he laid everybody off beforehand. I've been in that situation where somebody tells me, "I'm going to rewrite this. It'll be ready in three weeks." And I could fight with them and try and convince them they're wrong. But I feel like somebody who's approaching that with that attitude they're missing all of the nuance and context that would make it possible to explain to them why it's not going to work. And so, it's easier to just say, "You know, take the three weeks. I'll talk to you in three weeks." But if you've already laid off your development team, that's hard [laughs] to recover from. KENDALL: That's exactly right. VICTORIA: There's got to be a name for that kind of CTO who just wants to come in and blow everything up [laughs]. Yeah, so you spend a lot of time talking to different CTOs and doing this social networking aspect. I'm wondering if there's, like, patterns that you see. You've mentioned already one about just, like, the most often getting fired. [laughs] But what are the patterns you see, like, in challenges, and then what makes someone successful in that CTO role? KENDALL: Well, oh gosh, I have so many thoughts about this. First of all, I run into a couple of different categories of CTOs. There's a lot of people who come to CTO Lunches who are small company CTOs. I mean, it makes sense that there's a lot more small company CTOs than there are big company CTOs. But the small company CTO who maybe it's their first gig in the role or they're a serial CTO. There's the fractional CTOs that come that are doing it across several different organizations at the same time, and then there's the big company CTO who shows up. And honestly, all of their problems are very different. The thing that they have in common is even at a very large organization, in that position, they can make a decision that causes the company to go under. So, there is a significant amount of volatility in the amount of power that they wield. So, what's interesting about that is not everybody understands that. And so, first of all, there's the kind of CTO that just doesn't get that, and that doesn't matter if they're fractional, or a small company CTO, or a big company CTO. If they don't understand that, they're going to cause significant problems, right? Like the person I just mentioned who said, "I can just re-platform this in three weeks in .NET." There's that. I mean, I think, as with any senior leadership position, the comfort with volatility, the ability to know what to communicate down versus across and versus up, and then the ability to speak the business language. For everybody, the CFO's job is to communicate the financial needs alongside of the business leads, right? If the CFO's sole goal is to cut costs or make sure we're running as lean as possible, they're a bad CFO. But they're not as good of a CFO as the CFO who can say, "Hey, we're underspending right here. And I can look at the numbers and know we should invest more there. How can we invest more there and invest it well?" And it's the same thing for a technology executive to be able to look at the business context and communicate it back. And there are so many CTOs that I've worked with who they're the most technical person in the room, and they know it. And as a result, they're just a jerk to everyone around them, like, everything you did here was wrong. You know, that's where they fail. And so, if they can communicate the business needs, navigate the volatility, and support a team that's going to make decisions that aren't always the same decision they're going to make, they're going to be successful. Honestly, there's very, very few CTOs that I've met like that. People who are excited to meet you at work, excited to see you succeed, excited to see that you went and built a thing is great. I mean, the reason I was VP of engineering is the CTO that I was working with at the time...it's a terrible story. There was an engineer who had seen something that we were doing on repeat all the time and, in his spare time, spent about 40 hours outside of work, not during work hours, automating this task that we were doing regularly. And it was related to standing up a whole bunch of things in our standard infrastructure. He brings it to the CTO and says, "Look what I built." And the CTO, instead of saying, "Hey, this is incredible. Thank you. This is going to save us a bunch of time. Let's iterate on it. Here's some things I'd like to tweak. Can we bring it in this direction? Can we..." you know, whatever, said, "Why is this in Python? It should be in Ansible," something like that. I can't remember. And the engineer literally burst into tears. [laughs] JOE: Oh my God. KENDALL: [laughs] Well, I mean, yeah, it was like; literally, that's why the CTO stopped managing people that day. There's a lot of examples that I have like that. Joe, I appreciate that your response is, "Oh my God." Because I think there's a lot of people who'd be like, wait, what was wrong with that? Shouldn't it have been in Ansible? JOE: [laughs] Yeah, I've seen CTOs come into primarily two groups. One is the CTO who just tells, you know, like, they make the decisions, and they tell everybody what to do. They obviously don't have all of the information because you can't be in every room all the time. And the other is the CTO, who just wants to be one of the team members and doesn't make any decisions and tries to get people to make decisions collectively on their own without any particular guidance or structure. And finding that middle spot of, like, not just saying, "Hey, everything's in Ansible," allowing for the creativity and initiative, but also coalescing the group into a single direction, I think, is what makes a good CTO. KENDALL: Well, yeah, because the CTO does have to say no, sometimes, right? Like, the best product, people say, "No." Good CTOs say, "No." There is some amount of, hey, I need you to come to me with trade-offs about this. Why are you going to make that decision? And I'm sorry, you still didn't convince me, right? Like, I mean, those are appropriate things to say. But yeah, I'm with you on that. You said they fall into two categories. But you really mean the third and that middle ground. Is it easy for you to walk that middle ground, Joe? JOE: I wouldn't say it's easy. [laughs] KENDALL: Yeah. Well, I'm always nervous to say something. I'm doing well because I know there's a report out there that can point at every time I failed at it, right? So... MID-ROLL AD: Are you an entrepreneur or start-up founder looking to gain confidence in the way forward for your idea? At thoughtbot, we know you're tight on time and investment, which is why we've created targeted 1-hour remote workshops to help you develop a concrete plan for your product's next steps. Over four interactive sessions, we work with you on research, product design sprint, critical path, and presentation prep so that you and your team are better equipped with the skills and knowledge for success. Find out how we can help you move the needle at: tbot.io/entrepreneurs. VICTORIA: Yeah, what I'm getting from what you're saying, too, is this communication ability and not just, like, to communicate clearly but with a high level of empathy. So, if you say like, "Well, why is it in Python and not Ansible?" is different than being like, "What makes Python the best solution here?" Like, it's a different way to frame the question that could put someone on the defensive that just really requires, like, a high level of emotional intelligence. And also, if they've just worked, like, an 80-hour week, [laughs] I probably would maybe choose a different time to bring those questions up and notice that they have been burning the candle at both ends and prioritize getting them some rest. So, speaking of, like, communication and getting prioritization for [inaudible 15:34], especially on, like, infrastructure teams, maybe we could talk a little bit about Kubernetes, like, when that comes up as an appropriate solution, and how you talk about it with the business. KENDALL: My background with Kubernetes is long because a company that I still work with, Fairwinds, used to be called ReactiveOps, has been in the Kubernetes space for a very long time. I think we were one of the very first companies working with Kubernetes. It was coming up that people were running into the limits of something like Heroku, right? And I think it's Kelsey Hightower who said every company wants a PaaS. They just want the Paas that they built themselves. And that's really accurate. And I think Kubernetes isn't quite a framework for building your own PaaS or isn't quite a foundation where I think of a foundation for a house. Instead, it's more like rebar and cement and somebody saying, "Good luck, buddy." You know, you still have to know how to put the rebar and cement together to even make the foundation, but it is the building blocks that help get you to a custom-built PaaS. And it's become something that a lot of people have landed on as, you know, the broadly accepted way to build cloud-native infrastructure. The reason I've been in the Kubernetes space and the space that I see Kubernetes still filling is we need to standardize on something. We can choose a cloud provider's PaaS. We can choose a third-party PaaS, or we can standardize on something like Kubernetes. And even though we're not going to migrate from AWS to Azure, the flexibility that Kubernetes gives us as a broadly adopted pattern is going to give us some ability to be future-proofed in our infrastructure in a way that previous stacks were not, you know, it was Puppet, and it was Ansible. And it was SaltStack. And it was all Terraform all the time. I'm not saying those things don't exist anymore. I'm saying Kubernetes kind of has won that battle. Joe, since you're here and I know y'all are doing some Kubernetes work now at thoughtbot, I'm curious if you agree with that characterization. JOE: Yeah, I think that's true. I think it's the center for people to coalesce around. Like, for an effort in the industry to move forward, there needs to be some common language, some common ground. And I think Kubernetes struck the right balance of being abstract. So, you can use it in different environments but still making some decisions, so you don't have to make them all. And so, like, all of the things you had to do with containers like figuring out what your data solution is going to be, what your networking solution is going to be, Kubernetes didn't even really make those decisions. [laughs] They just made a platform where those decisions can be made in a common way. And that allowed the community and the ecosystem to grow. KENDALL: I mean, I think of it a lot like WordPress; you know, WordPress is hated by many. When WordPress came out, it was hot, right? And it was PHP, which everybody was super excited about at the time. Kubernetes is going to reach a point where it's as long in the tooth and terrible as people think WordPress is, but it has become the standard. And the advantage of the standard is you can use the not standard. You can go build a website in Jekyll instead of WordPress, and there's going to be some things that are nicer about Jekyll. But because WordPress is so broadly adopted, there's a plugin for everything. And I think that's where Kubernetes sits is because it's become so widely adopted everybody's building for it. Everybody's adapting for it. If you run into a problem, you're going to find somebody else out there who has that problem. In fact, I think of one organization that I know that was on HashiCorp's Nomad. And they said, "Actually, we think Nomad has better technology through and through. But we think we're the only company at this size and scale using Nomad. And so, when we run into a problem, we can't Google for it. There's no such thing as a plugin that exists to solve this. Nobody has ever run into this before on Nomad. But there's 100 companies dealing with the same problem in Kubernetes, and there's ten solutions." And I think that's the power that it brings. VICTORIA: So, it's not just a trend that CTOs are moving towards, you think. KENDALL: I mean, I think it's already won the battle and the hockey stick of adoption. We're still right at the very bottom of that tick-up because it takes people a long time to adapt new technology like this, especially in their infrastructure. It's a big migration, to move. So, I don't think it's the widely adopted infrastructure technology even yet. I think a lot of the biggest organizations are still running on things that predate Kubernetes. But I think it has won the battle, and it is winning the battle and is going to be the thing going forward, so yeah. JOE: I think it also has a lot of room to grow still. Like, there are other technologies that I used previously, like Docker, and they were a big step up from some of the things I was doing at the time. But you quickly hit the ceiling, or it was, like, I don't know where to go with this next. I don't know what else is going to happen. Whereas with Kubernetes, there are so many directions it can go in. Like, the serverless Kubernetes offerings that are starting to pop up are extremely interesting, where, you know, you don't actually maintain a cluster or anything. You just deploy things to this ethereal cluster that always exists. And so, that sort of combination of platform as a service, function as a service, Kubernetes, as that evolves, I think there are a lot of exciting things that have yet to come in the Kubernetes space. KENDALL: Well, so say more about that, Joe, because I've been going to KubeCon for a very long time, maybe...I don't know if it's 2016 or so when I first went. And it felt for a number of years...maybe those first four-ish years it was always the people at KubeCon were the, like, big dreamers and thinkers and, like, we're here to change the future of cloud infrastructure. And this is going places, and we're excited to be here and be a part of it. And here's what I'm going to do that changes the next thing. And I feel like now if I go to KubeCon, it's a lot of people from, you know, IBM and some big bank that are, like, deep sigh, well, I have to adopt Kubernetes. I need to know what the vendors are. What do you guys do, and how does this work? Can you please teach it to me? Because I'm being told by my boss, I have to do it. I don't see that excitement around Kubernetes anymore. The excitement I see is all around further up the stack, you know, things like Wasm, WebAssembly, or eBPF, the networking things and tracing things that are possible. Maybe that's further down the stack. I guess it depends on how you think about it, but different part of the stack. So, I'm curious, touching on the serverless components of Kubernetes; sure, I get that. And I do think, increasingly, the PaaSs of the future are all going to be Kubernetes-based, whether that's exposed or not. But where are the places that you think it's still going to go? Because I feel like it's already gotten boring, maybe in a positive way. But I don't see the excitement around it like I saw a few years back. And I'm curious what else you think is going to happen. JOE: Yeah, I mean, I don't think I disagree. I think Kubernetes itself, the core concept, is, like, it's still changing. But you're right that the excitement about Kubernetes existing has gone down because it's been there for a while. But I feel like the ecosystem is still growing pretty rapidly. Like, the things you mentioned, like Wasm and Istio, and all the tools in that ecosystem that continue to grow, is where I think the interesting things will happen. Like, it's created this new lower-level layer of abstraction that makes it possible to build concepts and technology that could not have existed before. KENDALL: Yeah, well, and I'm, you know, talking to people who are working really hard at making short-run ephemeral workloads work better on things like GPUs for the sake of AI, right? Like, I mean, there is some really interesting things happening, and people are doing this in Kubernetes. So, I get that. I agree with that. It is interesting that Kubernetes has become sort of the stable thing, and now it's about who can build the interesting add-ons. It's almost like, okay, we've built Half-Life. What is Counter-Strike going to look like? You know. That's a terrible (I'm aging myself.) example. But still. VICTORIA: I think it's interesting, I mean, to look at the size of the market for platform engineering right now. In 2022, was 4.8 billion, and it's estimated to be in 10 years $41 billion. So, there is this emerging trend of different platform engineering products, different abstractions on top of Kubernetes. And I wonder what advice you would have for a technical founder who's looking to build and solve some of these interesting issues in Kubernetes and create a business around it. KENDALL: Well, okay, let me clarify that question. Are you thinking, I'm a startup, and I need to build my infrastructure, and I'm going to choose Kubernetes. What advice do I need? Or are you thinking, I am founder, and I want to go build on the Kubernetes ecosystem. What advice do you have? VICTORIA: Now I want to know the answer to both. But my question was the second one to start. KENDALL: One of the things that is hard about the Kubernetes ecosystem is there's not a ton of companies that have made a whole bunch of money in Kubernetes because, as I said, I still think we're actually really early in the adoption curve. The kinds of companies that have adopted Kubernetes are the kinds of companies that don't spend lots and lots of money on an infrastructure. [laughs] They're the kinds of companies that are fast-moving, early adopters, or, you know, those first followers, and so they're under $100 million companies for the most part. Where the JP Morgans and Chase are running Kubernetes somewhere in their stack, but they haven't adopted it across the stack to need the biggest, best tools about it. So, the first piece of advice that I'd give is, be a little wary. It's still very early to the market. Maybe now is the time to build the thing. When ReactiveOps pivoted to Kubernetes, I think it was six months of having conversations with companies who were just, like, so excited about it, and this is definitely what we want to do. But nobody was doing it yet. You know, it was, we have, like, six solid months of just excitement and nobody actually pulling the trigger. And, you know, we were a little too early to that market. And that was just the people adopting it. So, I think there is some nervousness that cloud-native solutions the only people who are really making money in Kubernetes are named Amazon, Google, and Microsoft because it's the cloud providers that are making a ton off of it. Now, there's Rancher. There is StackPointCloud. There's a few others that have had big exits in this space. But I don't think it's actually as big of a booming economy as a lot of people think, in part because EKS is an incredibly amazing product. Like, eight years ago, the thing people paid us the most to do at ReactiveOps was just stand up Kubernetes because it was so stinking hard to just get it up and working. And now you click some buttons. Anybody can go do that. So, it's changed a lot, right? And I think be wary when you're entering that ecosystem. And then, my advice to the founder that's not building on the ecosystem but just looking to adopt a technology that's going to be a future-proofed infrastructure is just adopt one of the cloud-native platforms. And there are a whole bunch of sort of default best-in-class add-ons out there that you need to throw in. Don't adopt too many because then you have to maintain them forever. That's the easiest way to get started. You can figure out all the rest of it later. But if you go use EKS, or GKE, AKS, you can get started pretty easily and build something that is going to be future-proofed. I don't know, Joe; I'm curious if you disagree with any of that. JOE: Well, I think it's interesting to think about who's making money in Kubernetes. Like, I think there might not be as many companies who are doing only Kubernetes and Kubernetes-focused products that are massively successful. But I think because it has had a good amount of adoption and because it's easier to work with something that's standardized, it has helped companies sell things that they wanted to sell anyway. Like, all the Datadog, all the Scalas, the logging companies, they all have Kubernetes add-ons. And now everybody is paying Datadog [laughs] to have a dashboard for their Kubernetes cluster. I think they're making more money than they would have been without targeting the market. And so, I think that's really...if you want to get into the market, it's not, like, I'm going to build a Kubernetes product. It's if I'm building operations and an infrastructure product, I should definitely have it work with Kubernetes, and people will want to click and install it. KENDALL: So, to be clear, you know, one of the companies that I work with is called Axiom, and they play in the same, you know, monitoring, observability space as Datadog does. And part of what makes Kubernetes interesting in that space is in a microservice environment; there's so much happening. Where are problems being caused? We don't live in a day where I can just run my code, and it tells me that there's an unexpected semicolon on line 23, right? Like, that still happens. You're still doing those things. But this microservice talking to that microservice is where things tend to break down. Did I communicate this correctly? What was sent? What was received? Where did it break down? What was the latency? And if you were doing things in the old way back when you were standing it up with, say, Ansible, or Puppet, or something like that, and you were orchestrating all of these cloud virtual machines, you had to really work hard to instrument the tracing and logging and everything involved in order to track what was going on. Whereas that's one of the magic things about Kubernetes is with a few of the add-ons or some of the things out of the box with Kubernetes, it's a couple of clicks to get so, so much of the data and have insight into where things are going and what's going wrong. And so, I 100% agree with that. Kubernetes is generating a tremendous amount of data. And if you're a data company, it's really nice to have all that come in, and it helps them make money, helps the user of Kubernetes in that situation understand where problems are happening and breaking down. Yeah, there's definitely some network effects of what Kubernetes is doing in that. I completely agree. JOE: I think there are also some interesting companies, like, where they make...Emissary, Ambassador, and they have that sort of dual -- KENDALL: Komodor, is that -- JOE: Yeah, maybe. They have open source, but then they have a product. KENDALL: You're thinking of Ambassador Labs. JOE: Yeah. Ambassador Labs, yeah. I guess I don't really know how much money they're making. But I think that's a really interesting concept as people who make open-source things then make a well-supported product built around it. KENDALL: Sure. What's interesting is, I think in the VC world, at least right now, and it may pick up again, but post-Silicon Valley Bank nearly caving in, I think that the VC tolerance for, yeah, just go get a billion open-source adopters, and we'll figure out how to monetize later I think that the tolerance for that is a lot lower than it was even six months ago. JOE: Yeah, I think you have to have a dual model right from the beginning now. KENDALL: Yeah. Agreed. VICTORIA: You got to figure out how to make money on Kubernetes before you can. [laughs] KENDALL: You know, minor detail. That's why I think services companies in this space still have a lot going for it. Because in order to even be able to sell software to a company using Kubernetes, you half the time have to go stand up Kubernetes for them because it is still that hard for so many people to really adopt it. VICTORIA: Yeah. And maybe, like, talking more about, like, when it is the right decision to start on Kubernetes because I think the question I get sometimes is just, is it overkill? Is it too much for what we're building? Especially, like, if you're building a brand-new product, you're not even sure if it's going to get adopted that widely. KENDALL: I mean, and I'm [laughs] curious your thought on this, Joe, but there's a good argument to be made that Heroku was enough for the vast majority of founders early on. But the thing is, Kubernetes isn't as hard as it used to be. Going and clicking a couple of buttons on GKE and deploying something into Kubernetes with GKE Autopilot running it's not as easy as Heroku, but it's not wildly far off. And it does substantially future-proof you. So, when is it too early? I'm not sure it's ever too early if you have an intention of scaling if you're planning on running some kind of legacy workload, like, things that are going to be stateful. Or maybe WordPress, for example, you don't probably need to deploy your WordPress blog onto Kubernetes. You can do that in your cPanel on Bluehost. I don't actually know if Bluehost even exists anymore, but I assume it's still a thing. I don't know, what would you say, Joe? JOE: I agree with that. I think it's a hard first pill to swallow. But I think the reality is that it's very easy to underestimate the infrastructure needs of even an early product. Like, it doesn't really matter what you're building. You're still going to have things like secrets management. You're still going to have to worry about networking. They just don't go away. There's no way you have a product without them. And so, rather than slowly solving all those problems from scratch on a platform that isn't designed for it, I think it's easier to just bite the bullet and use one of the managed solutions, especially, as you said, I think it's getting easier and easier. The activation energy from going from credit card to Kubernetes cluster is just getting lower. KENDALL: And so, the role of the CTO is just getting easier and easier because they can just adopt the one technology, and it's obviously Kubernetes. And it's obviously Rust, right? [laughter] Yeah, no, I'm with you. And I think if you find somebody who knows Kubernetes inside and out, it's really not going to take them long to get started. VICTORIA: Yeah, once again, change management is the biggest challenge for any new innovation coming into adoption. So, I'm curious to talk more about the influence that you need and how you influence others to come around to these types of ideas, like, in the executive suite and with the leadership of a company, especially on these types of topics, which can feel maybe a little abstract for people. KENDALL: How you influence them specifically to use Kubernetes, or just how you talk with them about technology adoption in general? Or what are you asking? VICTORIA: Yeah, like, how do I get people to not just turn their ears off when I say the word Kubernetes? [laughs] KENDALL: Yeah, I mean, I think...so I think that's where it's the technologist's job and the role of the CTO to translate these things into business speak. And that's why I'm using words like future-proofing your infrastructure is because there are companies that...I know one company that made a conscious decision that they were going to try to re-platform every single year, and that is not a good idea or sustainable for the vast majority [laughs] of companies. In fact, I can't think of a single situation where that makes sense. But if you can say to the CFO, "Hey, it's going to cost us a little bit more right now. It's going to save us substantially in the long term because this is the thing that's winning. And if we go standardize on Heroku right now, every company does eventually have to migrate off of Heroku. They either go out of business, or they get too big for it." That's the kind of thing that needs to be communicated in order to get people to adopt it. They don't care what the word is. They don't care if you're saying Kubernetes; you know, most CFOs understand it about as well as my mom does. My mom tries to bring it up in conversation because she's heard me use it. And she thinks it makes her sound smart, which maybe it does in the right climate. VICTORIA: My partner does the same thing. He says DevOps and Kubernetes all the time. I'm like; you don't know what you're talking about. [laughter] JOE: Those words do not come up in my house. KENDALL: One of my kids asked me to explain Kubernetes. And I do a whole talk, particularly at organizations where understanding Kubernetes is essential to the salespeople's role. And I give a whole talk about the background of how we got here from deploying on some servers in our back room. And, you know, what's different about the cloud, what containerization did, et cetera. And I have this long explanation. And I remember taking a deep breath and saying to my kids, "Do you really want to hear this?" And I had one son say, "Yes, absolutely." And my wife and three of the other kids all stood up and said, "No way," and left the room. So, when somebody asks me, "What do you do?" Actually, one of the key relationships I built with some of the early people at GCP when we were partnering closely with them was a person that I met, and I asked, "What do you do for a living?" And he said, "I can tell you, but it's not going to mean anything to you." And I was like, "That's what I say to people." And it turned out he was in charge of, you know, Kubernetes partnerships for Google. I can explain to you what it means and why it's important. But you're not going to be happy that I spent that time explaining it to you. VICTORIA: [laughs] That sounds awesome, though. It sounds like you built a server rack just to demo to your children what it was. KENDALL: No, no. I just talked back through the history of...that company that I mentioned that built Twitter about five years too early; we had a, you know, we had a server rack in the...literally physically in our closet that was serving up our product at the time. VICTORIA: Probably the best demo I ever saw was at Google headquarters in Herndon, and someone had built...They had 3D-printed a little mini server rack that they had put Raspberry Pis onto, and then they had Kubernetes deployed on it. And they did an automatic failover of a node to just demo how it works and had little lights that went with it. It was pretty fun. So maybe you should get one for yourself. [laughter] It's a fun project. KENDALL: They remember the things that it enables. They don't remember what it does. And so, when I say so, and so is a client that's using this technology, then they get real excited because they're like, "My dad makes that work." And I'm like, well, okay, that's kind of a stretch, but you get the idea. VICTORIA: Yeah, you got to lean into that kind of reputation in your house. KENDALL: That's right. VICTORIA: And you're like, yes, that's correct. KENDALL: That's right. [laughs] VICTORIA: I do make Kubernetes. I make all the clouds work, yeah. KENDALL: Actually, my most common explanation is Kubernetes is the plumbing of the internet. Unless you're a plumber, you don't care about the pipes. You just want your shit to flush when you use the toilet. You want the things to load when you click your buttons. You don't actually care what's going on behind the scenes, but this is what's orchestrating it increasingly across the internet. VICTORIA: So far, we've called Kubernetes WordPress or the toilet. [laughs] KENDALL: The plumbing. [laughter] VICTORIA: You are really good at selling it. [laughter] KENDALL: Hey, if you want to build a nice, clean city, you need good plumbing. You might not care what the pipes are made of, but you need good plumbing. [laughs] VICTORIA: Works for me. On that note -- [laughs] KENDALL: Yeah. Right? Right? VICTORIA: That's [inaudible 36:41] on a high note. Is there anything else that you'd like to promote? KENDALL: With regards to CTO Lunches, we have a free listserv. There are local lunches. If there isn't a local lunch where you are, it's very lightweight to start up a chapter. We often have folks who are willing to sponsor that first lunch to get you going. We do have a paid tier of CTO Lunches. If you want a small back room Slack channel of people to discuss, I think it's $99 a month. Yeah, if you're a CTO and/or a senior engineering leader and you want a community of people to process with, be it our free tier or our paid tier, we've got something for you. We're trying to invest in this to build community around it. And it's something we enjoy doing more than almost anything. Come take part. VICTORIA: You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, email us at hosts@giantrobots.fm. And you can find me on Twitter @victori_ousg. This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening. See you next time. ANNOUNCER: This podcast is brought to you by thoughtbot, your expert strategy, design, development, and product management partner. We bring digital products from idea to success and teach you how because we care. Learn more at thoughtbot.com. Special Guest: Kendall Miller.