POPULARITY
In this episode, Bill Kennedy interviews Ryan Ryke, founder of CloudLife Consulting, focusing on AWS and cloud computing. They discuss the challenges of understanding AWS billing, the importance of managing cloud storage, and the benefits of using services like Cloud Run and Fargate. The conversation also touches on the evolution of engineering perspectives on complexity, the shift towards simpler infrastructure solutions, and personal experiences with technology. 00:00 Introduction00:30 What is Ryan Doing Today?9:00 Cloud Run Experience13:00 Handling Complexity21:00 Running Local LLMs25:30 First Memory of a Computer33:20 Entering University36:30 Relevant Education42:00 Early Industry53:00 Trading Stocks1:05:00 Discovering AWS 1:10:00 Starting a Business1:16:00 Maintaining Steady Clients1:22:00 Contact Info Connect with Ryan: Linkedin: https://www.linkedin.com/in/ryanrykeX: https://x.com/itsacloudlife99Email: ryan@cloudlife.ioMentioned in this Episode:CloudLife Consulting: https://www.cloudlife.io/Fargate: https://aws.amazon.com/fargate/Want more from Ardan Labs? You can learn Go, Kubernetes, Docker & more through our video training, live events, or through our blog!Online Courses : https://ardanlabs.com/education/ Live Events : https://www.ardanlabs.com/live-training-events/ Blog : https://www.ardanlabs.com/blog Github : https://github.com/ardanlabs
Picture this. You've got a web app built with Rust and Solid.js. It started life running on a dusty on-prem server, but now it's time to move it to the cloud. The clock is ticking. You could take the well-worn AWS path: set up a VPC, configure subnets, attach an ALB, define IAM roles, and deploy with Fargate. Or you could try something different. In this episode of AWS Bites, we share the real story of migrating a monolithic containerized app to AWS App Runner. It promises to take your code, build it, deploy it, and scale it with minimal effort. But does it really deliver? We compare App Runner with Fargate based on hands-on experience. You'll learn where App Runner shines, where it gets in the way, and how we handled everything from custom domains to background job processing. You'll also hear when we would still choose Fargate, and why. If you've ever hoped for a Heroku-like experience on AWS, or you want to simplify your container deployments without giving up too much control, this episode is for you.AWS Bites is brought to you in association with fourTheorem. At fourTheorem, we believe serverless should be simple, scalable, and cost-effective — and we help teams do just that. Whether you're diving into containers, stepping into event-driven architecture, or scaling a global SaaS platform on AWS, our team has your back. Visit https://fourTheorem.com to see how we can help you build faster, better, and with more confidence using AWS cloud!
W/C 3rd March 2025Todays notes come courtesy of Wikipedia;Coles Corner is the name given to the corner of Fargate and Church Street in Sheffield, in sight of the cathedral. It was the site of the old Cole Brothers department store until it moved to Barker's Pool in 1963.The corner is famous in Sheffield as a place to meet for a first date. A plaque put up by the Rotary Club now marks the spot and ensures its local history is not forgotten.I mention this of course because of the 20th anniversary of the album by Richard Hawley, and because as I am off to Sheffield today anyway (the Blades are taking on Preston at BDBL) I might try and go and seek out that plaque. You can find some more information and pictures hereStay safe.Coles Corner - Richard HawleyCraig JoinerSea of SoulsTherapy For Me (or TFM as I now refer to it) is a bit of an audio curiosity. It started out as a mechanism for me to clear my head, with the hope that by saying stuff out loud it would act as a little bit of self-help. It's remains loose in style, fluid in terms of content and raw - it's a one take, press record and see what happens, affair.If you want to keep in touch with TFM and the other stuff I do then please follow me on Facebook, Insta, Twitter or Patreon. Thanks for getting this far.
Today's episode is with Paul Klein, founder of Browserbase. We talked about building browser infrastructure for AI agents, the future of agent authentication, and their open source framework Stagehand.* [00:00:00] Introductions* [00:04:46] AI-specific challenges in browser infrastructure* [00:07:05] Multimodality in AI-Powered Browsing* [00:12:26] Running headless browsers at scale* [00:18:46] Geolocation when proxying* [00:21:25] CAPTCHAs and Agent Auth* [00:28:21] Building “User take over” functionality* [00:33:43] Stagehand: AI web browsing framework* [00:38:58] OpenAI's Operator and computer use agents* [00:44:44] Surprising use cases of Browserbase* [00:47:18] Future of browser automation and market competition* [00:53:11] Being a solo founderTranscriptAlessio [00:00:04]: Hey everyone, welcome to the Latent Space podcast. This is Alessio, partner and CTO at Decibel Partners, and I'm joined by my co-host Swyx, founder of Smol.ai.swyx [00:00:12]: Hey, and today we are very blessed to have our friends, Paul Klein, for the fourth, the fourth, CEO of Browserbase. Welcome.Paul [00:00:21]: Thanks guys. Yeah, I'm happy to be here. I've been lucky to know both of you for like a couple of years now, I think. So it's just like we're hanging out, you know, with three ginormous microphones in front of our face. It's totally normal hangout.swyx [00:00:34]: Yeah. We've actually mentioned you on the podcast, I think, more often than any other Solaris tenant. Just because like you're one of the, you know, best performing, I think, LLM tool companies that have started up in the last couple of years.Paul [00:00:50]: Yeah, I mean, it's been a whirlwind of a year, like Browserbase is actually pretty close to our first birthday. So we are one years old. And going from, you know, starting a company as a solo founder to... To, you know, having a team of 20 people, you know, a series A, but also being able to support hundreds of AI companies that are building AI applications that go out and automate the web. It's just been like, really cool. It's been happening a little too fast. I think like collectively as an AI industry, let's just take a week off together. I took my first vacation actually two weeks ago, and Operator came out on the first day, and then a week later, DeepSeat came out. And I'm like on vacation trying to chill. I'm like, we got to build with this stuff, right? So it's been a breakneck year. But I'm super happy to be here and like talk more about all the stuff we're seeing. And I'd love to hear kind of what you guys are excited about too, and share with it, you know?swyx [00:01:39]: Where to start? So people, you've done a bunch of podcasts. I think I strongly recommend Jack Bridger's Scaling DevTools, as well as Turner Novak's The Peel. And, you know, I'm sure there's others. So you covered your Twilio story in the past, talked about StreamClub, you got acquired to Mux, and then you left to start Browserbase. So maybe we just start with what is Browserbase? Yeah.Paul [00:02:02]: Browserbase is the web browser for your AI. We're building headless browser infrastructure, which are browsers that run in a server environment that's accessible to developers via APIs and SDKs. It's really hard to run a web browser in the cloud. You guys are probably running Chrome on your computers, and that's using a lot of resources, right? So if you want to run a web browser or thousands of web browsers, you can't just spin up a bunch of lambdas. You actually need to use a secure containerized environment. You have to scale it up and down. It's a stateful system. And that infrastructure is, like, super painful. And I know that firsthand, because at my last company, StreamClub, I was CTO, and I was building our own internal headless browser infrastructure. That's actually why we sold the company, is because Mux really wanted to buy our headless browser infrastructure that we'd built. And it's just a super hard problem. And I actually told my co-founders, I would never start another company unless it was a browser infrastructure company. And it turns out that's really necessary in the age of AI, when AI can actually go out and interact with websites, click on buttons, fill in forms. You need AI to do all of that work in an actual browser running somewhere on a server. And BrowserBase powers that.swyx [00:03:08]: While you're talking about it, it occurred to me, not that you're going to be acquired or anything, but it occurred to me that it would be really funny if you became the Nikita Beer of headless browser companies. You just have one trick, and you make browser companies that get acquired.Paul [00:03:23]: I truly do only have one trick. I'm screwed if it's not for headless browsers. I'm not a Go programmer. You know, I'm in AI grant. You know, browsers is an AI grant. But we were the only company in that AI grant batch that used zero dollars on AI spend. You know, we're purely an infrastructure company. So as much as people want to ask me about reinforcement learning, I might not be the best guy to talk about that. But if you want to ask about headless browser infrastructure at scale, I can talk your ear off. So that's really my area of expertise. And it's a pretty niche thing. Like, nobody has done what we're doing at scale before. So we're happy to be the experts.swyx [00:03:59]: You do have an AI thing, stagehand. We can talk about the sort of core of browser-based first, and then maybe stagehand. Yeah, stagehand is kind of the web browsing framework. Yeah.What is Browserbase? Headless Browser Infrastructure ExplainedAlessio [00:04:10]: Yeah. Yeah. And maybe how you got to browser-based and what problems you saw. So one of the first things I worked on as a software engineer was integration testing. Sauce Labs was kind of like the main thing at the time. And then we had Selenium, we had Playbrite, we had all these different browser things. But it's always been super hard to do. So obviously you've worked on this before. When you started browser-based, what were the challenges? What were the AI-specific challenges that you saw versus, there's kind of like all the usual running browser at scale in the cloud, which has been a problem for years. What are like the AI unique things that you saw that like traditional purchase just didn't cover? Yeah.AI-specific challenges in browser infrastructurePaul [00:04:46]: First and foremost, I think back to like the first thing I did as a developer, like as a kid when I was writing code, I wanted to write code that did stuff for me. You know, I wanted to write code to automate my life. And I do that probably by using curl or beautiful soup to fetch data from a web browser. And I think I still do that now that I'm in the cloud. And the other thing that I think is a huge challenge for me is that you can't just create a web site and parse that data. And we all know that now like, you know, taking HTML and plugging that into an LLM, you can extract insights, you can summarize. So it was very clear that now like dynamic web scraping became very possible with the rise of large language models or a lot easier. And that was like a clear reason why there's been more usage of headless browsers, which are necessary because a lot of modern websites don't expose all of their page content via a simple HTTP request. You know, they actually do require you to run this type of code for a specific time. JavaScript on the page to hydrate this. Airbnb is a great example. You go to airbnb.com. A lot of that content on the page isn't there until after they run the initial hydration. So you can't just scrape it with a curl. You need to have some JavaScript run. And a browser is that JavaScript engine that's going to actually run all those requests on the page. So web data retrieval was definitely one driver of starting BrowserBase and the rise of being able to summarize that within LLM. Also, I was familiar with if I wanted to automate a website, I could write one script and that would work for one website. It was very static and deterministic. But the web is non-deterministic. The web is always changing. And until we had LLMs, there was no way to write scripts that you could write once that would run on any website. That would change with the structure of the website. Click the login button. It could mean something different on many different websites. And LLMs allow us to generate code on the fly to actually control that. So I think that rise of writing the generic automation scripts that can work on many different websites, to me, made it clear that browsers are going to be a lot more useful because now you can automate a lot more things without writing. If you wanted to write a script to book a demo call on 100 websites, previously, you had to write 100 scripts. Now you write one script that uses LLMs to generate that script. That's why we built our web browsing framework, StageHand, which does a lot of that work for you. But those two things, web data collection and then enhanced automation of many different websites, it just felt like big drivers for more browser infrastructure that would be required to power these kinds of features.Alessio [00:07:05]: And was multimodality also a big thing?Paul [00:07:08]: Now you can use the LLMs to look, even though the text in the dome might not be as friendly. Maybe my hot take is I was always kind of like, I didn't think vision would be as big of a driver. For UI automation, I felt like, you know, HTML is structured text and large language models are good with structured text. But it's clear that these computer use models are often vision driven, and they've been really pushing things forward. So definitely being multimodal, like rendering the page is required to take a screenshot to give that to a computer use model to take actions on a website. And it's just another win for browser. But I'll be honest, that wasn't what I was thinking early on. I didn't even think that we'd get here so fast with multimodality. I think we're going to have to get back to multimodal and vision models.swyx [00:07:50]: This is one of those things where I forgot to mention in my intro that I'm an investor in Browserbase. And I remember that when you pitched to me, like a lot of the stuff that we have today, we like wasn't on the original conversation. But I did have my original thesis was something that we've talked about on the podcast before, which is take the GPT store, the custom GPT store, all the every single checkbox and plugin is effectively a startup. And this was the browser one. I think the main hesitation, I think I actually took a while to get back to you. The main hesitation was that there were others. Like you're not the first hit list browser startup. It's not even your first hit list browser startup. There's always a question of like, will you be the category winner in a place where there's a bunch of incumbents, to be honest, that are bigger than you? They're just not targeted at the AI space. They don't have the backing of Nat Friedman. And there's a bunch of like, you're here in Silicon Valley. They're not. I don't know.Paul [00:08:47]: I don't know if that's, that was it, but like, there was a, yeah, I mean, like, I think I tried all the other ones and I was like, really disappointed. Like my background is from working at great developer tools, companies, and nothing had like the Vercel like experience. Um, like our biggest competitor actually is partly owned by private equity and they just jacked up their prices quite a bit. And the dashboard hasn't changed in five years. And I actually used them at my last company and tried them and I was like, oh man, like there really just needs to be something that's like the experience of these great infrastructure companies, like Stripe, like clerk, like Vercel that I use in love, but oriented towards this kind of like more specific category, which is browser infrastructure, which is really technically complex. Like a lot of stuff can go wrong on the internet when you're running a browser. The internet is very vast. There's a lot of different configurations. Like there's still websites that only work with internet explorer out there. How do you handle that when you're running your own browser infrastructure? These are the problems that we have to think about and solve at BrowserBase. And it's, it's certainly a labor of love, but I built this for me, first and foremost, I know it's super cheesy and everyone says that for like their startups, but it really, truly was for me. If you look at like the talks I've done even before BrowserBase, and I'm just like really excited to try and build a category defining infrastructure company. And it's, it's rare to have a new category of infrastructure exists. We're here in the Chroma offices and like, you know, vector databases is a new category of infrastructure. Is it, is it, I mean, we can, we're in their office, so, you know, we can, we can debate that one later. That is one.Multimodality in AI-Powered Browsingswyx [00:10:16]: That's one of the industry debates.Paul [00:10:17]: I guess we go back to the LLMOS talk that Karpathy gave way long ago. And like the browser box was very clearly there and it seemed like the people who were building in this space also agreed that browsers are a core primitive of infrastructure for the LLMOS that's going to exist in the future. And nobody was building something there that I wanted to use. So I had to go build it myself.swyx [00:10:38]: Yeah. I mean, exactly that talk that, that honestly, that diagram, every box is a startup and there's the code box and then there's the. The browser box. I think at some point they will start clashing there. There's always the question of the, are you a point solution or are you the sort of all in one? And I think the point solutions tend to win quickly, but then the only ones have a very tight cohesive experience. Yeah. Let's talk about just the hard problems of browser base you have on your website, which is beautiful. Thank you. Was there an agency that you used for that? Yeah. Herb.paris.Paul [00:11:11]: They're amazing. Herb.paris. Yeah. It's H-E-R-V-E. I highly recommend for developers. Developer tools, founders to work with consumer agencies because they end up building beautiful things and the Parisians know how to build beautiful interfaces. So I got to give prep.swyx [00:11:24]: And chat apps, apparently are, they are very fast. Oh yeah. The Mistral chat. Yeah. Mistral. Yeah.Paul [00:11:31]: Late chat.swyx [00:11:31]: Late chat. And then your videos as well, it was professionally shot, right? The series A video. Yeah.Alessio [00:11:36]: Nico did the videos. He's amazing. Not the initial video that you shot at the new one. First one was Austin.Paul [00:11:41]: Another, another video pretty surprised. But yeah, I mean, like, I think when you think about how you talk about your company. You have to think about the way you present yourself. It's, you know, as a developer, you think you evaluate a company based on like the API reliability and the P 95, but a lot of developers say, is the website good? Is the message clear? Do I like trust this founder? I'm building my whole feature on. So I've tried to nail that as well as like the reliability of the infrastructure. You're right. It's very hard. And there's a lot of kind of foot guns that you run into when running headless browsers at scale. Right.Competing with Existing Headless Browser Solutionsswyx [00:12:10]: So let's pick one. You have eight features here. Seamless integration. Scalability. Fast or speed. Secure. Observable. Stealth. That's interesting. Extensible and developer first. What comes to your mind as like the top two, three hardest ones? Yeah.Running headless browsers at scalePaul [00:12:26]: I think just running headless browsers at scale is like the hardest one. And maybe can I nerd out for a second? Is that okay? I heard this is a technical audience, so I'll talk to the other nerds. Whoa. They were listening. Yeah. They're upset. They're ready. The AGI is angry. Okay. So. So how do you run a browser in the cloud? Let's start with that, right? So let's say you're using a popular browser automation framework like Puppeteer, Playwright, and Selenium. Maybe you've written a code, some code locally on your computer that opens up Google. It finds the search bar and then types in, you know, search for Latent Space and hits the search button. That script works great locally. You can see the little browser open up. You want to take that to production. You want to run the script in a cloud environment. So when your laptop is closed, your browser is doing something. The browser is doing something. Well, I, we use Amazon. You can see the little browser open up. You know, the first thing I'd reach for is probably like some sort of serverless infrastructure. I would probably try and deploy on a Lambda. But Chrome itself is too big to run on a Lambda. It's over 250 megabytes. So you can't easily start it on a Lambda. So you maybe have to use something like Lambda layers to squeeze it in there. Maybe use a different Chromium build that's lighter. And you get it on the Lambda. Great. It works. But it runs super slowly. It's because Lambdas are very like resource limited. They only run like with one vCPU. You can run one process at a time. Remember, Chromium is super beefy. It's barely running on my MacBook Air. I'm still downloading it from a pre-run. Yeah, from the test earlier, right? I'm joking. But it's big, you know? So like Lambda, it just won't work really well. Maybe it'll work, but you need something faster. Your users want something faster. Okay. Well, let's put it on a beefier instance. Let's get an EC2 server running. Let's throw Chromium on there. Great. Okay. I can, that works well with one user. But what if I want to run like 10 Chromium instances, one for each of my users? Okay. Well, I might need two EC2 instances. Maybe 10. All of a sudden, you have multiple EC2 instances. This sounds like a problem for Kubernetes and Docker, right? Now, all of a sudden, you're using ECS or EKS, the Kubernetes or container solutions by Amazon. You're spending up and down containers, and you're spending a whole engineer's time on kind of maintaining this stateful distributed system. Those are some of the worst systems to run because when it's a stateful distributed system, it means that you are bound by the connections to that thing. You have to keep the browser open while someone is working with it, right? That's just a painful architecture to run. And there's all this other little gotchas with Chromium, like Chromium, which is the open source version of Chrome, by the way. You have to install all these fonts. You want emojis working in your browsers because your vision model is looking for the emoji. You need to make sure you have the emoji fonts. You need to make sure you have all the right extensions configured, like, oh, do you want ad blocking? How do you configure that? How do you actually record all these browser sessions? Like it's a headless browser. You can't look at it. So you need to have some sort of observability. Maybe you're recording videos and storing those somewhere. It all kind of adds up to be this just giant monster piece of your project when all you wanted to do was run a lot of browsers in production for this little script to go to google.com and search. And when I see a complex distributed system, I see an opportunity to build a great infrastructure company. And we really abstract that away with Browserbase where our customers can use these existing frameworks, Playwright, Publisher, Selenium, or our own stagehand and connect to our browsers in a serverless-like way. And control them, and then just disconnect when they're done. And they don't have to think about the complex distributed system behind all of that. They just get a browser running anywhere, anytime. Really easy to connect to.swyx [00:15:55]: I'm sure you have questions. My standard question with anything, so essentially you're a serverless browser company, and there's been other serverless things that I'm familiar with in the past, serverless GPUs, serverless website hosting. That's where I come from with Netlify. One question is just like, you promised to spin up thousands of servers. You promised to spin up thousands of browsers in milliseconds. I feel like there's no real solution that does that yet. And I'm just kind of curious how. The only solution I know, which is to kind of keep a kind of warm pool of servers around, which is expensive, but maybe not so expensive because it's just CPUs. So I'm just like, you know. Yeah.Browsers as a Core Primitive in AI InfrastructurePaul [00:16:36]: You nailed it, right? I mean, how do you offer a serverless-like experience with something that is clearly not serverless, right? And the answer is, you need to be able to run... We run many browsers on single nodes. We use Kubernetes at browser base. So we have many pods that are being scheduled. We have to predictably schedule them up or down. Yes, thousands of browsers in milliseconds is the best case scenario. If you hit us with 10,000 requests, you may hit a slower cold start, right? So we've done a lot of work on predictive scaling and being able to kind of route stuff to different regions where we have multiple regions of browser base where we have different pools available. You can also pick the region you want to go to based on like lower latency, round trip, time latency. It's very important with these types of things. There's a lot of requests going over the wire. So for us, like having a VM like Firecracker powering everything under the hood allows us to be super nimble and spin things up or down really quickly with strong multi-tenancy. But in the end, this is like the complex infrastructural challenges that we have to kind of deal with at browser base. And we have a lot more stuff on our roadmap to allow customers to have more levers to pull to exchange, do you want really fast browser startup times or do you want really low costs? And if you're willing to be more flexible on that, we may be able to kind of like work better for your use cases.swyx [00:17:44]: Since you used Firecracker, shouldn't Fargate do that for you or did you have to go lower level than that? We had to go lower level than that.Paul [00:17:51]: I find this a lot with Fargate customers, which is alarming for Fargate. We used to be a giant Fargate customer. Actually, the first version of browser base was ECS and Fargate. And unfortunately, it's a great product. I think we were actually the largest Fargate customer in our region for a little while. No, what? Yeah, seriously. And unfortunately, it's a great product, but I think if you're an infrastructure company, you actually have to have a deeper level of control over these primitives. I think it's the same thing is true with databases. We've used other database providers and I think-swyx [00:18:21]: Yeah, serverless Postgres.Paul [00:18:23]: Shocker. When you're an infrastructure company, you're on the hook if any provider has an outage. And I can't tell my customers like, hey, we went down because so-and-so went down. That's not acceptable. So for us, we've really moved to bringing things internally. It's kind of opposite of what we preach. We tell our customers, don't build this in-house, but then we're like, we build a lot of stuff in-house. But I think it just really depends on what is in the critical path. We try and have deep ownership of that.Alessio [00:18:46]: On the distributed location side, how does that work for the web where you might get sort of different content in different locations, but the customer is expecting, you know, if you're in the US, I'm expecting the US version. But if you're spinning up my browser in France, I might get the French version. Yeah.Paul [00:19:02]: Yeah. That's a good question. Well, generally, like on the localization, there is a thing called locale in the browser. You can set like what your locale is. If you're like in the ENUS browser or not, but some things do IP, IP based routing. And in that case, you may want to have a proxy. Like let's say you're running something in the, in Europe, but you want to make sure you're showing up from the US. You may want to use one of our proxy features so you can turn on proxies to say like, make sure these connections always come from the United States, which is necessary too, because when you're browsing the web, you're coming from like a, you know, data center IP, and that can make things a lot harder to browse web. So we do have kind of like this proxy super network. Yeah. We have a proxy for you based on where you're going, so you can reliably automate the web. But if you get scheduled in Europe, that doesn't happen as much. We try and schedule you as close to, you know, your origin that you're trying to go to. But generally you have control over the regions you can put your browsers in. So you can specify West one or East one or Europe. We only have one region of Europe right now, actually. Yeah.Alessio [00:19:55]: What's harder, the browser or the proxy? I feel like to me, it feels like actually proxying reliably at scale. It's much harder than spending up browsers at scale. I'm curious. It's all hard.Paul [00:20:06]: It's layers of hard, right? Yeah. I think it's different levels of hard. I think the thing with the proxy infrastructure is that we work with many different web proxy providers and some are better than others. Some have good days, some have bad days. And our customers who've built browser infrastructure on their own, they have to go and deal with sketchy actors. Like first they figure out their own browser infrastructure and then they got to go buy a proxy. And it's like you can pay in Bitcoin and it just kind of feels a little sus, right? It's like you're buying drugs when you're trying to get a proxy online. We have like deep relationships with these counterparties. We're able to audit them and say, is this proxy being sourced ethically? Like it's not running on someone's TV somewhere. Is it free range? Yeah. Free range organic proxies, right? Right. We do a level of diligence. We're SOC 2. So we have to understand what is going on here. But then we're able to make sure that like we route around proxy providers not working. There's proxy providers who will just, the proxy will stop working all of a sudden. And then if you don't have redundant proxying on your own browsers, that's hard down for you or you may get some serious impacts there. With us, like we intelligently know, hey, this proxy is not working. Let's go to this one. And you can kind of build a network of multiple providers to really guarantee the best uptime for our customers. Yeah. So you don't own any proxies? We don't own any proxies. You're right. The team has been saying who wants to like take home a little proxy server, but not yet. We're not there yet. You know?swyx [00:21:25]: It's a very mature market. I don't think you should build that yourself. Like you should just be a super customer of them. Yeah. Scraping, I think, is the main use case for that. I guess. Well, that leads us into CAPTCHAs and also off, but let's talk about CAPTCHAs. You had a little spiel that you wanted to talk about CAPTCHA stuff.Challenges of Scaling Browser InfrastructurePaul [00:21:43]: Oh, yeah. I was just, I think a lot of people ask, if you're thinking about proxies, you're thinking about CAPTCHAs too. I think it's the same thing. You can go buy CAPTCHA solvers online, but it's the same buying experience. It's some sketchy website, you have to integrate it. It's not fun to buy these things and you can't really trust that the docs are bad. What Browserbase does is we integrate a bunch of different CAPTCHAs. We do some stuff in-house, but generally we just integrate with a bunch of known vendors and continually monitor and maintain these things and say, is this working or not? Can we route around it or not? These are CAPTCHA solvers. CAPTCHA solvers, yeah. Not CAPTCHA providers, CAPTCHA solvers. Yeah, sorry. CAPTCHA solvers. We really try and make sure all of that works for you. I think as a dev, if I'm buying infrastructure, I want it all to work all the time and it's important for us to provide that experience by making sure everything does work and monitoring it on our own. Yeah. Right now, the world of CAPTCHAs is tricky. I think AI agents in particular are very much ahead of the internet infrastructure. CAPTCHAs are designed to block all types of bots, but there are now good bots and bad bots. I think in the future, CAPTCHAs will be able to identify who a good bot is, hopefully via some sort of KYC. For us, we've been very lucky. We have very little to no known abuse of Browserbase because we really look into who we work with. And for certain types of CAPTCHA solving, we only allow them on certain types of plans because we want to make sure that we can know what people are doing, what their use cases are. And that's really allowed us to try and be an arbiter of good bots, which is our long term goal. I want to build great relationships with people like Cloudflare so we can agree, hey, here are these acceptable bots. We'll identify them for you and make sure we flag when they come to your website. This is a good bot, you know?Alessio [00:23:23]: I see. And Cloudflare said they want to do more of this. So they're going to set by default, if they think you're an AI bot, they're going to reject. I'm curious if you think this is something that is going to be at the browser level or I mean, the DNS level with Cloudflare seems more where it should belong. But I'm curious how you think about it.Paul [00:23:40]: I think the web's going to change. You know, I think that the Internet as we have it right now is going to change. And we all need to just accept that the cat is out of the bag. And instead of kind of like wishing the Internet was like it was in the 2000s, we can have free content line that wouldn't be scraped. It's just it's not going to happen. And instead, we should think about like, one, how can we change? How can we change the models of, you know, information being published online so people can adequately commercialize it? But two, how do we rebuild applications that expect that AI agents are going to log in on their behalf? Those are the things that are going to allow us to kind of like identify good and bad bots. And I think the team at Clerk has been doing a really good job with this on the authentication side. I actually think that auth is the biggest thing that will prevent agents from accessing stuff, not captchas. And I think there will be agent auth in the future. I don't know if it's going to happen from an individual company, but actually authentication providers that have a, you know, hidden login as agent feature, which will then you put in your email, you'll get a push notification, say like, hey, your browser-based agent wants to log into your Airbnb. You can approve that and then the agent can proceed. That really circumvents the need for captchas or logging in as you and sharing your password. I think agent auth is going to be one way we identify good bots going forward. And I think a lot of this captcha solving stuff is really short-term problems as the internet kind of reorients itself around how it's going to work with agents browsing the web, just like people do. Yeah.Managing Distributed Browser Locations and Proxiesswyx [00:24:59]: Stitch recently was on Hacker News for talking about agent experience, AX, which is a thing that Netlify is also trying to clone and coin and talk about. And we've talked about this on our previous episodes before in a sense that I actually think that's like maybe the only part of the tech stack that needs to be kind of reinvented for agents. Everything else can stay the same, CLIs, APIs, whatever. But auth, yeah, we need agent auth. And it's mostly like short-lived, like it should not, it should be a distinct, identity from the human, but paired. I almost think like in the same way that every social network should have your main profile and then your alt accounts or your Finsta, it's almost like, you know, every, every human token should be paired with the agent token and the agent token can go and do stuff on behalf of the human token, but not be presumed to be the human. Yeah.Paul [00:25:48]: It's like, it's, it's actually very similar to OAuth is what I'm thinking. And, you know, Thread from Stitch is an investor, Colin from Clerk, Octaventures, all investors in browser-based because like, I hope they solve this because they'll make browser-based submission more possible. So we don't have to overcome all these hurdles, but I think it will be an OAuth-like flow where an agent will ask to log in as you, you'll approve the scopes. Like it can book an apartment on Airbnb, but it can't like message anybody. And then, you know, the agent will have some sort of like role-based access control within an application. Yeah. I'm excited for that.swyx [00:26:16]: The tricky part is just, there's one, one layer of delegation here, which is like, you're authoring my user's user or something like that. I don't know if that's tricky or not. Does that make sense? Yeah.Paul [00:26:25]: You know, actually at Twilio, I worked on the login identity and access. Management teams, right? So like I built Twilio's login page.swyx [00:26:31]: You were an intern on that team and then you became the lead in two years? Yeah.Paul [00:26:34]: Yeah. I started as an intern in 2016 and then I was the tech lead of that team. How? That's not normal. I didn't have a life. He's not normal. Look at this guy. I didn't have a girlfriend. I just loved my job. I don't know. I applied to 500 internships for my first job and I got rejected from every single one of them except for Twilio and then eventually Amazon. And they took a shot on me and like, I was getting paid money to write code, which was my dream. Yeah. Yeah. I'm very lucky that like this coding thing worked out because I was going to be doing it regardless. And yeah, I was able to kind of spend a lot of time on a team that was growing at a company that was growing. So it informed a lot of this stuff here. I think these are problems that have been solved with like the SAML protocol with SSO. I think it's a really interesting stuff with like WebAuthn, like these different types of authentication, like schemes that you can use to authenticate people. The tooling is all there. It just needs to be tweaked a little bit to work for agents. And I think the fact that there are companies that are already. Providing authentication as a service really sets it up. Well, the thing that's hard is like reinventing the internet for agents. We don't want to rebuild the internet. That's an impossible task. And I think people often say like, well, we'll have this second layer of APIs built for agents. I'm like, we will for the top use cases, but instead of we can just tweak the internet as is, which is on the authentication side, I think we're going to be the dumb ones going forward. Unfortunately, I think AI is going to be able to do a lot of the tasks that we do online, which means that it will be able to go to websites, click buttons on our behalf and log in on our behalf too. So with this kind of like web agent future happening, I think with some small structural changes, like you said, it feels like it could all slot in really nicely with the existing internet.Handling CAPTCHAs and Agent Authenticationswyx [00:28:08]: There's one more thing, which is the, your live view iframe, which lets you take, take control. Yeah. Obviously very key for operator now, but like, was, is there anything interesting technically there or that the people like, well, people always want this.Paul [00:28:21]: It was really hard to build, you know, like, so, okay. Headless browsers, you don't see them, right. They're running. They're running in a cloud somewhere. You can't like look at them. And I just want to really make, it's a weird name. I wish we came up with a better name for this thing, but you can't see them. Right. But customers don't trust AI agents, right. At least the first pass. So what we do with our live view is that, you know, when you use browser base, you can actually embed a live view of the browser running in the cloud for your customer to see it working. And that's what the first reason is the build trust, like, okay, so I have this script. That's going to go automate a website. I can embed it into my web application via an iframe and my customer can watch. I think. And then we added two way communication. So now not only can you watch the browser kind of being operated by AI, if you want to pause and actually click around type within this iframe that's controlling a browser, that's also possible. And this is all thanks to some of the lower level protocol, which is called the Chrome DevTools protocol. It has a API called start screencast, and you can also send mouse clicks and button clicks to a remote browser. And this is all embeddable within iframes. You have a browser within a browser, yo. And then you simulate the screen, the click on the other side. Exactly. And this is really nice often for, like, let's say, a capture that can't be solved. You saw this with Operator, you know, Operator actually uses a different approach. They use VNC. So, you know, you're able to see, like, you're seeing the whole window here. What we're doing is something a little lower level with the Chrome DevTools protocol. It's just PNGs being streamed over the wire. But the same thing is true, right? Like, hey, I'm running a window. Pause. Can you do something in this window? Human. Okay, great. Resume. Like sometimes 2FA tokens. Like if you get that text message, you might need a person to type that in. Web agents need human-in-the-loop type workflows still. You still need a person to interact with the browser. And building a UI to proxy that is kind of hard. You may as well just show them the whole browser and say, hey, can you finish this up for me? And then let the AI proceed on afterwards. Is there a future where I stream my current desktop to browser base? I don't think so. I think we're very much cloud infrastructure. Yeah. You know, but I think a lot of the stuff we're doing, we do want to, like, build tools. Like, you know, we'll talk about the stage and, you know, web agent framework in a second. But, like, there's a case where a lot of people are going desktop first for, you know, consumer use. And I think cloud is doing a lot of this, where I expect to see, you know, MCPs really oriented around the cloud desktop app for a reason, right? Like, I think a lot of these tools are going to run on your computer because it makes... I think it's breaking out. People are putting it on a server. Oh, really? Okay. Well, sweet. We'll see. We'll see that. I was surprised, though, wasn't I? I think that the browser company, too, with Dia Browser, it runs on your machine. You know, it's going to be...swyx [00:30:50]: What is it?Paul [00:30:51]: So, Dia Browser, as far as I understand... I used to use Arc. Yeah. I haven't used Arc. But I'm a big fan of the browser company. I think they're doing a lot of cool stuff in consumer. As far as I understand, it's a browser where you have a sidebar where you can, like, chat with it and it can control the local browser on your machine. So, if you imagine, like, what a consumer web agent is, which it lives alongside your browser, I think Google Chrome has Project Marina, I think. I almost call it Project Marinara for some reason. I don't know why. It's...swyx [00:31:17]: No, I think it's someone really likes the Waterworld. Oh, I see. The classic Kevin Costner. Yeah.Paul [00:31:22]: Okay. Project Marinara is a similar thing to the Dia Browser, in my mind, as far as I understand it. You have a browser that has an AI interface that will take over your mouse and keyboard and control the browser for you. Great for consumer use cases. But if you're building applications that rely on a browser and it's more part of a greater, like, AI app experience, you probably need something that's more like infrastructure, not a consumer app.swyx [00:31:44]: Just because I have explored a little bit in this area, do people want branching? So, I have the state. Of whatever my browser's in. And then I want, like, 100 clones of this state. Do people do that? Or...Paul [00:31:56]: People don't do it currently. Yeah. But it's definitely something we're thinking about. I think the idea of forking a browser is really cool. Technically, kind of hard. We're starting to see this in code execution, where people are, like, forking some, like, code execution, like, processes or forking some tool calls or branching tool calls. Haven't seen it at the browser level yet. But it makes sense. Like, if an AI agent is, like, using a website and it's not sure what path it wants to take to crawl this website. To find the information it's looking for. It would make sense for it to explore both paths in parallel. And that'd be a very, like... A road not taken. Yeah. And hopefully find the right answer. And then say, okay, this was actually the right one. And memorize that. And go there in the future. On the roadmap. For sure. Don't make my roadmap, please. You know?Alessio [00:32:37]: How do you actually do that? Yeah. How do you fork? I feel like the browser is so stateful for so many things.swyx [00:32:42]: Serialize the state. Restore the state. I don't know.Paul [00:32:44]: So, it's one of the reasons why we haven't done it yet. It's hard. You know? Like, to truly fork, it's actually quite difficult. The naive way is to open the same page in a new tab and then, like, hope that it's at the same thing. But if you have a form halfway filled, you may have to, like, take the whole, you know, container. Pause it. All the memory. Duplicate it. Restart it from there. It could be very slow. So, we haven't found a thing. Like, the easy thing to fork is just, like, copy the page object. You know? But I think there needs to be something a little bit more robust there. Yeah.swyx [00:33:12]: So, MorphLabs has this infinite branch thing. Like, wrote a custom fork of Linux or something that let them save the system state and clone it. MorphLabs, hit me up. I'll be a customer. Yeah. That's the only. I think that's the only way to do it. Yeah. Like, unless Chrome has some special API for you. Yeah.Paul [00:33:29]: There's probably something we'll reverse engineer one day. I don't know. Yeah.Alessio [00:33:32]: Let's talk about StageHand, the AI web browsing framework. You have three core components, Observe, Extract, and Act. Pretty clean landing page. What was the idea behind making a framework? Yeah.Stagehand: AI web browsing frameworkPaul [00:33:43]: So, there's three frameworks that are very popular or already exist, right? Puppeteer, Playwright, Selenium. Those are for building hard-coded scripts to control websites. And as soon as I started to play with LLMs plus browsing, I caught myself, you know, code-genning Playwright code to control a website. I would, like, take the DOM. I'd pass it to an LLM. I'd say, can you generate the Playwright code to click the appropriate button here? And it would do that. And I was like, this really should be part of the frameworks themselves. And I became really obsessed with SDKs that take natural language as part of, like, the API input. And that's what StageHand is. StageHand exposes three APIs, and it's a super set of Playwright. So, if you go to a page, you may want to take an action, click on the button, fill in the form, etc. That's what the act command is for. You may want to extract some data. This one takes a natural language, like, extract the winner of the Super Bowl from this page. You can give it a Zod schema, so it returns a structured output. And then maybe you're building an API. You can do an agent loop, and you want to kind of see what actions are possible on this page before taking one. You can do observe. So, you can observe the actions on the page, and it will generate a list of actions. You can guide it, like, give me actions on this page related to buying an item. And you can, like, buy it now, add to cart, view shipping options, and pass that to an LLM, an agent loop, to say, what's the appropriate action given this high-level goal? So, StageHand isn't a web agent. It's a framework for building web agents. And we think that agent loops are actually pretty close to the application layer because every application probably has different goals or different ways it wants to take steps. I don't think I've seen a generic. Maybe you guys are the experts here. I haven't seen, like, a really good AI agent framework here. Everyone kind of has their own special sauce, right? I see a lot of developers building their own agent loops, and they're using tools. And I view StageHand as the browser tool. So, we expose act, extract, observe. Your agent can call these tools. And from that, you don't have to worry about it. You don't have to worry about generating playwright code performantly. You don't have to worry about running it. You can kind of just integrate these three tool calls into your agent loop and reliably automate the web.swyx [00:35:48]: A special shout-out to Anirudh, who I met at your dinner, who I think listens to the pod. Yeah. Hey, Anirudh.Paul [00:35:54]: Anirudh's a man. He's a StageHand guy.swyx [00:35:56]: I mean, the interesting thing about each of these APIs is they're kind of each startup. Like, specifically extract, you know, Firecrawler is extract. There's, like, Expand AI. There's a whole bunch of, like, extract companies. They just focus on extract. I'm curious. Like, I feel like you guys are going to collide at some point. Like, right now, it's friendly. Everyone's in a blue ocean. At some point, it's going to be valuable enough that there's some turf battle here. I don't think you have a dog in a fight. I think you can mock extract to use an external service if they're better at it than you. But it's just an observation that, like, in the same way that I see each option, each checkbox in the side of custom GBTs becoming a startup or each box in the Karpathy chart being a startup. Like, this is also becoming a thing. Yeah.Paul [00:36:41]: I mean, like, so the way StageHand works is that it's MIT-licensed, completely open source. You bring your own API key to your LLM of choice. You could choose your LLM. We don't make any money off of the extract or really. We only really make money if you choose to run it with our browser. You don't have to. You can actually use your own browser, a local browser. You know, StageHand is completely open source for that reason. And, yeah, like, I think if you're building really complex web scraping workflows, I don't know if StageHand is the tool for you. I think it's really more if you're building an AI agent that needs a few general tools or if it's doing a lot of, like, web automation-intensive work. But if you're building a scraping company, StageHand is not your thing. You probably want something that's going to, like, get HTML content, you know, convert that to Markdown, query it. That's not what StageHand does. StageHand is more about reliability. I think we focus a lot on reliability and less so on cost optimization and speed at this point.swyx [00:37:33]: I actually feel like StageHand, so the way that StageHand works, it's like, you know, page.act, click on the quick start. Yeah. It's kind of the integration test for the code that you would have to write anyway, like the Puppeteer code that you have to write anyway. And when the page structure changes, because it always does, then this is still the test. This is still the test that I would have to write. Yeah. So it's kind of like a testing framework that doesn't need implementation detail.Paul [00:37:56]: Well, yeah. I mean, Puppeteer, Playwright, and Slenderman were all designed as testing frameworks, right? Yeah. And now people are, like, hacking them together to automate the web. I would say, and, like, maybe this is, like, me being too specific. But, like, when I write tests, if the page structure changes. Without me knowing, I want that test to fail. So I don't know if, like, AI, like, regenerating that. Like, people are using StageHand for testing. But it's more for, like, usability testing, not, like, testing of, like, does the front end, like, has it changed or not. Okay. But generally where we've seen people, like, really, like, take off is, like, if they're using, you know, something. If they want to build a feature in their application that's kind of like Operator or Deep Research, they're using StageHand to kind of power that tool calling in their own agent loop. Okay. Cool.swyx [00:38:37]: So let's go into Operator, the first big agent launch of the year from OpenAI. Seems like they have a whole bunch scheduled. You were on break and your phone blew up. What's your just general view of computer use agents is what they're calling it. The overall category before we go into Open Operator, just the overall promise of Operator. I will observe that I tried it once. It was okay. And I never tried it again.OpenAI's Operator and computer use agentsPaul [00:38:58]: That tracks with my experience, too. Like, I'm a huge fan of the OpenAI team. Like, I think that I do not view Operator as the company. I'm not a company killer for browser base at all. I think it actually shows people what's possible. I think, like, computer use models make a lot of sense. And I'm actually most excited about computer use models is, like, their ability to, like, really take screenshots and reasoning and output steps. I think that using mouse click or mouse coordinates, I've seen that proved to be less reliable than I would like. And I just wonder if that's the right form factor. What we've done with our framework is anchor it to the DOM itself, anchor it to the actual item. So, like, if it's clicking on something, it's clicking on that thing, you know? Like, it's more accurate. No matter where it is. Yeah, exactly. Because it really ties in nicely. And it can handle, like, the whole viewport in one go, whereas, like, Operator can only handle what it sees. Can you hover? Is hovering a thing that you can do? I don't know if we expose it as a tool directly, but I'm sure there's, like, an API for hovering. Like, move mouse to this position. Yeah, yeah, yeah. I think you can trigger hover, like, via, like, the JavaScript on the DOM itself. But, no, I think, like, when we saw computer use, everyone's eyes lit up because they realized, like, wow, like, AI is going to actually automate work for people. And I think seeing that kind of happen from both of the labs, and I'm sure we're going to see more labs launch computer use models, I'm excited to see all the stuff that people build with it. I think that I'd love to see computer use power, like, controlling a browser on browser base. And I think, like, Open Operator, which was, like, our open source version of OpenAI's Operator, was our first take on, like, how can we integrate these models into browser base? And we handle the infrastructure and let the labs do the models. I don't have a sense that Operator will be released as an API. I don't know. Maybe it will. I'm curious to see how well that works because I think it's going to be really hard for a company like OpenAI to do things like support CAPTCHA solving or, like, have proxies. Like, I think it's hard for them structurally. Imagine this New York Times headline, OpenAI CAPTCHA solving. Like, that would be a pretty bad headline, this New York Times headline. Browser base solves CAPTCHAs. No one cares. No one cares. And, like, our investors are bored. Like, we're all okay with this, you know? We're building this company knowing that the CAPTCHA solving is short-lived until we figure out how to authenticate good bots. I think it's really hard for a company like OpenAI, who has this brand that's so, so good, to balance with, like, the icky parts of web automation, which it can be kind of complex to solve. I'm sure OpenAI knows who to call whenever they need you. Yeah, right. I'm sure they'll have a great partnership.Alessio [00:41:23]: And is Open Operator just, like, a marketing thing for you? Like, how do you think about resource allocation? So, you can spin this up very quickly. And now there's all this, like, open deep research, just open all these things that people are building. We started it, you know. You're the original Open. We're the original Open operator, you know? Is it just, hey, look, this is a demo, but, like, we'll help you build out an actual product for yourself? Like, are you interested in going more of a product route? That's kind of the OpenAI way, right? They started as a model provider and then…Paul [00:41:53]: Yeah, we're not interested in going the product route yet. I view Open Operator as a model provider. It's a reference project, you know? Let's show people how to build these things using the infrastructure and models that are out there. And that's what it is. It's, like, Open Operator is very simple. It's an agent loop. It says, like, take a high-level goal, break it down into steps, use tool calling to accomplish those steps. It takes screenshots and feeds those screenshots into an LLM with the step to generate the right action. It uses stagehand under the hood to actually execute this action. It doesn't use a computer use model. And it, like, has a nice interface using the live view that we talked about, the iframe, to embed that into an application. So I felt like people on launch day wanted to figure out how to build their own version of this. And we turned that around really quickly to show them. And I hope we do that with other things like deep research. We don't have a deep research launch yet. I think David from AOMNI actually has an amazing open deep research that he launched. It has, like, 10K GitHub stars now. So he's crushing that. But I think if people want to build these features natively into their application, they need good reference projects. And I think Open Operator is a good example of that.swyx [00:42:52]: I don't know. Actually, I'm actually pretty bullish on API-driven operator. Because that's the only way that you can sort of, like, once it's reliable enough, obviously. And now we're nowhere near. But, like, give it five years. It'll happen, you know. And then you can sort of spin this up and browsers are working in the background and you don't necessarily have to know. And it just is booking restaurants for you, whatever. I can definitely see that future happening. I had this on the landing page here. This might be a slightly out of order. But, you know, you have, like, sort of three use cases for browser base. Open Operator. Or this is the operator sort of use case. It's kind of like the workflow automation use case. And it completes with UiPath in the sort of RPA category. Would you agree with that? Yeah, I would agree with that. And then there's Agents we talked about already. And web scraping, which I imagine would be the bulk of your workload right now, right?Paul [00:43:40]: No, not at all. I'd say actually, like, the majority is browser automation. We're kind of expensive for web scraping. Like, I think that if you're building a web scraping product, if you need to do occasional web scraping or you have to do web scraping that works every single time, you want to use browser automation. Yeah. You want to use browser-based. But if you're building web scraping workflows, what you should do is have a waterfall. You should have the first request is a curl to the website. See if you can get it without even using a browser. And then the second request may be, like, a scraping-specific API. There's, like, a thousand scraping APIs out there that you can use to try and get data. Scraping B. Scraping B is a great example, right? Yeah. And then, like, if those two don't work, bring out the heavy hitter. Like, browser-based will 100% work, right? It will load the page in a real browser, hydrate it. I see.swyx [00:44:21]: Because a lot of people don't render to JS.swyx [00:44:25]: Yeah, exactly.Paul [00:44:26]: So, I mean, the three big use cases, right? Like, you know, automation, web data collection, and then, you know, if you're building anything agentic that needs, like, a browser tool, you want to use browser-based.Alessio [00:44:35]: Is there any use case that, like, you were super surprised by that people might not even think about? Oh, yeah. Or is it, yeah, anything that you can share? The long tail is crazy. Yeah.Surprising use cases of BrowserbasePaul [00:44:44]: One of the case studies on our website that I think is the most interesting is this company called Benny. So, the way that it works is if you're on food stamps in the United States, you can actually get rebates if you buy certain things. Yeah. You buy some vegetables. You submit your receipt to the government. They'll give you a little rebate back. Say, hey, thanks for buying vegetables. It's good for you. That process of submitting that receipt is very painful. And the way Benny works is you use their app to take a photo of your receipt, and then Benny will go submit that receipt for you and then deposit the money into your account. That's actually using no AI at all. It's all, like, hard-coded scripts. They maintain the scripts. They've been doing a great job. And they build this amazing consumer app. But it's an example of, like, all these, like, tedious workflows that people have to do to kind of go about their business. And they're doing it for the sake of their day-to-day lives. And I had never known about, like, food stamp rebates or the complex forms you have to do to fill them. But the world is powered by millions and millions of tedious forms, visas. You know, Emirate Lighthouse is a customer, right? You know, they do the O1 visa. Millions and millions of forms are taking away humans' time. And I hope that Browserbase can help power software that automates away the web forms that we don't need anymore. Yeah.swyx [00:45:49]: I mean, I'm very supportive of that. I mean, forms. I do think, like, government itself is a big part of it. I think the government itself should embrace AI more to do more sort of human-friendly form filling. Mm-hmm. But I'm not optimistic. I'm not holding my breath. Yeah. We'll see. Okay. I think I'm about to zoom out. I have a little brief thing on computer use, and then we can talk about founder stuff, which is, I tend to think of developer tooling markets in impossible triangles, where everyone starts in a niche, and then they start to branch out. So I already hinted at a little bit of this, right? We mentioned more. We mentioned E2B. We mentioned Firecrawl. And then there's Browserbase. So there's, like, all this stuff of, like, have serverless virtual computer that you give to an agent and let them do stuff with it. And there's various ways of connecting it to the internet. You can just connect to a search API, like SERP API, whatever other, like, EXA is another one. That's what you're searching. You can also have a JSON markdown extractor, which is Firecrawl. Or you can have a virtual browser like Browserbase, or you can have a virtual machine like Morph. And then there's also maybe, like, a virtual sort of code environment, like Code Interpreter. So, like, there's just, like, a bunch of different ways to tackle the problem of give a computer to an agent. And I'm just kind of wondering if you see, like, everyone's just, like, happily coexisting in their respective niches. And as a developer, I just go and pick, like, a shopping basket of one of each. Or do you think that you eventually, people will collide?Future of browser automation and market competitionPaul [00:47:18]: I think that currently it's not a zero-sum market. Like, I think we're talking about... I think we're talking about all of knowledge work that people do that can be automated online. All of these, like, trillions of hours that happen online where people are working. And I think that there's so much software to be built that, like, I tend not to think about how these companies will collide. I just try to solve the problem as best as I can and make this specific piece of infrastructure, which I think is an important primitive, the best I possibly can. And yeah. I think there's players that are actually going to like it. I think there's players that are going to launch, like, over-the-top, you know, platforms, like agent platforms that have all these tools built in, right? Like, who's building the rippling for agent tools that has the search tool, the browser tool, the operating system tool, right? There are some. There are some. There are some, right? And I think in the end, what I have seen as my time as a developer, and I look at all the favorite tools that I have, is that, like, for tools and primitives with sufficient levels of complexity, you need to have a solution that's really bespoke to that primitive, you know? And I am sufficiently convinced that the browser is complex enough to deserve a primitive. Obviously, I have to. I'm the founder of BrowserBase, right? I'm talking my book. But, like, I think maybe I can give you one spicy take against, like, maybe just whole OS running. I think that when I look at computer use when it first came out, I saw that the majority of use cases for computer use were controlling a browser. And do we really need to run an entire operating system just to control a browser? I don't think so. I don't think that's necessary. You know, BrowserBase can run browsers for way cheaper than you can if you're running a full-fledged OS with a GUI, you know, operating system. And I think that's just an advantage of the browser. It is, like, browsers are little OSs, and you can run them very efficiently if you orchestrate it well. And I think that allows us to offer 90% of the, you know, functionality in the platform needed at 10% of the cost of running a full OS. Yeah.Open Operator: Browserbase's Open-Source Alternativeswyx [00:49:16]: I definitely see the logic in that. There's a Mark Andreessen quote. I don't know if you know this one. Where he basically observed that the browser is turning the operating system into a poorly debugged set of device drivers, because most of the apps are moved from the OS to the browser. So you can just run browsers.Paul [00:49:31]: There's a place for OSs, too. Like, I think that there are some applications that only run on Windows operating systems. And Eric from pig.dev in this upcoming YC batch, or last YC batch, like, he's building all run tons of Windows operating systems for you to control with your agent. And like, there's some legacy EHR systems that only run on Internet-controlled systems. Yeah.Paul [00:49:54]: I think that's it. I think, like, there are use cases for specific operating systems for specific legacy software. And like, I'm excited to see what he does with that. I just wanted to give a shout out to the pig.dev website.swyx [00:50:06]: The pigs jump when you click on them. Yeah. That's great.Paul [00:50:08]: Eric, he's the former co-founder of banana.dev, too.swyx [00:50:11]: Oh, that Eric. Yeah. That Eric. Okay. Well, he abandoned bananas for pigs. I hope he doesn't start going around with pigs now.Alessio [00:50:18]: Like he was going around with bananas. A little toy pig. Yeah. Yeah. I love that. What else are we missing? I think we covered a lot of, like, the browser-based product history, but. What do you wish people asked you? Yeah.Paul [00:50:29]: I wish people asked me more about, like, what will the future of software look like? Because I think that's really where I've spent a lot of time about why do browser-based. Like, for me, starting a company is like a means of last resort. Like, you shouldn't start a company unless you absolutely have to. And I remain convinced that the future of software is software that you're going to click a button and it's going to do stuff on your behalf. Right now, software. You click a button and it maybe, like, calls it back an API and, like, computes some numbers. It, like, modifies some text, whatever. But the future of software is software using software. So, I may log into my accounting website for my business, click a button, and it's going to go load up my Gmail, search my emails, find the thing, upload the receipt, and then comment it for me. Right? And it may use it using APIs, maybe a browser. I don't know. I think it's a little bit of both. But that's completely different from how we've built software so far. And that's. I think that future of software has different infrastructure requirements. It's going to require different UIs. It's going to require different pieces of infrastructure. I think the browser infrastructure is one piece that fits into that, along with all the other categories you mentioned. So, I think that it's going to require developers to think differently about how they've built software for, you know
What if you could scale your SaaS platforms effortlessly across diverse hosting services? Join us as we welcome Adam McCrea, the brilliant mind behind JudoScale, who takes us through his fascinating evolution from being a Rails developer to creating a cutting-edge autoscaling solution. Adam opens up about the technical challenges he faced while adapting JudoScale for platforms like Render, Fly, and Railway, and how Heroku's unique architecture initially shaped his approach. His journey is one of innovation driven by necessity, as JudoScale originated from a need to optimize costs more efficiently than existing solutions.Our conversation doesn't shy away from complexity; in fact, it embraces it. Adam shares his experiences of grappling with AWS integration, navigating the intricate maze of ECS, EC2, Fargate, and IAM, all driven by customer demand. We explore the strategic shift from metered billing to flat-tiered pricing and the hurdles faced while setting up a staging environment on Render, ultimately reaffirming Heroku's smoother experience. This episode promises valuable insights into the strategic decisions and architectural reimaginations that keep JudoScale ahead of the game.Adding a creative flair, we delve into the entertaining world of infomercial production, as Adam recounts his experience crafting a humorous Billy Mays-inspired ad for JudoScale. With the aid of AI tools like ChatGPT and Descript, Adam turned a fun concept into an engaging reality. As we wrap up, Adam shares his excitement for RailsConf in Philadelphia and the significance of fostering connections through digital networking. Whether you're a tech enthusiast or a developer seeking innovative scaling solutions, this episode is brimming with insightful takeaways and creative inspiration.Send us some love.HoneybadgerHoneybadger is an application health monitoring tool built by developers for developers.HoneybadgerHoneybadger is an application health monitoring tool built by developers for developers.Disclaimer: This post contains affiliate links. If you make a purchase, I may receive a commission at no extra cost to you.Support the showReady to start your own podcast?This show is hosted on Buzzsprout and it's awesome, not to mention a Ruby on Rails application. Let Buzzsprout know we sent you and you'll get a $20 Amazon gift card if you sign up for a paid plan, and it helps support our show.
In this episode, we discuss AWS Lambda provisioned concurrency. We start with a recap of Lambda cold starts and the different concurrency control options. We then explain how provisioned concurrency works to initialize execution environments in advance to avoid cold starts. We cover how to enable it, pricing details, common issues like over/under-provisioning, and alternatives like self-warming functions or using other services like ECS and Fargate.
What are you doing differently today that you're stopping tomorrow's legacy? In this episode Ashish spoke to Adrian Asher, CISO and Cloud Architect at Checkout.com, to explore the journey from monolithic architecture to cloud-native solutions in a regulated fintech environment. Adrian shared his perspective on why there "aren't enough lambdas" and how embracing cloud-native technologies like AWS Lambda and Fargate can enhance security, scalability, and efficiency. Guest Socials: Adrian's Linkedin Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security Podcast- Youtube - Cloud Security Newsletter - Cloud Security BootCamp Questions asked: (00:00) Introduction (01:59) A bit about Adrian (02:47) Cloud Naive vs Cloud Native (03:54) Checkout's Cloud Native Journey (05:44) What is AWS Fargate? (06:52) There are not enough Lambdas (09:52) The evolution of the Security Function (12:15) Culture change for being more cloud native (15:23) Getting security teams ready for Gen AI (18:16) Where to start with Cloud Native? (19:14) Where you can connect with Adrian? (19:39) The Fun Section
This episode discusses solutions for securely accessing private VPC resources for debugging and troubleshooting. We cover traditional approaches like bastion hosts and VPNs and newer solutions using containers and AWS services like Fargate, ECS, and SSM. We explain how to set up a Fargate task with a container image with the necessary tools, enable ECS integration with SSM, and use SSM to start remote shells and port forwarding tunnels into the container. This provides on-demand access without exposing resources on the public internet. We share a Python script to simplify the process. We suggest ideas for improvements like auto-scaling the container down when idle. Overall, this lightweight containerized approach can provide easy access for debugging compared to managing EC2 instances.
Andreas and Michael Wittig were pretty jazzed about writing unit tests using mocks for the AWS SDK v3 in JavaScript. They broke down Amazon's new GuardDuty malware protection for S3 and how it compares to their own product bucketAV. The duo also covered testing Terraform modules and using aws-nuke to clean up leftover resources from failed tests. They gave their two cents on some recent AWS service announcements too - CloudWatch, Fargate, CloudFormation and more!
Intrigué par les microVMs et le serverless ? Découvrez Firecracker, une solution ultra-rapide et simple pour vos applications serverless et bien plus encore ! Au cœur des fonctions Lambda et des conteneurs Fargate, Firecracker est une technologie incontournable pour les développeurs et les passionnés d'infrastructure. Mais ce n'est pas tout ! Que vous soyez bricoleur du dimanche ou féru d'informatique, Firecracker vous ouvre les portes d'un monde de possibilités. Rejoignez Quentin dans son labo et apprenez à installer une microVM de A à Z, à lui monter un disque dur, à la connecter au réseau et bien plus encore ! Abonnez-vous au podcast pour ne rien manquer et rejoignez la communauté des passionnés de technologie et de cloud ! #Firecracker #microVM #serverless #DIY #RaspberryPI #kubernetes
Intrigué par les microVMs et le serverless ? Découvrez Firecracker, une solution ultra-rapide et simple pour vos applications serverless et bien plus encore ! Au cœur des fonctions Lambda et des conteneurs Fargate, Firecracker est une technologie incontournable pour les développeurs et les passionnés d'infrastructure. Mais ce n'est pas tout ! Que vous soyez bricoleur du dimanche ou féru d'informatique, Firecracker vous ouvre les portes d'un monde de possibilités. Rejoignez Quentin dans son labo et apprenez à installer une microVM de A à Z, à lui monter un disque dur, à la connecter au réseau et bien plus encore ! Abonnez-vous au podcast pour ne rien manquer et rejoignez la communauté des passionnés de technologie et de cloud ! #Firecracker #microVM #serverless #DIY #RaspberryPI #kubernetes
В этом выпуске с приглашенными гостями из компании FivexL обсудим ECS (Elastic Container Service) AWS и попытаемся разобраться, чем он лучше (или нет) других способов запуска контейнеров в AWS. ССЫЛКИ
In this episode, we discuss best practices for working with AWS Lambda. We cover how Lambda functions work under the hood, including cold starts and warm starts. We then explore different invocation types - synchronous, asynchronous, and event-based. For each, we share tips on performance, cost optimization, and monitoring. Other topics include function structure, logging, instrumentation, and security. Throughout the episode, we aim to provide a solid mental model for serverless development and share our experiences to help you build efficient and robust Lambda applications.
Welcome to part four in the AWS Certification Exam Prep Mini-Series! Whether you're an aspiring cloud enthusiast or a seasoned developer looking to deepen your architectural acumen, you've landed in the perfect spot. In this six-part saga, we're demystifying the pivotal role of a Solutions Architect in the AWS cloud computing cosmos. In this fourth episode, Caroline and Dave chat again with Anya Derbakova, a Senior Startup Solutions Architect at AWS, known for weaving social media magic, and Ted Trentler, a Senior AWS Technical Instructor with a knack for simplifying the complex. Together, we will step into the realm of performance, where we untangle the complexities of designing high-performing architectures in the cloud. We dissect the essentials of high-performing storage solutions, dive deep into elastic compute services for scaling and cost efficiency, and unravel the intricacies of optimizing database solutions for unparalleled performance. Expect to uncover: • The spectrum of AWS storage services and their optimal use cases, from Amazon S3's versatility to the shared capabilities of Amazon EFS. • How to leverage Amazon EC2, Auto Scaling, and Load Balancing to create elastic compute solutions that adapt to your needs. • Insights into serverless computing paradigms with AWS Lambda and Fargate, highlighting the shift towards de-coupled architectures. • Strategies for selecting high-performing database solutions, including the transition from on-premise databases to AWS-managed services like RDS and the benefits of caching with Amazon ElastiCache. • A real-world scenario where we'll navigate the challenge of processing hundreds of thousands of online votes in minutes, testing your understanding and application of high-performing AWS architectures. Whether you're dealing with vast amounts of data, requiring robust compute power, or ensuring your architecture can handle peak loads without a hitch, we've got you covered! Anya on LinkedIn: https://www.linkedin.com/in/annadderbakova/ Ted on Twitter: https://twitter.com/ttrentler Ted on LinkedIn: https://linkedin/in/tedtrentler Caroline on Twitter: https://twitter.com/carolinegluck Caroline on LinkedIn: https://www.linkedin.com/in/cgluck/ Dave on Twitter: https://twitter.com/thedavedev Dave on LinkedIn: https://www.linkedin.com/in/davidisbitski AWS SAA Exam Guide - https://d1.awsstatic.com/training-and-certification/docs-sa-assoc/AWS-Certified-Solutions-Architect-Associate_Exam-Guide.pdf Party Rock for Exam Study - https://partyrock.aws/u/tedtrent/KQtYIhbJb/Solutions-Architect-Study-Buddy All Things AWS Training - Links to Self-paced and Instructor Led https://aws.amazon.com/training/ AWS Skill Builder – Free CPE Course - https://explore.skillbuilder.aws/learn/course/134/aws-cloud-practitioner-essentials AWS Skill Builder – Learning Badges - https://explore.skillbuilder.aws/learn/public/learning_plan/view/1044/solutions-architect-knowledge-badge-readiness-path AWS Usergroup Communities: https://aws.amazon.com/developer/community/usergroups Subscribe: Spotify: https://open.spotify.com/show/7rQjgnBvuyr18K03tnEHBI Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-developers-podcast/id1574162669 Stitcher: https://www.stitcher.com/show/1065378 Pandora: https://www.pandora.com/podcast/aws-developers-podcast/PC:1001065378 TuneIn: https://tunein.com/podcasts/Technology-Podcasts/AWS-Developers-Podcast-p1461814/ Amazon Music: https://music.amazon.com/podcasts/f8bf7630-2521-4b40-be90-c46a9222c159/aws-developers-podcast Google Podcasts: https://podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5zb3VuZGNsb3VkLmNvbS91c2Vycy9zb3VuZGNsb3VkOnVzZXJzOjk5NDM2MzU0OS9zb3VuZHMucnNz RSS Feed: https://feeds.soundcloud.com/users/soundcloud:users:994363549/sounds.rss
On this episode of The Six Five, hosts Patrick Moorhead and Daniel Newman are joined by AWS's Holly Mesrobian, VP Serverless Compute for a conversation on the integration and future of AWS Serverless Compute, including Lambda, ECS, Fargate, and Event-Driven Architectures. Our discussion covers: Navigating Kubernetes complexities with AWS Serverless Celebrating 10 years of AWS Lambda and its impact on customers How AWS Serverless changes application architecture towards event-driven models Integration of applications within the AWS portfolio akin to iPaaS Meeting the demands of Generative AI workloads with serverless solutions Learn more at AWS.
En este nuevo episodio sobre Contenedores en AWS conversamos con David Ugarte sobre los lanzamientos más destacados del año 2023 para los servicios de contenedores: Amazon EKS, Amazon ECS, Amazon ECR, Fargate y mucho más ¡No se lo pierdan! Material Adicional: https://aws.amazon.com/about-aws/whats-new/2023/11/amazon-guardduty-ecs-runtime-monitoring-fargate/
Phil Estes, a principal engineer at AWS and a key contributor to the containerd project, discusses the evolution and roadmap of containerd, its relationship to Docker, and its journey to being a fully open source project under the Cloud Native Computing Foundation (CNCF). 00:00 Introduction and Guest Background 00:41 The Evolution of Container Runtimes 02:48 The Impact of Docker and Kubernetes 06:07 The Birth and Growth of containerd 09:13 The Role of the CNCF in Open Source Projects 11:09 Challenges and Opportunities in Open Source Contribution 18:49 The Importance of Mentorship in Open Source 23:47 Conclusion and Final Thoughts Guest: Phil Estes is a Principal Engineer for Amazon Web Services (AWS), focused on core container technologies that power AWS container offerings like Fargate, EKS, and ECS. Phil is an active contributor and maintainer for the CNCF containerd runtime project, and participates in the Open Container Initiative (OCI) as the member of the Technical Oversight Board (TOB). He is also a current member of the 2023 CNCF Ambassador class and enjoys speaking on container technology topics and events worldwide.
AWS Morning Brief for the week of January 16, 2024 with Corey Quinn. Links:AWS Accounts discontinues the use of security challenge questionsAWS CodeBuild now supports a X-Large Linux compute typeIntroducing Open Job Description, an open specification for portable render jobs Reducing Inference Times by 87% for Darwinbox's Talent Search Engine Using AWS InferentiaAmazon ECS supports a native integration with Amazon EBS volumes for data-intensive workloadsThis is sad: AWS to Shut down Aurora Serverless v1.
Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result.About EvelynEvelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less.Links Referenced:LinkedIn: https://www.linkedin.com/in/evelyn-osman/TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me.Evelyn: Thank you very much, Corey. It's actually really fun to be on here.Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS's customer base has absolutely exploded over the past 15 years that it's been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we've lost some of that magic from the beginning. Where do you land on that whole topic?Evelyn: That's actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I'm like, you know, “I'm this director, I've done this XYZ,” and I always say, like, “I'm Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I've been working with AWS, you know, 11, 12 years,” or now I can't quite remember.Corey: Time becomes a flat circle. The pandemic didn't help.Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I'm like, “Jesus. It's been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I'm… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there's a lot of where it's just, you know, wrapping up a pretty bow, and calling it something different, it feels like.Corey: That's what I've been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What's yours?Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda.Corey: That was announced before I even got into this space, as an example of how far back things were. And you're right. That was transformative. That was awesome.Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things.And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn't figure out how to sell Lambda on the marketplace.Corey: It's kind of wild when we see just how far it's come, but also how much they've announced that doesn't change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that's a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn't a feature launch so much as it was a concession to the ways that companies had built things and weren't likely to change.Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn't actually that great.Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It's like, “Look, just sandbag the thing so it does what you'd want.” And all that stuff got fixed, but at the time it looked like it was clown shoes.Evelyn: Yeah, and that reminds me of, like, EBS's, like, gp2 when we're, like you know, we're talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we're still using it now today, and like, we're trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That's where I started. And I always like to tell a story that in my team, we're talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh].Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it's going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong.Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it's like, it's my bread and butter. I'm specialized in it, you know? I write code in my sleep and such.[Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we're looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time.Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It's not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don't have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that's anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I've had with almost all of their recent launches versus what I felt they could have been.Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way.Corey: I'm curious, as well, to get your take on what I've been seeing that I think could only be described as an internal shift, where it's almost as if there's been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I'd argue it shouldn't be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it's not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money.Evelyn: Yep, pretty simple. Oftentimes, there's, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that's, like, you know, if you're talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it.Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it's going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where's the spend going, or what's it going to cost me as time goes on? Because it's not just what are the resources I'm spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You're going to have to basically run an experiment and find out.Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we're trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it.I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn't something where we see that either there's, like, arbitrarily low pricing there, or the costs itself doesn't really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it's a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn't actually that large.Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you're actually spending on the thing. There were blog posts early on that, “Oh, here's how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It's, well, I'm guessing that that's not really for my $50 a month account.Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you're actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it's simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn't really that multidimensional, when you think about it [laugh].Corey: Increasingly, Cost Explorer has gotten better. It's not a new service, but every iteration seems to improve it to a point now where I'm talking to folks, and they're having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that's already improving and works for free. That doesn't necessarily make sense. And I feel like that's what you get trapped into when you start going down the VC path in the cost optimization space. You've got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I'm unconvinced that anything like that is really tenable.Evelyn: Yeah. When you're a small organization you're trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we're going to charge you a really small percentage of your invoice,” like, okay, great. That's, like, you know, like, a few $100 a month to make sure I'm fully optimized, and I'm saving, you know, far more than that. But as soon as your invoice turns into, you know, it's like $100,000, or $300,000 or more, that percentage becomes rather significant. And I've had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we're going to do this machine learning, you know, AI optimization for you. You know, you don't have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it.Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I've never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it's a service for just running a whole bunch of games at scale, and I'm not generally doing that. My favorite computer game remains to be Twitter at this point, but that's okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out.Evelyn: Yeah, so I'll preface this by saying, like, I'm not an expert on GameLift. I haven't even spun it up myself because there's quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I'm, you know, like, World of Warcraft, all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I'm running, and you're running, that we know that we're going to reach the same point the same time, or if there's one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients.And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that's really what it is. Now, there's, you know, quite a bit more happening within GameLift, but that's what I was going to explain is, like, it's just state management. And there are far more use cases for it than just for video games.Corey: That's maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they've solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that's useful.Evelyn: No, absolutely. And honestly, I wouldn't be surprised if you peeled back the curtain with GameLift, you'll find a lot of—like, several other you know, AWS services that it's just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it's like we just see other services packaged together and releases a new product.Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that's a thing that exists and is handy, but otherwise you'd never know it was there because, “Well, I'm not building anything that's IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in.And now they take existing services that are, mmm, kind of milquetoast, if I'm being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We're going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn't exist already, and you're mostly there. But what do I know? I'm not a product manager trying to get promoted.Evelyn: Yeah, that's true. Well, I was going to mention that maybe it's the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I've stared far too long at that compliance section on AWS's site to know this, but you know, a lot of them actually are HIPAA-compliant, they're PCI-compliant, and ISO-compliant, and you know, and everything. So, I'm actually pretty intrigued to know why they [wouldn't 00:16:04] take that advantage.Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what's the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there's no difference from a regulatory point of view.Evelyn: That's fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There's, like, Digital Native Business, there's Enterprise, there's Startup. So, I am curious how things look over the engineering side. I'm going to talk to somebody about this now [laugh].Corey: Yeah, it's the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst's chart somewhere.” I don't know. I mean, I hate to sound that negative about it, but it's… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they're like, “Yeah, we took a look at it. It was adorable, but we're not using it that way. We're going to use either the baseline version or we're going to work with someone who actively gets our industry.” And I've heard that repeated about three or four different releases that they've put out across the board of what they've been doing. It feels like it is a misunderstanding between what the world needs and what they're able to or willing to build for us.Evelyn: Not sure. I wouldn't be surprised, if we go far enough, it could probably be that it's just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it's an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical.Corey: And, on some level, too, they're moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn't beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I've run, came out way better as a product than what Code Whisperer is. And while I'd like to say that this is great, but it's too little too late. And when I talk to engineers, they're very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS.Evelyn: No, that's true. And so, I think what's happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it's like, “Ah, it's a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak.Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That's always pleasant.Evelyn: Yeah, precisely. And what I kind of notice is… now they're actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they're going for the larger behemoths who have massive data centers, and they don't want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we've all kind of scratched your head. I'm like, really [laugh]?Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don't work under emulation in some weird way, but it's also such a weird niche that I'm sure it's there for someone. My default assumption, just given the breadth of AWS's customer base, is that whenever I see something that they just announced, well, okay, it's clearly not for me; that doesn't mean it's not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don't see anything interesting coming out with it, it really makes you start to wonder.Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud.Corey: Yeah. There's a bunch of different ways they have that, okay, here's ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn't seem like it's quite fair.Evelyn: That's exactly it. So, I feel like now it's sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they're cloud-native, and they grow so big, that they actually start running their own data centers.Corey: I want to call out as well before we wind up being accused of being oblivious, that we're recording this before re:Invent. So, it's entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we're embarrassed to have had this conversation. And yeah, they're totally getting it now, and they have completely surprised us with stuff that's going to be transformative for almost every customer. I've been expecting and hoping for that for the last three or four re:Invents now, and I haven't gotten it.Evelyn: Yeah, that's right. And I think there's even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it's Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering.And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don't want to maintain our own infrastructure. And the problem we ran into is that it doesn't have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn't have it. And that's, kind of, what, I kind of found is, like, there's—it feels—sometimes it's—there's a—it's getting rushed out the door, and it'll actually have a new managed service or new service launched out, but they're also sort of cutting some corners just to actually make sure it's packaged up and ready to go.Corey: When I'm looking at this, and seeing how this stuff gets packaged, and how it's built out, I start to understand a pattern that I've been relatively down on across the board. I'm curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them?Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it's like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they're still using Ruby.And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you're going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there.And so, this is where we say, “Okay, let's agree. We're going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that's already, like, the formation of it. And oftentimes, you start with DevOps. You'll say, like, “I'm a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that's where we start looking at, you know, having a platform… approach, but I'm going to say it's Platform-as-a-Product. That's the key.Corey: Yeah, that's a good way of framing it because originally, the entire world needed that. That's what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn't really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren't distracting, dangerous, et cetera? I don't know the answer on that one.Evelyn: Yeah. No, that's true. So, full disclosure. At AutoScout, we do platform engineering. So, I'm part of, like, the platform engineering group, and we built a platform for our product teams. It's kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we're going to use Fargate. All right, how do we do it so that developers don't actually—don't need to think that they're running Fargate workloads?And that's, like, you know, where it's really important to have those standardized abstractions that developers actually enjoy using. And I'd even say that, before you start saying, “Ah, we're going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It's basically build tools that makes it easy for developers to write code. That's the first step for anything. It's just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it.Corey: That sure would be nice. There's a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they're clearly built by people who understand the problem space super well, but they're designing these things to be used by people who just want to make the website work. They don't have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to.Evelyn: No, that's true. And what I see is, a lot of the times, it's a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it's basically like, if they're just trying to run the website, they're just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what's been done, and they're kind of stuck with it because there is no guardrails or paved path or however you want to call it.Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year?Evelyn: I am going to be there this year.Corey: My condolences. I keep hoping people get to escape.Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody's going crazy over pins. And I still have a bag of them [laugh].Corey: Yeah, that did seem like a hot-second collectable moment, didn't it?Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody's heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh].Corey: Don't you hate it at some point where it's like, you feel like I'm going to finally get this crowning achievement, it's like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble.Evelyn: Yeah.Corey: Ugh, scavenger hunts are hard, as I'm about to learn to my own detriment.Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we'll see how re:Invent turns out.Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong?Evelyn: In most cases, I've seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we're going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they're going to structure things, how we do the planning, here's some great charts and diagrams,” but they don't really look at the culture aspect.And that's always where I've seen things fall apart. I've been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we're going to adopt it.” And then an engineering manager proceeded to say, “Okay, you're responsible for this team, you're responsible for that team, you're responsible for this team talking to, like, a team of, like, five engineers,” which doesn't really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we're going to adopt DevOps, we're going to have a DevOps team, or have a DevOps engineer.”Corey: Step one: we're going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that's not necessarily sufficient in and of itself.Evelyn: Not really. The Spotify model. People say, like, “Oh, we're going to do the Spotify model. We're going to do skills, tribes, you know, and everything. It's going to be awesome, it's going to be great, you know, and nice, cross-functional.”The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what's happening, just so I'm aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we're going to have a team to do everything, so your developers don't need to worry about it.” What ends up happening is you're still an ops team, you still have your silos.And that's always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that's why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it's all about the user journey.It's not about building a common set of tools. It's the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they're actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you're going to use it.” And they're probably going to say, “No.” And the next thing, you know, they're just doing their own thing on the side.Corey: Yeah, the rise of Shadow IT has never gone away. It's just, on some level, it's the natural expression, I think it's an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don't have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues.Evelyn: Yeah. And so, what I'm going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he's a [solutions architect 00:31:56]—I always f- up his name. I'm so sorry, Gregor. He has a great book, and even a talk, called The Magic of Platforms, that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk.If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that's—for me, I really kind of strongly agree with his approach because that's really how, like, you know, as he says, like, boost innovation is, you know, where you're actually building a platform that really works.Corey: Yeah, it's a hard problem, but it's also one of those things where you're trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It's hard to turn down things that might very well get you there sooner, faster, but it's like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn't work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn't work.Evelyn: Exactly. So, I've worked with, like, organizations, like, 4000-plus people, I've worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that's when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you're actually building, before you actually build it.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Evelyn: So, I used to be on Twitter, but I've actually got off there after it kind of turned a bit toxic and crazy.Corey: Feels like that was years ago, but that's beside the point.Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I'll probably be screaming into the cloud like you are.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it.Evelyn: Thank you, Corey.Corey: Evelyn Osman, engineering manager at AutoScout24. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
Amir Szekely, Owner at CloudSnorkel, joins Corey on Screaming in the Cloud to discuss how he got his start in the early days of cloud and his solo project, CloudSnorkel. Throughout this conversation, Corey and Amir discuss the importance of being pragmatic when moving to the cloud, and the different approaches they see in developers from the early days of cloud to now. Amir shares what motivates him to develop open-source projects, and why he finds fulfillment in fixing bugs and operating CloudSnorkel as a one-man show. About AmirAmir Szekely is a cloud consultant specializing in deployment automation, AWS CDK, CloudFormation, and CI/CD. His background includes security, virtualization, and Windows development. Amir enjoys creating open-source projects like cdk-github-runners, cdk-turbo-layers, and NSIS.Links Referenced: CloudSnorkel: https://cloudsnorkel.com/ lasttootinaws.com: https://lasttootinaws.com camelcamelcamel.com: https://camelcamelcamel.com github.com/cloudsnorkel: https://github.com/cloudsnorkel Personal website: https://kichik.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn, and this is an episode that I have been angling for for longer than you might imagine. My guest today is Amir Szekely, who's the owner at CloudSnorkel. Amir, thank you for joining me.Amir: Thanks for having me, Corey. I love being here.Corey: So, I've been using one of your open-source projects for an embarrassingly long amount of time, and for the longest time, I make the critical mistake of referring to the project itself as CloudSnorkel because that's the word that shows up in the GitHub project that I can actually see that jumps out at me. The actual name of the project within your org is cdk-github-runners if I'm not mistaken.Amir: That's real original, right?Corey: Exactly. It's like, “Oh, good, I'll just mention that, and suddenly everyone will know what I'm talking about.” But ignoring the problems of naming things well, which is a pain that everyone at AWS or who uses it knows far too well, the product is basically magic. Before I wind up basically embarrassing myself by doing a poor job of explaining what it is, how do you think about it?Amir: Well, I mean, it's a pretty simple project, which I think what makes it great as well. It creates GitHub runners with CDK. That's about it. It's in the name, and it just does that. And I really tried to make it as simple as possible and kind of learn from other projects that I've seen that are similar, and basically learn from my pain points in them.I think the reason I started is because I actually deployed CDK runners—sorry, GitHub runners—for one company, and I ended up using the Kubernetes one, right? So, GitHub in themselves, they have two projects they recommend—and not to nudge GitHub, please recommend my project one day as well—they have the Kubernetes controller and they have the Terraform deployer. And the specific client that I worked for, they wanted to use Kubernetes. And I tried to deploy it, and, Corey, I swear, I worked three days; three days to deploy the thing, which was crazy to me. And every single step of the way, I had to go and read some documentation, figure out what I did wrong, and apparently the order the documentation was was incorrect.And I had to—I even opened tickets, and they—you know, they were rightfully like, “It's open-source project. Please contribute and fix the documentation for us.” At that point, I said, “Nah.” [laugh]. Let me create something better with CDK and I decided just to have the simplest setup possible.So usually, right, what you end up doing in these projects, you have to set up either secrets or SSM parameters, and you have to prepare the ground and you have to get your GitHub token and all those things. And that's just annoying. So, I decided to create a—Corey: So much busy work.Amir: Yes, yeah, so much busy work and so much boilerplate and so much figuring out the right way and the right order, and just annoying. So, I decided to create a setup page. I thought, “What if you can actually install it just like you install any app on GitHub,” which is the way it's supposed to be right? So, when you install cdk-github-runners—CloudSnorkel—you get an HTML page and you just click a few buttons and you tell it where to install it and it just installs it for you. And it sets the secrets and everything. And if you want to change the secret, you don't have to redeploy. You can just change the secret, right? You have to roll the token over or whatever. So, it's much, much easier to install.Corey: And I feel like I discovered this project through one of the more surreal approaches—and I had cause to revisit it a few weeks ago when I was redoing my talk for the CDK Community Day, which has since happened and people liked the talk—and I mentioned what CloudSnorkel had been doing and how I was using the runners accordingly. So, that was what I accidentally caused me to pop back up with, “Hey, I've got some issues here.” But we'll get to that. Because once upon a time, I built a Twitter client for creating threads because shitposting is my love language, I would sit and create Twitter threads in the middle of live keynote talks. Threading in the native client was always terrible, and I wanted to build something that would help me do that. So, I did.And it was up for a while. It's not anymore because I'm not paying $42,000 a month in API costs to some jackass, but it still exists in the form of lasttootinaws.com if you want to create threads on Mastodon. But after I put this out, some people complained that it was slow.To which my response was, “What do you mean? It's super fast for me in San Francisco talking to it hosted in Oregon.” But on every round trip from halfway around the world, it became a problem. So, I got it into my head that since this thing was fully stateless, other than a Lambda function being fronted via an API Gateway, that I should deploy it to every region. It didn't quite fit into a Cloudflare Worker or into one of the Edge Lambda functions that AWS has given up on, but okay, how do I deploy something to every region?And the answer is, with great difficulty because it's clear that no one was ever imagining with all those regions that anyone would use all of them. It's imagined that most customers use two or three, but customers are different, so which two or three is going to be widely varied. So, anything halfway sensible about doing deployments like this didn't work out. Again, because this thing was also a Lambda function and an API Gateway, it was dirt cheap, so I didn't really want to start spending stupid amounts of money doing deployment infrastructure and the rest.So okay, how do I do this? Well, GitHub Actions is awesome. It is basically what all of AWS's code offerings wish that they were. CodeBuild is sad and this was kind of great. The problem is, once you're out of the free tier, and if you're a bad developer where you do a deploy on every iteration, suddenly it starts costing for what I was doing in every region, something like a quarter of per deploy, which adds up when you're really, really bad at programming.Amir: [laugh].Corey: So, their matrix jobs are awesome, but I wanted to do some self-hosted runners. How do I do that? And I want to keep it cheap, so how do I do a self-hosted runner inside of a Lambda function? Which led me directly to you. And it was nothing short of astonishing. This was a few years ago. I seem to recall that it used to be a bit less well-architected in terms of its elegance. Did it always use step functions, for example, to wind up orchestrating these things?Amir: Yeah, so I do remember that day. We met pretty much… basically as a joke because the Lambda Runner was a joke that I did, and I posted on Twitter, and I was half-proud of my joke that starts in ten seconds, right? But yeah, no, the—I think it always used functions. I've been kind of in love with the functions for the past two years. They just—they're nice.Corey: Oh, they're magic, and AWS is so bad at telling their story. Both of those things are true.Amir: Yeah. And the API is not amazing. But like, when you get it working—and you know, you have to spend some time to get it working—it's really nice because then you have nothing to manage, ever. And they can call APIs directly now, so you don't have to even create Lambdas. It's pretty cool.Corey: And what I loved is you wind up deploying this thing to whatever account you want it to live within. What is it, the OIDC? I always get those letters in the wrong direction. OIDC, I think, is correct.Amir: I think it's OIDC, yeah.Corey: Yeah, and it winds up doing this through a secure method as opposed to just okay, now anyone with access to the project can deploy into your account, which is not ideal. And it just works. It spins up a whole bunch of these Lambda functions that are using a Docker image as the deployment environment. And yeah, all right, if effectively my CDK deploy—which is what it's doing inside of this thing—doesn't complete within 15 minutes, then it's not going to and the thing is going to break out. We've solved the halting problem. After 15 minutes, the loop will terminate. The end.But that's never been a problem, even with getting ACM certificates spun up. It completes well within that time limit. And its cost to me is effectively nothing. With one key exception: that you made the choice to use Secrets Manager to wind up storing a lot of the things it cares about instead of Parameter Store, so I think you wind up costing me—I think there's two of those different secrets, so that's 80 cents a month. Which I will be demanding in blood one of these days if I ever catch you at re:Invent.Amir: I'll buy you beer [laugh].Corey: There we go. That'll count. That'll buy, like, several months of that. That works—at re:Invent, no. The beers there are, like, $18, so that'll cover me for years. We're set.Amir: We'll split it [laugh].Corey: Exactly. Problem solved. But I like the elegance of it, I like how clever it is, and I want to be very clear, though, it's not just for shitposting. Because it's very configurable where, yes, you can use Lambda functions, you can use Spot Instances, you can use CodeBuild containers, you can use Fargate containers, you can use EC2 instances, and it just automatically orchestrates and adds these self-hosted runners to your account, and every build gets a pristine environment as a result. That is no small thing.Amir: Oh, and I love making things configurable. People really appreciate it I feel, you know, and gives people kind of a sense of power. But as long as you make that configuration simple enough, right, or at least the defaults good defaults, right, then, even with that power, people still don't shoot themselves in the foot and it still works really well. By the way, we just added ECS recently, which people really were asking for because it gives you the, kind of, easy option to have the runner—well, not the runner but at least the runner infrastructure staying up, right? So, you can have auto-scaling group backing ECS and then the runner can start up a lot faster. It was actually very important to other people because Lambda, as fast that it is, it's limited, and Fargate, for whatever reason, still to this day, takes a minute to start up.Corey: Yeah. What's wild to me about this is, start to finish, I hit a deploy to the main branch and it sparks the thing up, runs the deploy. Deploy itself takes a little over two minutes. And every time I do this, within three minutes of me pushing to commit, the deploy is done globally. It is lightning fast.And I know it's easy to lose yourself in the idea of this being a giant shitpost, where, oh, who's going to do deployment jobs in Lambda functions? Well, kind of a lot of us for a variety of reasons, some of which might be better than others. In my case, it was just because I was cheap, but the massive parallelization ability to do 20 simultaneous deploys in a matrix configuration that doesn't wind up smacking into rate limits everywhere, that was kind of great.Amir: Yeah, we have seen people use Lambda a lot. It's mostly for, yeah, like you said, small jobs. And the environment that they give you, it's kind of limited, so you can't actually install packages, right? There is no sudo, and you can't actually install anything unless it's in your temp directory. But still, like, just being able to run a lot of little jobs, it's really great. Yeah.Corey: And you can also make sure that there's a Docker image ready to go with the stuff that you need, just by configuring how the build works in the CDK. I will admit, I did have a couple of bug reports for you. One was kind of useful, where it was not at all clear how to do this on top of a Graviton-based Lambda function—because yeah, that was back when not everything really supported ARM architectures super well—and a couple of other times when the documentation was fairly ambiguous from my perspective, where it wasn't at all clear, what was I doing? I spent four hours trying to beat my way through it, I give up, filed an issue, went to get a cup of coffee, came back, and the answer was sitting there waiting for me because I'm not convinced you sleep.Amir: Well, I am a vampire. My last name is from the Transylvania area [laugh]. So—Corey: Excellent. Excellent.Amir: By the way, not the first time people tell me that. But anyway [laugh].Corey: There's something to be said for getting immediate responsiveness because one of the reasons I'm always so loath to go and do a support ticket anywhere is this is going to take weeks. And then someone's going to come back with a, “I don't get it.” And try and, like, read the support portfolio to you. No, you went right into yeah, it's this. Fix it and your problem goes away. And sure enough, it did.Amir: The escalation process that some companies put you through is very frustrating. I mean, lucky for you, CloudSnorkel is a one-man show and this man loves solving bugs. So [laugh].Corey: Yeah. Do you know of anyone using it for anything that isn't ridiculous and trivial like what I'm using it for?Amir: Yeah, I have to think whether or not I can… I mean, so—okay. We have a bunch of dedicated users, right, the GitHub repo, that keep posting bugs and keep posting even patches, right, so you can tell that they're using it. I even have one sponsor, one recurring sponsor on GitHub that uses it.Corey: It's always nice when people thank you via money.Amir: Yeah. Yeah, it is very validating. I think [BLEEP] is using it, but I also don't think I can actually say it because I got it from the GitHub.Corey: It's always fun. That's the beautiful part about open-source. You don't know who's using this. You see what other things people are working on, and you never know, is one of their—is this someone's side project, is it a skunkworks thing, or God forbid, is this inside of every car going forward and no one bothered to tell me about that. That is the magic and mystery of open-source. And you've been doing open-source for longer than I have and I thought I was old. You were originally named in some of the WinAMP credits, for God's sake, that media player that really whipped the llama's ass.Amir: Oh, yeah, I started real early. I started about when I was 15, I think. I started off with Pascal or something or even Perl, and then I decided I have to learn C and I have to learn Windows API. I don't know what possessed me to do that. Win32 API is… unique [laugh].But once I created those applications for myself, right, I think there was—oh my God, do you know the—what is it called, Sherlock in macOS, right? And these days, for PowerToys, there is the equivalent of it called, I don't know, whatever that—PowerBar? That's exactly—that was that. That's a project I created as a kid. I wanted something where I can go to the Run menu of Windows when you hit Winkey R, and you can just type something and it will start it up, right?I didn't want to go to the Start menu and browse and click things. I wanted to do everything with the keyboard. So, I created something called Blazerun [laugh], which [laugh] helped you really easily create shortcuts that went into your path, right, the Windows path, so you can really easily start them from Winkey R. I don't think that anyone besides me used it, but anyway, that thing needed an installer, right? Because Windows, you got to install things. So, I ended up—Corey: Yeah, these days on Mac OS, I use Alfred for that which is kind of long in the tooth, but there's a launch bar and a bunch of other stuff for it. What I love is that if I—I can double-tap the command key and that just pops up whatever I need it to and tell the computer what to do. It feels like there's an AI play in there somewhere if people can figure out how to spend ten minutes on building AI that does something other than lets them fire their customer service staff.Amir: Oh, my God. Please don't fire customer service staff. AI is so bad.Corey: Yeah, when I reach out to talk to a human, I really needed a human.Amir: Yes. Like, I'm not calling you because I want to talk to a robot. I know there's a website. Leave me alone, just give me a person.Corey: Yeah. Like, you already failed to solve my problem on your website. It's person time.Amir: Exactly. Oh, my God. Anyway [laugh]. So, I had to create an installer, right, and I found it was called NSIS. So, it was a Nullsoft “SuperPiMP” installation system. Or in the future, when Justin, the guy who created Winamp and NSIS, tried to tone down a little bit, Nullsoft Scriptable Installation System. And SuperPiMP is—this is such useless history for you, right, but SuperPiMP is the next generation of PiMP which is Plug-in Mini Packager [laugh].Corey: I remember so many of the—like, these days, no one would ever name any project like that, just because it's so off-putting to people with sensibilities, but back then that was half the stuff that came out. “Oh, you don't like how this thing I built for free in the wee hours when I wasn't working at my fast food job wound up—you know, like, how I chose to name it, well, that's okay. Don't use it. Go build your own. Oh, what you're using it anyway. That's what I thought.”Amir: Yeah. The source code was filled with profanity, too. And like, I didn't care, I really did not care, but some people would complain and open bug reports and patches. And my policy was kind of like, okay if you're complaining, I'm just going to ignore you. If you're opening a patch, fine, I'm going to accept that you're—you guys want to create something that's sensible for everybody, sure.I mean, it's just source code, you know? Whatever. So yeah, I started working on that NSIS. I used it for myself and I joined the forums—and this kind of answers to your question of why I respond to things so fast, just because of the fun—I did the same when I was 15, right? I started going on the forums, you remember forums? You remember that [laugh]?Corey: Oh, yeah, back before they all became terrible and monetized.Amir: Oh, yeah. So, you know, people were using NSIS, too, and they had requests, right? They wanted. Back in the day—what was it—there was only support for 16-bit colors for the icon, so they want 32-bit colors and big colors—32—big icon, sorry, 32 pixels by 32 pixels. Remember, 32 pixels?Corey: Oh, yes. Not well, and not happily, but I remember it.Amir: Yeah. So, I started just, you know, giving people—working on that open-source and creating up a fork. It wasn't even called ‘fork' back then, but yeah, I created, like, a little fork of myself and I started adding all these features. And people were really happy, and kind of created, like, this happy cycle for myself: when people were happy, I was happy coding. And then people were happy by what I was coding. And then they were asking for more and they were getting happier, the more I responded.So, it was kind of like a serotonin cycle that made me happy and made everybody happy. So, it's like a win, win, win, win, win. And that's how I started with open-source. And eventually… NSIS—again, that installation system—got so big, like, my fork got so big, and Justin, the guy who works on WinAMP and NSIS, he had other things to deal with. You know, there's a whole history there with AOL. I'm sure you've heard all the funny stories.Corey: Oh, yes. In fact, one thing that—you want to talk about weird collisions of things crossing, one of the things I picked up from your bio when you finally got tired of telling me no and agreed to be on the show was that you're also one of the team who works on camelcamelcamel.com. And I keep forgetting that's one of those things that most people have no idea exists. But it's very simple: all it does is it tracks Amazon products that you tell it to and alerts you when there's a price drop on the thing that you're looking at.It's something that is useful. I try and use it for things of substance or hobbies because I feel really pathetic when I'm like, get excited emails about a price drop in toilet paper. But you know, it's very handy just to keep an idea for price history, where okay, am I actually being ripped off? Oh, they claim it's their big Amazon Deals day and this is 40% off. Let's see what camelcamelcamel has to say.Oh, surprise. They just jacked the price right beforehand and now knocked 40% off. Genius. I love that. It always felt like something that was going to be blown off the radar by Amazon being displeased, but I discovered you folks in 2010 and here you are now, 13 years later, still here. I will say the website looks a lot better now.Amir: [laugh]. That's a recent change. I actually joined camel, maybe two or three years ago. I wasn't there from the beginning. But I knew the guy who created it—again, as you were saying—from the Winamp days, right? So, we were both working in the free—well, it wasn't freenode. It was not freenode. It was a separate IRC server that, again, Justin created for himself. It was called landoleet.Corey: Mmm. I never encountered that one.Amir: Yeah, no, it was pretty private. The only people that cared about WinAMP and NSIS ended up joining there. But it was a lot of fun. I met a lot of friends there. And yeah, I met Daniel Green there as well, and he's the guy that created, along with some other people in there that I think want to remain anonymous so I'm not going to mention, but they also were on the camel project.And yeah, I was kind of doing my poor version of shitposting on Twitter about AWS, kind of starting to get some traction and maybe some clients and talk about AWS so people can approach me, and Daniel approached me out of the blue and he was like, “Do you just post about AWS on Twitter or do you also do some AWS work?” I was like, “I do some AWS work.”Corey: Yes, as do all of us. It's one of those, well crap, we're getting called out now. “Do you actually know how any of this stuff works?” Like, “Much to my everlasting shame, yes. Why are you asking?”Amir: Oh, my God, no, I cannot fix your printer. Leave me alone.Corey: Mm-hm.Amir: I don't want to fix your Lambdas. No, but I do actually want to fix your Lambdas. And so, [laugh] he approached me and he asked if I can help them move camelcamelcamel from their data center to AWS. So, that was a nice big project. So, we moved, actually, all of camelcamelcamel into AWS. And this is how I found myself not only in the Winamp credits, but also in the camelcamelcamel credits page, which has a great picture of me riding a camel.Corey: Excellent. But one of the things I've always found has been that when you take an application that has been pre-existing for a while in a data center and then move it into the cloud, you suddenly have to care about things that no one sensible pays any attention to in the land of the data center. Because it's like, “What do I care about how much data passes between my application server and the database? Wait, what do you mean that in this configuration, that's a chargeable data transfer? Oh, dear Lord.” And things that you've never had to think about optimizing are suddenly things are very much optimizing.Because let's face it, when it comes to putting things in racks and then running servers, you aren't auto-scaling those things, so everything tends to be running over-provisioned, for very good reasons. It's an interesting education. Anything you picked out from that process that you think it'd be useful for folks to bear in mind if they're staring down the barrel of the same thing?Amir: Yeah, for sure. I think… in general, right, not just here. But in general, you always want to be pragmatic, right? You don't want to take steps are huge, right? So, the thing we did was not necessarily rewrite everything and change everything to AWS and move everything to Lambda and move everything to Docker.Basically, we did a mini lift-and-shift, but not exactly lift-and-shift, right? We didn't take it as is. We moved to RDS, we moved to ElastiCache, right, we obviously made use of security groups and session connect and we dropped SSH Sage and we improved the security a lot and we locked everything down, all the permissions and all that kind of stuff, right? But like you said, there's stuff that you start having to pay attention to. In our case, it was less the data transfer because we have a pretty good CDN. There was more of IOPS. So—and IOPS, specifically for a database.We had a huge database with about one terabyte of data and a lot of it is that price history that you see, right? So, all those nice little graphs that we create in—what do you call them, charts—that we create in camelcamelcamel off the price history. There's a lot of data behind that. And what we always want to do is actually remove that from MySQL, which has been kind of struggling with it even before the move to AWS, but after the move to AWS, where everything was no longer over-provisioned and we couldn't just buy a few more NVMes on Amazon for 100 bucks when they were on sale—back when we had to pay Amazon—Corey: And you know, when they're on sale. That's the best part.Amir: And we know [laugh]. We get good prices on NVMe. But yeah, on Amazon—on AWS, sorry—you have to pay for io1 or something, and that adds up real quick, as you were saying. So, part of that move was also to move to something that was a little better for that data structure. And we actually removed just that data, the price history, the price points from MySQL to DynamoDB, which was a pretty nice little project.Actually, I wrote about it in my blog. There is, kind of, lessons learned from moving one terabyte from MySQL to DynamoDB, and I think the biggest lesson was about hidden price of storage in DynamoDB. But before that, I want to talk about what you asked, which was the way that other people should make that move, right? So again, be pragmatic, right? If you Google, “How do I move stuff from DynamoDB to MySQL,” everybody's always talking about their cool project using Lambda and how you throttle Lambda and how you get throttled from DynamoDB and how you set it up with an SQS, and this and that. You don't need all that.Just fire up an EC2 instance, write some quick code to do it. I used, I think it was Go with some limiter code from Uber, and that was it. And you don't need all those Lambdas and SQS and the complication. That thing was a one-time thing anyway, so it doesn't need to be super… super-duper serverless, you know?Corey: That is almost always the way that it tends to play out. You encounter these weird little things along the way. And you see so many things that are tied to this is how architecture absolutely must be done. And oh you're not a real serverless person if you don't have everything running in Lambda and the rest. There are times where yeah, spin up an EC2 box, write some relatively inefficient code in ten minutes and just do the thing, and then turn it off when you're done. Problem solved. But there's such an aversion to that. It's nice to encounter people who are pragmatists more than they are zealots.Amir: I mostly learned that lesson. And both Daniel Green and me learned that lesson from the Winamp days. Because we both have written plugins for Winamp and we've been around that area and you can… if you took one of those non-pragmatist people, right, and you had them review the Winamp code right now—or even before—they would have a million things to say. That code was—and NSIS, too, by the way—and it was so optimized. It was so not necessarily readable, right? But it worked and it worked amazing. And Justin would—if you think I respond quickly, right, Justin Frankel, the guy who wrote Winamp, he would release versions of NSIS and of Winamp, like, four versions a day, right? That was before [laugh] you had CI/CD systems and GitHub and stuff. That was just CVS. You remember CVS [laugh]?Corey: Oh, I've done multiple CVS migrations. One to Git and a couple to Subversion.Amir: Oh yeah, Subversion. Yep. Done ‘em all. CVS to Subversion to Git. Yep. Yep. That was fun.Corey: And these days, everyone's using Git because it—we're beginning to have a monoculture.Amir: Yeah, yeah. I mean, but Git is nicer than Subversion, for me, at least. I've had more fun with it.Corey: Talk about damning with faint praise.Amir: Faint?Corey: Yeah, anything's better than Subversion, let's be honest here.Amir: Oh [laugh].Corey: I mean, realistically, copying a bunch of files and directories to a.bak folder is better than Subversion.Amir: Well—Corey: At least these days. But back then it was great.Amir: Yeah, I mean, the only thing you had, right [laugh]?Corey: [laugh].Amir: Anyway, achieving great things with not necessarily the right tools, but just sheer power of will, that's what I took from the Winamp days. Just the entire world used Winamp. And by the way, the NSIS project that I was working on, right, I always used to joke that every computer in the world ran my code, every Windows computer in the world when my code, just because—Corey: Yes.Amir: So, many different companies use NSIS. And none of them cared that the code was not very readable, to put it mildly.Corey: So, many companies founder on those shores where they lose sight of the fact that I can point to basically no companies that died because their code was terrible, yeah, had an awful lot that died with great-looking code, but they didn't nail the business problem.Amir: Yeah. I would be lying if I said that I nailed exactly the business problem at NSIS because the most of the time I would spend there and actually shrinking the stub, right, there was appended to your installer data, right? So, there's a little stub that came—the executable, basically, that came before your data that was extracted. I spent, I want to say, years of my life [laugh] just shrinking it down by bytes—by literal bytes—just so it stays under 34, 35 kilobytes. It was kind of a—it was a challenge and something that people appreciated, but not necessarily the thing that people appreciate the most. I think the features—Corey: Well, no I have to do the same thing to make sure something fits into a Lambda deployment package. The scale changes, the problem changes, but somehow everything sort of rhymes with history.Amir: Oh, yeah. I hope you don't have to disassemble code to do that, though because that's uh… I mean, it was fun. It was just a lot.Corey: I have to ask, how much work went into building your cdk-github-runners as far as getting it to a point of just working out the door? Because I look at that and it feels like there's—like, the early versions, yeah, there wasn't a whole bunch of code tied to it, but geez, the iterative, “How exactly does this ridiculous step functions API work or whatnot,” feels like I'm looking at weeks of frustration. At least it would have been for me.Amir: Yeah, yeah. I mean, it wasn't, like, a day or two. It was definitely not—but it was not years, either. I've been working on it I think about a year now. Don't quote me on that. But I've put a lot of time into it. So, you know, like you said, the skeleton code is pretty simple: it's a step function, which as we said, takes a long time to get right. The functions, they are really nice, but their definition language is not very straightforward. But beyond that, right, once that part worked, it worked. Then came all the bug reports and all the little corner cases, right? We—Corey: Hell is other people's use cases. Always is. But that's honestly better than a lot of folks wind up experiencing where they'll put an open-source project up and no one ever knows. So, getting users is often one of the biggest barriers to a lot of this stuff. I've found countless hidden gems lurking around on GitHub with a very particular search for something that no one had ever looked at before, as best I can tell.Amir: Yeah.Corey: Open-source is a tricky thing. There needs to be marketing brought into it, there needs to be storytelling around it, and has to actually—dare I say—solve a problem someone has.Amir: I mean, I have many open-source projects like that, that I find super useful, I created for myself, but no one knows. I think cdk-github-runners, I'm pretty sure people know about it only because you talked about it on Screaming in the Cloud or your newsletter. And by the way, thank you for telling me that you talked about it last week in the conference because now we know why there was a spike [laugh] all of a sudden. People Googled it.Corey: Yeah. I put links to it as well, but it's the, yeah, I use this a lot and it's great. I gave a crappy explanation on how it works, but that's the trick I've found between conference talks and, dare I say, podcast episodes, you gives people a glimpse and a hook and tell them where to go to learn more. Otherwise, you're trying to explain every nuance and every intricacy in 45 minutes. And you can't do that effectively in almost every case. All you're going to do is drive people away. Make it sound exciting, get them to see the value in it, and then let them go.Amir: You have to explain the market for it, right? That's it.Corey: Precisely.Amir: And I got to say, I somewhat disagree with your—or I have a different view when you say that, you know, open-source projects needs marketing and all those things. It depends on what open-source is for you, right? I don't create open-source projects so they are successful, right? It's obviously always nicer when they're successful, but—and I do get that cycle of happiness that, like I was saying, people create bugs and I have to fix them and stuff, right? But not every open-source project needs to be a success. Sometimes it's just fun.Corey: No. When I talk about marketing, I'm talking about exactly what we're doing here. I'm not talking take out an AdWords campaign or something horrifying like that. It's you build something that solved the problem for someone. The big problem that worries me about these things is how do you not lose sleep at night about the fact that solve someone's problem and they don't know that it exists?Because that drives me nuts. I've lost count of the number of times I've been beating my head against a wall and asked someone like, “How would you handle this?” Like, “Oh, well, what's wrong with this project?” “What do you mean?” “Well, this project seems to do exactly what you want it to do.” And no one has it all stuffed in their head. But yeah, then it seems like open-source becomes a little more corporatized and it becomes a lead gen tool for people to wind up selling their SaaS services or managed offerings or the rest.Amir: Yeah.Corey: And that feels like the increasing corporatization of open-source that I'm not a huge fan of.Amir: Yeah. I mean, I'm not going to lie, right? Like, part of why I created this—or I don't know if it was part of it, but like, I had a dream that, you know, I'm going to get, oh, tons of GitHub sponsors, and everybody's going to use it and I can retire on an island and just make money out of this, right? Like, that's always a dream, right? But it's a dream, you know?And I think bottom line open-source is… just a tool, and some people use it for, like you were saying, driving sales into their SaaS, some people, like, may use it just for fun, and some people use it for other things. Or some people use it for politics, even, right? There's a lot of politics around open-source.I got to tell you a story. Back in the NSIS days, right—talking about politics—so this is not even about politics of open-source. People made NSIS a battleground for their politics. We would have translations, right? People could upload their translations. And I, you know, or other people that worked on NSIS, right, we don't speak every language of the world, so there's only so much we can do about figuring out if it's a real translation, if it's good or not.Back in the day, Google Translate didn't exist. Like, these days, we check Google Translate, we kind of ask a few questions to make sure they make sense. But back in the day, we did the best that we could. At some point, we got a patch for Catalan language, I'm probably mispronouncing it—but the separatist people in Spain, I think, and I didn't know anything about that. I was a young kid and… I just didn't know.And I just included it, you know? Someone submitted a patch, they worked hard, they wanted to be part of the open-source project. Why not? Sure I included it. And then a few weeks later, someone from Spain wanted to change Catalan into Spanish to make sure that doesn't exist for whatever reason.And then they just started fighting with each other and started making demands of me. Like, you have to do this, you have to do that, you have to delete that, you have to change the name. And I was just so baffled by why would someone fight so much over a translation of an open-source project. Like, these days, I kind of get what they were getting at, right?Corey: But they were so bad at telling that story that it was just like, so basically, screw, “You for helping,” is how it comes across.Amir: Yeah, screw you for helping. You're a pawn now. Just—you're a pawn unwittingly. Just do what I say and help me in my political cause. I ended up just telling both of them if you guys can agree on anything, I'm just going to remove both translations. And that's what I ended up doing. I just removed both translations. And then a few months later—because we had a release every month basically, I just added both of them back and I've never heard from them again. So sort of problem solved. Peace the Middle East? I don't know.Corey: It's kind of wild just to see how often that sort of thing tends to happen. It's a, I don't necessarily understand why folks are so opposed to other people trying to help. I think they feel like there's this loss of control as things are slipping through their fingers, but it's a really unwelcoming approach. One of the things that got me deep into the open-source ecosystem surprisingly late in my development was when I started pitching in on the SaltStack project right after it was founded, where suddenly everything I threw their way was merged, and then Tom Hatch, the guy who founded the project, would immediately fix all the bugs and stuff I put in and then push something else immediately thereafter. But it was such a welcoming thing.Instead of nitpicking me to death in the pull request, it just got merged in and then silently fixed. And I thought that was a classy way to do it. Of course, it doesn't scale and of course, it causes other problems, but I envy the simplicity of those days and just the ethos behind that.Amir: That's something I've learned the last few years, I would say. Back in the NSIS day, I was not like that. I nitpicked. I nitpicked a lot. And I can guess why, but it just—you create a patch—in my mind, right, like you create a patch, you fix it, right?But these days I get, I've been on the other side as well, right? Like I created patches for open-source projects and I've seen them just wither away and die, and then five years later, someone's like, “Oh, can you fix this line to have one instead of two, and then I'll merge it.” I'm like, “I don't care anymore. It was five years ago. I don't work there anymore. I don't need it. If you want it, do it.”So, I get it these days. And these days, if someone creates a patch—just yesterday, someone created a patch to format cdk-github-runners in VS Code. And they did it just, like, a little bit wrong. So, I just fixed it for them and I approved it and pushed it. You know, it's much better. You don't need to bug people for most of it.Corey: You didn't yell at them for having the temerity to contribute?Amir: My voice is so raw because I've been yelling for five days at them, yeah.Corey: Exactly, exactly. I really want to thank you for taking the time to chat with me about how all this stuff came to be and your own path. If people want to learn more, where's the best place for them to find you?Amir: So, I really appreciate you having me and driving all this traffic to my projects. If people want to learn more, they can always go to cloudsnorkel.com; it has all the projects. github.com/cloudsnorkel has a few more. And then my private blog is kichik.com. So, K-I-C-H-I-K dot com. I don't post there as much as I should, but it has some interesting AWS projects from the past few years that I've done.Corey: And we will, of course, put links to all of that in the show notes. Thank you so much for taking the time. I really appreciate it.Amir: Thank you, Corey. It was really nice meeting you.Corey: Amir Szekely, owner of CloudSnorkel. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an insulting comment. Heck, put it on all of the podcast platforms with a step function state machine that you somehow can't quite figure out how the API works.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Not Escaping Containers but escaping Clusters - Managed Kubernetes distributions such as Amazon EKS, Google Kubernetes Engine (GKE) and Azure Kubernetes Service (AKS) attack vectors can allow you to reach the underlying AWS Account etc. In conversation with Christophe Tafani-Dereeper & Nick Frichette, from Datadog on how this is possible in Amazon EKS and achieving potentially the same in GKE & AKS too. Thank you to our episode sponsor Sagetap Guest Socials: Nick's and Christophe's Linkedin (Nick Frichette + Christophe Tafani-Dereeper) Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security Newsletter - Cloud Security BootCamp Questions asked: (00:00) Introduction (04:11) A bit about Christophe (04:37) A bit about Nick (05:03) What is managed Kubernetes? (06:26) Security of managed Kubernetes (09:02) Comparison between different managed Kubernetes (10:41) Service accounts and managed Kubernetes (14:22) What is container escape? (18:20) IMDSv2 for EKS (19:51) IMDSv2 in EKS vs AKES and GKE (22:01) Benchmark compliance for Kubernetes architecture (24:49) Low hanging fruits for container escape (27:17) Shared responsibility for managed Kubernetes (29:34) Fargate for Managed Kubernetes (32:00) Different ways to run containers (33:37) Escaping Managed Kubernetes cluster (38:39) Find more about this attack path (42:38) Escalation priviledge in EKS cluster (44:19) Reducing the Kubernetes attack service (44:58) MKAT for Kubernetes Security (48:23) Preventing AWS AuthConfig (50:11) Propagation Security (54:55) The fun section (57:47) Resources for latest Kubernetes updates Resources spoken about during the episode Nick Frichette's Blog - Hacking the Cloud Christophe Tafani-Dereeper' Blog Corey Quinn's - 17 ways to run containers on AWS MKAT cloudseclist newsletter
In this episode we talk about Containers in the cloud with AWS Solutions Architects Jamal Arif and Nikita Patel. We start with the basics - What Are Containers? When does it make sense to use Containers? We then dive into Docker, Docker Swarm, Kubernetes and how container orchestration can be implemented through specific AWS services such as EKS and Fargate. If you're new to Containers, this episode will help you decide the right container strategy in the cloud.AWS Hosts: Nolan Chen & Malini Chatterjee
Até que ponto uma ferramenta que simplifica, e muito, a implementação do Kubernetes pode mesmo ajudar o DevOps? Serve apenas para aplicações simples? Vai ter que abraçar o lock-in? E a fatura, sai cara no final do mês? Essas e outras perguntas fazem parte da sabatina que o Kubicast fez com o ilustríssimo colega angolano, Daniel Batubenga.Além das respostas para essas dúvidas, o episódio traz dicas para quem quer começar a usar o Fargate. A principal delas é: não caia de paraquedas no Fargate!Recomendações do programa:Não deixe de estudar! Comece por CI/CD, Containers, DevOps e ouça o Kubicast!Visite Angola e não deixe de experimentar a fumbuaO Kubicast é uma produção da Getup, empresa especialista em Kubernetes e projetos open source para Kubernetes. Os episódios do podcast estão em getup.io/kubicast, nas principais plataformas de áudio digital e no YouTube.com/@getupcloud.
Join me in an engaging live stream as I sit down with Alexander Altenhofen, Director of Product and Technology at Deutsche Fussball Liga (DFL). We'll uncover the secrets behind Bundesliga's rise as the world's most digitally-driven football league.Dive deep into:• "Glass to Grass" Strategy: Explore the fascinating journey of content from stadium cameras to the screens of millions.• The AWS Partnership: Discover how more than 80 AWS services have been employed to revolutionize the Bundesliga experience for fans across the globe.• The Personalized Fan Experience: Delve into the league's use of AWS services like Lambda, Fargate, and Sagemaker to capture real-time "match facts" that offer deep insights into game analytics and player performances.• Cost Optimization: Hear firsthand from Altenhofen about DFL's mission to boost efficiency and reallocate budget for strategic innovation without compromising the user experience.• The Future: Get a sneak peek into DFL's future initiatives, from generative AI content creation to personalized user experiences tailored for each Bundesliga fan.Get ready for a behind-the-scenes look at the future of football (or soccer) and how technology is transforming the way we engage with sports.
Welcome to another episode of the DevOps Toolchain Podcast! In this episode, our host, Joe Colantonio, sits down with the incredible Hassy Veldstra to discuss the exciting world of performance testing using Artillery. Hassy dives deep into the power of the playwright API and how it can simplify the creation of scripts for load testing. They explore the benefits of running load tests on serverless technologies like AWS Lambda and Fargate. Join us as we uncover the importance of load testing for performance and reliability and discover how Artillery is revolutionizing the landscape by making load testing easy and accessible for developers and QA engineers alike. Get ready for an enlightening conversation on the intricacies of performance testing and the future of load-testing tools. Let's dive in!
Welcome episode 220 of The Cloud Pod podcast - where the forecast is always cloudy! This week your hosts, Justin, Jonathan, Ryan, and Matthew discuss all things cloud, including virtual machines, an AI partnership between Microsoft and Meta for Llama 2, Lambda functions, Fargate, and lots of security updates including the Outlook breach and WORM protections. This and much more in our newest episode. Titles we almost went with this week: Too Many Bees died for Honeycode Microsoft announces that AI will only cost you 3 arms and a leg. The Cloud Pod also detects Recursive Loops in cloud news The cloud pod disables health checks bc who needs them A big thanks to this week's sponsor: Foghorn Consulting, provides top-notch cloud and DevOps engineers to the world's most innovative companies. Initiatives stalled because you have trouble hiring? Foghorn can be burning down your DevOps and Cloud backlogs as soon as next week.
Michael Clark from Sysdig joins with Dave to discuss their research on SCARLETEEL 2.0: Fargate, Kubernetes, and Crypto. New research from Sysdig threat researchers found that the group continues to thrive with improved tactics. Most recently, they gained access to AWS Fargate, a more sophisticated environment to breach, thanks to their upgraded attack tools. The research states "In their most recent activities, we saw a similar strategy to what was reported in the previous blog: compromise AWS accounts through exploiting vulnerable compute services, gain persistence, and attempt to make money using cryptominers." Had Sysdig not thwarted SCARLETEEL's attack, they estimated that they would have mined $4,000 per day until they were stopped. The research can be found here: SCARLETEEL 2.0: Fargate,Kubernetes, and Crypto
Michael Clark from Sysdig joins with Dave to discuss their research on SCARLETEEL 2.0: Fargate, Kubernetes, and Crypto. New research from Sysdig threat researchers found that the group continues to thrive with improved tactics. Most recently, they gained access to AWS Fargate, a more sophisticated environment to breach, thanks to their upgraded attack tools. The research states "In their most recent activities, we saw a similar strategy to what was reported in the previous blog: compromise AWS accounts through exploiting vulnerable compute services, gain persistence, and attempt to make money using cryptominers." Had Sysdig not thwarted SCARLETEEL's attack, they estimated that they would have mined $4,000 per day until they were stopped. The research can be found here: SCARLETEEL 2.0: Fargate,Kubernetes, and Crypto Learn more about your ad choices. Visit megaphone.fm/adchoices
Andreas Wittig, Co-Author of Amazon Web Services in Action and Co-Founder of marbot, joins Corey on Screaming in the Cloud to discuss ways to keep a book up to date in an ever-changing world, the advantages of working with a publisher, and how he began the journey of writing a book in the first place. Andreas also recalls how much he learned working on the third edition of Amazon Web Services in Action and how teaching can be an excellent tool for learning. Since writing the first edition, Adreas's business has shifted from a consulting business to a B2B product business, so he and Corey also discuss how that change came about and the pros and cons of each business model. About AndreasAndreas is the Co-Author of Amazon Web Services in Action and Co-Founder of marbot - AWS Monitoring made simple! He is also known on the internet as cloudonaut through the popular blog, podcast, and youtube channel he created with his brother Michael. Links Referenced: Amazon Web Services in Action: https://www.amazon.com/Amazon-Services-Action-Andreas-Wittig/dp/1617295116 Rapid Docker on AWS: https://cloudonaut.io/rapid-docker-on-aws/ bucket/av: https://bucketav.com/ marbot: https://marbot.io/ cloudonaut.io: https://cloudonaut.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It's been a few years since I caught up with Andreas Wittig, who is also known in the internet as cloudonaut, and much has happened since then. Andreas, how are you?Andreas: Hey, absolutely. Thank you very much. I'm happy to be here in the show. I'm doing fine.Corey: So, one thing that I have always held you in some high regard for is that you have done what I have never had the attention span to do: you wrote a book. And you published it a while back through Manning, it was called Amazon Web Services in Action. That is ‘in action' two words, not Amazon Web Services Inaction of doing absolutely nothing about it, which is what a lot of companies in the space seem to do instead.Andreas: [laugh]. Yeah, absolutely. So. And it was not only me. I've written the book together with my brother because back in 2015, Manning, for some reason, wrote in and asked us if we would be interested in writing the book.And we had just founded our own consulting company back then and we had—we didn't have too many clients at the very beginning, so we had a little extra time free. And then we decided, okay, let's do the book. And let's write a book about Amazon Web Services, basically, a deep introduction into all things AWS. So, this was 2015, and it was indeed a lot of work, much more [laugh] than we expected. So, first of all, the hard part is, what do you want to have in the book? So, what's the TOC? What is important and must be in?And then you start writing and have examples and everything. So, it was really an interesting journey. And doing it together with a publisher like Manning was also really interesting because we learned a lot about writing. You have kind of a coach, an editor that helps you through that process. So, this was really a hard and fun experience.Corey: There's a lot of people that have said very good things about writing the book through a traditional publisher. And they also say that one of the challenges is it's a blessing and a curse, where you basically have someone standing over your shoulder saying, “Is it done yet? Is it done yet? Is it done yet?” The consensus that seems to have emerged from people who have written books is, “That was great, please don't ever ask me to do it again.”And my operating theory is that no one wants to write a book. They want to have written a book. Which feels like two very different things most of the time. But the reason you're back on now is that you have gone the way of the terrible college professor, where you're going to update the book, and therefore you get to do a whole new run of textbooks and make everyone buy it and kill the used market, et cetera. And you've done that twice now because you have just recently released the third edition. So, I have to ask, how different is version one from version two and from version three? Although my apologies; we call them ‘editions' in the publishing world.Andreas: [laugh]. Yeah, yeah. So, of course, as you can imagine, things change a lot in AWS world. So, of course, you have to constantly update things. So, I remember from first to second edition, we switched from CloudFormation in JSON to YAML. And now to the third edition, we added two new chapters. This was also important to us, so to keep also the scope of the book in shape.So, we have in the third edition, two new chapters. One is about automating deployments, recovering code deploy, [unintelligible 00:03:59], CloudFormation rolling updates in there. And then there was one important topic missing at all in the book, which was containers. And we finally decided to add that in, and we have now container chapter, starting with App Runner, which I find quite an interesting service to observe right now, and then our bread and butter service: ECS and Fargate. So, that's basically the two new chapters. And of course, then reworking all the other chapters is also a lot of work. And so, many things change over time. Cannot imagine [laugh].Corey: When was the first edition released? Because I believe the second one was released in 2018, which means you've been at this for a while.Andreas: Yeah. So, the first was 2015, the second 2018, three years later, and then we had five years, so now this third edition was released at the beginning of this year, 2023.Corey: Eh, I think you're right on schedule. Just March of 2020 lasted three years. That's fine.Andreas: Yeah [laugh].Corey: So, I have to ask, one thing that I've always appreciated about AWS is, it feels like with remarkably few exceptions, I can take a blog post written on how to do something with AWS from 2008 and now in 2023, I can go through every step along with that blog post. And yeah, I might have trouble getting some of the versions and services and APIs up and running, but the same steps will absolutely work. There are very few times where a previously working API gets deprecated and stops working. Is this the best way to proceed? Absolutely not.But you can still spin up the m1.medium instance sizes, or whatever it was, or [unintelligible 00:05:39] on small or whatever the original only size that you could get was. It's just there are orders of magnitude and efficiency gains you can do by—you can go through by using more modern approaches. So, I have to ask, was there anything in the book as you revised it—two times now—that needed to come out because it was now no longer working?Andreas: So, related to the APIs that's—they are really very stable, you're right about that. So, the problem is, our first few chapters where we have screenshots of how you go through the management—Corey: Oh no.Andreas: —console [laugh]. And you can probably, you can redo them every three months, probably, because the button moves or a step is included or something like that. So, the later chapters in the book, where we focus a lot on the CLI or CloudFormation and stuff like—or SDKs, they are pretty stable. But the first few [ones 00:06:29] are a nightmare to update all those screenshots. And then sometimes, so I was going through the book, and then I noticed, oh, there's a part of this chapter that I can completely remove nowadays.So, I can give you an example. So, I was going through the chapter about Simple Storage Service S3, and I—there was a whole section in the chapter about read-after-write consistency. Because back then, it was important that you knew that after updating an object or reading an object before it was created the first time, you could get outdated versions for a little while, so this was eventually consistent. But nowadays, AWS has changed that and basically now, S3 has this strong read-after-write consistency. So, I basically could remove that whole part in the chapter which was quite complicated to explain to the reader, right, so I [laugh] put a lot of effort into that.Corey: You think that was confusing? I look at the sea of systems I had to oversee at one company, specifically to get around that problem. It's like, well, we can now take this entire application and yeet it into the ocean because it was effectively a borderline service to that just want to ens—making consistency guarantees. It's not a common use case, but it is one that occurs often enough to be a problem. And of course, when you need it, you really need it. That was a nice under-the-hood change that was just one day, surprise, it works that way. But I'm sure it was years of people are working behind the scenes, solving for impossible problems to get there, and cetera, et cetera.Andreas: Yeah, yeah. But that's really cool is to remove parts of the book that are now less complicated. This is really cool. So, a few other examples. So, things change a lot. So, for example, EFS, so we have EFS, Elastic File System, in the book as well. So, now we have new throughput modes, different limits. So, there's really a lot going on and you have to carefully go through all the—Corey: Oh, when EFS launched, it was terrible. Now, it's great just because it's gotten so much more effective and efficient as a service. It's… AWS releases things before they're kind of ready, it feels like sometimes, and then they improve with time. I know there have been feature deprecations. For example, for some reason, they are no longer allowing us to share out a bucket via BitTorrent, which, you know, in 2006 when it came out, seemed like a decent idea to save on bandwidth. But here in 2023, no one cares about it.But I'm also keeping a running list of full-on AWS services that have been deprecated or have the deprecations announced. Are any of those in the book in any of its editions? And if and when there's a fourth edition, will some of those services have to come out?Andreas: [laugh]. Let's see. So, right after the book was published—because the problem with books is they get printed, right; that's the issue—but the target of the book, AWS, changes. So, a few weeks after the printed book was out, we found out that we have an issue in our one of our examples because now S3 buckets, when you create them, they have locked public access enabled by default. And this was not the case before. And one of our example relies on that it can create object access control lists, and this is not working now anymore. [laugh].So yeah, there are things changing. And we have, the cool thing about Manning is they have that what they call a live book, so you can read it online and you can have notes from other readers and us as the authors along the text, and there we can basically point you in the right direction and explain what happened here. So, this is how we try to keep the book updated. Of course, the printed one stays the same, but the ebook can change over time a little bit.Corey: Yes, ebooks are… at least keeping them updated is a lot easier, I would imagine. It feels like that—speaking of continuous builds and automatic CI/CD approaches—yeah, well, we could build a book just by updating some text in a Git repo or its equivalent, and pressing go, but it turns out that doing a whole new print run takes a little bit more work.Andreas: Yeah. Because you mentioned the experience of writing a book with a publisher and doing it on your own with self-publishing, so we did both in the past. We have Amazon Web Services in Action with Manning and we did another book, Rapid Docker on AWS in self-publishing. And what we found out is, there's really a lot of effort that goes into typesetting and layouting a book, making sure it looks consistent.And of course, you can just transform some markdown into a epub and PDF versions, but if a publisher is doing that, the results are definitely different. So, that was, besides the other help that we got from the publisher, very helpful. So, we enjoyed that as well.Corey: What is the current state of the art—since I don't know the answer to this one—around updating ebook versions? If I wind up buying an ebook on Kindle, for example, will they automatically push errata down automatically through their system, or do they reserve that for just, you know, unpublishing books that they realized shouldn't be on the Marketplace after people have purchased them?Andreas: [laugh]. So—Corey: To be fair, that only happened once, but I'm still giving them grief for it a decade and change later. But it was 1984. Of all the books to do that, too. I digress.Andreas: So, I'm not a hundred percent sure how it works with the Kindle. I know that Manning pushes out new versions by just emailing all the customers who bought the book and sending them a new version. Yeah.Corey: Yeah. It does feel, on some level, like there needs to be at least a certain degree of substantive change before they're going to start doing that. It's like well, good news. There was a typo on page 47 that we're going to go ahead and fix now. Two letters were transposed in a word. Now, that might theoretically be incredibly important if it's part of a code example, which yes, send that out, but generally, A, their editing is on point, so I didn't imagine that would sneak through, and 2, no one cares about a typo release and wants to get spammed over it?Andreas: Definitely, yeah. Every time there's a reprint of the book, you have the chance to make small modifications, to add something or remove something. That's also a way to keep it in shape a little bit.Corey: I have to ask, since most people talk about AWS services to a certain point of view, what is your take on databases? Are you sticking to the actual database services or are you engaged in my personal hobby of misusing everything as a database by holding it wrong?Andreas: [laugh]. So, my favorite database for starting out is DynamoDB. So, I really like working with DynamoDB and I like the limitations and the thing that you have to put some thoughts into how to structure your data set in before. But we also use a lot of Aurora, which really find an interesting technology. Unfortunately, Aurora Serverless, it's not becoming a product that I want to use. So, version one is now outdated, version two is much too expensive and restricted. So—Corey: I don't even know that it's outdated because I'm seeing version one still get feature updates to it. It feels like a divergent service. That is not what I would expect a version one versus version two to be. I'm with you on Dynamo, by the way. I started off using that and it is cheap is free for most workloads I throw at it. It's just a great service start to finish. The only downside is that if I need to move it somewhere else, then I have a problem.Andreas: That's true. Yeah, absolutely.Corey: I am curious, as far as you look across the sea of change—because you've been doing this for a while and when you write a book, there's nothing that I can imagine that would be better at teaching you the intricacies of something like AWS than writing a book on it. I got a small taste of this years ago when I shot my mouth off and committed to give a talk about Git. Well, time to learn Git. And teaching it to other people really solidifies a lot of the concepts yourself. Do you think that going through the process of writing this book has shaped how you perform as an engineer?Andreas: Absolutely. So, it's really interesting. So,I added the third edition and I worked on it mostly last year. And I didn't expect to learn a lot during that process actually, because I just—okay, I have to update all the examples, make sure everything work, go through the text, make sure everything is up to date. But I learned things, not only new things, but I relearned a lot of things that I wasn't aware of anymore. Or maybe I've never been; I don't know exactly [laugh].But it's always, if you go into the details and try to explain something to others, you learn a lot about that. So, teaching is a very good way to, first of all gather structure and a deep understanding of a topic and also dive into the details. Because when you write a book, every time you write a sentence, ask the question, is that really correct? Do I really know that or do I just assume that? So, I check the documentation, try to find out, is that really the case or is that something that came up myself?So, you'll learn a lot by doing that. And always come to the limits of the AWS documentation because sometimes stuff is just not documented and you need to figure out, what is really happening here? What's the real deal? And then this is basically the research part. So, I always find that interesting. And I learned a lot in during the third edition, while was only adding two new chapters and rewriting a lot of them. So, I didn't expect that.Corey: Do you find that there has been an interesting downstream effect from having written the book, that for better or worse, I've always no—I always notice myself responding to people who have written a book with more deference, more acknowledgment for the time and effort that it takes. And some books, let's be clear, are terrible, but I still find myself having that instinctive reaction because they saw something through to be published. Have you noticed it changing other aspects of your career over the past, oh, dear Lord, it would have been almost ten years now.Andreas: So, I think it helped us a lot with our consulting business, definitely. Because at the very beginning, so back in 2015, at least here in Europe and Germany, AWS was really new in the game. And being the one that has written a book about AWS was really helping… stuff. So, it really helped us a lot for our consulting work. I think now we are into that game of having to update the book [laugh] every few years, to make sure it stays up to date, but I think it really helped us for starting our consulting business.Corey: And you've had a consulting business for a while. And now you have effectively progressed to the next stage of consulting business lifecycle development, which is, it feels like you're becoming much more of a product company than you were in years past. Is that an accurate perception from the outside or am I misunderstanding something fundamental?Andreas: You know, absolutely, that's the case. So, from the very beginning, basically, when we founded our company, so eight years ago now, so we always had to go to do consulting work, but also do product work. And we had a rule of thumb that 20% of our time goes into product development. And we tried a lot of different things. So, we had just a few examples that failed completely.So, we had a Time [Series 00:17:49] as a Service offering at the very beginning of our journey, which failed completely. And now we have Amazon Timestream, which makes that totally—so now the market is maybe there for that. We tried a lot of things, tried content products, but also as we are coming from the software development world, we always try to build products. And over the years, we took what we learned from consulting, so we learned a lot about, of course, AWS, but also about the market, about the ecosystem. And we always try to bring that into the market and build products out of that.So nowadays, we really transitioned completely from consulting to a product company, as you said. So, we do not do any consulting anymore with one few exception with one of our [laugh] best or most important clients. But we are now a product company. And we only a two-person company. So, the idea was always how to scale a company without growing the team or hiring a lot of people, and a consulting business is definitely not a good way to do that, so yeah, this was why always invested into products.And now we have two products in the AWS Marketplace which works very well for us because it allows us to sell worldwide and really easily get a relationship up and running with our customers, and that pay through their AWS bill. So, that's really helping us a lot. Yeah.Corey: A few questions on that. At first it always seems to me that writing software or building a product is a lot like real estate in that you're doing a real estate development—to my understanding since I live in San Francisco and this is a [two exit 00:19:28] town; I still rent here—I found though, that you have to spend a lot of money and effort upfront and you don't get to start seeing revenue on that for years, which is why the VC model is so popular where you'll take $20 million, but then in return they want to see massive, outsized returns on that, which—it feels—push an awful lot of perfectly sustainable products into things that are just monstrous.Andreas: Hmm, yeah. Definitely.Corey: And to my understanding, you bootstrapped. You didn't take a bunch of outside money to do this, right?Andreas: No, no, we have completely bootstrapping and basically paying the bills with our consulting work. So yeah, I can give you one example. So, bucketAV is our solution to scan S3 buckets for malware, and basically, this started as an open-source project. So, this was just a side project we are working on. And we saw that there is some demand for that.So, people need ways to scan their objects—for example, user uploads—for malware, and we just tried to publish that in the AWS Marketplace to sell it through the Marketplace. And we don't really expect that this is a huge deal, and so we just did, I don't know, Michael spent a few days to make sure it's possible to publish that and get in shape. And over time, this really grew into an important, really substantial part of our business. And this doesn't happen overnight. So, this adds up, month by month. And you get feedback from customers, you improve the product based on that. And now this is one of the two main products that we sell in the Marketplace.Corey: I wanted to ask you about the Marketplace as well. Are you finding that that has been useful for you—obviously, as a procurement vehicle, it means no matter what country a customer is in, they can purchase it, it shows up on the AWS bill, and life goes on—but are you finding that it has been an effective way to find new customers?Andreas: Yes. So, I definitely would think so. It's always funny. So, we have completely inbound sales funnel. So, all customers find us through was searching the Marketplace or Google, probably. And so, what I didn't expect that it's possible to sell a B2B product that way. So, we don't know most of our customers. So, we know their name, we know the company name, but we don't know anyone there. We don't know the person who buys the product.This is, on the one side, a very interesting thing as a two-person company. You cannot build a huge sales process and I cannot invest too much time into the sales process or procurement process, so this really helps us a lot. The downside of it is a little bit that we don't have a close relationship with our customers and sometimes it's a little tricky for us to find important person to talk to, to get feedback and stuff. But on the other hand, yeah, it really helps us to sell to businesses all over the world. And we sell to very small business of course, but also to large enterprise customers. And they are fine with that process as well. And I think, even the large enterprises, they enjoy that it's so easy [laugh] to get a solution up and running and don't have to talk to any salespersons. So, enjoy it and I think our customers do as well.Corey: This is honestly the first time I've ever heard a verifiable account a vendor saying, “Yeah, we put this thing on the Marketplace, and people we've never talked to find us on the Marketplace and go ahead and buy.” That is not the common experience, let's put it that way. Now true, an awful lot of folks are selling enterprise software on this and someone—I forget who—many years ago had a great blog post on why no enterprise software costs $5,000. It either is going to cost $500 or it's going to cost 100 grand and up because the difference is, is at some point, you'd have a full-court press enterprise sales motion to go and sell the thing. And below a certain point, great, people are just going to be able to put it on their credit card and that's fine. But that's why you have this giant valley of there is very little stuff priced in that sweet spot.Andreas: Yeah. So, I think maybe it's important to mention that our products are relatively simple. So, they are just for a very small niche, a solution for a small problem. So, I think that helps a lot. So, we're not selling a full-blown cloud security solution; we only focus on that very small part: scanning S3 objects for malware.For example, on marbot,f the other product that we sell, which is monitoring of AWS accounts. Again, we focus on a very simple way to monitor AWS workloads. And so, I think that is probably why this is a successful way for us to find new customers because it's not a very complicated product where you have to explain a lot. So, that's probably the differentiator here.Corey: Having spent a fair bit of time doing battle with compliance goblins—which is, to be clear, I'm not describing people; I'm describing processes—in many cases, we had to do bucket scanning for antivirus, just to check a compliance box. From our position, there was remarkably little risk of a user-generated picture of a receipt that is input sanitized to make sure it is in fact a picture, landing in an S3 bucket and then somehow infecting one of the Linux servers through which it passed. So, we needed something that just checked the compliance box or we would not be getting the gold seal on our website today. And it was, more or less, a box-check as opposed to something that solved a legitimate problem. This was also a decade and change ago. Has that changed to a point now where there are legitimate threats and concerns around this, or is it still primarily just around make the auditor stop yelling at me, please?Andreas: Mmm. I think it's definitely to tick the checkbox, to be compliant with this, some regulation. On the other side, I think there are definitely use cases where it makes a lot of sense, especially when it comes to user-generated content of all kinds, especially if you're not only consuming it internally, but maybe also others can immediately start downloading that. So, that is where we see many of our customers are coming with that scenario that they want to make sure that the files that people upload and others can download are not infected. So, that is probably the most important use case.Corey: There's also, on some level, an increasing threat of ransomware. And for a long time, I was very down on the ideas of all these products that hit the market to defend S3 buckets against ransomware. Until one day, there was an AWS security blog post talking about how they found it. And yeah, we've we have seen this in the wild; it is causing problems for companies; here's what to do about it. Because it's one of those areas where I can't trust a vendor who's trying to sell me something to tell me that this problem exists.I mean, not to cast aspersions, but they're very interested, they're very incentivized to tell that story, whereas AWS is not necessarily incentivized to tell a story like that. So, that really brought it home for me that no, this is a real thing. So, I just want to be clear that my opinion on these things does in fact, evolve. It's not, “Well, I thought it was dumb back in 2012, so clearly it's still dumb now.” That is not my position, I want to be very clear on that.I do want to revisit for a moment, the idea of going from a consultancy that is a services business over to a product business because we've toyed with aspects of that here at The Duckbill Group a fair bit. We've not really found long-term retainer services engagements that add value that we are comfortable selling. And that means as a result that when you sell fixed duration engagements, it's always a sell, sell, sell, where's the next project coming from? Whereas with product businesses, it's oh, the grass is always greener on the other side. It's recurring revenue. Someone clicks, the revenue sticks around and never really goes away. That's the dream from where I sit on the services side of the fence, wistfully looking across and wondering what if. Now that you've made that transition, what sucks about product businesses that you might not have seen going into it?Andreas: [laugh]. Yeah, that a good question. So, on the one side, it was really also our dream to have a product business because it really changes the way we work. We can block large parts of our calendar to do deep-focus work, focus on things, find new solutions, and really try to make a solution that really fits to problem and uses all the AWS capabilities to do so. And on the other side, a product business involves, of course, selling the product, which is hard.And we are two software engineers, [laugh] and really making sure that we optimize our sales and there's search engine optimization, all that stuff, this is really hard for us because we don't know anything about that and we always have to find an expert, or we need to build a knowledge ourself, try things out, and so on. So, that whole part of selling the product, this is really a challenge for us. And then of course, product business evolves a lot of support work. So, we get support emails multiple times per hour, and we have to answer them and be as fast as possible with that. So, that is, of course, something that you do not have to do with consulting work.And not always that, the questions are many times really simple questions that pointed people in the right direction, find part of the documentation that answers the question. So, that is a constant stream of questions coming in that you have to answer. So, the inbox is always full [laugh]. So, that is maybe a small downside of a product business. But other than that, yeah, compared to a consulting business, it really gives us many flexibilities with planning our work day around the rest of our lives. That's really what we enjoy about a product company.Corey: I was very careful to pick an expensive problem that was only a business-hours problem. So, I don't wind up with a surprise, middle-of-the-night panic phone call. It's yeah, it turns out that AWS billing operate during business hours in the US Pacific Time. The end. And there are no emergencies here; there are simply curiosities that will, in the fullness of time take weeks to get resolved.Andreas: Mmm. Yeah.Corey: I spent too many years on call, in that sense. Everyone who's built a product company the first time always says the second time, the engineering? Meh, there are ways to solve that. Solving the distribution problem. That's the thing I want to focus on next.And I feel like I sort of went into this backwards in that I don't really have a product to sell people but I somehow built an audience. And to be honest, it's partly why. It's because I didn't know what I was going to be doing after 18 months and I knew that whatever it was going to be, I needed an audience to tell about it, so may as well start the work of building the audience now. So, I have to imagine if nothing else, your book has been a tremendous source of building a community. When I mentioned the word cloudonaut to people who have been learning AWS, more often than not, they know who you are.Andreas: Yeah.Corey: Although I admit they sometimes get you confused with your brother.Andreas: [laugh]. Yes, that's not too hard. Yeah, yeah, cloudonaut is definitely—this was always our, also a side project of we was just writing about things that we learned about AWS. Whenever we, I don't know, for example, looked into a new series, we wrote a blog post about that. Later, we did start a podcast and YouTube videos during the pandemic, of course, as everyone did. And so, I think this was always fun stuff to do. And we like sharing what we learn and getting into discussion with the community, so this is what we still do and enjoy as well, of course. Yeah.Corey: I really want to thank you for taking the time to catch up and see what you've been up to these last few years with a labor of love and the pivot to a product company. If people want to learn more, where's the best place for them to find you?Andreas: So definitely, the best place to find me is cloudonaut.io. So, this basically points you to all [laugh] what I do. Yeah, that's basically the one domain and URL that you need to know.Corey: Excellent. And we will put that in the show notes, of course. Thank you so much for taking the time to speak with me today. I really appreciate it.Andreas: Yeah, it was a pleasure to be back here. I'm big fan of podcasts and also of Screaming in the Cloud, of course, so it was a pleasure to be here again.Corey: [laugh]. You are always welcome. Andreas Wittig, co-author of Amazon Web Services in Action, now up to its third edition. And of course, the voice behind cloudonaut. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that I will at one point be able to get to just as soon as I find something to deal with your sarcasm on the AWS Marketplace.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Aaron Torres and Ala Shiban are from Klotho, which powers Infrastructure Copilot, the most advanced infrastructure design tool that understands how to define, connect, and scale your infrastructure-as-code. Victoria talks to Aaron and Ala about the Klotho engine, Klotho the CLI tool, and InfraCopilot and how they work together to help enable developer teams to iterate on applications and features quickly. Klotho (https://klo.dev/) Infrastructure Copilot (https://infracopilot.io/) Follow Klotho on Github (https://github.com/klothoplatform/klotho), Discord (https://discord.com/invite/4wwBRqqysY), Twitter (https://twitter.com/GetKlotho), or LinkedIn (https://www.linkedin.com/company/klothoplatform/). Follow Aaron Torres on LinkedIn (https://www.linkedin.com/in/torresaaron/), or Twitter (https://twitter.com/aarontorres). Follow Ala Shiban on LinkedIn (https://www.linkedin.com/in/alashiban/) or Twitter (https://twitter.com/AlaShiban). Follow thoughtbot on Twitter (https://twitter.com/thoughtbot) or LinkedIn (https://www.linkedin.com/company/150727/). Become a Sponsor (https://thoughtbot.com/sponsorship) of Giant Robots! Transcript: VICTORIA: This is the Giant Robots Smashing Into Other Giant Robots Podcast, where we explore the design, development, and business of great products. I'm your host, Victoria Guido. And with me today is Aaron Torres and Ala Shiban from Klotho, which powers Infrastructure Copilot, the most advanced infrastructure design tool that understands how to define, connect, and scale your infrastructure-as-code. Aaron and Ala, thank you for joining me. ALA: Thank you for having us. AARON: Yeah, thank you very much. VICTORIA: Well, great. I wanted to just start with a little bit of a icebreaker; maybe tell me a little bit more about what the weather is like where you're currently at. AARON: So I'm in St. Louis, Missouri. Right now, it is definitely...it feels like summer finally. So we're getting some nice, warm days and clear skies. ALA: And I'm in LA. And it's gloomier than I would like compared to what it's been in the last few years. But I'll take it if this means we're getting closer to summer. VICTORIA: Right. And I'm not too far from you, Ala, in San Diego, and it's a little chillier than I would prefer as well. But that's what we get for living close to the beach. So there's always trade-offs. Well, wonderful. I'm so excited to talk to you about your product here today. Let me start with a question about, let's say, I'm a non-technical founder, and I've just heard about your product. What's your pitch to someone in that position on the value of your tool? ALA: For somebody who isn't technical, I would say you can enable your team, your developer team, to quickly iterate on their applications or features and let InfraCopilot and Klotho take care of taking that application or features and deploy them and getting them running on the cloud. VICTORIA: Okay. So maybe I've been thinking about having to hire an AWS engineer or someone who's an infrastructure engineer. I could consider getting a tool like Klotho and Infrastructure Copilot to allow my developers to take on more of that responsibility themselves. ALA: Absolutely. VICTORIA: Gotcha. Okay, well, great. So let me ask about how did it all get started? What was the impetus that set you on this journey ALA: Both Aaron and I used to work at Riot Games, and I used to lead the cloud services org at Riot. I had about 50 people, 40 engineers, as part of a larger 120-person org, infrastructure platform org, which was tasked with building the platform that runs League of Legends, VALORANT, for 200 million people all around the world, in China. Full DevOps mode for Riot developers and full ops mode for running in China. It took us three years, a lot of effort. And by the time we were done, it was already legacy, and that seemed broken to me. We were already getting started to do another round of upgrades and iterations. At that point, I decided to leave. But I couldn't let go of this feeling that we shouldn't have had to spend so many years solving a problem only for it not to be solved. And based on research and conversations, it was clear that this was an industry-wide phenomena. And so I went about trying to figure out why that happens and then how we can solve it, and that's how Klotho came about. VICTORIA: That's so interesting. And I've certainly been a part of similar situations where you spend so much time solving a big problem and infrastructure only to get to the end of it and realize now you have a whole nother set of problems. [laughs] And you get upgrade. And they've also invented new ways of doing things in the cloud that you want to be able to take advantage of. So you had that time with Riot Games and League of Legends and building this globally responsive infrastructure. What lessons learned did you take from that into building Klotho and building your product, Infrastructure Copilot? AARON: We learned a bunch of things. One of the more difficult problems to solve isn't technical at all; it's organizational and understanding how the organization flows and how the different teams interact with each other. So we really endeavor to solve that problem. I mean, our product is a technical product, but it is meant to help bridge that gulf and make that problem a little bit easier as well. Otherwise, yeah, exactly to your point, part of the problem with these migrations is that new technology comes along. And there's definitely a feeling of when you hire new developers, they are excited about the new thing, and there's other reasons as well. But you get this kind of natural, eternal migration going to the newer technology. VICTORIA: That makes sense. And you bring up a great point on some of the issues, not being technical but organizational. And when I look at a lot of infrastructure-as-code tools, when we get to security, I wonder how it fits in with the organizational requirements for security, right? Like, you have to have defined groups who have defined access to different levels and have the tools in place to be able to manage identities in your organization. So I'm curious how that fits into what you built with Klotho and the Infrastructure Copilot. ALA: The way we think about infrastructure is as a set of intents or things that developers, and operators, cloud engineers, infrastructure engineers are trying to satisfy or to do. So you have tasks. You're trying to build a solution. You're trying to build an architecture or add something to it. And organizations have constraints, whether it's their own Terraform, or their own ruleset, or security expectations, or compliance expectations. And the way we look at this dynamic is those rules are encoded in a way that Klotho, which is a cloud compiler, it has the ability to reason about both the application and the infrastructure-as-code and enforce or at least warn about mismatches between the constraints that the organization sets, and what the developer or operator are trying to do, or the intent that is being described high level or low level within the tools. And then that is reflected both visually and in code and in the infrastructure-as-code, one or more. And so it's very much rooted in how the entire set of technologies and product and tools are designed. VICTORIA: Got it. So do you see the tool will be more fit for the market of larger development shops who maybe have existing infrastructure but want to experiment with a different way of managing it for their developers? ALA: It depends. So because we went about solving the problem rather than just building a specific vertical or a specific stack piece, we try to only play in this space of intelligent editing and intelligent understanding of the alignment between infrastructure and code. And so you could, as a developer, effectively with Klotho, write a plain application and have it be running in the cloud without knowing anything in the underlying cloud systems. It will set up storage, and persistence, and security, and secrets. All those elements are easily accessible within the code itself. It can also work in the context of a company where the infrastructure or platform team have set those rules and guidance within the tools. And then, developers can continue working the way they expect to work, either in code or in the infrastructure-as-code layer. And it would still allow them to do the same intents that they want only within that sandbox. Or if they can't be satisfied because they're trying to do something that isn't allowed, they have a mechanism of, one, knowing that but also asking, in our case, InfraCopilot to help it reshape what it's doing, what they're doing into the sandbox and the trade-offs that that brings in. VICTORIA: Got it. So you can both start from scratch and start a brand new application using it, or you can integrate it with your existing rules and systems and everything that already exists. ALA: Exactly. VICTORIA: Gotcha. Yeah, I think one interesting thing we've found with very new founders who are building their application for the first time is that there are some essential things, like, they don't even have an identity store like a Google [laughs] or Microsoft Azure Directory. So starting to work in the cloud, there are some basic elements you have to set up first that's a little bit of a barrier. So it sounds like what you're saying with Klotho is that you wouldn't necessarily have those same issues. Or how would you get that initial, like, cloud accounts set up? AARON: Yeah. So, for the situation where you're bootstrapping everything from scratch, you've done nothing; we haven't invested much in setting up the initial accounts. But assuming you get to the point where you have AWS credentials, and you're able to hit the AWS API using the CLI, that's sort of where we can take over. So, yeah, like, I would say right now, as a business, it's definitely where the value is coming is going to be these mid-sized companies. But for that scenario specifically, bootstrapping and starting something from scratch, if you have that initial setup in place, it's one of the fastest ways to go from a concept to something running in the cloud. ALA: And if you think about the two tools that we're building, there's Klotho, which InfraCopilot...or the Klotho engine, which Klotho the CLI tool uses and InfraCopilot uses. The Klotho engine is responsible for the intelligence. It knows how to translate things like I want a web API that talks to DynamoDB. And it will literally create everything or modify everything that is needed to give you that and plug in your code. You can also say things in a much higher level degree, which things like I want a lambda which handles 10,000 users. And I want it to be lowest latency talking to an RDS Instance or to a Postgres database. And what that would do is, in our side, in the Klotho engine, we understand that there needs to be a VPC and subnets, and spin up RDS, and connect an RDS proxy. Because for connection pooling with lambda specifically, you need one to scale to that degree of scale. And so that is the intelligence that is built into the Klotho engine if you want to start from the infrastructure. If you want to start from code, all you have to do is bring in the Redis instance, the Redis SDK, and, let's say, your favorite web framework, and just add the annotations or the metadata that says, I want this web framework to be exposed to the internet, and I want this Redis to be persisted in the cloud. And you run Klotho. And what comes out the other end is the cloud version that does that for you. And it's one command away from getting it to run. VICTORIA: So that's interesting how the two tools work together and how a developer might be able to get things spun up quickly on the cloud without having to know the details of each particular AWS service. And reading through your docs, it sounds like once you have something working in the cloud, then you'll also get automated recommendations on how to improve it for cost and reliability. Is that right? ALA: That's where we're headed. VICTORIA: Gotcha. I'm curious; for Aaron, it sounds like there is more in that organizational challenges that you alluded to earlier. So you want to be able to deliver this capability to developers. But what barriers have you found organizationally to getting this done? AARON: So I'm going to speak specifically on infrastructure here because I think this is one of the biggest ones we've seen. But typically, when you get to a larger-sized company, we'll call it a mid-sized company with, you know, a couple hundred engineers or more, you get to the point where it doesn't make sense for every team to own their entire vertical. And so you want to really put the cloud knowledge into a central team. And so you tend to build either a platform team, or an infrastructure team, or a cloud team who sort of owns how cloud resources are provisioned, which ones they support, et cetera. And so, really, some of the friction I'm talking about is the friction between that team and developer teams who really just want to write their application and get going quickly. But you don't have to fall within the boundaries set by that central team. To give, like, a real concrete example of that, if you wanted to prototype a new technology, like, let's say that some new database technology came out and you wanted to use it, it's a very coordinated effort between both teams in terms of the roadmap. Like, the infrastructure team needs to get on that roadmap, that they need to make a sandbox and how that's going to work. The code team needs to make an application to test it. And the whole thing requires a lot more communication than just tech. VICTORIA: Yeah, no, I've been part of kind of one of those classic DevOps problems. It's where now you've built the ops team and the dev team, [laughs] and now you're back to those coordination issues that you had before. So, if I were a dev using Klotho or the infrastructure-as-code copilot, I would theoretically have access to any AWS sandbox account. And I could just spin up whatever I wanted [laughs] within the limits that could be defined by your security team or by your, you know, I'm sure there's someone who's setting a limit on the size of databases you could spin up for fun. Does that sound right? AARON: Yeah, that's totally right. And in addition to just limits, it's also policies. So a good example is maybe in production for databases, you have a data retention policy. And you have something like we need to keep three months of backups for this amount of time. We want to make sure that if someone spins up a production database from any of those app teams, that they will follow their company policy there and not accidentally, like, lose data where it has to be maintained for some reason. ALA: That's an important distinction where we have our own set of, you know, best practice or rules that are followed roughly in the industry. But also, the key here is that the infrastructure central teams in every company can describe the different rulesets and guidelines, guardrails within the company on what developers can do, not only in low-level descriptions like instance sizes or how much something is, whether it's Spot Instances always or not in production versus dev. But also be able to teach the system when a developer says, "I want a database," spin up a Postgres database with this configuration that is wired to the larger application that they have. Or, if I want to run a service, then it spins up the correct elements and configures them to work, let's say, Kubernetes pods, or lambdas, or a combination based on what the company has described as the right way for that company to do things. And so it gives flexibility to not know the specific details but still get the company's specific way of doing them. And the key here is that we're trying to codify the communication patterns that do happen, and they need to happen if there's no tools to facilitate it between the infrastructure platform team and the feature teams. Only in this case, we try and capture that in a way that the central teams can define it. And the developers on feature teams can consume it without having as much friction. VICTORIA: So that will be different than, like, an infrastructure team that's putting out everything in Terraform and doing pull requests based off GitHub repository to that. It makes it a little more easier to read, and understand, and share the updates and changes. AARON: Right. And also, I mean, so, like, the thing you're describing of, like, the central team, having Terraform tends to be, like, these golden templates. Like they say, "If you want to make a database, here's your database template." And then you get a lot of interesting issues like drift, where maybe some teams are using the old versions of the templates, and they're not picking up the new changes. And how do you kind of reconcile all that? So it is meant to help with all of those things. VICTORIA: That makes a lot of sense. And I'm curious, what questions came up in the customer discovery process for this product that surprised you? ALA: I think there's one...I don't know that it was a question, but I think there was...So, when we started with Klotho, Klotho has the ability to enable a code-first approach, which means that you give the tool to developers as the infrastructure or platform team, or if you're a smaller shop, then you can just use Klotho directly. You set the rules on what's allowed or what's not allowed, and then developers can work very freely. They can describe very succinctly how to turn a plain object, SDK, et cetera, how to build larger architectures very quickly with a few annotations that we describe and that give cloud powers. We had always thought that some teams will feel that this encroaches on their jobs. We've heard from people on infra, you know, platform teams, "This is amazing. But this is my job." And so, one of our hypotheses was that we are encroaching into what they see as their responsibility. And we built more and more mechanisms that would clean up that interface and give them the ability to control more so they can free themselves up, just like most automations that happen in the world, to do more things. What happened later surprised us. And by having a few or several more discoveries, we found out that the feeling isn't a fear of the tool replacing their job. The fear or worry is that the tool will make their jobs boring, what is left of the job be boring, and nobody wants to go to work and not have cool and fun things to do. And because I think we all, on a certain degree, believe that, you know, if we take away some of the work that we're doing, we'll find something higher level and harder to solve, but until that exists in people's minds, there's nothing there. And therefore, they're left with whatever they don't want to do or didn't want to do. And so that's where we tried to take a step back from all the intelligence the Klotho engine provides through that code-first Klotho. And we built out focusing on one of the pillars in the tech to create InfraCopilot, which helps with keeping or making the things that we already do much simpler but also in a way that maintains and does it in a fun way. VICTORIA: That makes sense because my understanding of where to use AI and where to use machine learning for best purposes is to automate those, like, repetitive, boring tasks and allow people to focus on the creative and more interesting work, right? ALA: Yes and no. The interesting bit about our approach to ML is that we don't actually use machine learning or ChatGPT for any of the intelligence layers, meaning we don't ask ChatGPT to generate Terraform or any kind of GPT model to analyze a certain aspect of the infrastructure. That is all deterministic and happens in the Klotho engine. That is the uniqueness of why this always works rather than if GPT happened to get it right. What we use ML for is the ability to parse the intent. So we actually use it as a language model to parse the intent from what the user is trying to convey, meaning I want a lambda with an API gateway. What we get back from our use of ML is the user has asked for a lambda, an AWS lambda, and API gateway and that they be connected. That is the only thing we get back. And that is fed into the Klotho engine. And then, we do the intelligence to translate that to an actual architecture. VICTORIA: That's a really cool way to use natural language processing to build cloud infrastructure. MID-ROLL AD: Are you an entrepreneur or start-up founder looking to gain confidence in the way forward for your idea? At thoughtbot, we know you're tight on time and investment, which is why we've created targeted 1-hour remote workshops to help you develop a concrete plan for your product's next steps. Over four interactive sessions, we work with you on research, product design sprint, critical path, and presentation prep so that you and your team are better equipped with the skills and knowledge for success. Find out how we can help you move the needle at: tbot.io/entrepreneurs. VICTORIA: I'm curious; you said you're already working on some issues about being able to suggest improvements for cost reduction and efficiencies. What else is on your roadmap for what's coming up next? AARON: So there's a bunch of things in the long-term roadmap. And I'll say that, like, in the short term, it's much more about just expanding the breadth of what we support. If you think about just generating all the different permutations and types of infrastructure, it's, like, a huge matrix problem. Like, there's many, many dimensions that you could go in. And if you add an extra cloud or you add an extra capability, it expands everything. So you can imagine, like, testing it to make sure things work, and everything becomes very complicated. So, really, a lot of what we're doing is still foundational and trying to just increase the breadth, make the intent processing more intelligent, make the other bits work. And then one of the areas right now is for our initial release of the product; we chose to use Discord as our interface for the chatbot. And the reason for that is because it gives us a lot of benefits of having sort of the community built in and the engagement built in where we can actually talk with users and try and understand what they're doing. However, we really have a lot of UI changes and expansions that we'd like to do. And even from some of our early demo material, we have things like being able to right-click and being able to configure your lambda directly from the UI. So there's a lot of areas there that we can expand into an intent, too, once we get sort of the foundational stuff done, as an example. The intelligence bit is a much bigger process, like, there's a lot of things to unpack there. So I won't talk about it too much. But if we were to just talk about the most simple things, it'd be setting up alerts somehow and then feeding into our system that, like, we're hitting those alerts, and we have to make modifications. A good example of that would be, like, configuring auto scaling on an instance for [inaudible 22:17]. So we can get some of those benefits now. The bigger vision of what we want to do with optimization requires a lot more exploration and also the ability to look at what's happening to your application while it's running in the cloud. ALA: Let me maybe shed a bit more light on the problems we're trying to solve and where we're headed. When it comes to optimization, to truly optimize a cloud application, you have to reason about it on the application level rather than on the one service level. To do that, we have to be able to look at the application as an application. And today, there's a multi-repo approach to building cloud applications. So one of the future work that we're going to do is be able to reason about existing infrastructure-as-code from different portions of the teams or organization or even multiple services that the same team works and link them together. So, when we look at reasoning about an architecture, it is within the entire context of the application rather than just the smaller bits and pieces. That's one layer. Another layer is being able to ingest the real runtime application metrics and infrastructure metrics, let's say, from AWS or Azure into the optimizer system to be able to not only say, oh well, I want low latency. Then this is hard-coded to use a Fargate instance instead of a lambda. But more realistically, being able to see what that means in lambda world and maybe increase the concurrency count. Because we know that within the confines of cost limitations or constraints that the company wants to have, it is more feasible and cost-effective to raise the minimum concurrency rate of that lambda instead of using Fargate. You can only do that by having real-time data, or aggregated data come from the performance characteristics of the applications. And so that's another layer that we're going to be focusing on. The third one is, just like Aaron said, being able to approach that editing experience and operational experience, not just through one system like InfraCopilot but also through a web UI, or an app, or even as an extension to other systems that want to integrate with Klotho's engine. The last thing that I think is key is that we're still holding on to the vision that infrastructure should be invisible to most developers. Infrastructure definition is similar to how we approach assembly code. It's the bits and pieces. It's the underlying components, the CPUs, the storage. And as long as we're building microservices in that level of fidelity, of like, thinking about the wiring and how things interconnect, then we're not going to get the gains of 10x productivity building cloud applications. We have to enable developers and operators to work on a higher abstraction. And so our end game, where we're headed, is still what we want to build with Klotho, which is the ability to write code and have it be translated into what's allowed in the infrastructure within the constraints of the underlying platforms that infrastructure or platform teams set for the rest of the organization. It can be one set or multiple sets, but it's still that type of developers develop, and the infrastructure teams set them up to be able to develop, and there's separation. VICTORIA: Those are all really interesting problems to be solving. I also saw on your roadmap that you have published on Klotho that you're thinking of open-sourcing Klotho on GitHub. AARON: So, at this point, we already have the core engine of Klotho open-sourced, so the same engine that's powering InfraCopilot and Klotho, the tool itself is open source today. So, if anyone wants to take a look, it is on github.com/klothoplatform/klotho. VICTORIA: Super interesting. And it sounds like you mentioned you have a Discord. So that's where you're also getting feedback from developers on how to do this. And I think that challenge you mentioned about creating abstractions so that developers don't have to worry as much about the infrastructure and platform teams can just enable them to get their work done; I'm curious what you think is the biggest challenge with that. It seems like a problem that a lot of companies are trying to solve. So, what's the biggest challenge? And I think what do you think is unique about Klotho and solving that challenge? AARON: I guess what I would say the biggest challenge today is that every company is different enough that they all saw this in a slightly different way. So it's like, right now, the tools that are available are the building blocks to make the solution but not the solution itself. So, like, every cloud team approaches it on, let's build our own platform. We're building our own platform that every one of our developers is going to use. In some cases, we're building, like, frameworks and SDKs that everyone's going to use. But then the problem is that you're effectively saying my company is entering the platform management business. And there's no way the economies of scale will make sense forever in that world. So I think that's the biggest issue. And I think the reason it hasn't been solved is it's just a very hard problem. There's many approaches, but there's not a clear solution that kind of brings it all together. And I think our product is positioned better than most to solve some of the higher-level abstractions. It still doesn't solve the whole problem. There's still some things that are going to be tricky. But the idea is, if you can get to the point where you're using some of our abstractions, then you've guaranteed yourself portability into the future, like, your architecture will be able to evolve, even in technologies that don't exist yet once they become available. ALA: To tack on to what Aaron said, a key difference, and to our knowledge, this doesn't exist in any other tool or technology, is a fundamentally new architecture we call adaptive architecture. It is not microservices. It is not monoliths. It's a superset that combines all the benefits from monoliths, microservices, and serverless if you consider it a different platform or paradigm. What that means is that you get the benefits without the drawbacks. And the reason we can do it is because of the compiler approach that we're taking, where everything in the architecture that we produce is interchangeable. The team has decided to use Kubernetes, a specific version of Kubernetes with Istio. That works great. And, a year later, it turns out that that choice no longer scales well for the use. And we need to use Linkerd. The problem in today's world and what companies have to do is retrofit everything and not only the technology itself, but it's the ripple effects of changing it into everything else that all the other choices that were made that depended on it. In the Klotho world, because of the compilation step or the compilation approach and its extensibility, you could say, I want to take out Istio and replace it with Linkerd. And it would percolate all the changes that need to happen everywhere for that to maintain its semantic behavior. To our knowledge, that doesn't exist anywhere today. VICTORIA: So it would do, maybe not, like, would do migrations for you as well? ALA: I think migrations are a special case. When it comes to stateless things, yes. When it comes to data, we are much more conservative. Again, bringing what we've learned in different companies in, a lot of solutions try to solve all the things versus we're trying to play in a very specific niche, which is the adaptive architecture of it all. But if you want to move data, there's fantastic tools for it, and we will guide you through getting the access to the actual underlying services and, say, great, write a migration system, or we can generate for you. But you will run it to move the data from, let's say, Postgres to MySQL or from being able to drain a unit on Kubernetes to a lambda. Some of those things are much more automatic. And the transition happened through the underlying technologies like Terraform or Pulumi. Others will require you to take a step, not because we can't do it for you but we want to be conservative with the choices. AARON: I would also add that another aspect of this is that we don't position ourselves as being the center of the universe for these teams. Like a lot of products, you kind of have to adopt the platform, and everything has to plug into it, and if you don't adhere, it doesn't work. We're trying very, very hard with our design to make it so that existing apps will continue to function like they've always functioned. If app developers want to continue using direct SDKs and managing config themselves, they can absolutely do that. And then they'll interact well with Klotho apps that are also in that same company. So we're trying to make it so that you can adopt incrementally without having to go all in. VICTORIA: So that makes a lot of sense. So it's really helpful if you're trying to swap out those stateless parts of your infrastructure and you want to make some changes there. And then, if you were going to do a data migration, it would help you and guide you to where additional tools might be needed to do that. And at your market segment, you're really focusing on having it be an additional tool, as opposed to, like, an all-encompassing platform. Did I get it all right? [crosstalk 31:07] ALA: Exactly. VICTORIA: [laughs] Cool. All right. Well, that's exciting. That's a lot of cool things that you all are working on. I'm curious how overall the workload is for you two. How big of a team do you have so far? How are you balancing out this work of creating something new and exciting that has such a broad potential scope? AARON: Yeah. So, right now, the team is currently six people. So it's Ala and I, plus four additional engineers is the current team. And in terms of, like, where we're focusing, the real answer is that it's somewhat reactive, and it's very fast. So, like, it could be, like...in fact, Copilot went from ideation to us acting on it extremely quickly. And it wasn't even in the pipeline before that. So I'd actually say the biggest challenge has been where do we sort of focus our energy to get the best results? And a lot of where we spend our time is sort of meta-process of, like, making sure we're investing in the right things. ALA: And I think that comes from both Aaron and I have been in the industry for over 15 years. We don't, you know, drop everything and now switch to something new. We're very both tactical and strategic with the pace and when we pivot. But the idea is when we decide to change and focus on something that we think will be higher value, and it's almost always rooted in the signals and hypotheses that we set out to kind of learn from, from every iteration that we go after. We are not the type that would say, "Oh, we saw this. Let's drop everything, and let's go do it." I think we've seen enough in the industry that there's a measure of knowing when to switch, and when to refocus, and what to do when these higher tidbits come, and then being able to execute aggressively when that choice or decision happens. VICTORIA: Are there any trends that you're watching right now that the outcome would influence a change in direction for you? ALA: Not technically. I think what we're seeing in the industry is there's no real approaches to solving the problem. I would say most of the solutions and trends that we're seeing are...I call them streamlined complexity. We choose a set of technologies, and we make that easy. We make the SaaS version, and it can do these workloads, and it makes that easy. But the minute you step out of the comfort zone of those tools, you're back into the nightmare that building distributed systems brings with it, and then you're back to, you know, square one. What we're trying to do is fundamentally solve the problem. And we haven't seen many at least make a lot of headway there. We are seeing a few of the startups that are starting to think in the same vein, which is the zeitgeist. And that's fantastic. We actually work with them closely to try and broaden the category. VICTORIA: Right. Do you feel that other companies who are working in a similar problem space that there is...is it competitive between each other? Or do you think it's actually more collaborative? ALA: It depends on the companies and what they're trying to achieve. Every set of companies have different incentives. So Google, Amazon, and Microsoft have, you know, are incentivized to keep you on their clouds. They may care less about what they have in there as long as you are happy to stay. So you'll see more open source being adopted. You will see Amazon trying to copy or operationalize a lot of open-source tools. Microsoft will give their...because they are working with larger companies to have more vertical solutions. Google is trying to catch up. If you look at startups, you will see some focus more on developers. You'll see others focus on infra team. So it really depends on the intersection of the companies, and then they either collaborate or they compete, depending on how it affects their strategy. In our case, we recognize that our competition is the incumbents and the current way of doing things. And so we are happy to collaborate with all the startups that are doing something in the vicinity of what we're doing, startups like Ampt, and Encore, and Winglang. And there's several others. We have our own Slack channel where we talk about, like, where we're headed or at least what we can do to support one another. VICTORIA: Great. And I wonder if that's part of your business decision to open source your product as well or if there are other factors involved. ALA: I think the biggest factor that we've seen, realistically, is the expectation in the developer community to have a core that is open source, not even the source available model but to have an open-source core that they can rely on always existing and referencing when, you know if the company disappears or Oracle buys them. And so I would say that that was the biggest determining factor in the end to open-sourcing the Klotho engine. It's a very pragmatic view. VICTORIA: That makes sense. Well, I wanted to make sure we had time to ask one of my favorite questions that I ask on the podcast, and you can both answer. But if you could go back in time to when you first started this project, what advice would you give yourself? ALA: I guess the advice that I would give is keep selling and start selling as early as you can, even before the vision is realized. Or let's say you're making kind of headway towards what you'll wind up sharing and giving companies, the lead time to creating the opportunities and the belief and the faith that you can solve problems for companies, and the entire machinery of doing that is a lot more complex than most founders, I think, or at least first-time founders or, honestly, myself have found it to be. AARON: Yeah. If I try and answer that same question, it's very challenging. I guess my perspective now is there's nothing I could tell myself that would make me go any faster because a lot of it really is the journey. Like, the amount of stuff that we've learned in the last year of working on this and exploring and talking with people and everything else has been so vast that there's nothing I can communicate to past me that would prepare me any better. So [laughs] I think I would try just my best to be encouraging to just stick with it. VICTORIA: Well, that's good. And who knows what you're going to learn in the next year that [laughs] probably might not help you in the past either? That's wonderful. Do you have any final takeaways for our listeners today or anything you'd like to promote? ALA: So, from my lens, I've always wanted to do a startup but felt that the life setting wasn't quite ready. And a lot of the startup culture is talking about younger, earlier founders. I think having had the industry experience and understanding both the organizational and technical challenges, knowing more people, and engineers, and founders, potential founders, has been vastly more helpful than what I would have been able to pull off ten years ago. So, if you are thinking maybe it's too late, it is not. It's probably easier in some regards now. And yeah, check out InfraCopilot. It's on infracopilot.io. We would love to have you try it out and go on this journey with us. AARON: Yeah, I would definitely echo that. I mean, sort of the same thing on the journey. Like, it's never too late to start. And yeah, like, I would say being in the industry and actually seeing these problems first-hand makes it so much more fulfilling to actually try and solve them. VICTORIA: That's [inaudible 38:15]. I'm excited to see what you all accomplish. And I appreciate you coming on the show. You can subscribe to the show and find notes along with a complete transcript for this episode at giantrobots.fm. If you have questions or comments, you can email us at hosts@giantrobots.fm. And you could find me on Twitter @victori_ousg or on Mastodon @vguido@thoughtbot.social. This podcast is brought to you by thoughtbot and produced and edited by Mandy Moore. Thanks for listening. See you next time. ANNOUNCER: This podcast is brought to you by thoughtbot, your expert strategy, design, development, and product management partner. We bring digital products from idea to success and teach you how because we care. Learn more at thoughtbot.com. Special Guests: Aaron Torres and Ala Shiban.
In this episode, I spoke with Eduard Bargues, who is a Principal Engineer at Ohpen, a cloud-native open banking platform.We talked about Ohpen's migration from EC2 to a mix of Fargate and Lambda, and along the way, we touched on many topics:migration patternsthe "serverless-first" mindsethow to choose when to use Lambda vs Fargatewhy monolith functions are painful and should be avoidedrecommendations and pitfallswhy you should build a DX teamWe also talked about how being cloud-agnostic makes you use the cloud in a way that is inefficient and creates many layers of unnecessary abstraction layers in your architecture, which Eduard calls "legacy lock-in"!We talked about their event-driven architecture and how they are using an in-house Event Broker to add FIFO support (which EventBridge doesn't support) for some subscribers. It's similar to what Luc van Donkersgoed talked about in episode 68. We talked about their EventBridge topology and how they manage over 200 AWS accounts, and finally, what are the biggest shortcomings with EventBridge right now.Links from the episode:Job openings at OhpenEpisode 68 with PostNLEpisode 73 with NNLuc van Donkersgoed's Twitter and LinkedInThe decoupled invocation patternFor more stories about real-world use of serverless technologies, please follow me on Twitter as @theburningmonk and subscribe to this podcast.Want to step up your AWS game and learn how to build production-ready serverless applications? Check out my upcoming workshops and I will teach you everything I know.Opening theme song:Cheery Monday by Kevin MacLeodLink: https://incompetech.filmmusic.io/song/3495-cheery-mondayLicense: http://creativecommons.org/licenses/by/4.0
Everett Berry, Growth and Open Source at Vantage, joins Corey at Screaming in the Cloud to discuss the complex world of cloud costs. Everett describes how Vantage takes a broad approach to understanding and cutting cloud costs across a number of different providers, and reveals which providers he feels generate large costs quickly. Everett also explains some of his best practices for cutting costs on cloud providers, and explores what he feels the impact of AI will be on cloud providers. Corey and Everett also discuss the pros and cons of AWS savings plans, why AWS can't be counted out when it comes to AI, and why there seems to be such a delay in upgrading instances despite the cost savings. About EverettEverett is the maintainer of ec2instances.info at Vantage. He also writes about cloud infrastructure and analyzes cloud spend. Prior to Vantage Everett was a developer advocate at Arctype, a collaborative SQL client acquired by ClickHouse. Before that, Everett was cofounder and CTO of Perceive, a computer vision company. In his spare time he enjoys playing golf, reading sci-fi, and scrolling Twitter.Links Referenced: Vantage: https://www.vantage.sh/ Vantage Cloud Cost Report: https://www.vantage.sh/cloud-cost-report Everett Berry Twitter: https://twitter.com/retttx Vantage Twitter: https://twitter.com/JoinVantage TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: LANs of the late 90's and early 2000's were a magical place to learn about computers, hang out with your friends, and do cool stuff like share files, run websites & game servers, and occasionally bring the whole thing down with some ill-conceived software or network configuration. That's not how things are done anymore, but what if we could have a 90's style LAN experience along with the best parts of the 21st century internet? (Most of which are very hard to find these days.) Tailscale thinks we can, and I'm inclined to agree. With Tailscale I can use trusted identity providers like Google, or Okta, or GitHub to authenticate users, and automatically generate & rotate keys to authenticate devices I've added to my network. I can also share access to those devices with friends and teammates, or tag devices to give my team broader access. And that's the magic of it, your data is protected by the simple yet powerful social dynamics of small groups that you trust.Try now - it's free forever for personal use. I've been using it for almost two years personally, and am moderately annoyed that they haven't attempted to charge me for what's become an essential-to-my-workflow service.Corey: Have you listened to the new season of Traceroute yet? Traceroute is a tech podcast that peels back the layers of the stack to tell the real, human stories about how the inner workings of our digital world affect our lives in ways you may have never thought of before. Listen and follow Traceroute on your favorite platform, or learn more about Traceroute at origins.dev. My thanks to them for sponsoring this ridiculous podcast. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This seems like an opportune moment to take a step back and look at the overall trend in cloud—specifically AWS—spending. And who better to do that than this week, my guest is Everett Berry who is growth in open-source over at Vantage. And they've just released the Vantage Cloud Cost Report for Q1 of 2023. Everett, thank you for joining me.Everett: Thanks for having me, Corey.Corey: I enjoy playing slap and tickle with AWS bills because I am broken in exactly that kind of way where this is the thing I'm going to do with my time and energy and career. It's rare to find people who are, I guess, similarly afflicted. So, it's great to wind up talking to you, first off.Everett: Yeah, great to be with you as well. Last Week in AWS and in particular, your Twitter account, are things that we follow religiously at Vantage.Corey: Uh-oh [laugh]. So, I want to be clear because I'm sure someone's thinking it out there, that, wait, Vantage does cloud cost optimization as a service? Isn't that what I do? Aren't we competitors? And the answer that I have to that is not by any definition that I've ever seen that was even halfway sensible.If SaaS could do the kind of bespoke consulting engagements that I do, we would not sell bespoke consulting engagements because it's easier to click button: receive software. And I also will point out that we tend to work once customers are at a certain point at scale that in many cases is a bit prohibitive for folks who are just now trying to understand what the heck's going on the first time finance has some very pointed questions about the AWS bill. That's how I see it from my perspective, anyway. Agree? Disagree?Everett: Yeah, I agree with that. I think the product solution, the system of record that companies need when they're dealing with Cloud costs ends up being a different service than the one that you guys provide. And I think actually the to work in concert very well, where you establish a cloud cost optimization practice, and then you keep it in place via software and via sort of the various reporting tools that the Vantage provide. So, I completely agree with you. In fact, in the hundreds of customers and deals that Vantage has worked on, I don't think we have ever come up against Duckbill Group. So, that tells you everything you need to know in that regard.Corey: Yeah. And what's interesting about this is that you have a different scale of visibility into the environment. We wind up dealing with a certain profile, or a couple of profiles, in our customer base. We work with dozens of companies a year; you work with hundreds. And that's bigger numbers, of course, but also in many cases at different segments of the industry.I also am somewhat fond of saying that Vantage is more focused on going broad in ways where we tend to focus on going exclusively deep. We do AWS; the end. You folks do a number of different cloud providers, you do Datadog cost visibility. I've lost track of all the different services that you wind up tracking costs for.Everett: Yeah, that's right. We just launched our 11th provider, which was OpenAI and for the first time in this report, we're actually breaking out data among the different clouds and we're comparing services across AWS, Google, and Azure. And I think it's a bit of a milestone for us because we started on AWS, where I think the cost problem is the most acute, if you will, and we've hit a point now across Azure and Google where we actually have enough data to say some interesting things about how those clouds work. But in general, we have this term, single pane of glass, which is the idea that you use 5, 6, 7 services, and you want to bundle all those costs into one report.Corey: Yeah. And that is something that we see in many cases where customers are taking a more holistic look at things. But, on some level, when people ask me, “Oh, do you focus on Google bills, too,” or Azure bills in the early days, it was, “Well, not yet. Let's take a look.” And what I was seeing was, they're spending, you know, millions or hundreds of millions, in some cases, on AWS, and oh, yeah, here's, like, a $300,000 thing we're running over on GCP is a proof-of-concept or some bizdev thing. And it's… yeah, why don't we focus on the big numbers first? The true secret of cloud economics is, you know, big numbers first rather than alphabetical, but don't tell anyone I told you that.Everett: It's pretty interesting you say that because, you know, in this graph where we break down costs across providers, you can really see that effect on Google and Azure. So, for example, the number three spending category on Google is BigQuery and I think many people would say BigQuery is kind of the jewel of the Google Cloud empire. Similarly for Azure, we actually found Databricks showing up as a top-ten service. Compare that to AWS where you just see a very routine, you know, compute, database, storage, monitoring, bandwidth, down the line. AWS still is the king of costs, if you will, in terms of, like, just running classic compute workloads. And the other services are a little bit more bespoke, which has been something interesting to see play out in our data.Corey: One thing that I've heard that's fascinating to me is that I've now heard from multiple Fortune 500 companies where the Datadog bill is now a board-level concern, given the size and scale of it. And for fun, once I modeled out all the instance-based pricing models that they have for the suite of services they offer, and at the time was three or $400 a month, per instance to run everything that they've got, which, you know, when you look at the instances that I have, costing, you know, 15, 20 bucks a month, in some cases, hmm, seems a little out of whack. And I can absolutely see that turning into an unbounded growth problem in kind of the same way. I just… I don't need to conquer the world. I'm not VC-backed. I am perfectly content at the scale that I'm at—Everett: [laugh].Corey: —with the focus on the problems that I'm focused on.Everett: Yeah, Datadog has been fascinating. It's been one of our fastest-growing providers of sort of the ‘others' category that we've launched. And I think the thing with Datadog that is interesting is you have this phrase cloud costs are all about cloud architecture and I think that's more true on Datadog than a lot of other services because if you have a model where you have, you know, thousands of hosts, and then you add-on one of Datadogs 20 services, which charges per host, suddenly your cloud bill has grown exponentially compared to probably the thing that you were after. And a similar thing happens—actually, my favorite Datadog cost recommendation is, when you have multiple endpoints, and you have sort of multiple query parameters for those endpoints, you end up in this cardinality situation where suddenly Datadog is tracking, again, like, exponentially increasing number of data points, which it's then charging to you on a usage-based model. And so, Datadog is great partners with AWS and I think it's no surprise because the two of them actually sort of go hand-in-hand in terms of the way that they… I don't want to say take ad—Corey: Extract revenue?Everett: Yeah, extract revenue. That's a good term. And, you know, you might say a similar thing about Snowflake, possibly, and the way that they do things. Like oh, the, you know, warehouse has to be on for one minute, minimum, no matter how long the query runs, and various architectural decisions that these folks make that if you were building a cost-optimized version of the service, you would probably go in the other direction.Corey: One thing that I'm also seeing, too, is that I can look at the AWS bill—and just billing data alone—and then say, “Okay, you're using Datadog, aren't you?” Like, “How did you know that?” Like, well, first, most people are secondly, CloudWatch is your number two largest service spend right now. And it's the downstream effect of hammering all the endpoints with all of the systems. And is that data you're actually using? Probably not, in some cases. It's, everyone turns on all the Datadog integrations the first time and then goes back and resets and never does it again.Everett: Yeah, I think we have this set of advice that we give Datadog folks and a lot of it is just, like, turn down the ingestion volume on your logs. Most likely, logs from 30 days ago that are correlated with some new services that you spun up—like you just talked about—are potentially not relevant anymore, for the kind of day-to-day cadence that you want to get into with your cloud spending. So yeah, I mean, I imagine when you're talking to customers, they're bringing up sort of like this interesting distinction where you may end up in a meeting room with the actual engineering team looking at the actual YAML configuration of the Datadog script, just to get a sense of like, well, what are the buttons I can press here? And so, that's… yeah, I mean, that's one reason cloud costs are a pretty interesting world is, on the surface level, you may end up buying some RIs or savings plans, but then when you really get into saving money, you end up actually changing the knobs on the services that you're talking about.Corey: That's always a fun thing when we talk to people in our sales process. It's been sord—“Are you just going to come in and tell us to buy savings plans or reserved instances?” Because the answer to that used to be, “No, that's ridiculous. That's not what we do.” But then we get into environments and find they haven't bought any of those things in 18 months.Everett: [laugh].Corey: —and it's well… okay, that's step two. Step one is what are you using you shouldn't be? Like, basically measure first then cut as opposed to going the other direction and then having to back your way into stuff. Doesn't go well.Everett: Yeah. One of the things that you were discussing last year that I thought was pretty interesting was the gp3 volumes that are now available for RDS and how those volumes, while they offer a nice discount and a nice bump in price-to-performance on EC2, actually don't offer any of that on RDS except for specific workloads. And so, I think that's the kind of thing where, as you're working with folks, as Vantage is working with people, the discussion ends up in these sort of nuanced niche areas, and that's why I think, like, these reports, hopefully, are helping people get a sense of, like, well, what's normal in my architecture or where am I sort of out of bounds? Oh, the fact that I'm spending most of my bill on NAT gateways and bandwidth egress? Well, that's not normal. That would be something that would be not typical of what your normal AWS user is doing.Corey: Right. There's always a question of, “Am I normal?” is one of the first things people love to ask. And it comes in different forms. But it's benchmarking. It's, okay, how much should it cost us to service a thousand monthly active users? It's like, there's no good way to say that across the board for everyone.Everett: Yeah. I like the model of getting into the actual unit costs. I have this sort of vision in my head of, you know, if I'm Uber and I'm reporting metrics to the public stock market, I'm actually reporting a cost to serve a rider, a cost to deliver an Uber Eats meal, in terms of my cloud spend. And that sort of data is just ridiculously hard to get to today. I think it's what we're working towards with Vantage and I think it's something that with these Cloud Cost Reports, we're hoping to get into over time, where we're actually helping companies think about well, okay, within my cloud spend, it's not just what I'm spending on these different services, there's also an idea of how much of my cost to deliver my service should be realized by my cloud spending.Corey: And then people have the uncomfortable realization that wait, my bill is less a function of number of customers I have but more the number of engineers I've hired. What's going on with that?Everett: [laugh]. Yeah, it is interesting to me just how many people end up being involved in this problem at the company. But to your earlier point, the cloud spending discussion has really ramped up over the past year. And I think, hopefully, we are going to be able to converge on a place where we are realizing the promise of the cloud, if you will, which is that it's actually cheaper. And I think what these reports show so far is, like, we've still got a long ways to go for that.Corey: One thing that I think is opportune about the timing of this recording is that as of last week, Amazon wound up announcing their earnings. And Andy Jassy has started getting on the earnings calls, which is how you know it's bad because the CEO of Amazon never deigned to show up on those things before. And he said that a lot of AWS employees are focused and spending their time on helping customers lower their AWS bills. And I'm listening to this going, “Oh, they must be talking to different customers than the ones that I'm talking to.” Are you seeing a lot of Amazonian involvement in reducing AWS bills? Because I'm not and I'm wondering where these people are hiding.Everett: So, we do see one thing, which is reps pushing savings plans on customers, which in general, is great. It's kind of good for everybody, it locks people into longer-term spend on Amazon, it gets them a lower rate, savings plans have some interesting functionality where they can be automatically applied to the area where they offer the most discount. And so, those things are all positive. I will say with Vantage, we're a cloud cost optimization company, of course, and so when folks talk to us, they often already have talked to their AWS rep. And the classic scenario is, that the rep passes over a large spreadsheet of options and ways to reduce costs, but for the company, that spreadsheet may end up being quite a ways away from the point where they actually realize cost savings.And ultimately, the people that are working on cloud cost optimization for Amazon are account reps who are comped by how much cloud spending their accounts are using on Amazon. And so, at the end of the day, some of the, I would say, most hard-hitting optimizations that you work on that we work on, end up hitting areas where they do actually reduce the bill which ends up being not in the account manager's favor. And so, it's a real chicken-and-egg game, except for savings plans is one area where I think everybody can kind of work together.Corey: I have found that… in fairness, there is some defense for Amazon in this but their cost-cutting approach has been rightsizing instances, buy some savings plans, and we are completely out of ideas. Wait, can you switch to Graviton and/or move to serverless? And I used to make fun of them for this but honestly that is some of the only advice that works across the board, irrespective in most cases, of what a customer is doing. Everything else is nuanced and it depends.That's why in some cases, I find that I'm advising customers to spend more money on certain things. Like, the reason that I don't charge percentage of savings in part is because otherwise I'm incentivized to say things like, “Backups? What are you, some kind of coward? Get rid of them.” And that doesn't seem like it's going to be in the customer's interest every time. And as soon as you start down that path, it starts getting a little weird.But people have asked me, what if my customers reach out to their account teams instead of talking to us? And it's, we do bespoke consulting engagements; I do not believe that we have ever had a client who did not first reach out to their account team. If the account teams were capable of doing this at the level that worked for customers, I would have to be doing something else with my business. It is not something that we are seeing hit customers in a way that is effective, and certainly not at scale. You said—as you were right on this—that there's an element here of account managers doing this stuff, there's an [unintelligible 00:15:54] incentive issue in part, but it's also, quality is extraordinarily uneven when it comes to these things because it is its own niche and a lot of people focus in different areas in different ways.Everett: Yeah. And to the areas that you brought up in terms of general advice that's given, we actually have some data on this in this report. In particular Graviton, this is something we've been tracking the whole time we've been doing these reports, which is the past three quarters and we actually are seeing Graviton adoption start to increase more rapidly than it was before. And so, for this last quarter Q1, we're seeing 5% of our costs that we're measuring on EC2 coming from Graviton, which is up from, I want to say 2% the previous quarter, and, like, less than 1% the quarter before. The previous quarter, we also reported that Lambda costs are now majority on ARM among the Vantage customer base.And that one makes some sense to me just because in most cases with Lambda, it's a flip of a switch. And then to your archival point on backups, this is something that we report in this one is that intelligent tiering, which we saw, like, really make an impact for folks towards the end of last year, the numbers for that were flat quarter over quarter. And so, what I mean by that is, we reported that I think, like, two-thirds of our S3 costs are still in the standard storage tier, which is the most expensive tier. And folks have enabled S3 intelligent tiering, which moves your data to progressively cheaper tiers, but we haven't seen that increase this quarter. So, it's the same number as it was last quarter.And I think speaks to what you're talking about with a ceiling on some cost optimization techniques, where it's like, you're not just going to get rid of all your backups; you're not just going to get rid of your, you know, Amazon WorkSpaces archived desktop snapshots that you need for some HIPAA compliance reason. Those things have an upper limit and so that's where, when the AWS rep comes in, it's like, as they go through the list of top spending categories, the recommendations they can give start to provide diminishing returns.Corey: I also think this is sort of a law of large numbers issue. When you start seeing a drop off in the growth rate of large cloud providers, like, there's a problem, in that there are only so many exabyte scale workloads that can be moved inside of a given quarter into the cloud. You're not going to see the same unbounded infinite growth that you would expect mathematically. And people lose their minds when they start to see those things pointed out, but the blame that oh, that's caused by cost optimization efforts, with respect, bullshit it is. I have seen customers devote significant efforts to reducing their AWS bills and it takes massive amounts of work and even then they don't always succeed in getting there.It gets better, but they still wind up a year later, having spent more on a month-by-month basis than they did when they started. Sure they understand it better and it's organic growth that's driving it and they've solved the low hanging fruit problem, but there is a challenge in acting as a boundary for what is, in effect, an unbounded growth problem.Everett: Yeah. And speaking to growth, I thought Microsoft had the most interesting take on where things could happen next quarter, and that, of course, is AI. And so, they attributed, I think it was, 1% of their guidance regarding 26 or 27% growth for Q2 Cloud revenue and it attributed 1% of that to AI. And I think Amazon is really trying to be in the room for those discussions when a large enterprise is talking about AI workloads because it's one of the few remaining cloud workloads that if it's not in the cloud already, is generating potentially massive amounts of growth for these guys.And so, I'm not really sure if I believe the 1% number. I think Microsoft may be having some fun with the fact that, of course, OpenAI is paying them for acting as a cloud provider for ChatGPT and further API, but I do think that AWS, although they were maybe a little slow to the game, they did, to their credit, launch a number of AI services that I'm excited to see if that contributes to the cost that we're measuring next quarter. We did measure, for the first time, a sudden increase on those new [Inf1 00:20:17] EC2 instances, which are optimized for machine learning. And I think if AWS can have success moving customers to those the way they have with Graviton, then that's going to be a very healthy area of growth for them.Corey: I'll also say that it's pretty clear to me that Amazon does not know what it's doing in its world of machine-learning-powered services. I use Azure for the [unintelligible 00:20:44] clients I built originally for Twitter, then for Mastodon—I'm sure Bluesky is coming—but the problem that I'm seeing there is across the board, start to finish, that there is no cohesive story from the AWS side of here's a picture tell me what's in it and if it's words, describe it to me. That's a single API call when we go to Azure. And the more that Amazon talks about something, I find, the less effective they're being in that space. And they will not stop talking about machine learning. Yes, they have instances that are powered by GPUs; that's awesome. But they're an infrastructure provider and moving up the stack is not in their DNA. But that's where all the interest and excitement and discussion is going to be increasingly in the AI space. Good luck.Everett: I think it might be something similar to what you've talked about before with all the options to run containers on AWS. I think they today have a bit of a grab bag of services and they may actually be looking forward to the fact that they're these truly foundational models which let you do a number of tasks, and so they may not need to rely so much on you know, Amazon Polly and Amazon Rekognition and sort of these task-specific services, which to date, I'm not really sure of the takeoff rates on those. We have this cloud costs leaderboard and I don't think you would find them in the top 50 of AWS services. But we'll see what happens with that.AWS I think, ends up being surprisingly good at sticking with it. I think our view is that they probably have the most customer spend on Kubernetes of any major cloud, even though you might say Google at first had the lead on Kubernetes and maybe should have done more with GKE. But to date, I would kind of agree with your take on AI services and I think Azure is… it's Azure's to lose for the moment.Corey: I would agree. I think the future of the cloud is largely Azure's to lose and it has been for a while, just because they get user experience, they get how to talk to enterprises. I just… I wish they would get security a little bit more effectively, and if failing that, communicating with their customers about security more effectively. But it's hard for a leopard to change its spots. Microsoft though has demonstrated an ability to change their nature multiple times, in ways that I would have bet were impossible. So, I just want to see them do it again. It's about time.Everett: Yeah, it's been interesting building on Azure for the past year or so. I wrote a post recently about, kind of, accessing billing data across the different providers and it's interesting in that every cloud provider is unique in the way that it simply provides an external endpoint for downloading your billing data, but Azure is probably one of the easiest integrations; it's just a REST API. However, behind that REST API are, like, years and years of different ways to pay Microsoft: are you on a pay-as-you-go plan, are you on an Azure enterprise plan? So, there's all this sort of organizational complexity hidden behind Azure and I think sometimes it rears its ugly head in a way that stringing together services on Amazon may not, even if that's still a bear in and of itself, if you will.Corey: Any other surprises that you found in the Cloud Cost Report? I mean, looking through it, it seems directionally aligned with what I see in my environments with customers. Like for example, you're not going to see Kubernetes showing up as a line item on any of these things just because—Everett: Yeah.Corey: That is indistinguishable from a billing perspective when we're looking at EC2 spend versus control plane spend. I don't tend to [find 00:24:04] too much that's shocking me. My numbers are of course, different percentage-wise, but surprise, surprise, different companies doing different things doing different percentages, I'm sure only AWS knows for sure.Everett: Yeah, I think the biggest surprise was just the—and, this could very well just be kind of measurement method, but I really expected to see AI services driving more costs, whether it was GPU instances, or AI-specific services—which we actually didn't report on at all, just because they weren't material—or just any indication that AI was a real driver of cloud spending. But I think what you see instead is sort of the same old folks at the top, and if you look at the breakdown of services across providers, that's, you know, compute, database, storage, bandwidth, monitoring. And if you look at our percentage of AI costs as a percentage of EC2 costs, it's relatively flat, quarter over quarter. So, I would have thought that would have shown up in some way in our data and we really didn't see it.Corey: It feels like there's a law of large numbers things. Everyone's talking about it. It's very hype right now—Everett: Yeah.Corey: But it's also—you talk to these companies, like, “Okay, we have four exabytes of data that we're storing and we have a couple 100,000 instances at any given point in time, so yeah, we're going to start spending $100,000 a month on our AI adventures and experiments.” It's like, that's just noise and froth in the bill, comparatively.Everett: Exactly, yeah. And so, that's why I think Microsoft's thought about AI driving a lot of growth in the coming quarters is, we'll see how that plays out, basically. The one other thing I would point to is—and this is probably not surprising, maybe, for you having been in the infrastructure world and seeing a lot of this, but for me, just seeing the length of time it takes companies to upgrade their instance cycles. We're clocking in at almost three years since the C6 series instances have been released and for just now seeing C6 and R6 start to edge above 10% of our compute usage. I actually wonder if that's just the stranglehold that Intel has on cloud computing workloads because it was only last year around re:Invent that the C6in and the Intel version of the C6 series instances had been released. So, I do think in general, there's supposed to be a price-to-performance benefit of upgrading your instances, and so sometimes it surprises me to see how long it takes companies to get around to doing that.Corey: Generation 6 to 7 is also 6% more expensive in my sampling.Everett: Right. That's right. I think Amazon has some work to do to actually make that price-to-performance argument, sort of the way that we were discussing with gp2 versus gp3 volumes. But yeah, I mean, other than that, I think, in general, my view is that we're past the worst of it, if you will, for cloud spending. Q4 was sort of a real letdown, I think, in terms of the data we had and the earnings that these cloud providers had and I think Q1 is actually everyone looking forward to perhaps what we call out at the beginning of the report, which is a return to normal spend patterns across the cloud.Corey: I think that it's going to be an interesting case. One thing that I'm seeing that might very well explain some of the reluctance to upgrade EC2 instances has been that a lot of those EC2 instances are databases. And once those things are up and running and working, people are hesitant to do too much with them. One of the [unintelligible 00:27:29] roads that I've seen of their savings plan approach is that you can migrate EC2 spend to Fargate to Lambda—and that's great—but not RDS. You're effectively leaving a giant pile of money on the table if you've made a three-year purchase commitment on these things. So, all right, we're not going to be in any rush to migrate to those things, which I think is AWS getting in its own way.Everett: That's exactly right. When we encounter customers that have a large amount of database spend, the most cost-effective option is almost always basically bare-metal EC2 even with the overhead of managing the backup-restore scalability of those things. So, in some ways, that's a good thing because it means that you can then take advantage of the, kind of, heavy committed use options on EC2, but of course, in other ways, it's a bit of a letdown because, in the ideal case, RDS would scale with the level of workloads and the economics would make more sense, but it seems that is really not the case.Corey: I really want to thank you for taking the time to come on the show and talk to me. I'll include a link in the [show notes 00:28:37] to the Cost Report. One thing I appreciate is the fact that it doesn't have one of those gates in front of it of, your email address, and what country you're in, and how can our salespeople best bother you. It's just, here's a link to the PDF. The end. So, thanks for that; it's appreciated. Where else can people go to find you?Everett: So, I'm on Twitter talking about cloud infrastructure and AI. I'm at@retttx, that's R-E-T-T-T-X. And then of course, Vantage also did quick hot-takes on this report with a series of graphs and explainers in a Twitter thread and that's @JoinVantage.Corey: And we will, of course, put links to that in the [show notes 00:29:15]. Thank you so much for your time. I appreciate it.Everett: Thanks, Corey. Great to chat.Corey: Everett Berry, growth in open-source at Vantage. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that will increase its vitriol generation over generation, by approximately 6%.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Eswar Bala, Director of Amazon EKS at AWS, joins Corey on Screaming in the Cloud to discuss how and why AWS built a Kubernetes solution, and what customers are looking for out of Amazon EKS. Eswar reveals the concerns he sees from customers about the cost of Kubernetes, as well as the reasons customers adopt EKS over ECS. Eswar gives his reasoning on why he feels Kubernetes is here to stay and not just hype, as well as how AWS is working to reduce the complexity of Kubernetes. Corey and Eswar also explore the competitive landscape of Amazon EKS, and the new product offering from Amazon called Karpenter.About EswarEswar Bala is a Director of Engineering at Amazon and is responsible for Engineering, Operations, and Product strategy for Amazon Elastic Kubernetes Service (EKS). Eswar leads the Amazon EKS and EKS Anywhere teams that build, operate, and contribute to the services customers and partners use to deploy and operate Kubernetes and Kubernetes applications securely and at scale. With a 20+ year career in software , spanning multimedia, networking and container domains, he has built greenfield teams and launched new products multiple times.Links Referenced: Amazon EKS: https://aws.amazon.com/eks/ kubernetesthemuchharderway.com: https://kubernetesthemuchharderway.com kubernetestheeasyway.com: https://kubernetestheeasyway.com EKS documentation: https://docs.aws.amazon.com/eks/ EKS newsletter: https://eks.news/ EKS GitHub: https://github.com/aws/eks-distro TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. Today's promoted guest episode is brought to us by our friends at Amazon. Now, Amazon is many things: they sell underpants, they sell books, they sell books about underpants, and underpants featuring pictures of books, but they also have a minor cloud computing problem. In fact, some people would call them a cloud computing company with a gift shop that's attached. Now, the problem with wanting to work at a cloud company is that their interviews are super challenging to pass.If you want to work there, but can't pass the technical interview for a long time, the way to solve that has been, “Ah, we're going to run Kubernetes so we get to LARP as if we worked at a cloud company but don't.” Eswar Bala is the Director of Engineering for Amazon EKS and is going to basically suffer my slings and arrows about one of the most complicated, and I would say overwrought, best practices that we're seeing industry-wide. Eswar, thank you for agreeing to subject yourself to this nonsense.Eswar: Hey, Corey, thanks for having me here.Corey: [laugh]. So, I'm a little bit unfair to Kubernetes because I wanted to make fun of it and ignore it. But then I started seeing it in every company that I deal with in one form or another. So yes, I can still sit here and shake my fist at the tide, but it's turned into, “Old Man Yells at Cloud,” which I'm thrilled to embrace, but everyone's using it. So, EKS has recently crossed, I believe, the five-year mark since it was initially launched. What is EKS other than Amazon's own flavor of Kubernetes?Eswar: You know, the best way I can define EKS is, EKS is just Kubernetes. Not Amazon's version of Kubernetes. It's just Kubernetes that we get from the community and offer it to customers to make it easier for them to consume. So, EKS. I've been with EKS from the very beginning when we thought about offering a managed Kubernetes service in 2017.And at that point, the goal was to bring Kubernetes to enterprise customers. So, we have many customers telling us that they want us to make their life easier by offering a managed version of Kubernetes that they've actually beginning to [erupt 00:02:42] at that time period, right? So, my goal was to figure it out, what does that service look like and which customer base should be targeting service towards.Corey: Kelsey Hightower has a fantastic learning tool out there in a GitHub repo called, “Kubernetes the Hard Way,” where he talks you through building the entire thing, start to finish. I wound up forking it and doing that on top of AWS, and you can find that at kubernetesthemuchharderway.com. And that was fun.And I went through the process and my response at the end was, “Why on earth would anyone ever do this more than once?” And we got that sorted out, but now it's—customers aren't really running these things from scratch. It's like the Linux from Scratch project. Great learning tool; probably don't run this in production in the same way that you might otherwise because there are better ways to solve for the problems that you will have to solve yourself when you're building these things from scratch. So, as I look across the ecosystem, it feels like EKS stands in the place of the heavy, undifferentiated lifting of running the Kubernetes control plane so customers functionally don't have to. Is that an effective summation of this?Eswar: That is precisely right. And I'm glad you mentioned, “Kubernetes the Hard Way,” I'm a big fan of that when it came out. And if anyone who did that tutorial, and also your tutorial, “Kubernetes the Harder Way,” would walk away thinking, “Why would I pick this technology when it's super complicated to setup?” But then you see that customers love Kubernetes and you see that reflected in the adoption, even in 2016, 2017 timeframes.And the reason is, it made life easier for application developers in terms of offering web services that they wanted to offer to their customer base. And because of all the features that Kubernetes brought on, application lifecycle management, service discoveries, and then it evolved to support various application architectures, right, in terms of stateless services, stateful applications, and even daemon sets, right, like for running your logging and metrics agents. And these are powerful features, at the end of the day, and that's what drove Kubernetes. And because it's super hard to get going to begin with and then to operate, the day-two operator experience is super complicated.Corey: And the day one experience is super hard and the day two experience of, “Okay, now I'm running it and something isn't working the way it used to. Where do I start,” has been just tremendously overwrought. And frankly, more than a little intimidating.Eswar: Exactly. Right? And that exactly was our opportunity when we started in 2017. And when we started, there was question on, okay, should we really build a service when you have an existing service like ECS in place? And by the way, like, I did work in ECS before I started working in EKS from the beginning.So, the answer then was, it was about giving what customers want. And their space for many container orchestration systems, right, ECS was the AWS service at that point in time. And our thinking was, how do we give customers what they wanted? They wanted a Kubernetes solution. Let's go build that. But we built it in a way that we remove the undifferentiated heavy lifting of managing Kubernetes.Corey: One of the weird things that I find is that everyone's using Kubernetes, but I don't see it in the way that I contextualize the AWS universe, which of course, is on the bill. That's right. If you don't charge for something in AWS Lambda, and preferably a fair bit, I don't tend to know it exists. Like, “What's an IAM and what might that possibly do?” Always have reassuring thing to hear from someone who's often called an expert in this space. But you know, if it doesn't cost money, why do I pay attention to it?The control plane is what EKS charges for, unless you're running a bunch of Fargate-managed pods and containers to wind up handling those things. So, it mostly just shows up as an addenda to the actual big, meaty portions of the belt. It just looks like a bunch of EC2 instances with some really weird behavior patterns, particularly with regard to auto-scaling and crosstalk between all of those various nodes. So, it's a little bit of a murder mystery, figuring out, “So, what's going on in this environment? Do you folks use containers at all?” And the entire Kubernetes shop is looking at me like, “Are you simple?”No, it's just I tend to disregard the lies that customers say, mostly to themselves because everyone has this idea of what's going on in their environment, but the bill speaks. It's always been a little bit of an investigation to get to the bottom of anything that involves Kubernetes at significant points of scale.Eswar: Yeah, you're right. Like if you look at EKS, right, like, we started with managing the control plane to begin with. And managing the control plane is a drop in the bucket when you actually look at the costs in terms of operating a Kubernetes cluster or running a Kubernetes cluster. When you look at how our customers use and where they spend most of their cost, it's about where their applications run; it's actually the Kubernetes data plane and the amount of compute and memory that the applications end of using end up driving 90% of the cost. And beyond that is the storage, beyond that as a networking costs, right, and then after that is the actual control plane costs. So, the problem right now is figuring out, how do we optimize our costs for the application to run on?Corey: On some level, it requires a little bit of understanding of what's going on under the hood. There have been a number of cost optimization efforts that have been made in the Kubernetes space, but they tend to focus around stuff that I find relatively, well, I call it banal because it basically is. You're looking at the idea of, okay, what size instances should you be running, and how well can you fill them and make sure that all the resources per node wind up being taken advantage of? But that's also something that, I guess from my perspective, isn't really the interesting architectural point of view. Whether or not you're running a bunch of small instances or a few big ones or some combination of the two, that doesn't really move the needle on any architectural shift, whereas ingesting a petabyte a month of data and passing 50 petabytes back and forth between availability zones, that's where it starts to get really interesting as far as tracking that stuff down.But what I don't see is a whole lot of energy or effort being put into that. And I mean, industry-wide, to be clear. I'm not attempting to call out Amazon specifically on this. That's [laugh] not the direction I'm taking this in. For once. I know, I'm still me. But it seems to be just an industry-wide issue, where zone affinity for Kubernetes has been a very low priority item, even on project roadmaps on the Kubernetes project.Eswar: Yeah, the Kubernetes does provide ability for customers to restrict their workloads within as particular [unintelligible 00:09:20], right? Like, there is constraints that you can place on your pod specs that end up driving applications towards a particular AZ if they want, right? You're right, it's still left to the customers to configure. Just because there's a configuration available doesn't mean the customers use it. If it's not defaulted, most of the time, it's not picked up.That's where it's important for service providers—like EKS—to offer ability to not only provide the visibility by means of reporting that it's available using tools like [Cue Cards 00:09:50] and Amazon Billing Explorer but also provide insights and recommendations on what customers can do. I agree that there's a gap today. For example in EKS, in terms of that. Like, we're slowly closing that gap and it's something that we're actively exploring. How do we provide insights across all the resources customers end up using from within a cluster? That includes not just compute and memory, but also storage and networking, right? And that's where we are actually moving towards at this point.Corey: That's part of the weird problem I've found is that, on some level, you get to play almost data center archaeologists when you start exploring what's going on in these environments. I found one of the only reliable ways to get answers to some of this stuff has been oral tradition of, “Okay, this Kubernetes cluster just starts hurling massive data quantities at 3 a.m. every day. What's causing that?” And it leads to, “Oh, no no, have you talked to the data science team,” like, “Oh, you have a data science team. A common AWS billing mistake.” And exploring down that particular path sometimes pays dividends. But there's no holistic way to solve that globally. Today. I'm optimistic about tomorrow, though.Eswar: Correct. And that's where we are spending our efforts right now. For example, we recently launched our partnership with Cue Cards, and Cue Cards is now available as an add-on from the Marketplace that you can easily install and provision on Kubernetes EKS clusters, for example. And that is a start. And Cue Cards is amazing in terms of features, in terms of insight it offers, right, it looking into computer, the memory, and the optimizations and insights it provides you.And we are also working with the AWS Cost and Usage Reporting team to provide a native AWS solution for the cost reporting and the insights aspect as well in EKS. And it's something that we are going to be working really closely to solve the networking gaps in the near future.Corey: What are you seeing as far as customer concerns go, with regard to cost and Kubernetes? I see some things, but let's be very clear here, I have a certain subset of the market that I spend an inordinate amount of time speaking to and I always worry that what I'm seeing is not holistically what's going on in the broader market. What are you seeing customers concerned about?Eswar: Well, let's start from the fundamentals here, right? Customers really want to get to market faster, whatever services and applications that they want to offer. And they want to have it cheaper to operate. And if they're adopting EKS, they want it cheaper to operate in Kubernetes in the cloud. They also want a high performance, they also want scalability, and they want security and isolation.There's so many parameters that they have to deal with before they put their service on the market and continue to operate. And there's a fundamental tension here, right? Like they want cost efficiency, but they also want to be available in the market quicker and they want performance and availability. Developers have uptime, SLOs, and SLAs is to consider and they want the maximum possible resources that they want. And on the other side, you've got financial leaders and the business leaders who want to look at the spending and worry about, like, okay, are we allocating our capital wisely? And are we allocating where it makes sense? And are we doing it in a manner that there's very little wastage and aligned with our customer use, for example? And this is where the actual problems arise from [unintelligible 00:13:00].Corey: I want to be very clear that for a long time, one of the most expensive parts about running Kubernetes has not been the infrastructure itself. It's been the people to run this responsibly, where it's the day two, day three experience where for an awful lot of companies like, oh, we're moving to Kubernetes because I don't know we read it in an in-flight magazine or something and all the cool kids are doing it, which honestly during the pandemic is why suddenly everyone started making better IT choices because they're execs were not being exposed to airport ads. I digress. The point, though, is that as customers are figuring this stuff out and playing around with it, it's not sustainable that every company that wants to run Kubernetes can afford a crack SRE team that is individually incredibly expensive and collectively staggeringly so. That it seems to be the real cost is the complexity tied to it.And EKS has been great in that it abstracts an awful lot of the control plane complexity away. But I still can't shake the feeling that running Kubernetes is mind-bogglingly complicated. Please argue with me and tell me I'm wrong.Eswar: No, you're right. It's still complicated. And it's a journey towards reducing the complexity. When we launched EKS, we launched only with managing the control plane to begin with. And that's where we started, but customers had the complexity of managing the worker nodes.And then we evolved to manage the Kubernetes worker nodes in terms two products: we've got Managed Node Groups and Fargate. And then customers moved on to installing more agents in their clusters before they actually installed their business applications, things like Cluster Autoscaler, things like Metric Server, critical components that they have come to rely on, but doesn't drive their business logic directly. They are supporting aspects of driving core business logic.And that's how we evolved into managing the add-ons to make life easier for our customers. And it's a journey where we continue to reduce the complexity of making it easier for customers to adopt Kubernetes. And once you cross that chasm—and we are still trying to cross it—once you cross it, you have the problem of, okay so, adopting Kubernetes is easy. Now, we have to operate it, right, which means that we need to provide better reporting tools, not just for costs, but also for operations. Like, how easy it is for customers to get to the application level metrics and how easy it is for customers to troubleshoot issues, how easy for customers to actually upgrade to newer versions of Kubernetes. All of these challenges come out beyond day one, right? And those are initiatives that we have in flight to make it easier for customers [unintelligible 00:15:39].Corey: So, one of the things I see when I start going deep into the Kubernetes ecosystem is, well, Kubernetes will go ahead and run the containers for me, but now I need to know what's going on in various areas around it. One of the big booms in the observability space, in many cases, has come from the fact that you now need to diagnose something in a container you can't log into and incidentally stopped existing 20 minutes for you got the alert about the issue, so you'd better hope your telemetry is up to snuff. Now, yes, that does act as a bit of a complexity burden, but on the other side of it, we don't have to worry about things like failed hard drives taking systems down anymore. That has successfully been abstracted away by Kubernetes, or you know, your cloud provider, but that's neither here nor there these days. What are you seeing as far as, effectively, the sidecar pattern, for example of, “Oh, you have too many containers and need to manage them? Have you considered running more containers?” Sounds like something a container salesman might say.Eswar: So, running containers demands that you have really solid observability tooling, things that you're able to troubleshoot—successfully—debug without the need to log into the containers itself. In fact, that's an anti-pattern, right? You really don't want a container to have the ability to SSH into a particular container, for example. And to be successful at it demands that you publish your metrics and you publish your logs. All of these are things that a developer needs to worry about today in order to adopt containers, for example.And it's on the service providers to actually make it easier for the developers not to worry about these. And all of these are available automatically when you adopt a Kubernetes service. For example, in EKS, we are working with our managed Prometheus service teams inside Amazon, right—and also CloudWatch teams—to easily enable metrics and logging for customers without having to do a lot of heavy lifting.Corey: Let's talk a little bit about the competitive landscape here. One of my biggest competitors in optimizing AWS bills is Microsoft Excel, specifically, people are going to go ahead and run it themselves because, “Eh, hiring someone who's really good at this, that sounds expensive. We can screw it up for half the cost.” Which is great. It seems to me that one of your biggest competitors is people running their own control plane, on some level.I don't tend to accept the narrative that, “Oh, EKS is expensive that winds up being what 35 bucks or 70 bucks or whatever it is per control plane per cluster on a monthly basis.” Okay, yes, that's expensive if you're trying to stay completely within a free tier perhaps, but if you're running anything that's even slightly revenue-generating or a for-profit company, you will spend far more than that just on people's time. I have no problems—for once—with the EKS pricing model, start to finish. Good work on that. You've successfully nailed it. But are you seeing significant pushback from the industry of, “Nope, we're going to run our own Kubernetes management system instead because we enjoy pain, corporately speaking.”Eswar: Actually, we are in a good spot there, right? Like, at this point, customers who choose to run Kubernetes on AWS by themselves and not adopt EKS just fall into one main category, so—or two main categories: number one, they have existing technical stack built on running Kubernetes on themselves and they'd rather maintain that and not moving to EKS. Or they demand certain custom configurations of the Kubernetes control plane that EKS doesn't support. And those are the only two reasons why we see customers not moving into EKS and prefer to run their own Kubernetes on AWS clusters.[midroll 00:19:46]Corey: It really does seem, on some level, like there's going to be a… I don't want to say reckoning because that makes it sound vaguely ominous and that's not the direction that I intend for things to go in, but there has to be some form of collapsing of the complexity that is inherent to all of this because the entire industry has always done that. An analogy that I fall back on because I've seen this enough times to have the scars to show for it is that in the '90s, running a web server took about a week of spare time and an in-depth knowledge of GCC compiler flags. And then it evolved to ah, I could just unzip a tarball of precompiled stuff, and then RPM or Deb became a thing. And then Yum, or something else, or I guess apt over in the Debian land to wind up wrapping around that. And then you had things like Puppet where it was it was ensure installed. And now it's Docker Run.And today, it's a checkbox in the S3 console that proceeds to yell at you because you're making a website public. But that's neither here nor there. Things don't get harder with time. But I've been surprised by how I haven't yet seen that sort of geometric complexity collapsing of around Kubernetes to make it easier to work with. Is that coming or are we going to have to wait for the next cycle of things?Eswar: Let me think. I actually don't have a good answer to that, Corey.Corey: That's good, at least because if you did, I'd worried that I was just missing something obvious. That's kind of the entire reason I ask. Like, “Oh, good. I get to talk to smart people and see what they're picking up on that I'm absolutely missing.” I was hoping you had an answer, but I guess it's cold comfort that you don't have one off the top of your head. But man, is it confusing.Eswar: Yeah. So, there are some discussions in the community out there, right? Like, it's Kubernetes the right layer to do interact? And there are some tooling that's built on top of Kubernetes, for example, Knative that tries to provide a serverless layer on top of Kubernetes, for example. There are also attempts at abstracting Kubernetes completely and providing tooling that just completely removes any sort of Kubernetes API out of the picture and maybe a specific CI/CD-based solution that takes it from the source and deploys the service without even showing you that there's Kubernetes underneath, right?All of these are evolutions that are being tested out there in the community. Time will tell whether these end up sticking. But what's clear here is the gravity around Kubernetes. All sorts of tooling that gets built on top of Kubernetes, all the operators, all sorts of open-source initiatives that are built to run on Kubernetes. For example, Spark, for example, Cassandra, so many of these big, large-scale, open-source solutions are now built to run really well on Kubernetes. And that is the gravity that's pushing Kubernetes at this point.Corey: I'm curious to get your take on one other, I would consider interestingly competitive spaces. Now, because I have a domain problem, if you go to kubernetestheeasyway.com, you'll wind up on the ECS marketing page. That's right, the worst competition in the world: the people who work down the hall from you.If someone's considering using ECS, Elastic Container Service versus EKS, Elastic Kubernetes Service, what is the deciding factor when a customer's making that determination? And to be clear, I'm not convinced there's a right or wrong answer. But I am curious to get your take, given that you have a vested interest, but also presumably don't want to talk complete smack about your colleagues. But feel free to surprise me.Eswar: Hey, I love ECS, by the way. Like I said, I started my life in the AWS in ECS. So look, ECS is a hugely successful container orchestration service. I know we talk a lot about Kubernetes, I know there's a lot of discussions around Kubernetes, but I wouldn't make it a point that, like, ECS is a hugely successful service. Now, what determines how customers go to?If customers are… if the customers tech stack is entirely on AWS, right, they use a lot of AWS services and they want an easy way to get started in the container world that has really tight integration with other AWS services without them having to configure a lot, ECS is the way, right? And customers have actually seen terrific success adopting ECS for that particular use case. Whereas EKS customers, they start with, “Okay, I want an open-source solution. I really love Kubernetes. I lo—or, I have a tooling that I really like in the open-source land that really works well with Kubernetes. I'm going to go that way.” And those kind of customers end up picking EKS.Corey: I feel like, on some level, Kubernetes has become the most the default API across a wide variety of environments. AWS obviously, but on-prem other providers. It seems like even the traditional VPS companies out there that offer just rent-a-server in the cloud somewhere are all also offering, “Oh, and we have a Kubernetes service as well.” I wound up backing a Kickstarter project that runs a Kubernetes cluster with a shared backplane across a variety of Raspberries Pi, for example. And it seems to be almost everywhere you look.Do you think that there's some validity to that approach of effectively whatever it is that we're going to wind up running in the future, it's going to be done on top of Kubernetes or do you think that that's mostly hype-driven these days?Eswar: It's definitely not hype. Like we see the proof in the kind of adoption we see. It's becoming the de facto container orchestration API. And with all the tooling, open-source tooling that's continuing to build on top of Kubernetes, CNCF tooling ecosystem that's actually spawned to actually support Kubernetes at option, all of this is solid proof that Kubernetes is here to stay and is a really strong, powerful API for customers to adopt.Corey: So, four years ago, I had a prediction on Twitter, and I said, “In five years, nobody will care about Kubernetes.” And it was in February, I believe, and every year, I wind up updating an incrementing a link to it, like, “Four years to go,” “Three years to go,” and I believe it expires next year. And I have to say, I didn't really expect when I made that prediction for it to outlive Twitter, but yet, here we are, which is neither here nor there. But I'm curious to get your take on this. But before I wind up just letting you savage the naive interpretation of that, my impression has been that it will not be that Kubernetes has gone away. That is ridiculous. It is clearly in enough places that even if they decided to rip it out now, it would take them ten years, but rather than it's going to slip below the surface level of awareness.Once upon a time, there was a whole bunch of energy and drama and debate around the Linux virtual memory management subsystem. And today, there's, like, a dozen people on the planet who really have to care about that, but for the rest of us, it doesn't matter anymore. We are so far past having to care about that having any meaningful impact in our day-to-day work that it's just, it's the part of the iceberg that's below the waterline. I think that's where Kubernetes is heading. Do you agree or disagree? And what do you think about the timeline?Eswar: I agree with you; that's a perfect analogy. It's going to go the way of Linux, right? It's here to stay; it just going to get abstracted out if any of the abstraction efforts are going to stick around. And that's where we're testing the waters there. There are many, many open-source initiatives there trying to abstract Kubernetes. All of these are yet to gain ground, but there's some reasonable efforts being made.And if they are successful, they just end up being a layer on top of Kubernetes. Many of the customers, many of the developers, don't have to worry about Kubernetes at that point, but a certain subset of us in the tech world will need to do a deal with Kubernetes, and most likely teams like mine that end up managing and operating their Kubernetes clusters.Corey: So, one last question I have for you is that if there's one thing that AWS loves, it's misspelling things. And you have an open-source offering called Karpenter spelled with a K that is an extending of that tradition. What does Karpenter do and why would someone use it?Eswar: Thank you for that. Karpenter is one of my favorite launches in the last one year.Corey: Presumably because you're terrible at the spelling bee back when you were a kid. But please tell me more.Eswar: [laugh]. So Karpenter, is an open-source flexible and high performance cluster auto-scaling solution. So basically, when your cluster needs more capacity to support your workloads, Karpenter automatically scales the capacity as needed. For people that know the Kubernetes space well, there's an existing component called Cluster Autoscaler that fills this space today. And it's our take on okay, so what if we could reimagine the capacity management solution available in Kubernetes? And can we do something better? Especially for cases where we expect terrific performance at scale to enable cost efficiency and optimization use cases for our customers, and most importantly, provide a way for customers not to pre-plan a lot of capacity to begin with.Corey: This is something we see a lot, in the sense of very bursty workloads where, okay, you're going to steady state load. Cool. Buy a bunch of savings plans, get things set up the way you want them, and call it a day. But when it's bursty, there are challenges with it. Folks love using Spot, but in the event of a sudden capacity shortfall, the question is, is can we spin up capacity to backfill it within those two minutes that we have a warning on that on? And if the answer is no, then it becomes a bit of a non-starter.Customers have had to build an awful lot of those things around EC2 instances that handle a lot of that logic for them in ways that are tuned specifically for their use cases. I'm encouraged to see there's a Kubernetes story around this that starts to remove some of that challenge from the customer side.Eswar: Yeah. So, the burstiness is where complexity comes [here 00:29:42], right? Like many customers for steady state, they know what their capacity requirements are, they set up the capacity, they can also reason out what is the effective capacity needed for good utilization for economical reasons and they can actually pre plan that and set it up. But once burstiness comes in, which inevitably does it at [unintelligible 00:30:05] applications, customers worry about, “Okay, am I going to get the capacity that I need in time that I need to be able to service my customers? And am I confident at it?”If I'm not confident, I'm going to actually allocate capacity beforehand, assuming that I'm going to actually get the burst that I needed. Which means, you're paying for resources that you're not using at the moment. And the burstiness might happen and then you're on the hook to actually reduce the capacity for it once the peak subsides at the end of the [day 00:30:36]. And this is a challenging situation. And this is one of the use cases that we targeted Karpenter towards.Corey: I find that the idea that you're open-sourcing this is fascinating because of two reasons. One, it does show a willingness to engage with the community that… again, it's difficult. When you're a big company, people love to wind up taking issue with almost anything that you do. But for another, it also puts it out in the open, on some level, where, especially when you're talking about cost optimization and decisions that affect cost, it's all out in public. So, people can look at this and think, “Wait a minute, it's not—what is this line of code that means if it's toward the end of the month, crank it up because we might need to hit our numbers.” Like, there's nothing like that in there. At least I'm assuming. I'm trusting that other people have read this code because honestly, that seems like a job for people who are better at that than I am. But that does tend to breed a certain element of trust.Eswar: Right. It's one of the first things that we thought about when we said okay, so we have some ideas here to actually improve the capacity management solution for Kubernetes. Okay, should we do it out in the open? And the answer was a resounding yes, right? I think there's a good story here that actually enables not just AWS to offer these ideas out there, right, and we want to bring it to all sorts of Kubernetes customers.And one of the first things we did is to architecturally figure out all the core business logic of Karpenter, which is, okay, how to schedule better, how quickly to scale, what is the best instance types to pick for this workload. All of that business logic was abstracted out from the actual cloud provider implementation. And the cloud provider implementation is super simple. It's just creating instances, deleting instances, and describing instances. And it's something that we bake from the get-go so it's easier for other cloud providers to come in and to add their support to it. And we as a community actually can take these ideas forward in a much faster way than just AWS doing it.Corey: I really want to thank you for taking the time to speak with me today about all these things. If people want to learn more, where's the best place for them to find you?Eswar: The best place to learn about EKS, right, as EKS evolves, is using our documentation, we have an EKS newsletter that you can go subscribe, and you can also find us on GitHub where we share our product roadmap. So, it's a great places to learn about how EKS is evolving and also sharing your feedback.Corey: Which is always great to hear, as opposed to, you know, in the AWS Console, where we live, waiting for you to stumble upon us, which, yeah. No it's good does have a lot of different places for people to engage with you. And we'll put links to that, of course, in the [show notes 00:33:17]. Thank you so much for being so generous with your time. I appreciate it.Eswar: Corey, really appreciate you having me.Corey: Eswar Bala, Director of Engineering for Amazon EKS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice telling me why, when it comes to tracking Kubernetes costs, Microsoft Excel is in fact the superior experience.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
John Mille, Principal Cloud Engineer at Sainsbury's UK joins Corey on Screaming in the Cloud to discuss how retail companies are using cloud services. John describes the lessons he's learned since joining the Sainsbury's UK team, including why it's important to share knowledge across your team if you don't want to be on call 24/7, as well as why he doesn't subscribe to the idea that every developer needs access to production. Corey and John also discuss an open-source project John created called ECS Compose-X.About JohnJohn is an AWS Community Builder (devtools), Open Source enthusiast, SysAdmin born in the cloud, and has worked with AWS since his very first job. He enjoys writing code and creating projects. John likes to focus on automation & architecture that delivers business value, and has been dabbling with data & the wonderful world of Kafka for the past 3 years.Links Referenced: AWS Open-Source Roundup newsletter blog post about ECS Compose-X: https://aws.amazon.com/blogs/opensource/automating-your-ecs-container-architecture-deployments-with-ecs-composex/ ECS Compose-X: https://docs.compose-x.io/ LinkedIn: https://www.linkedin.com/in/john-mille/ Twitter: https://twitter.com/JohnPre32286850 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today my guest is a long-time listener, first-time caller. John Mille is a Principal Cloud Engineer at Sainsbury's, which is UK-speak for ‘grocery store.' John, thank you for joining me.John: Hi, Corey. Thanks for having me.Corey: So, I have to begin with, I guess, the big question that I used to run into people in San Francisco with all the time. They would work at Walmart Labs and they would mention in conversation that they work at Walmart, and people who weren't aware that there was a labs out here figured they were a greeter at the grocery store. Do you ever wind up with people making that sort of fundamental assumption around the fact, oh, you work at Sainsbury's as a checker or whatnot?John: No. But it actually is one of the—if you look at one of the job descriptions from Sainsbury's, the first thing is, why would you join a retail company to do tech? And as it turns out, tech—I mean, I think retail companies, as any other companies in the world, rely on Cloud more and more and more. And I think that one of the things that is interesting today is, if you look at the landscape of retailers, I've heard many times people saying, “We don't want to go for AWS because we're giving money to the competition.” And actually, I think AWS does a fantastic job overall giving you all the tools to actually beat them as your competition. And as it turns out, we've had really, really great success running a lot of our workloads on AWS for many, many years now.Corey: On some level, if you can't come to terms with the idea of Amazon as competition, you shouldn't be using AWS, regardless of what industry you're in, because their entire company strategy is yes. It's very hard to start to even come up with industries that they don't have some form of presence within. On some level, that's a problem. In fact a lot of levels, that's something of a problem.Everyone tends to wind up viewing the world in a bunch of different ways. I like to divide companies into two groups. More or less it's, is the AWS bill one of the top three line items at the company? And if the answer's no, on some level, you know, that usually is an indicator that there's a sustainable business there that, you know, both our grandparents and our grandchildren will be able to recognize, in the fullness of time. You absolutely have a business that winds up falling into that category, whereas, “Oh yeah, I fix the AWS bill,” yeah, my parents would have no idea what I do and my kids don't have much of a better one. It feels like it's very point-in-time type of problem. At least I hope.Technology is not the core of what grocery stores tend to do, but I also don't get the sense that what you're doing is sitting there doing the back office corporate IT style of work, either. How do you use technology in the overall context of the business?John: Well, so we use it in a very wide variety of sense. So, you obviously have everything that has to do with online shopping, orders and all of those sort of things, which obviously, especially with the drive of Covid and being everybody from home, has been a huge driver to improve our ability to deliver to customers. But certainly, I think that Sainsbury's sees AWS as a key partner to be able to go and say we want to deliver more value. And so, there's been a number of transformation over the years to—and one of the reasons I was hired is actually to be part of one of those transformation, where we're going to take existing infrastructure servers that literally—I usually say to people, “Oh, are we doing an upgrade this month? Has somebody gotten their little brush to go and brush onto the hard drives to make sure that nothing is going to die?” And actually do that transformation and move over to the cloud in order to never have to really worry about whether or not they have to manage hardware and infrastructure.Corey: It's strange in that I never got very deep into containers until I was no longer hands-on hardware, managing things. I was more or less doing advisory work and then messing around with them. And you'd think given my proclivities historically, of being very unlucky when it comes to data, you would think that this would be great because, oh yeah, you blow away an ephemeral container? Well, that's kind of the point. We'll all laugh and it'll re-instantiate itself and life goes on.But no. Making fun of them was more or less how I tended to do approach them for the longest time until I started to see them a little bit… well I guess less as a culture, less as a religion, and more as an incredibly versatile packaging format, which is probably going to annoy the people I know who are the packaging [unintelligible 00:04:58] for Linux distributions. How do you tend to view them? And how did you start using them?John: Right. So, that's a great question. So historically, I was a student at, I think the school were one of the original creators of Docker were. And one of the things that you learn when you do development at the school is that, you know, containers [unintelligible 00:05:18] new invention. Docker, I think, came on the platform as the way to, you know, give everybody a great framework, a great API, to drive the deployment of containers in the world and bundle them and ship them around the world, on your laptop and somebody else's, and help a little bit with, you know, solving the problem of it works on my laptop, but not just on the laptop properly. Maybe.It's obviously gone viral over the years and I really enjoy containers; I quite like containers. What I find interesting is what people are going to do with. And I think that over the last few years, we've seen a number of technologies such as Kubernetes and others come into the scene and say—and trying to solve people's problem, but everybody seems to be doing, sort of, things on their own way. And historically, I started off using ECS, when it was terrible and you didn't have security groups per containers and all of this. But over the years, you know, you learn, and AWS has improved the service quite significantly with more and more features.And I think we are today in the place where there's this landscape, I think, where a lot of workloads are going to be extremely ephemeral and you can go [unintelligible 00:06:28], you know, wherever you want and you have a bit—if you have a platform or workflow that you need to have working in different places, maybe Kubernetes could be an easy way to have a different sort of sets of features that allows you to move around in maybe an easier way. But that also comes with a set of drawbacks. Again, I look at using EKS, for example, and I see okay, I have to manage IAM in our back now, whereas if I used something like ECS, for the whatever the [unintelligible 00:06:56] cloud vendor of choice, I don't have to deal with any of this. So, I think it's finding the fine balance between how you do orchestration of containers now and what works for you and is any sustainable over the time, more than about are you going to use containers? Because the chances are, somebody is using containers.Corey: My experiences and workflows and constraints are radically different than that of other folks because for a lot of the things I'm building, these are accounts that are I'm the only person that has access to them. It is me. So, the idea of fine-grained permissions for users from an ARBAC perspective doesn't really factor into it. Yes, yes, in theory, I should have a lot of the systems themselves with incidents roles being managed in safe and secure ways, but in many cases, the AWS account boundary is sufficient for that, depending on what it is we're talking about. But that changes when you start having a small team of people working with you and having to collaborate on these things.And we do a little bit of that with some of our consulting stuff that isn't just the shitpost stuff I build for fun. But there's multiple levels beyond that. You are clearly in a full-blown enterprise at this point where there are a bunch of different teams working on different things, all ideally going in the same direction. And it's easy to get stuck in the weeds of having to either go through central IT for these things, which gives rise to shadow IT every time you find a corporate credit card in the wild, or it winds up being everyone can do what they want, but then there's no consensus, there's no control, there's no architectural similarity. And I'm not sure which path is worse in some respects. How do you land on it?John: Right. So, what I've seen done in companies that works very well—and again, to the credit of my current company—is one of the things they've done really well is build a hub of people who are going to manage solely everything that has to do with accounts access, right? So, the control, IAM, Security Hub, all of those sorts of things, for you. There's things that are mandatory that you can't deal without, you have permissions boundary, that's it, you have to use those things, end of story. But beyond that point, once you have access to your accounts, you've been given all of the access that is necessary for you to deliver application and deploy them all the way up to production without asking permission for anybody else apart from your delivery managers, potentially.And I think from there, because there is the room to do all of this, one of the things that we've done within my business unit is that we've put in place a framework that enables developers—and when I say that it really is a question of allowing them to do everything they have to do, focus on the code, and I know it's a little catchy [unintelligible 00:09:33] a phrase that you hear these days, but the developers really are the customers that we have. And all that we do is to try to make sure that they have a framework in place that allows them to do what they need and deploy the applications in a secure fashion. And the only way to do that for us was to build the tools for them that allows them to do all of that. And I honestly haven't checked a single service IAM policies in a very are longtime because I know that by providing the tools to developers, they don't have this [will 00:10:05] to go and mess with the permissions because their application suddenly doesn't have the permissions. They just know that with the automation we've providing them, the application gets the access it needs and no more.Corey: On some level, it feels like there's a story around graduated development approach where in a dev environment you can do basically whatever you want with a big asterisk next to it. That's the same asterisk, by the way, next to the AWS free tier. But as you start elevating things into higher environments, you start to see gating around things like who has access to what, security reviews, et cetera, et cetera, and ideally, by the time you wind up getting into production, almost no one should have access and that access that people do have winds up being heavily gated. That is, of course, the vision that folks have. In practice, reality is what happens instead of what we plan on. The idea of it works in theory, but not in production is of course, why I call my staging environment ‘theory.' Does that tend to resonate as far as what you've seen in the wild?John: Yeah. Very much so. And when I joined the company, and we put together our [standard 00:11:11] pipelines for developers to be able to do everything, the rule that I would give to my team—so I manage a small team of cloud engineers—the one rule I would say is, “We have access to prod because we need to provision resources, but when we're going to build the pipelines for the developers, you have to build everything in such a way that the developers will only have read-only access to the production environment, and that is only to go and see their logs.” And at least try to foster this notion that developers do not need access to production, as much as possible because that avoids people going and do something they shouldn't be doing in those production environments.Now, as the pipeline progresses and applications get deployed to production, there are some operational capabilities that people need to have, and so in that case, what we do is we try to fine-tune what do people need to do and grant those people access to the accounts so that they can perform the jobs and I don't have to be woken up at two in the morning. The developers are.Corey: One thing that I think is going to be a cause of some consternation for folks—because I didn't really think about this in any meaningful sense until I started acting as a consultant, which means you're getting three years of experience for every year that you're in the wild, just by virtue of the variety of environments you encounter—on some level, there's a reasonable expectation you can have when you're at a small, scrappy startup, that everyone involved knows where all the moving parts live. That tends to break down with scale. So, the idea of a Cloud Center of Excellence has been bandied around a lot. And personally, I hate the term because it implies the ‘Data Center of Mediocrity,' which is a little on the nose for some people at times. So, the idea of having a sort of as a centralized tiger team that has the expertise and has the ability to go on deep dives and sort of loan themselves out to different teams seems to be a compromise between nobody knows what they're doing and, every person involved should have an in-depth knowledge of the following list of disciplines.For example, most folks do not need an in-depth primer on AWS billing constructs. They need about as much information fits on an index card. Do you find that having the centralized concentration of cloud knowledge on a particular team works out or do you find that effectively doing a rotating embedding story is the better answer?John: It varies a lot, I think, because it depends on the level of curiosity of the developers quite a lot. So, I have a huge developer background. People in my team are probably more coming from ex-IT environments or this sort of operation and then it just naturally went into the cloud. And in my opinion, is fairly rare to find somebody that is actually good at doing both AWS and coding. I am by no means really, really great at coding. I code pretty much every day but I wouldn't call myself a professional developer.However, it does bring to my knowledge the fact that there are some good patterns and good practices that you can bring into building your applications in the cloud and some really bad ones. However, I think it's really down to making sure that the knowledge is here within the team. If there's a specialized team, those really need to be specialists. And I think the important thing then is to make sure that the developers and the people around you that are curious and want to ask questions know that you're available to them to share that knowledge. Because at the end of the day, if I'm the only one with the knowledge, I'm going to be the one who is always going to be on call for this or doing that and this is no responsibility that I want. I am happy with a number of responsibilities, but not to be the only person to ever do this. I want to go on holidays from time to time.So, at the end of the day, I suppose it really is up to what people want or expect out of their careers. I do a job that it was a passion for me since I was about 14 years old. And I've always been extremely curious to understand how things work, but I do draw the line that I don't write anything else than Python these days. And if you ask me to write Java, I'll probably change job in the flip of a second. But that's the end of it. But I enjoy understanding how Java things work so that I can help my developers make better choices with what services in AWS to use.Corey: On some level, it feels like there's a, I guess, lack of the same kind of socialization that startups have sort of been somewhat guided by as far as core ethos goes, where, oh whatever I'm working on, I want to reach out to other people, and, “Hey, I'm trying to solve this problem. What is it that you have been working on that's germane to this and how can we collaborate together?” It has nothing to do, incidentally, with the idea that, oh, big company people aren't friendly or are dedicated or aren't good or aren't well-connected; none of that. But there are so many people internally that you're spending your time focusing on and there's so much more internal context that doesn't necessarily map to anything outside of the company that the idea of someone off the street who just solved a particular problem in a weird way could apply to what a larger company with, you know, regulatory burdens, starts to have in mind, it becomes a little bit further afield. Do you think that that's accurate? Do you think that there's still a strong sense of enterprise community that I'm just potentially not seeing in various ways because I don't work at big companies?John: It's a very fine line to walk. So, when I joined the company, I was made aware that there's a lot of Terraform and Kubernetes, which I went [unintelligible 00:16:28] all the way with CloudFormation is yes. So, that was one of the changes I knew I would have. But I can move an open mind and when I looked around at, okay, what are the Terraform modules—because I used Terraform with anger for an entire year of suffering—and I thought, “Okay, well, maybe people have actually got to a point where they've built great modules that I can just pick up off the shelf and reuse or customize only a tiny little bit, add maybe a couple of features and that's, it move on; it's good enough for me.” But as it turns out, there is I think, a lot of the time a case where the need for standardization goes against the need for business to move on.So, I think this is where you start to see silos start to being built within the company and people do their own thing and the other ones do their own. And I think it's always a really big challenge for a large company with extremely opinionated individuals to say, “All right, we're going to standardize on this way.” And it definitely was one of the biggest challenge that I had when I joined the company because again, big communities and Terraform place, we're going to need to do something else. So, then it was the case of saying, “Hey, I don't think we need Kubernetes and I definitely don't think we need Terraform for any the things—for any of those reasons, so how about we do something a little different?”Corey: Speaking of doing things a little bit different, you were recently featured in an AWS Open-Source Roundup newsletter that was just where you, I think, came across my desk one of the first times, has specifically around an open-source project that you built: ECS Compose-X.So, I assume it's like, oh, it's like Docker Compose for ECS and also the ‘X' implies that it is extreme, just, like, you know, snack foods at the convenience store. What does it do and where'd it come from?John: Right. So, you said most of it, right? It literally is a question where you take a Docker Compose file and you want to deploy your services that you worked on and all of that together, and you want to deploy it to AWS. So, ECS Compose-X is a CLI tool very much like the Copilot. I think it was released about four months just before Copilots came out—so, sorry, I beat you to the ball there—but with the Docker Compose specification supported.And again, it was really out of I needed to find a neat way to take my services and deploy them in AWS. So, Compose-X is just a CLI tool that is going to parse your Docker Compose file and create CloudFormation templates out of it. Now, the X is not very extreme or anything like that, but it's actually coming from the [finite 00:18:59] extension fields, which is something supported in Docker Compose. And so, you can do things like x-RDS, or x-DynamoDB, which Docker Compose on your laptop will totally ignore, but ECS Compose-X however will take that into account.And what it will do is if you need a database or a DynamoDB table, for example, in your Docker Compose file, you do [x-RDS, my database, some properties, 00:19:22]—exactly the same properties as CloudFormation, actually—and then you say, “I want this service to have access to it in read-only fashion.” And what ECS Compose-X is going to do is just understand what it has to do when—meaning creating IAM policies, opening security groups, all of that stuff, and make all of that available to the containers in one way or another.Corey: It feels like it's a bit of a miss for Copilot not to do this. It feels like they wanted to go off in their own direction with the way that they viewed the world—which I get; I'm not saying there's anything inherently wrong with that. There's a reason that I point kubernetestheeasyway.com to the ECS marketing site—but there's so much stuff out there that is shipped or made available in other ways with a Docker Compose file, and the question of okay, how do I take this and run it in Fargate or something because I don't want to run it locally for whatever reason, and the answer is, “That's the neat part. You don't.”And it just becomes such a clear miss. There have been questions about this Since Copilot launched. There's a GitHub issue tracking getting support for this that was last updated in September—we are currently recording this at the end of March—it just doesn't seem to be something that's a priority. I mean, I will say the couple of times that I've used Copilot myself, it was always for greenfield experiments, never for adopting something else that already existed. And that was… it just felt like a bit of a heavy lift to me of oh, you need to know from the beginning that this is the tool you're going to use for the thing. Docker Compose is what the ecosystem has settled on a long time ago and I really am disheartened by the fact that there's no direct ECS support for it today.John: Yeah, and it was definitely a motivation for me because I knew that ECS CLI version 1 was going into the sunset, and there wasn't going to be anything supporting it. And so, I just wanted to have Docker Compose because it's familiar to developers and again, if you want to have adoption and have people use your thing, it has to be easy. And when I looked at Copilot the first time around, I was extremely excited because I thought, “Yes, thank you, Amazon for making my life easy. I don't have to maintain this project anymore and I'm going to be able to just lift and shift, move over, and be happy about it.” But when the specification for Copilot was out and I could go for the documentation, I was equally disheartened because I was like, “Okay, not for me.”And something very similar happened when they announced Proton. I was extremely excited by Proton. I opened a GitHub issue on the roadmap immediately to say, “Hey, are you going to support to have some of those things together or not?” And the fact that the Proton templates—I mean, again, it was, what, two, three years ago now—and I haven't looked at Proton since, so it was a very long time now.Corey: The beta splasher was announced in 2020 and I really haven't seen much from it since.John: Well, and I haven't done anything [unintelligible 00:22:07] with it. And literally, one of the first thing did when the project came out. Because obviously, this is an open-source project that we use in Sainsbury's, right because we deploy everything in [ECS 00:22:17] so why would I reinvent the wheel the third time? It's been done, I might as well leverage it. But every time something on it came out, I was seeing it as the way out of nobody's going to need me anymore—which is great—and that doesn't create a huge potential dependency on the company for me, oh, well, we need this to, you know, keep working.Now, it's open-source, it's on the license you can fork it and do whatever you want with it, so from that point of view, nobody's going to ask me anything in the future, but from the point of view where I need to, as much as possible, use AWS native tools, or AWS-built tools, I differently wanted every time to move over to something different. But every time I tried and tiptoed with those alternative offerings, I just went back and said, “No, this [laugh] either is too new and not mature enough yet, or my tool is just better.” Right? And one of the things I've been doing for the past three years is look at the Docker ECS plugin, all of the issues, and I see all of the feature requests that people are asking for and just do that in my project. And some with Copilots. The only thing that Copilot does that I don't do is tell people how to do CI/CD pipelines.Corey: One thing you said a second ago just sort of, I guess, sent me spiraling for a second because I distinctly remember this particular painful part. You're right, there was an ECS CLI for a long time that has since been deprecated. But we had internal tooling built around that. When there was an issue with a particular task that failed, getting logs out of it was non-trivial, so great. Here's the magic incantation that does it.I still haven't found a great way to do that with the AWS v2 CLI and that feels like it's a gap where yes, I understand, old tools go away and new ones show up, but, “Hey, I [unintelligible 00:24:05] task. Can you tell me what the logs are?” “No. Well, Copilot's the new answer.” “Okay. Can I use this to get logs from something that isn't Copilot?” “Oh, absolutely not.” And the future is inherently terrible as a direct result.John: Yeah. Well, I mean, again, the [unintelligible 00:24:20]—the only thing that ECS Compose-X does is create all the templates for you so you can, you know, then just query it and know where everything has been created. And one of the things it definitely does create is all of the log groups. Because again, least-privileged permissions being something that is very dear to me, I create the log groups and just allow the services to only write in those log groups and that's it.Now, typically this is not a thing that I've thought Compose-X was going to do because that's not its purpose. It's not going to be an operational tool to troubleshoot all the things and this is where I think that other projects are much better suited and I would rather use them as an extension or library of the project as opposed to reinvent them. So, if you're trying to find a tool for yourself to look at logs, I highly recommend something called ‘AWS logs,' which is fantastic. You just say, “Hey, can you list the groups?” “Okay.” “Can you get me the groups and can I tell them on a terminal?”And that's it. Job done. So, as much as I enjoy building new features into the project, for example, I think that there's a clear definition between what the project is for and what it's not. And what it's for is giving people CloudFormation templates they can reuse in any region and deploy their services and not necessarily deal with their operations; that's up to them. At the end of the day, it's really up to the user to know what they want to do with it. I'm not trying to force anybody into doing something specific.Corey: I would agree. I think that there's value to there's more than one way to do it. The problem is, at some point, there's a tipping point where you have this proliferation of different options to the point where you end up in this analysis paralysis model where you're too busy trying to figure out what is the next clear step. And yes, that flexibility is incredibly valuable, especially when you get into, you know, large, sophisticated enterprises—ahem, ahem—but when you're just trying to kick the tires on something new, I feel like there's a certain lack of golden path where in the event of not having an opinion on any of these things, this is what you should do just to keep things moving forward, as opposed to here are two equal options that you can check with radio boxes and it's not at all clear what you which does what or what the longer-term implications are. We've all gotten caught with the one-way doors we didn't realize we were passing through at the time and then had to do significant technical debt repayment efforts to wind up making it right again.I just wish that those questions would be called out, but everything else just, it doesn't matter. If you don't like the name of the service that you're creating, you can change it later. Or if you can't, maybe you should know now, so you don't have—in my case—a DynamoDB table that is named ‘test' running in production forever.John: Yeah. You're absolutely right. And again, I think it goes back to one of the biggest challenges that I had when I joined the company, which was when I said, “I think we should be using CloudFormation, I think we should be using ECS and Terraforming Kubernetes for those reasons.” And one of the reasons was, the people. Meaning we were a very small team, only five cloud engineers at the time.And as I joined the company, they were already was three different teams using four different CI/CD tools. And they all wanted to use Kubernetes, for example, and they were all using different CI/CD—like I said, just now—different CI/CD tools. And so, the real big challenge for me was how do I pitch that simplicity is what's going to allow us to deliver value for the business? Because at the end of the day, like you said many, many times before, the AWS bill is a question of architecture, right? And there's a link and intricacy between the two things.So, the only thing that really mattered for me and the team was to find a way, find the service that was going to allow to do a number of things, A, delivering value quickly, being supported over time. Because one of the things that I think people forget these days—well, one of the things I'm allergic to and one of the things that makes me spiral is what I call CV-driven tech choices where people say, “Hey, I love this great thing I read about and I think that we should use that in production. How great idea.” But really, I don't know anything about it and is then up to somebody else to maintain it long-term.And that goes to the other point, which is, turnover-proof is what I call it. So, making tech choices that are going to be something that people will be able to use for many, many years, there is going to be a company behind the scenes that he's going to be able to support you as well as you go and use the service for the many, many years to go.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?John: So, people can find me on LinkedIn. I'm also around on Twitter these days, although I probably about have nine followers. Well, probably shouldn't say that [laugh] and that doesn't matter.Corey: It's fine. We'll put a link into it—we'll put a link to that in the [show notes 00:29:02] and maybe we'll come up with number ten. You never know. Thanks again for your time. I really appreciate it.John: Thanks so much, Corey, for having me.Corey: John Mille, Principal Cloud Engineer at Sainsbury's. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you go to great pains to type out but then fails to post because the version of the tool you use to submit it has been deprecated without a viable replacement.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Michael Isbitski, Director of Cybersecurity Strategy at Sysdig, joins Corey on Screaming in the Cloud to discuss the nuances of an effective cybersecurity strategy. Michael explains that many companies are caught between creating a strategy that's truly secure and one that's merely compliant and within the bounds of cost-effectiveness, and what can be done to help balance the two aims more effectively. Corey and Michael also explore what it means to hire for transferrable skills in the realm of cybersecurity and tech, and Michael reveals that while there's no such thing as a silver-bullet solution for cybersecurity, Sysdig can help bridge many gaps in a company's strategy. About MichaelMike has researched and advised on cybersecurity for over 5 years. He's versed in cloud security, container security, Kubernetes security, API security, security testing, mobile security, application protection, and secure continuous delivery. He's guided countless organizations globally in their security initiatives and supporting their business.Prior to his research and advisory experience, Mike learned many hard lessons on the front lines of IT with over twenty years of practitioner and leadership experience focused on application security, vulnerability management, enterprise architecture, and systems engineering.Links Referenced: Sysdig: https://sysdig.com/ LinkedIn: https://www.linkedin.com/in/michael-isbitski/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I periodically find myself in something of a weird spot when it comes to talking about security. I spent a lot of my time in previous lives having to care about it, but the word security was never in my job title. That's who my weekly podcast on the AWS Morning Brief and the accompanying newsletter goes out to: it's people who have to care about security but don't have it as part of their job title. They just want to know what's going on without all of the buzzwords.This promoted guest episode is brought to us by our friends at Sysdig and my guest is Mike Isbitski, Director of Cybersecurity Strategy at Sysdig. Mike, thanks for joining me.Michael: Thanks, Corey. Yeah, it's great to be here.Corey: So, you've been at Sysdig for a little bit, but your history is fascinating to me. You were at Gartner, which on the one hand would lead someone to think, “Oh okay, you talk about this stuff a lot, but might not have been particularly hands-on,” but that's not true. Either. You have a strong background as a practitioner, but not directly security-focused. Is that right?Michael: Yeah. Yeah, that is correct. I can certainly give the short version of the history lesson [laugh]. It is true, yes. As a Gartner analyst, you don't always get as hands-on, certainly talking to practitioners and leaders from all walks of life, different industries, different company sizes, and organization sizes.But yeah, as a Gartner analyst, I was in a different division that was much more technical. So, for me personally, I did actually try to tinker a lot: set up Docker, deploy Kubernetes clusters, all that fun stuff. But yeah, prior to my life, as an analyst, I was a practitioner, a security leader for close to 20 years at Verizon so, saw quite a bit. And actually started as enterprise architect building, kind of, systems and infrastructure to support all of those business needs, then I kind of transitioned over to application security towards the tail end of that career at Verizon.Corey: And one of the things that I find that I enjoy doing is talking with folks in positions like yours, the folks who did not come to the cybersecurity side of the world from a pure strategy advisory sense, but have been hands-on with these things at varying points in our careers, just because otherwise I feel like I'm sort of coming at this from a very different world. When I walk around the RSA show floor, I am consistently confronted by people trying to sell me the same dozen products over and over again with different words and different branding, but it seems like it's all buzzwords aimed from security people who are deep in the weeds to other security people who are deep in the weeds and it's just presumed that everyone knows what they're talking about already. And obviously worse. I'm not here to tell them that they're going about their business wrong, but for smaller companies, SMBs, folks who have to care about security but don't know the vernacular in the same way and don't have sophisticated security apparatus at their companies, it feels like a dense thicket of impenetrable buzzwords.Michael: Yes. Very, very fair assessment, [laugh] I would say. So, I'd say my life as an analyst was a lot of lengthy conversations. I guess a little bit of the secret behind analyst inquiry, I mean, a lot of times, they are hour-long conversations, sometimes multiple sets of them. But yeah, it's very true, right?There's a lot of nuance to how you work with technology and how you build things, but then also how you secure it, it's very hard to, kind of, condense that, you know, hours of conversation and many pages of documentation down into some bite-size nuggets that marketers might run with. So, I try to kind of live in that in-between world where I can kind of explain deep technology problems and business realities, and kind of explain that in more common language to people. Sometimes it's easier said than done when you're speaking it as opposed to writing it. But yeah, that's kind of where I tried to bring my skills and experience.Corey: It's a little counterintuitive to folks coming out from the other side, I suspect. For me, at least the hardest part of getting into the business of cloud cost optimization the way that I do with the Duckbill Group was learning to talk. Where I come from a background of heavy on the engineering and operations side, but being able to talk to business stakeholders who do not particularly care what a Kubernetes might be, is critical. You have to effectively be able to speak to different constituencies, sometimes in the same conversation, without alienating the rest of them. That was the hard part for me.Michael: Yeah, that's absolutely true and I certainly ran into that quite a bit as an enterprise architect at Verizon. There's kind of really need to work to identify, like, what is the business need. And typically, that is talking to the stakeholders, you know, what are they trying to achieve? They might not even know that, right, [laugh] because not everybody is very structured in how they think about the problem you're trying to solve. And then what is their daily workflow?And then you kind of arrive at the technology. I'd say, a common pitfall for anybody, right, Whether you're an engineer or a security practitioner is to kind of start with the technology or the solution and then try to force that on people, right? “Here's your solution to the problem that maybe you didn't know you had.” [laugh]. It kind of should work in reverse, right? What's the actual business need? What's your workflow? And what's the appropriate technology for that, right?Whether it's right-sizing the infrastructure or a particular type of functionality or protection, all those things, right? So, very similar kind of way of approaching the problem. It's just what you're trying to solve but [laugh] I've definitely seen that, kind of, Kubernetes is all the rage, right, or service mesh. Like, everybody needs to start deploying Istio, and you really should be asking the question—Corey: Oh, it's all resume-driven development.Michael: Yep, exactly. Yeah. It's kind of the new kid on the block, right? Let's push out this cool new technology and then problems be damned, right?Corey: I'm only half-kidding on that. I've talked to folks who are not running those types of things and they said that it is a bit of a drag on their being able to attract talent.Michael: Yeah, it's—you know, I mean, it's newer technologies, right, so it can be hard to find them, right, kind of unicorn status. I used to talk quite a bit in advisory calls to find DevOps practitioners that were kind of full-stack. That's tricky.Corey: I always wonder if it's possible to find them, on some level.Michael: Yeah. And it's like, well, can you find them and then when you do find them, can you afford them?Corey: Oh, yeah. What I'm seeing in these other direction, though, is people who are making, you know, sensible technology choices where you actually understand what lives were without turning it into a murder mystery where you need to hire a private investigator to track it down. Those are the companies that are having trouble hiring because it seems that an awful lot of the talent, or at least a significant subset of it, want to have the latest and greatest technologies on their resume on their next stop. Which, I'm not saying they're wrong for doing that, but it is a strange outcome that I wasn't quite predicting.Michael: Yeah. No, it is very true, I definitely see that quite a bit in tech sector. I've run into it myself, even with the amount of experience I have and skills. Yeah, companies sometimes get in a mode where they're looking for very specific skills, potentially even products or technologies, right? And that's not always the best way to go about it.If you understand concepts, right, with technology and systems engineering, that should translate, right? So, it's kind of learning the new syntax, or semantics, working with a framework or a platform or a piece of technology.Corey: One of the reasons that I started the security side of what I do on the newsletter piece, and it caught some people by surprise, but the reason I did it was because I have always found that, more or less, security and cost are closely aligned spiritually, if nothing else. They're reactive problems and they don't, in the general sense, get companies one iota closer to the business outcome they're chasing, but it's something you have to do, like buying fire insurance for the building. You can spend infinite money on those things, but it doesn't advance. It's all on the defensive, reactive side. And you tend to care about these things a lot right after you failed to care about them sufficiently. Does that track at all from your experience?Michael: Yeah. Yeah, absolutely. I'm just kind of flashing back to some war stories at Verizon, right? It was… I'd say very common that, once you've kind of addressed, well, these are the business problems we want to solve for and we're off to the races, right, we're going to build this cool thing. And then you deploy it, right [laugh], and then you forgot to account for backup, right? What's your disaster recovery plan? Do you have logging in place? Are you monitoring the thing effectively? Are your access controls accounted for?All those, kind of, tangential processes, but super-critical, right, when you think about, kind of, production systems, like, they have to be in place. So, it's absolutely true, right, and it's kind of definitely for just general availability, you need to be thinking about these things. And yeah, they almost always translate to that security piece of it as well, right, particularly with all the regulations that organizations are impacted with today. You really need to be thinking about, kind of, all these pieces of the puzzle, not just hey, let's build this thing and get it on running infrastructure and we're done with our work.Corey: A question that I've got for you—because I'm seeing a very definite pattern emerging tied to the overall macro environment, now, where after a ten-year bull run, suddenly a bunch of companies are discovering, holy crap, money means something again, where instead of being able to go out and gets infinite money, more or less, to throw at an AWS bill, suddenly, oh, that's a big number, and we have no idea what's in it. We should care about that. So, almost overnight, we've seen people suddenly caring about their bill. How are you seeing security over the past year or so? Has there been a similar awareness around that or has that not really been tied to the overall macro-cycle?Michael: Very good question, yeah. So unfortunately, security's often an afterthought, right, just like, kind of those things that support availability—probably going to get a little bit better ranking because it's going to support your customers and employees, so you're going to get budget and headcount to support that. Security, usually in the pecking order, is below that, right, which is unfortunate because [laugh] there can be severe repercussions with that, such as privacy impacts, or data breach, right, lost revenue, all kinds of things. But yeah, typically, security has been undercut, right? You're always seeking headcount, you need more budget.So, security teams tend to look to delegate security process out, right? So, you kind of see a lot of DevOps programs, like, can we empower engineers to run some of these processes and tooling, and then security, kind of, becomes the overseer. So, we see a lot of that where can we kind of have people satisfy some of these pieces. But then with respect to, like, security budgets, it is often security tools consolidation because a lot organizations tend to have a lot of things, right? So, security leaders are looking to scale that back, right, so they can work more effectively, but then also cut costs, which is definitely true these days in the current macroeconomic environment.Corey: I'm curious as well, to see what your take is on the interplay between cost and security. And what I mean by that is, I did the numbers once, and if you were to go into an AWS native environment, ignore third-party vendors for a second, just configure all of the AWS security services in your account, so the way that best practices dictate that you should, you're pretty quickly going to end up in a scenario where the cost of that outweighs that of the data breach that you're ostensibly trying to prevent. So—Michael: Yes.Corey: It's an infinite money pit that you can just throw everything into. So, people care about security, but they also care about cost. Plus, let's be very direct here, you can spend all the money on security and still lose. How do companies think about that now?Michael: A lot of leaders will struggle with, are we trying to be compliant or are we trying to be secure? Because those can be very different conversations and solutions to the problem. I mean, ideally, everybody would pursue that perfect model of security, right, enable all the things, but that's not necessarily cost-effective to do that. And so, most organizations and most security teams are going to prioritize their risks, right? So, they'll start to carve out, maybe these are all our internet-facing applications, these are the business-critical ones, so we're going to allocate more security focus to them and security spend, so [maybe we will be turn up 00:13:20] more security services to protect those things and monitor them.Then [laugh], unfortunately, you can end up with a glut of maybe internal applications or non-critical things that just don't get that TLC from security, unfortunately, for security teams, but fortunate for attackers, those things become attack targets, right? So, they don't necessarily care how you've prioritized your controls or your risk. They're going to go for the low-hanging fruit. So, security teams have always struggled with that, but it's very true. Like, in a cloud environment like AWS, yeah, if you start turning everything up, be prepared for a very, very costly cloud expense bill.Corey: Yeah, in my spare time, I'm working on a project that I was originally going to open-source, but I realized if I did it, it would cause nothing but pain and drama for everyone, of enabling a whole bunch of AWS misconfiguration options, given a set of arbitrary credentials, that just effectively try to get the high score on the bill. And it turned out that my early tests were way more successful than anticipated, and instead, I'm just basically treating it as a security vulnerability reporting exercise, just because people don't think about this in quite the same way. And again, it's not that these tools are necessarily overpriced; it's not that they aren't delivering value. It's that in many cases, it is unexpectedly expensive when they bill across dimensions that people are not aware of. And it's one of those everyone's aware of that trap the second time type of situations.It's a hard problem. And I don't know that there's a great way to answer it. I don't think that AWS is doing anything untoward here; I don't think that they're being intentionally malicious around these things, but it's very vast, very complex, and nobody sees all of it.Michael: Very good point, yes. Kind of, cloud complexity and ephemeral nature of cloud resources, but also the cost, right? Like, AWS isn't in the business of providing free service, right? Really, no cloud provider is. They are a business, right, so they want to make money on Cloud consumption.And it's interesting, I remember, like, the first time I started exploring Kubernetes, I did deploy clusters in cloud providers, so you can kind of tinker and see how these things work, right, and they give you some free credits, [a month of credit 00:15:30], to kind of work with this stuff. And, you know, if you spin up a [laugh] Kubernetes cluster with very bare bones, you're going to chew through that probably within a day, right? There's a lot of services in it. And that's even with defaults, which includes things like minimal, if anything, with respect to logging. Which is a problem, right, because then you're going to miss general troubleshooting events, but also actual security events.So, it's not necessarily something that AWS could solve for by turning everything up, right, because they are going to start giving away services. Although I'm starting to see some tide shifts with respect to cybersecurity. The Biden administration just released their cybersecurity strategy that talks about some of this, right? Like, should cloud providers start assuming more of the responsibility and accountability, potentially just turning up logging services? Like, why should those be additional cost to customers, right, because that's really critical to even support basic monitoring and security monitoring so you can report incidents and breaches.Corey: When you look across what customers are doing, you have a different problem than I do. I go in and I say, “Oh, I fixed the horrifying AWS bill.” And then I stop talking and I wait. Because if people [unintelligible 00:16:44] to that, “Ooh, that's a problem for us,” great. We're having a conversation.If they don't, then there's no opportunity for my consulting over in that part of the world. I don't have to sit down and explain to people why their bill is too high or why they wouldn't want it to be they intrinsically know and understand it or they're honestly not fit to be in business if they can't make a strategic evaluation of whether or not their bill is too high for what they're doing. Security is very different, especially given how vast it is and how unbounded the problem space is, relatively speaking. You have to first educate customers in some ways before attempting to sell them something. How do you do that without, I guess, drifting into the world of FUD where, “Here are all the terrible things that could happen. The solution is to pay me.” Which in many cases is honest, but people have an aversion to it.Michael: Yeah. So, that's how I feel [laugh] a lot of my days here at Sysdig. So, I do try to explain, kind of, these problems in general terms as opposed to just how Sysdig can help you solve for it. But you know, in reality, it is larger strategic challenges, right, there's not necessarily going to be one tool that's going to solve all your problems, the silver bullet, right, it's always true. Yes, Sysdig has a platform that can address a lot of cloud security-type issues, like over-permissioning or telling you what are the actual exploitable workloads in your environment, but that's not necessarily going to help you with, you know, if you have a regulator breathing down your neck and wants to know about an incident, how do you actually relay that information to them, right?It's really just going to help surface event data, stitch things together, that now you have to carry that over to that person or figure out within your organization who's handling that. So, there is kind of this larger piece of, you know, governance risk and compliance, and security tooling helps inform a lot of that, but yeah, every organization is, kind of, have to answer to [laugh] those authorities, often within their own organization, but it could also be government authorities.Corey: Part of the challenge as well is that there's—part of it is tooling, absolutely, but an awful lot of it is a people problem where you have these companies in the security space talking about a variety of advanced threats, of deeply sophisticated attackers that are doing incredibly arcane stuff, and then you have the CEO yelling about what they're doing on a phone call in the airport lounge and their password—which is ‘kitty' by the way—is on a Post-It note on their laptop for everyone to see. It feels like it's one of those, get the basic stuff taken care of first, before going down the path to increasingly arcane attacks. There's an awful lot of vectors to wind up attacking an infrastructure, but so much of what we see from data breaches is simply people not securing S3 buckets, as a common example. It's one of those crawl, walk, run types of stories. For what you do, is there a certain level of sophistication that companies need to get to before what you offer starts to bear fruit?Michael: Very good question, right, and I'd start with… right, there's certainly an element of truth that we're lagging behind on some of the security basics, right, or good security hygiene. But it's not as simple as, like, well, you picked a bad password or you left the port exposed, you know? I think certainly security practitioners know this, I'd even put forth that a lot of engineers know it, particularly if they're been trained more recently. There's been a lot of work to promote security awareness, so we know that we should provide IDs and passwords of sufficient strength, don't expose things you shouldn't be doing. But what tends to happen is, like, as you build monitoring systems, they're just extremely complex and distributed.Not to go down the weeds with app designs, with microservices architecture patterns, and containerized architectures, but that is what happens, right, because the days of building some heavyweight system in the confines of a data center in your organization, those things still do happen, but that's not typically how new systems are being architected. So, a lot of the old problems still linger, there's just many more instances of it and it's highly distributed. So it, kind of, the—the problem becomes very amplified very quickly.Corey: That's, I think, on some level, part of the challenge. It's worse in some ways that even the monitoring and observability space where, “All right, we have 15 tools that we're using right now. Why should we talk to yours?” And the answer is often, “Because we want to be number 16.” It's one of those stories where it winds up just adding incremental cost. And by cost, I don't just mean money; I mean complexity on top of these things. So, you folks are, of course, sponsoring this episode, so the least I can do is ask you, where do you folks start and stop? Sysdig: you do a lot of stuff. What's the sweet spot?Michael: Yeah, I mean, there's a few, right, because it is a larger platform. So, I often talk in terms of full lifecycle security, right? And a lot of organizations will split their approaches. We'll talk about shift left, which is really, let's focus very heavily on secure design, let's test all the code and all the artifacts prior to delivering that thing, try to knock out all quality issues, right, for kind of that general IT, but also security problems, which really should be tracked as quality issues, but including those things like vulnerabilities and misconfigs. So, Sysdig absolutely provides that capability that to satisfy that shift left approach.And Sysdig also focuses very heavily on runtime security or the shield right side of the equation. And that's, you know, give me those capabilities that allow me to monitor all types of workloads, whether they're virtual machines, or containers, serverless abstractions like Fargate because I need to know what's going on everywhere. In the event that there is a potential security incident or breach, I need all that information so I can actually know what happened or report that to a regulatory authority.And that's easier said than done, right? Because when you think about containerized environments, they are very ephemeral. A container might spin up a tear down within minutes, right, and if you're not thinking about your forensics and incident response processes, that data is going to be lost [unintelligible 00:23:10] [laugh]. You're kind of shooting yourself in the foot that way. So yeah, Sysdig kind of provides that platform to give you that full range of capabilities throughout the lifecycle.Corey: I think that that is something that is not fully understood in a lot of cases. I remember a very early Sysdig, I don't know if it was a demo or what exactly it was, I remember was the old Heavybit space in San Francisco, where they came out, it was, I believe, based on an open-source project and it was still taking the perspective, isn't this neat? It gives you really in-depth insight into almost a system-call level of what it is the system is doing. “Cool. So, what's the value proposition for this?”It's like, “Well, step one, be an incredibly gifted engineer when it comes to systems internals.” It's like, “Okay, I'll be back in five years. What's step two?” It's like, “We'll figure it out then.” Now, the story has gone up the stack. It originally felt a little bit like it was a solution in search of a problem. Now, I think you have found that problem, you have clearly hit product-market fit. I see you folks in the wild in many of my customer engagements. You are doing something very right. But it was neat watching, like, it's almost for me, I turned around, took my eye off the ball for a few seconds and it went from, “We have no idea of what we're doing” to, “We know exactly what we're doing.” Nice work.Michael: Yeah. Yeah. Thanks, Corey. Yeah, and there's quite a history with Sysdig in the open-source community. So, one of our co-founders, Loris Degioanni, was one of the creators of Wireshark, which some of your listeners may be familiar with.So, Wireshark was a great network traffic inspection and observability tool. It certainly could be used by, you know, just engineers, but also security practitioners. So, I actually used it quite a bit in my days when I would do pen tests. So, a lot of that design philosophy carried over to the Sysdig open source. So, you're absolutely correct.Sysdig open source is all about gathering that sys-call data on what is happening at that low level. But it's just one piece of the puzzle, exactly as you described. The other big piece of open-source that Sysdig does provide is Falco, which is kind of a threat detection and response engine that can act on all of those signals to tell you, well, what is actually happening is this potentially a malicious event? Is somebody trying to compromise the container runtime? Are they trying to launch a suspicious process? So that those pieces are there under the hood, right, and then Sysdig Secure is, kind of, the larger platform of capabilities that provide a lot of the workflow, nice visualizations, all those things you kind of need to operate at scale when you're supporting your systems and security.Corey: One thing that I do find somewhat interesting is there's always an evolution as companies wind up stumbling through the product lifecycle, where originally it starts off as we have an idea around one specific thing. And that's great. And for you folks, it feels like it was security. Then it started changing a little bit, where okay, now we're going to start doing different things. And I am very happy with the fact right now that when I look at your site, you have two offerings and not two dozen, like a number of other companies tend to. You do Sysdig Secure, which is around the security side of the world, and Sysdig Monitor, which is around the observability side of the world. How did that come to be?Michael: Yeah, it's a really good point, right, and it's kind of in the vendor space [laugh], there's also, like, chasing the acronyms. And [audio break 00:26:41] full disclosure, we are guilty of that at times, right, because sometimes practitioners and buyers seek those things. So, you have to kind of say, yeah, we checked that box for CSPM or CWPP. But yeah, it's kind of talking more generally to organizations and how they operate their businesses, like, that's more well-known constructs, right? I need to monitor this thing or I need to get some security. So, lumping into those buckets helps that way, right, and then you turn on those capabilities you need to support your environment, right?Because you might not be going full-bore into a containerized environment, and maybe you're focusing specifically on the runtime pieces and you're going to, kind of, circle back on security testing in your build pipeline. So, you're only going to use some of those features at the moment. So, it is kind of that platform approach to addressing that problem.Corey: Oh, I would agree. I think that one of the challenges I still have around the observability space—which let's remind people, is hipster monitoring; I don't care what other people say. That's what it is—is that it is depressingly tied to a bunch of other things. To this day, the only place to get a holistic view of everything in your AWS account in every region is the bill. That somehow has become an observability tool. And that's ridiculous.On the other side of it, I have had several engagements that inadvertently went from, “We're going to help optimize your cost,” to, “Yay. We found security incidents.” I don't love a lot of these crossover episodes we wind up seeing, but it is the nature of reality where security, observability, and yes, costs all seem to tie together to some sort of unholy triumvirate. So, I guess the big question is when does Sysdig launch a cost product?Michael: Well, we do have one [laugh], specifically for—Corey: [laugh]. Oh, events once again outpace me.Michael: [laugh]. But yeah, I mean, you touched on this a few times in our discussion today, right? There's heavy intersections, right, and the telemetry you need to gather, right, or the log data you need to gather to inform monitoring use cases or security use cases, a lot of the times that telemetry is the same set of data, it's just you're using it for different purposes. So, we actually see this quite commonly where Sysdig customers might pursue, Monitor or Secure, and then they actually find that there's a lot of value-add to look at the other pieces.And it goes both ways, right? They might start with the security use cases and then they find, well, we've over-allocated on our container environments and we're over-provisioning in Kubernetes resources, so all right, that's cool. We can actually reduce costs that could help create more funding to secure more hosts or more workloads in an environment, right? So it's, kind of, show me the things I'm doing wrong on this side of the equation, whether that's general IT security problems and then benefit the other. And yeah, typically we find that because things are so complex, yeah, you're over-permissioning you're over-allocating, it's just very common, rights? Kubernetes, as amazing as it can be or is, it's really difficult to operate that in practice, right? Things can go off the rails very, very quickly.Corey: I really want to thank you for taking time to speak about how you see the industry and the world. If people want to learn more, where's the best place for them to find you?Michael: Yes, thanks, Corey. It's really been great to be here and talk with you about these topics. So, for me personally, you know, I try to visit LinkedIn pretty regularly. Probably not daily but, you know, at least once a week, so please, by all means, if you ever have questions, do contact me. I love talking about this stuff.But then also on Sysdig, sysdig.com, I do author content on there. I speak regularly in all types of event formats. So yeah, you'll find me out there. I have a pretty unique last name. And yeah, that's kind of it. That's the, I'd say the main sources for me at the moment. Don't fall for the other Isbitski; that's actually my brother, who does work for AWS.Corey: [laugh]. That's okay. There's no accounting for family, sometimes.Michael: [laugh].Corey: I kid, I kid. Okay, great company. Great work. Thank you so much for your time. I appreciate it.Michael: Thank you, Corey.Corey: Mike Isbitski, Director of Cybersecurity Strategy at Sysdig. I'm Cloud Economist Corey Quinn and this has been a promoted guest episode brought to us by our friends at Sysdig. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment from your place, which is no doubt expensive, opaque, and insecure, hitting all three points of that triumvirate.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Jean Yang, CEO of Akita Software, joins Corey on Screaming in the Cloud to discuss how she went from academia to tech founder, and what her company is doing to improve monitoring and observability. Jean explains why Akita is different from other observability & monitoring solutions, and how it bridges the gap from what people know they should be doing and what they actually do in practice. Corey and Jean explore why the monitoring and observability space has been so broken, and why it's important for people to see monitoring as a chore and not a hobby. Jean also reveals how she took a leap from being an academic professor to founding a tech start-up. About JeanJean Yang is the founder and CEO of Akita Software, providing the fastest time-to-value for API monitoring. Jean was previously a tenure-track professor in Computer Science at Carnegie Mellon University.Links Referenced: Akita Software: https://www.akitasoftware.com/ Aki the dog chatbot: https://www.akitasoftware.com/blog-posts/we-built-an-exceedingly-polite-ai-dog-that-answers-questions-about-your-apis Twitter: https://twitter.com/jeanqasaur TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone whose company has… well, let's just say that it has piqued my interest. Jean Yang is the CEO of Akita Software and not only is it named after a breed of dog, which frankly, Amazon service namers could take a lot of lessons from, but it also tends to approach observability slash monitoring from a perspective of solving the problem rather than preaching a new orthodoxy. Jean, thank you for joining me.Jean: Thank you for having me. Very excited.Corey: In the world that we tend to operate in, there are so many different observability tools, and as best I can determine observability is hipster monitoring. Well, if we call it monitoring, we can't charge you quite as much money for it. And whenever you go into any environment of significant scale, we pretty quickly discover that, “What monitoring tool are you using?” The answer is, “Here are the 15 that we use.” Then you talk to other monitoring and observability companies and ask them which ones of those they've replace, and the answer becomes, “We're number 16.” Which is less compelling of a pitch than you might expect. What does Akita do? Where do you folks start and stop?Jean: We want to be—at Akita—your first stop for monitoring and we want to be all of the monitoring, you need up to a certain level. And here's the motivation. So, we've talked with hundreds, if not thousands, of software teams over the last few years and what we found is there is such a gap between best practice, what people think everybody else is doing, what people are talking about at conferences, and what's actually happening in software teams. And so, what software teams have told me over and over again, is, hey, we either don't actually use very many tools at all, or we use 15 tools in name, but it's you know, one [laugh] one person on the team set this one up, it's monitoring one of our endpoints, we don't even know which one sometimes. Who knows what the thresholds are really supposed to be. We got too many alerts one day, we turned it off.But there's very much a gap between what people are saying they're supposed to do, what people in their heads say they're going to do next quarter or the quarter after that and what's really happening in practice. And what we saw was teams are falling more and more into monitoring debt. And so effectively, their customers are becoming their monitoring and it's getting harder to catch up. And so, what Akita does is we're the fastest, easiest way for teams to quickly see what endpoints you have in your system—so that's API endpoints—what's slow and what's throwing errors. And you might wonder, okay, wait, wait, wait, Jean. Monitoring is usually about, like, logs, metrics, and traces. I'm not used to hearing about API—like, what do APIs have to do with any of it?And my view is, look, we want the most simple form of what might be wrong with your system, we want a developer to be able to get started without having to change any code, make any annotations, drop in any libraries. APIs are something you can watch from the outside of a system. And when it comes to which alerts actually matter, where do you want errors to be alerts, where do you want thresholds to really matter, my view is, look, the places where your system interfaces with another system are probably where you want to start if you've really gotten nothing. And so, Akita view is, we're going to start from the outside in on this monitoring. We're turning a lot of the views on monitoring and observability on its head and we just want to be the tool that you reach for if you've got nothing, it's middle of the night, you have alerts on some endpoint, and you don't want to spend a few hours or weeks setting up some other tool. And we also want to be able to grow with you up until you need that power tool that many of the existing solutions out there are today.Corey: It feels like monitoring is very often one of those reactive things. I come from the infrastructure world, so you start off with, “What do you use for monitoring?” “Oh, we wait till the help desk calls us and users are reporting a problem.” Okay, that gets you somewhere. And then it becomes oh, well, what was wrong that time? The drive filled up. Okay, so we're going to build checks in that tell us when the drives are filling up.And you wind up trying to enumerate all of the different badness. And as a result, if you leave that to its logical conclusion, one of the stories that I heard out of MySpace once upon a time—which dates me somewhat—is that you would have a shift, so there were three shifts working around the clock, and each one would open about 5000 tickets, give or take, for the monitoring alerts that wound up firing off throughout their infrastructure. At that point, it's almost, why bother? Because no one is going to be around to triage these things; no one is going to see any of the signal buried and all of that noise. When you talk about doing this for an API perspective, are you running synthetics against those APIs? Are you shimming them in order to see what's passing through them? What's the implementation side look like?Jean: Yeah, that's a great question. So, we're using a technology called BPF, Berkeley Packet Filter. The more trendy, buzzy term is EBPF—Corey: The EBPF. Oh yes.Jean: Yeah, Extended Berkeley Packet Filter. But here's the secret, we only use the BPF part. It's actually a little easier for users to install. The E part is, you know, fancy and often finicky. But um—Corey: SEBPF then: Shortened Extended BPF. Why not?Jean: [laugh]. Yeah. And what BPF allows us to do is passively watch traffic from the outside of a system. So, think of it as you're sending API calls across the network. We're just watching that network. We're not in the path of that traffic. So, we're not intercepting the traffic in any way, we're not creating any additional overhead for the traffic, we're not slowing it down in any way. We're just sitting on the side, we're watching all of it, and then we're taking that and shipping an obfuscated version off to our cloud, and then we're giving you analytics on that.Corey: One of the things that strikes me as being… I guess, a common trope is there are a bunch of observability solutions out there that offer this sort of insight into what's going on within an environment, but it's, “Step one: instrument with some SDK or some agent across everything. Do an entire deploy across your fleet.” Which yeah, people are not generally going to be in a hurry to sign up for. And further, you also said a minute ago that the idea being that someone could start using this in the middle of the night in the middle of an outage, which tells me that it's not, “Step one: get the infrastructure sparkling. Step two: do a global deploy to everything.” How do you go about doing that? What is the level of embeddedness into the environment?Jean: Yeah, that's a great question. So, the reason we chose BPF is I wanted a completely black-box solution. So, no SDKs, no code annotations. I wanted people to be able to change a config file and have our solution apply to anything that's on the system. So, you could add routes, you could do all kinds of things. I wanted there to be no additional work on the part of the developer when that happened.And so, we're not the only solution that uses BPF or EBPF. There's many other solutions that say, “Hey, just drop us in. We'll let you do anything you want.” The big difference is what happens with the traffic once it gets processed. So, what EBPF or BPF gives you is it watches everything about your system. And so, you can imagine that's a lot of different events. That's a lot of things.If you're trying to fix an incident in the middle of the night and someone just dumps on you 1000 pages of logs, like, what are you going to do with that? And so, our view is, the more interesting and important and valuable thing to do here is not make it so that you just have the ability to watch everything about your system but to make it so that developers don't have to sift through thousands of events just to figure out what went wrong. So, we've spent years building algorithms to automatically analyze these API events to figure out, first of all, what are your endpoints? Because it's one thing to turn on something like Wireshark and just say, okay, here are the thousand API calls, I saw—ten thousand—but it's another thing to say, “Hey, 500 of those were actually the same endpoint and 300 of those had errors.” That's quite a hard problem.And before us, it turns out that there was no other solution that even did that to the level of being able to compile together, “Here are all the slow calls to an endpoint,” or, “Here are all of the erroneous calls to an endpoint.” That was blood, sweat, and tears of developers in the night before. And so, that's the first major thing we do. And then metrics on top of that. So, today we have what's slow, what's throwing errors. People have asked us for other things like show me what happened after I deployed. Show me what's going on this week versus last week. But now that we have this data set, you can imagine there's all kinds of questions we can now start answering much more quickly on top of it.Corey: One thing that strikes me about your site is that when I go to akitasoftware.com, you've got a shout-out section at the top. And because I've been doing this long enough where I find that, yeah, you work at a company; you're going to say all kinds of wonderful, amazing aspirational things about it, and basically because I have deep-seated personality disorders, I will make fun of those things as my default reflexive reaction. But something that AWS, for example, does very well is when they announce something ridiculous on stage at re:Invent, I make fun of it, as is normal, but then they have a customer come up and say, “And here's the expensive, painful problem that they solved for us.”And that's where I shut up and start listening. Because it's a very different story to get someone else, who is presumably not being paid, to get on stage and say, “Yeah, this solved a sophisticated, painful problem.” Your shout-outs page has not just a laundry list of people saying great things about it, but there are former folks who have been on the show here, people I know and trust: Scott Johnson over at Docker, Gergely Orosz over at The Pragmatic Engineer, and other folks who have been luminaries in the space for a while. These are not the sort of people that are going to say, “Oh, sure. Why not? Oh, you're going to send me a $50 gift card in a Twitter DM? Sure I'll say nice things,” like it's one of those respond to a viral tweet spamming something nonsense. These are people who have gravitas. It's clear that there's something you're building that is resonating.Jean: Yeah. And for that, they found us. Everyone that I've tried to bribe to say good things about us actually [laugh] refused.Corey: Oh, yeah. As it turns out that it's one of those things where people are more expensive than you might think. It's like, “What, you want me to sell my credibility down the road?” Doesn't work super well. But there's something like the unsolicited testimonials that come out of, this is amazing, once people start kicking the tires on it.You're currently in open beta. So, I guess my big question for you is, whenever you see a product that says, “Oh, yeah, we solve everything cloud, on-prem, on physical instances, on virtual machines, on Docker, on serverless, everything across the board. It's awesome.” I have some skepticism on that. What is your ideal application architecture that Akita works best on? And what sort of things are you a complete nonstarter for?Jean: Yeah, I'll start with a couple of things we work well on. So, container platforms. We work relatively well. So, that's your Fargate, that's your Azure Web Apps. But that, you know, things running, we call them container platforms. Kubernetes is also something that a lot of our users have picked us up and had success with us on. I will say our Kubernetes deploy is not as smooth as we would like. We say, you know, you can install us—Corey: Well, that is Kubernetes, yes.Jean: [laugh]. Yeah.Corey: Nothing in Kubernetes is as smooth as we would like.Jean: Yeah, so we're actually rolling out Kubernetes injection support in the next couple of weeks. So, those are the two that people have had the most success on. If you're running on bare metal or on a VM, we work, but I will say that you have to know your way around a little bit to get that to work. What we don't work on is any Platform as a Service. So, like, a Heroku, a Lambda, a Render at the moment. So those, we haven't found a way to passively listen to the network traffic in a good way right now.And we also work best for unencrypted HTTP REST traffic. So, if you have encrypted traffic, it's not a non-starter, but you need to fall into a couple of categories. You either need to be using Kubernetes, you can run Akita as a sidecar, or you're using Nginx. And so, that's something we're still expanding support on. And we do not support GraphQL or GRPC at the moment.Corey: That's okay. Neither do I. It does seem these days that unencrypted HTTP API calls are increasingly becoming something of a relic, where folks are treating those as anti-patterns to be stamped out ruthlessly. Are you still seeing significant deployments of unencrypted APIs?Jean: Yeah. [laugh]. So, Corey—Corey: That is the reality, yes.Jean: That's a really good question, Corey, because in the beginning, we weren't sure what we wanted to focus on. And I'm not saying the whole deployment is unencrypted HTTP, but there is a place to install Akita to watch where it's unencrypted HTTP. And so, this is what I mean by if you have encrypted traffic, but you can install Akita as a Kubernetes sidecar, we can still watch that. But there was a big question when we started: should this be GraphQL, GRPC, or should it be REST? And I read the “State of the API Report” from Postman for you know, five years, and I still keep up with it.And every year, it seemed that not only was REST, remaining dominant, it was actually growing. So, [laugh] this was shocking to me as well because people said, well, “We have this more structured stuff, now. There's GRPC, there's GraphQL.” But it seems that for the added complexity, people weren't necessarily seeing the value and so, REST continues to dominate. And I've actually even seen a decline in GraphQL since we first started doing this. So, I'm fully on board the REST wagon. And in terms of encrypted versus unencrypted, I would also like to see more encryption as well. That's why we're working on burning down the long tail of support for that.Corey: Yeah, it's one of those challenges. Whenever you're deploying something relatively new, there's this idea that it should be forward-looking and you, on some level, want to modernize your architecture and infrastructure to keep up with it. An AWS integration story I see that's like that these days is, “Oh, yeah, generate an IAM credential set and just upload those into our system.” Yeah, the modern way of doing that is role assumption: to find a role and here's how to configure it so that it can do what we need to do. So, whenever you start seeing things that are, “Oh, yeah, just turn the security clock back in time a little bit,” that's always a little bit of an eyebrow raise.I can also definitely empathize with the joys of dealing with anything that even touches networking in a Lambda context. Building the Lambda extension for Tailscale was one of the last big dives I made into that area and I still have nightmares as a result. It does a lot of interesting things right up until you step off the golden path. And then suddenly, everything becomes yaks all the way down, in desperate need of shaving.Jean: Yeah, Lambda does something we want to handle on our roadmap, but I… believe we need a bigger team before [laugh] we are ready to tackle that.Corey: Yeah, we're going to need a bigger boat is very often [laugh] the story people have when they start looking at entire new architectural paradigms. So, you end up talking about working in containerized environments. Do you find that most of your deployments are living in cloud environments, in private data centers, some people call them private cloud. Where does the bulk of your user applications tend to live these days?Jean: The bulk of our user applications are in the cloud. So, we're targeting small to medium businesses to start. The reason being, we want to give our users a magical deployment experience. So, right now, a lot of our users are deploying in under 30 minutes. That's in no small part due to automations that we've built.And so, we initially made the strategic decision to focus on places where we get the most visibility. And so—where one, we get the most visibility, and two, we are ready for that level of scale. So, we found that, you know, for a large business, we've run inside some of their production environments and there are API calls that we don't yet handle well or it's just such a large number of calls, we're not doing the inference as well and our algorithms don't work as well. And so, we've made the decision to start small, build our way up, and start in places where we can just aggressively iterate because we can see everything that's going on. And so, we've stayed away, for instance, from any on-prem deployments for that reason because then we can't see everything that's going on. And so, smaller companies that are okay with us watching pretty much everything they're doing has been where we started. And now we're moving up into the medium-sized businesses.Corey: The challenge that I guess I'm still trying to wrap my head around is, I think that it takes someone with a particularly rosy set of glasses on to look at the current state of monitoring and observability and say that it's not profoundly broken in a whole bunch of ways. Now, where it all falls apart, Tower of Babelesque, is that there doesn't seem to be consensus on where exactly it's broken. Where do you see, I guess, this coming apart at the seams?Jean: I agree, it's broken. And so, if I tap into my background, which is I was a programming languages person in my very recently, previous life, programming languages people like to say the problem and the solution is all lies in abstraction. And so, computing is all about building abstractions on top of what you have now so that you don't have to deal with so many details and you got to think at a higher level; you're free of the shackles of so many low-level details. What I see is that today, monitoring and observability is a sort of abstraction nightmare. People have just taken it as gospel that you need to live at the lowest level of abstraction possible the same way that people truly believe that assembly code was the way everybody was going to program forevermore back, you know, 50 years ago.So today, what's happening is that when people think monitoring, they think logs, not what's wrong with my system, what do I need to pay attention to? They think, “I have to log everything, I have to consume all those logs, we're just operating at the level of logs.” And that's not wrong because there haven't been any tools that have given people any help above the level of logs. Although that's not entirely correct, you know? There's also events and there's also traces, but I wouldn't say that's actually lifting the level of [laugh] abstraction very much either.And so, people today are thinking about monitoring and observability as this full control, like, I'm driving my, like, race car, completely manual transmission, I want to feel everything. And not everyone wants to or needs to do that to get to where they need to go. And so, my question is, how far are can we lift the level of abstraction for monitoring and observability? I don't believe that other people are really asking this question because most of the other players in the space, they're asking what else can we monitor? Where else can we monitor it? How much faster can we do it? Or how much more detail can we give the people who really want the power tools?But the people entering the buyer's market with needs, they're not people—you don't have, like, you know, hordes of people who need more powerful tools. You have people who don't know about the systems are dealing with and they want easier. They want to figure out if there's anything wrong with our system so they can get off work and do other things with their lives.Corey: That, I think, is probably the thing that gets overlooked the most. It's people don't tend to log into their monitoring systems very often. They don't want to. When they do, it's always out of hours, middle of the night, and they're confronted with a whole bunch of upsell dialogs of, “Hey, it's been a while. You want to go on a tour of the new interface?”Meanwhile, anything with half a brain can see there's a giant spike on the graph or telemetry stop coming in.Jean: Yeah.Corey: It's way outside of normal business hours where this person is and maybe they're not going to be in the best mood to engage with your brand.Jean: Yeah. Right now, I think a lot of the problem is, you're either working with monitoring because you're desperate, you're in the middle of an active incident, or you're a monitoring fanatic. And there isn't a lot in between. So, there's a tweet that someone in my network tweeted me that I really liked which is, “Monitoring should be a chore, not a hobby.” And right now, it's either a hobby or an urgent necessity [laugh].And when it gets to the point—so you know, if we think about doing dishes this way, it would be as if, like, only, like, the dish fanatics did dishes, or, like, you will just have piles of dishes, like, all over the place and raccoons and no dishes left, and then you're, like, “Ah, time to do a thing.” But there should be something in between where there's a defined set of things that people can do on a regular basis to keep up with what they're doing. It should be accessible to everyone on the team, not just a couple of people who are true fanatics. No offense to the people out there, I love you guys, you're the ones who are really helping us build our tool the most, but you know, there's got to be a world in which more people are able to do the things you do.Corey: That's part of the challenge is bringing a lot of the fire down from Mount Olympus to the rest of humanity, where at some level, Prometheus was a great name from that—Jean: Yep [laugh].Corey: Just from that perspective because you basically need to be at that level of insight. I think Kubernetes suffers from the same overall problem where it is not reasonably responsible to run a Kubernetes production cluster without some people who really know what's going on. That's rapidly changing, which is for the better, because most companies are not going to be able to afford a multimillion-dollar team of operators who know the ins and outs of these incredibly complex systems. It has to become more accessible and simpler. And we have an entire near century at this point of watching abstractions get more and more and more complex and then collapsing down in this particular field. And I think that we're overdue for that correction in a lot of the modern infrastructure, tooling, and approaches that we take.Jean: I agree. It hasn't happened yet in monitoring and observability. It's happened in coding, it's happened in infrastructure, it's happened in APIs, but all of that has made it so that it's easier to get into monitoring debt. And it just hasn't happened yet for anything that's more reactive and more about understanding what the system is that you have.Corey: You mentioned specifically that your background was in programming languages. That's understating it slightly. You were a tenure-track professor of computer science at Carnegie Mellon before entering industry. How tied to what your area of academic speciality was, is what you're now at Akita?Jean: That's a great question and there are two answers to that. The first is very not tied. If it were tied, I would have stayed in my very cushy, highly [laugh] competitive job that I worked for years to get, to do stuff there. And so like, what we're doing now is comes out of thousands of conversations with developers and desire to build on the ground tools that I'm—there's some technically interesting parts to it, for sure. I think that our technical innovation is our moat, but is it at the level of publishable papers? Publishable papers are a very narrow thing; I wouldn't be able to say yes to that question.On the other hand, everything that I was trained to do was about identifying a problem and coming up with an out-of-the-box solution for it. And especially in programming languages research, it's really about abstractions. It's really about, you know, taking a set of patterns that you see of problems people have, coming up with the right abstractions to solve that problem, evaluating your solution, and then, you know, prototyping that out and building on top of it. And so, in that case, you know, we identified, hey, people have a huge gap when it comes to monitoring and observability. I framed it as an abstraction problem, how can we lift it up?We saw APIs as this is a great level to build a new level of solution. And our solution, it's innovative, but it also solves the problem. And to me, that's the most important thing. Our solution didn't need to be innovative. If you're operating in an academic setting, it's really about… producing a new idea. It doesn't actually [laugh]—I like to believe that all endeavors really have one main goal, and in academia, the main goal is producing something new. And to me, building a product is about solving a problem and our main endeavor was really to solve a real problem here.Corey: I think that it is, in many cases, useful when we start seeing a lot of, I guess, overflow back and forth between academia and industry, in both directions. I think that it is doing academia a disservice when you start looking at it purely as pure theory, and oh yeah, they don't deal with any of the vocational stuff. Conversely, I think the idea that industry doesn't have anything to learn from academia is dramatically misunderstanding the way the world works. The idea of watching some of that ebb and flow and crossover between them is neat to see.Jean: Yeah, I agree. I think there's a lot of academics I super respect and admire who have done great things that are useful in industry. And it's really about, I think, what you want your main goal to be at the time. Is it, do you want to be optimizing for new ideas or contributing, like, a full solution to a problem at the time? But it's there's a lot of overlap in the skills you need.Corey: One last topic I'd like to dive into before we call it an episode is that there's an awful lot of hype around a variety of different things. And right now in this moment, AI seems to be one of those areas that is getting an awful lot of attention. It's clear too there's something of value there—unlike blockchain, which has struggled to identify anything that was not fraud as a value proposition for the last decade-and-a-half—but it's clear that AI is offering value already. You have recently, as of this recording, released an AI chatbot, which, okay, great. But what piques my interest is one, it's a dog, which… germane to my interest, by all means, and two, it is marketed as, and I quote, “Exceedingly polite.”Jean: [laugh].Corey: Manners are important. Tell me about this pupper.Jean: Yeah, this dog came really out of four or five days of one of our engineers experimenting with ChatGPT. So, for a little bit of background, I'll just say that I have been excited about the this latest wave of AI since the beginning. So, I think at the very beginning, a lot of dev tools people were skeptical of GitHub Copilot; there was a lot of controversy around GitHub Copilot. I was very early. And I think all the Copilot people retweeted me because I was just their earlies—like, one of their earliest fans. I was like, “This is the coolest thing I've seen.”I've actually spent the decade before making fun of AI-based [laugh] programming. But there were two things about GitHub Copilot that made my jaw drop. And that's related to your question. So, for a little bit of background, I did my PhD in a group focused on program synthesis. So, it was really about, how can we automatically generate programs from a variety of means? From constraints—Corey: Like copying and pasting off a Stack Overflow, or—Jean: Well, the—I mean, that actually one of the projects that my group was literally applying machine-learning to terabytes of other example programs to generate new programs. So, it was very similar to GitHub Copilot before GitHub Copilot. It was synthesizing API calls from analyzing terabytes of other API calls. And the thing that I had always been uncomfortable with these machine-learning approaches in my group was, they were in the compiler loop. So, it was, you know, you wrote some code, the compiler did some AI, and then it spit back out some code that, you know, like you just ran.And so, that never sat well with me. I always said, “Well, I don't really see how this is going to be practical,” because people can't just run random code that you basically got off the internet. And so, what really excited me about GitHub Copilot was the fact that it was in the editor loop. I was like, “Oh, my God.”Corey: It had the context. It was right there. You didn't have to go tabbing to something else.Jean: Exactly.Corey: Oh, yeah. I'm in the same boat. I think it is basically—I've seen the future unfolding before my eyes.Jean: Yeah. Was the autocomplete thing. And to me, that was the missing piece. Because in your editor, you always read your code before you go off and—you know, like, you read your code, whoever code reviews your code reads your code. There's always at least, you know, two pairs of eyes, at least theoretically, reading your code.So, that was one thing that was jaw-dropping to me. That was the revelation of Copilot. And then the other thing was that it was marketed not as, “We write your code for you,” but the whole Copilot marketing was that, you know, it kind of helps you with boilerplate. And to me, I had been obsessed with this idea of how can you help developers write less boilerplate for years. And so, this AI-supported boilerplate copiloting was very exciting to me.And I saw that is very much the beginning of a new era, where, yes, there's tons of data on how we should be programming. I mean, all of Akita is based on the fact that we should be mining all the data we have about how your system and your code is operating to help you do stuff better. And so, to me, you know, Copilot is very much in that same philosophy. But our AI chatbot is, you know, just a next step along this progression. Because for us, you know, we collect all this data about your API behavior; we have been using non-AI methods to analyze this data and show it to you.And what ChatGPT allowed us to do in less than a week was analyze this data using very powerful large-language models and I have this conversational interface that both gives you the opportunity to check over and follow up on the question so that what you're spitting out—so what we're spitting out as Aki the dog doesn't have to be a hundred percent correct. But to me, the fact that Aki is exceedingly polite and kind of goofy—he, you know, randomly woofs and says a lot of things about how he's a dog—it's the right level of seriousness so that it's not messaging, hey, this is the end all, be all, the way, you know, the compiler loop never sat well with me because I just felt deeply uncomfortable that an AI was having that level of authority in a system, but a friendly dog that shows up and tells you some things that you can ask some additional questions to, no one's going to take him that seriously. But if he says something useful, you're going to listen. And so, I was really excited about the way this was set up. Because I mean, I believe that AI should be a collaborator and it should be a collaborator that you never take with full authority. And so, the chat and the politeness covered those two parts for me both.Corey: Yeah, on some level, I can't shake the feeling that it's still very early days there for Chat-Gipity—yes, that's how I pronounce it—and it's brethren as far as redefining, on some level, what's possible. I think that it's in many cases being overhyped, but it's solving an awful lot of the… the boilerplate, the stuff that is challenging. A question I have, though, is that, as a former professor, a concern that I have is when students are using this, it's less to do with the fact that they're not—they're taking shortcuts that weren't available to me and wanting to make them suffer, but rather, it's, on some level, if you use it to write your English papers, for example. Okay, great, it gets the boring essay you don't want to write out of the way, but the reason you write those things is it teaches you to form a story, to tell a narrative, to structure an argument, and I think that letting the computer do those things, on some level, has the potential to weaken us across the board. Where do you stand on it, given that you see both sides of that particular snake?Jean: So, here's a devil's advocate sort of response to it, is that maybe the writing [laugh] was never the important part. And it's, as you say, telling the story was the important part. And so, what better way to distill that out than the prompt engineering piece of it? Because if you knew that you could always get someone to flesh out your story for you, then it really comes down to, you know, I want to tell a story with these five main points. And in some way, you could see this as a playing field leveler.You know, I think that as a—English is actually not my first language. I spent a lot of time editing my parents writing for their work when I was a kid. And something I always felt really strongly about was not discriminating against people because they can't form sentences or they don't have the right idioms. And I actually spent a lot of time proofreading my friends' emails when I was in grad school for the non-native English speakers. And so, one way you could see this as, look, people who are not insiders now are on the same playing field. They just have to be clear thinkers.Corey: That is a fascinating take. I think I'm going to have to—I'm going to have to ruminate on that one. I really want to thank you for taking the time to speak with me today about what you're up to. If people want to learn more, where's the best place for them to find you?Jean: Well, I'm always on Twitter, still [laugh]. I'm @jeanqasaur—J-E-A-N-Q-A-S-A-U-R. And there's a chat dialog on akitasoftware.com. I [laugh] personally oversee a lot of that chat, so if you ever want to find me, that is a place, you know, where all messages will get back to me somehow.Corey: And we will, of course, put a link to that into the [show notes 00:35:01]. Thank you so much for your time. I appreciate it.Jean: Thank you, Corey.Corey: Jean Yang, CEO at Akita Software. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that you will then, of course, proceed to copy to the other 17 podcast tools that you use, just like you do your observability monitoring suite.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
An airhacks.fm conversation with Logan Kulinski (@lbkulinski) about: 2009 Dell Inspiron with Pentium Inside, enjoying wrestling games, programming arduino, 15 degrees Fahrenheit in Chicago, learning HTML with DreamWeaver, enjoying C++ and cin and cout, starting with Java 8, hooked with Brian Goetz Java video, enjoying Java Lambda expressions, method does not exist Ruby, starting with Java on smart charging software, Lamba, ECS, Fargate, Aurora MySQL and Java, enjoying, Chicago Java User Group, type checks at build time with Micronaut, saving money with Java, Helidon and Micronaut, Helidon Nima and Project Loom, interesting Text Blocks and records, pattern matching with records, deconstructing records, pattern matching in switch, useful switch expression in Java 17, Visual Studio Code vs. JetBrains Fleet, structured concurrency and Loom, reactive programming with reactive use cases, Loom will scale your servers, reactive programming scales your services Logan Kulinski on twitter: @lbkulinski
In this episode, we're going to be talking about AWS Application Composer - a FREE service that promises to help you build serverless applications with ease. With its simple drag-and-drop interface, it's supposed to make Infrastructure as Code a breeze. But the real question is - does it live up to the hype? We know a lot of you are probably struggling with building applications using CloudFormation. It's a real pain, right? So, we decided to take Application Composer for a spin and see if it's worth adding to your toolkit or giving it a hard pass. After covering a generic overview of the service, how it works, and the main concepts, we discuss our experience in creating a new simple serverless application from scratch only using API Gateway, Lambda, and S3. Then we cover what it looks like to import an existing project (a slightly more complicated one) into Application Composer and find out what works and what doesn't. We conclude by discussing some other things that didn't work as expected and by providing our general recommendation on whether you should be using this service today.
AWS ECS is a powerful service that allows you to run containerized applications at scale. It's suitable for a variety of use cases, including web applications, microservices, and background processing. In this episode, we'll provide an introduction to the main concepts of ECS and then dive into cost-optimization strategies. We'll explore the different options for running containers on ECS, including EC2, Fargate, and ECS Anywhere. We'll discuss various opportunities for saving money, such as using Arm (Graviton) instances, Spot instances, Compute Savings Plans, and RIs or EC2 Saving Plans. Finally, we'll cover how to set up ECS to use Spot instances, including how to create capacity providers and specify a capacity provider strategy. We'll also discuss whether it's always best to use EC2 instead of Fargate for cost optimization and recommend some tools that can help you find other opportunities to save on container costs.
Diya Wynn, Senior Practice Manager in Responsible AI for AWS Machine Learning Solutions Lab, joins Corey on Screaming in the Cloud to discuss her team's efforts to study and implement responsible practices when developing AI technology. Corey and Diya explore the ethical challenges of AI, and why it's so important to be looking ahead for potential issues before they arise. Diya explains why socially responsible AI is still a journey, and describes how her and her team at AWS are seeking to forge that path to help their customers implement the technology in a safe and ethical way. Diya also describes her approach to reducing human-caused bias in AI models. About DiyaDiya Wynn is the Senior Practice Manager in Responsible AI for AWS Machine Learning Solutions Lab. She leads the team that engages with customers globally to go from theory to practice - operationalizing standards for responsible Artificial Intelligence/Machine Learning and data. Diya leads discussions on taking intentional action to uncover potential unintended impacts, and mitigate risks related to the development, deployment and use of AI/ML systems. She leverages her more than 25 years of experience as a technologist scaling products for acquisition; driving inclusion, diversity & equity initiatives; leading operational transformation across industries and understanding of historical and systemic contexts to guide customers in establishing an AI/ML operating model that enables inclusive and responsible products. Additionally, she serves on non-profit boards including the AWS Health Equity Initiative Review Committee; mentors at Tulane University, Spelman College and GMI; was a mayoral appointee in Environment Affairs for 6 consecutive years and guest lectures regularly on responsible and inclusive technology. Diya studied Computer Science at Spelman College, the Management of Technology at New York University, and AI & Ethics at Harvard University Professional School and MIT Sloan School of Management.Links Referenced:Machine Learning is a Marvelously Executed Scam: https://www.lastweekinaws.com/blog/machine-learning-is-a-marvelously-executed-scam/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: Kentik provides Cloud and NetOps teams with complete visibility into hybrid and multi-cloud networks. Ensure an amazing customer experience, reduce cloud and network costs, and optimize performance at scale — from internet to data center to container to cloud. Learn how you can get control of complex cloud networks at www.kentik.com, and see why companies like Zoom, Twitch, New Relic, Box, Ebay, Viasat, GoDaddy, booking.com, and many, many more choose Kentik as their network observability platform. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. In a refreshing change of pace, I have decided to emerge from my home office cave studio thing and go to re:Invent and interview people in person. This is something of a challenge for me because it is way easier in person to punch me in the face, so we'll see how it winds up playing out. My guest today is Diya Wynn, Senior Practice Manager at AWS. Diya, what is a practice manager at AWS? What do you do?Diya: So, a practice manager, I guess you can think of it just like a manager of a team. I have a practice that's specifically focused on Responsible AI. And I mean, practices are just like you could have won in financial services or anything. It's a department, essentially. But more important than the practice in the title is actually what I get a chance to do, and that's working with our customers directly that are using and leveraging our AI/ML services to build products.And we have an opportunity to help them think about how are they using that technology in ways to have improvements or benefit individuals in society, but minimize the risk and the unintended impact or harm. And that's something that we get to do with customers over any industry as well as globally. And my team and I have been enjoying the opportunity to be able to help them along their Responsible AI journey.Corey: So, the idea of Responsible AI is… I'm going to sound old and date myself when I say this, but it feels like it's such a strange concept for me, someone who came up doing systems administration work in physical data centers. The responsible use of a server back when I was hands-on hardware was, “Well, you don't want to hit your coworker with a server no matter how obnoxious they are.” And it was fairly straightforward. It was clear: yes or no. And now it seems that whenever we talk about AI in society, in popular culture, from a technologist's point of view, the answer is always a deeply nuanced shade of gray. Help.Diya: Nuanced shade of gray. That's interesting. It is a little bit more challenging. I think that it is, you know, in one sense because of the notion of all of the data that we get to leverage, and our machine-learning models are reliant on data that has variations coming from, you know, historical sort of elements, things that are here baked with bias, all of that has to be considered. And I think when we think about some of the challenges and even the ways in which AI is being used, it means that we have to be much more mindful of its context, right?And these systems are being used in ways that we probably didn't think about servers being used in the past, but also are in the midst of some high-stakes decisions, right? Whether or not I might be identified or misidentified and inappropriately arrested or if I get the appropriate service that I was thinking about or whether or not there are associations related to my gender or my sexual preference. All of that matters, and so it does become much more of a nuanced conversation. Also because depending on the jurisdiction you're in, the region, what makes sense and what matters might differ slightly. So, it's a multidisciplinary problem or challenge that we need to think about what is the legality of this?And we have to think about social science sometimes and there's an element of ethics. And all of that plays into what becomes responsible, what is the right way in which we use the technology, what are the implications of technology? And so yes, it is a little bit more gray, but there are things that I think we have at our disposal to help us be able to respond to and put in place so that we really are doing the right things with technology.Corey: I've known Amazon across the board to be customer-obsessed, and they tell us that constantly—and I do believe it; I talk to an awful lot of Amazonians—and so much of what the company does comes directly from customer requests. I have to ask, what were customers asking that led to the creation of your group? Because it seems odd to me that you would have someone coming to you and saying, “Well, we built a ‘Hot Dog/Not A Hot Dog' image recognition app,” and, “Oopsie. It turns out our app is incredibly biased against Canadians. How do we fix this?” Like, that does not seem like a realistic conversation. What were the customer concerns? How are they articulated?Diya: No, that's really good. And you're right. They weren't asking the question in that way, but over the last five years or so, I would say, there has been an increase in interest and as well as concern about how AI is being used and the potential risks or the areas of unintended impact. And with this sort of heightened sensitivity or concern, both with our executives as well as members of common society, right—they're starting to talk about that more—they started to ask questions. They're using surfaces we want to be responsible in building.Now, some customers were saying that. And so, they would ask, “What are other customers doing? What should we be aware of? How do we or are there tools that we can use to make sure that we're minimizing bias in our systems? Are there things that we can think about in the way of privacy?”And oftentimes privacy and security are one of those areas that might come up first. And those were the kinds of questions. We actually did a survey asking a number of our customer-facing resources to find out what were customers asking so that we could begin to respond with a product or service that would actually meet that need. And I think we've done a great job in being able to respond to that in providing them assistance. And I think the other thing that we paid attention to was not just the customer requests but also what we're seeing in the marketplace. Part of our job is not only to respond to the customer need but also sometimes to see the need that they're going to have ahead of them because of the way in which the industry is moving. And I think we did a pretty good job of being able to see that and then start to provide service and respond to assist them.Corey: Yeah, it's almost like a rule that I believe it was Scott Hanselman that I stole it from where the third time that you're asked the same question, write a blog post, then that way you can do a full deep—Diya: Did he really say write a post? [laugh].Corey: Treatment of it. Yes, he did. And the idea is, write a blog post—because his blog is phenomenal—and that way, you have a really in-depth authoritative answer to that question and you don't have to ad-lib it off the cuff every time someone asks you in the future. And it feels like that's sort of an expression of what you did. You started off as a customer-facing team where they were asking you the same questions again and again and at some point it's, okay, we can either spend the rest of our lives scaling this team ad infinitum and winding up just answering the phone all day, or we can build a service that directly addresses and answers the question.Diya: Absolutely, absolutely. I think that's the way in which we scale, right, and then we have some consistency and structure in order to be able to respond and meet a need. What we were able to do was—and I think this is sort of the beauty of being at AWS and Amazon; we have this opportunity to create narratives and to see a need, and be able to identify and respond to that. And that's something that everybody can do, not just resigned to a VP or someone that's an executive, we all can do that. And that was an opportunity that I had: seeing the need, getting information and data, and being able to respond and say, “We need to come up with something.”And so, one of our first pieces of work was to actually define a framework. How would we engage? What would be that repeatable process or structure for us, framework that we can leverage with our customers every time to help them think through, look around corners, understand where there's risk, be better informed, and make better-informed decisions about how they were using the technology or what ways they could minimize bias? And so, that framework for us was important. And then we have now tools and services as well that were underway, you know, on our product side, if you will, that are complementing—or that, you know, complement the work.So, not only here's a process, here's a framework and structure, but also here are tools that in technology you can bring to bear to help you automate, to help you understand performance, or even you know, help you minimize the bias and risk.Corey: What's interesting to me, in a very different part of the world than AI, I live in AWS costing because I decided, I don't know, I should just go and try and be miserable for the rest of my life and look at bills all day. But whenever I talk to clients, they asked the same question: what are other customers doing, as you alluded to a few minutes ago? And that feels like it's a universal question. I feel like every customer, no matter in what discipline or what area they're in, is firmly convinced that somewhere out there is this utopian, platonic ideal of the perfect company that has figured all of this stuff out and we're all constantly searching for them. Like, there's got to be someone who has solved this problem the right way.And in several cases, I've had to tell clients that you are actually one of the best in the world and furthest advanced at this particular thing. That customer, the closest we've got to them is you, so we should be asking you these questions. And for whatever it's worth, no one ever likes hearing that because, “Like, oh, we're doing something wild.” It's like—Diya: [crosstalk 00:10:15] pioneers.Corey: —“Well, we got to solve this ourselves? That's terrible.”Diya: Well, it's interesting you say that because it is a common question. I think customers have an expectation that because we are AWS, we've seen a lot. And I think that's true. There are tens of thousands of customers that are using our services, we have conversations with companies all across the world, so we do have some perspective of what other customers are doing and that's certainly something that we can bring to the table. But the other part of this is that this is really a new area. This is a sort of new space, that we're focused on trustworthy and Responsible AI, and there aren't a ton of customers that are doing this—or companies at all—that have it entirely answered, that have—you know, we're all on a journey.So, these are, I would say, early stages. And we do have the benefit of being large, having a lot of customers, having some experience in building services as well as helping our customers build products, having a team that's focused on looking at standards and working with standards bodies globally, having teams that are working on our understanding what we're doing in regulation and public policy. And so, all of that we bring to bear when we start talking about, you know, this with our customers. But we don't have all the answers; we're on a journey like them. And I think that's something that we have to be comfortable with, to some degree, that this is an evolving area and we're learning. And we're investing even in research to help us continue to move forward. But there's a lot that we know, that there's a lot that we can bring to the table, and we can help our customers in that regard.Corey: Now, this might very well be old news and well understood and my understanding is laughably naive when this gets released, but as of this recording, a few hours beforehand, you released something called Service Cards. And I have to say, my initial glance at this was honestly one of disappointment when I saw what it was because what I was hoping for, with—when you ever see ‘service' and ‘cards' together, is these are going to be printable cardboard, little cards that I can slip into the Monopoly board game I have at home and game night at home is going to be so traumatic for my kids afterwards. Like, “What's a Fargate?” Says the five-year-old, and there we go. “It means that daddy is not going to passing go, going directly to jail with you. Have fun,” it's great. But I don't think that's what it is.Diya: No, not at all. Not at all. So, it is very similar to the context that people might be familiar with around model cards, being able to give definition and understanding of a model that's being used. For us, we sort of took that concept at one step beyond that in that, you know, just providing a model card isn't sufficient necessarily, especially when there are multiple services or multiple models being used for any one of our services. But what our Service Cards allow us to do is to provide a better understanding of the intended use of the service, you know, and the model that's underpinning that, give context for the performance of that service, give guidelines for our customers to be able to understand how was it best used and how does it best perform.And that's a degree of transparency that we're providing under the hood, for our customers to really help them as well be much more responsible and how they're building on top of those. And it gives them clarity because there is a growing interest in the marketplace for our customers to hold their vendors—or companies to hold their vendors responsible, right, making sure that they're doing the right things and covering off, are we building well? Do we have, like, the customer or enough of demographic covered? What the performance looks like. And this is a really big opportunity for us to be transparent with our customers about how our services are being built and give them a little bit more of that guardrail that we were talking about—guidelines—how to best use it as they look to build upon those.Corey: Not in any way, shape, or form to besmirch the importance of a lot of the areas that you're covering on this, but on some level, I'm just envious in that it would be so nice to have that for every AWS service, of this is how it is—Diya: Uh-oh [laugh].Corey: —actually intended to be used. Because to me, I look at it and all I see is database, database, really expensive database, probably a database, and, like, none of those are designed to be databases. Like, “You lack imagination,” is my approach. And no, it just turns out I'm terrible at computers, but I'm also enthusiastic and those are terrible combinations. But I would love to see breakdowns around things like that as far as intended use, potential pitfalls, and increasingly as we start seeing more and more services get machine learning mixed in, for lack of a better term, increasingly we're going to start to see areas where the ethical implications absolutely are going to be creeping in. Which is a wild thing to say about, I don't know, a service that recommends how to right-size instances having ethical concerns. But it's not that unreasonable.Diya: Well, I can't make any promises about us having those kinds of instructions or guidelines for some of our other services, but we are certainly committed to being able to provide this transparency across our AI/ML services. And again, that's something I will say that's a journey. We've released a few today; there are others that are going to come. We're going to continue to iterate and evolve so that we can get through our services. And there's a lot of work behind that, right?It's not just that we wrote up this document, but it is providing transparency. But it also means that our teams are doing a great bit in terms of the diligence to be able to provide that feedback, to be able to test their models, understand their datasets, you know, provide information about the datasets in public—you know, for the public datasets that are being tested against, and also have the structure for them to train their models appropriately. So, there's a lot going into the development of those that may not be immediately transparent, but really is core to our commitment to how we're building our services now.Corey: It's a new area in many respects because, like, to be very direct. If I wind up misusing or being surprised by a bad implementation of something in most cases in AWS context, the disaster area looks a lot closer to I get a big bill. Which—and this [unintelligible 00:16:35] is going to sound bizarre, but here we are, it's only money. Money can be fixed. I can cry and sob to support and get that fixed.With things like machine learning and AI, the stakes are significantly higher because given some of the use cases and given some of the rapid emerging technology areas in which these things are being tested and deployed, it hurts people if it gets wrong. And an AWS bill is painful, but not in a damaging to populations level. Yet. I'm sure at some point, it becomes so large it becomes its own micro-economy, I guess the way those credits are now, but it's a very different way.Diya: Right. Absolutely. So, I think that's why our work from a responsibility perspective is important. But I think it's also valuable for customers to understand, we're taking a step forward and being able to help them. Very much like what we do with well-architected, right? We have a framework, we have best practices and guidance that is being provided so that our customers who are using our cloud services really know what's the best.This is very much like those Service Cards, right? Here's the best conditions in order to be able to use and get the greatest value out of your cloud investment. The same thing is what we're doing with this approach in helping our customers in the Responsible AI way. Here's the best, sort of, best practices, guidance, guardrails, tools that are going to help you make the most out of your investment in AI and minimize where there's this unintended or potential areas of potential harm that you were describing. And you're right, there are high stakes use cases, right, that we want to make sure or want to be able to help and equip our customers to think more about intentionally and be prepared to be able to hopefully have a governance structure, people aligned, processes, technology to really be able to minimize that, right? We want to reduce the blast radius.[midroll 00:18:37]Corey: One thing I want to call out as well is that as much as we love in tech to pretend that we have invented all of these things ourselves—like, we see it all the time; like, “No one really knows how to hire, there's no real scientific study on this.” “Yes, there are. There are multi-decade longitudinal studies at places like GM and whatnot.” And, “No, no, no tech is different. There's no way to know this. La la la.”And that's great. We have to invent these things ourselves. But bias has been a thing in business decisions, even ones that are not directly caused by humans, for a long time. An easy example is in many cases, credit ratings and decisions whether to grant credit or not. Like, they were not using machine learning in the 90s to do this, but strangely, depending upon a wide variety of factors that are not actually things that are under your control as a person, you are deemed to be a good credit risk versus a bad credit risk.And as a result, I think one of the best terms I heard in the early days when machine learning started getting big, was just referring to it as bias laundering. Well, we've had versions of that for a long time. Now, at least it seems like this shines a light on it if nothing else, and gives us an opportunity to address it.Diya: Absolutely. Oh, I'd love that, right? The opportunity to address it. So, one of the things that I often share with folks is we all have bias, right? And so, like you said we've had bias in a number of cases. Now, you know, in some cases, bias is understandable. We all have it. It is the thing that often—we talk about the sort of like mental shortcuts, things that we do that help us to respond rapidly in the world in the vast array of information that we're taking in all the time. So—Corey: You're an Amazonian. You yourself bias for action.Diya: Exactly. Right? So, we have bias. Now, the intent is that we want to be able to disrupt that so that we don't make decisions, oftentimes, that could be harmful, right? So, we have proclivities, desires, interest, right, that kind of folds into our bias, but there are other things, our background, where we went to school, you know, experiences that we had, information that we've been taking that also helped to drive towards some of those biases.So, that's one element, right, understanding that. A human bias gets infiltrated into our systems. And there was a study in AI now—I think it was 2019—that talked about that, right, that our systems are often biased by—or the bias is introduced, you know, sometimes by individuals. And part of the necessity for us to be able to eliminate that is understanding that we have bias, do things to interrupt it, and then also bringing in diversity, right? Because some of our biases are just that we don't have enough of the right perspectives in the room; we don't have enough of the right people involved, right?And so, being able to sort of widen the net, making sure that we're involving the outliers, I think are important to us being able to eliminate bias as well. And then there are tools that we can use. But then you also bring up something interesting here in terms of the data, right? And there's a part that education plays a good role in helping us understand the things like what you described our institutional biases baked into our data that also can come out in decisions that are now being made. And the more that we use AI in these ways, the more there is risk for that, right?So, that's why this effort in Responsible AI, understanding how we mitigate bias, understanding how we invite the right people in, the inclusion of the right perspectives, thinking about the outliers, thinking about whether or not this is the right problem for us to solve with AI is important, right, so that we can minimize those areas where bias is just another thing that we continue to propagate.Corey: So, a year or two ago, I wrote a blog post titled Machine Learning is a Marvelously Executed Scam. And it was talking about selling digital pickaxes into a data gold rush.Diya: I [crosstalk 00:22:30] remember this one [laugh].Corey: And it was a lot of fun. In fact, the Head of Analyst Relations at AWS for Machine Learning responded by sending me a Minecraft pickaxe made out of foam, which is now in my home office hung behind my office and I get a comment at least three times a week on that. It was absolutely genius as far as rebuttal go. And I've got to find some way to wind up responding to her in kind one of these days.But it felt like it was a solution in search of a problem. And I no longer hold so closely to that particular opinion, in no small part due to the fact that, as you're discussing, this area is fraught, it's under an awful lot of scrutiny, large companies who use these things and then those tools get it wrong are going to basically wind up being castigated for it. And yet, they are clearly realizing enough value from machine learning that it is worth the risk. And these are companies whose entire business, start to finish, is managing and mitigating risk. There is something there or suddenly everyone has taken leave of their senses. I don't quite buy that second option, so I'm guessing it's the first.Diya: So, the question is, is it worth the risk? And I would say, I think some people might or some companies might have started to step into that area thinking that it is, but it's not. And that's what we're saying and that's what we're hearing in the industry [unintelligible 00:23:51], that it's not worth the risk. And you're hearing from customers, outcries from others, government officials, right, all of them are saying, like, “It's not worth the risk and we have to pay attention to that.”But I think that there's certainly value and we're seeing that, right? We're solving previously unattainable problems with AI. We want to be able to continue to do that, but give people the means to be able to sort of minimize where there is risk and recognize that this is not a risk that's worth us taking. So, the potential for reputational harm and the damage that will do is real, right? When a company is called out for the fact that they've discriminated and they're unfairly evaluating homes, for instance, for people of color in certain communities, right, that's not something that's going to be tolerated or accepted.And so, you have people really calling those things out so that we start to—organizations do the right things and not think that risk is worth the [unintelligible 00:24:52]. It is very well worth the risk to use AI, but we've got to do it responsibly. There's so much value in what we are able to accomplish. So, we're seeing, you know, even with Covid, being able to advance, like, the technology around vaccinations and how that was done and accelerated with machine learning, or being able to respond to some of the needs that small businesses and others had, you know, during Covid, being able to continuate their service because we didn't have people in businesses or in offices, a lot of that was advanced during that time as a result of AI. We want to be able to see advances like that and companies be able to continue to innovate, and so we want to be able to do that without the risk, without the sort of impact that we're talking about, the negative impact. And I think that's why the work is so important.Corey: Do you believe that societally we're closer to striking the right balance?Diya: We're on our way. I think this is certainly a journey. There is a lot of attention on this in the right ways. And my hope—and certainly, that's why I'm in a role like this—that we can actually invite the right voices into the room. One of the things—and one of my colleagues said this earlier today, and I think it was a really, really great point, right—as we are seeing—first of all, we never thought that we would have, like, ethicists roles and sort of Responsible AI folks, and chief ethics officers. That was not something that existed in the context of, sort of, machine learning, and that's something that it's evolved in the last, you know, few years.But the other thing that we're seeing is that the folks that are sitting in those roles are increasingly diverse and are helping to drive the focus on the inclusion that we need and the value of making sure that those voices are there so that we can build in inclusive and responsible ways. And that's one of the things that I think is helping us get there, right? We're not entirely there, but I think that we're on a path. And the more that we can have conversations like this, the more that companies are starting to pay attention and take intentional action, right, to build ethically and to have the trust in the technology and the products that they build, and to do that in responsible ways, we'll get there.Corey: I really want to thank you for taking so much time to talk through what you're up to with me.Diya: I am super excited and glad that you were able to have me on. I love talking about this, so it's great. And I think it's one of the ways that we get more people aware, and hopefully, it sparks the interest in companies to take their own Responsible AI journey.Corey: Thank you so much for your time.Diya: Thanks for having me.Corey: I appreciate it. Diya Wynn, Senior Practice Manager at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, presumably because you're Canadian.Diya: [laugh].Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
About Chris Chris Farris has been in the IT field since 1994 primarily focused on Linux, networking, and security. For the last 8 years, he has focused on public-cloud and public-cloud security. He has built and evolved multiple cloud security programs for major media companies, focusing on enabling the broader security team's objectives of secure design, incident response and vulnerability management. He has developed cloud security standards and baselines to provide risk-based guidance to development and operations teams. As a practitioner, he's architected and implemented multiple serverless and traditional cloud applications focused on deployment, security, operations, and financial modeling.Chris now does cloud security research for Turbot and evangelizes for the open source tool Steampipe. He is one if the organizers of the fwd:cloudsec conference (https://fwdcloudsec.org) and has given multiple presentations at AWS conferences and BSides events.When not building things with AWS's building blocks, he enjoys building Legos with his kid and figuring out what interesting part of the globe to travel to next. He opines on security and technology on Twitter and his website https://www.chrisfarris.comLinks Referenced: Turbot: https://turbot.com/ fwd:cloudsec: https://fwdcloudsec.org/ Steampipe: https://steampipe.io/ Steampipe block: https://steampipe.io/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I have been meaning to invite slash drag onto this show for a number of years. We first met at re:Inforce the first year that they had such a thing, Amazon's security conference for cloud, as is Amazon's tradition, named after an email subject line. Chris Farris is a cloud security nerd at Turbot. He's also one of the organizers for fwd:cloudsec, another security conference named after an email subject line with a lot more self-awareness than any of Amazon's stuff. Chris, thank you for joining me.Chris: Oh, thank you for dragging me on. You can let go of my hair now.Corey: Wonderful, wonderful. That's why we're all having the thinning hair going on. People just use it to drag us to and fro, it seems. So, you've been doing something that I'm only going to describe as weird lately because your background—not that dissimilar from mine—is as a practitioner. You've been heavily involved in the security space for a while and lately, I keep seeing an awful lot of things with your name on them getting sucked up by the giant app surveillance apparatus deployed to the internet, looking for basically any mention of AWS that I wind up using to write my newsletter and feed the content grist mill every year. What are you doing and how'd you get there?Chris: So, what am I doing right now is, I'm in marketing. It's kind of a, you know, “Oops, I'm sorry I did that.”Corey: Oh, the running gag is, you work in DevRel; that means, “Oh, you're in marketing, but they're scared to tell you that.” You're self-aware.Chris: Yeah.Corey: Good for you.Chris: I'm willing to address that I'm in marketing now. And I've been a cloud practitioner since probably 2014, cloud security since about 2017. And then just decided, the problem that we have in the cloud security community is a lot of us are just kind of sitting in a corner in our companies and solving problems for our companies, but we're not solving the problems at scale. So, I wanted a job that would allow me to reach a broader audience and help a broader audience. Where I see cloud security having—you know, or cloud in general falling down is Amazon makes it really hard for you to do your side of shared responsibility, and so we need to be out there helping customers understand what they need to be doing. So, I am now at a company called Turbot and we're really trying to promote cloud security.Corey: One of the first promoted guest episodes of this show was David Boeke, your CTO, and one of the things that I regret is that I've sort of lost track of Turbot over the past few years because, yeah, one or two things might have been going on during that timeline as I look back at having kids in the middle of a pandemic and the deadly plague o'er land. And suddenly, every conversation takes place over Zoom, which is like, “Oh, good, it's like a happy hour only instead, now it's just like a conference call for work.” It's like, ‘Conference Calls: The Drinking Game' is never the great direction to go in. But it seems the world is recovering. We're going to be able to spend some time together at re:Invent by all accounts that I'm actively looking forward to.As of this recording, you're relatively new to Turbot, and I figured out that you were going there because, once again, content hits my filters. You wrote a fascinating blog post that hits on an interest of mine that I don't usually talk about much because it's off-putting to some folk, and these days, I don't want to get yelled at and more than I have to about the experience of traveling, I believe it was to an all-hands on the other side of the world.Chris: Yep. So, my first day on the job at Turbot, I was landing in Kuala Lumpur, Malaysia, having left the United States 24 hours—or was it 48? It's hard to tell when you go to the other side of the planet and the time zones have also shifted—and then having left my prior company day before that. But yeah, so Turbot about traditionally has an annual event where we all get together in person. We're a completely remote company, but once a year, we all get together in person in our integrate event.And so, that was my first day on the job. And then you know, it was basically two weeks of reasonably intense hackathons, building out a lot of stuff that hopefully will show up open-source shortly. And then yeah, meeting all of my coworkers. And that was nice.Corey: You've always had a focus through all the time that I've known you and all the public content that you've put out there that has come across my desk that seems to center around security. It's sort of an area that I give a nod to more often than I would like, on some level, but that tends to be your bread and butter. Your focus seems to be almost overwhelmingly on I would call it AWS security. Is that fair to say or is that a mischaracterization of how you view it slash what you actually do? Because, again, we have these parasocial relationships with voices on the internet. And it's like, “Oh, yeah, I know all about that person.” Yeah, you've met them once and all you know other than that is what they put on Twitter.Chris: You follow me on Twitter. Yeah, I would argue that yes, a lot of what I do is AWS-related security because in the past, a lot of what I've been responsible for is cloud security in AWS. But I've always worked for companies that were multi-cloud; it's just that 90% of everything was Amazon and so therefore 90% of my time, 90% of my problems, 90% of my risk was all in AWS. I've been trying to break out of that. I've been trying to understand the other clouds.One of the nice aspects of this role and working on Steampipe is I am now experimenting with other clouds. The whole goal here is to be able to scale our ability as an industry and as security practitioners to support multiple clouds. Because whether we want to or not, we've got it. And so, even though 90% of my spend, 90% of my resources, 90% of my applications may be in AWS, that 10% that I'm ignoring is probably more than 10% of my risk, and we really do need to understand and support major clouds equally.Corey: One post you had recently that I find myself in wholehearted agreement with is on the adoption of Tailscale in the enterprise. I use it for all of my personal nonsense and it is transformative. I like the idea of what that portends for a multi-cloud, or poly-cloud, or whatever the hell we're calling it this week, sort of architectures were historically one of the biggest problems in getting to clouds two speak to one another and manage them in an intelligent way is the security models are different, the user identity stuff is different as well, and the network stuff has always been nightmarish. Well, with Tailscale, you don't have to worry about that in the same way at all. You can, more or less, ignore it, turn on host-based firewalls for everything and just allow Tailscale. And suddenly, okay, I don't really have to think about this in the same way.Chris: Yeah. And you get the micro-segmentation out of it, too, which is really nice. I will agree that I had not looked at Tailscale until I was asked to look at Tailscale, and then it was just like, “Oh, I am completely redoing my home network on that.” But looking at it, it's going to scare some old-school network engineers, it's going to impact their livelihoods and that is going to make them very defensive. And so, what I wanted to do in that post was kind of address, as a practitioner, if I was looking at this with an enterprise lens, what are the concerns you would have on deploying Tailscale in your environment?A lot of those were, you know, around user management. I think the big one that is—it's a new thing in enterprise security, but kind of this host profiling, which is hey, before I let your laptop on the network, I'm going to go make sure that you have antivirus and some kind of EDR, XDR, blah-DR agents so that you know we have a reasonable thing that you're not going to just go and drop [unintelligible 00:09:01] on the network and next thing you know, we're Maersk. Tailscale, that's going to be their biggest thing that they are going to have to figure out is how do they work with some of these enterprise concerns and things along those lines. But I think it's an excellent technology, it was super easy to set up. And the ability to fine-tune and microsegment is great.Corey: Wildly so. They occasionally sponsor my nonsense. I have no earthly idea whether this episode is one of them because we have an editorial firewall—they're not paying me to set any of this stuff, like, “And this is brought to you by whatever.” Yeah, that's the sponsored ad part. This is just, I'm in love with the product.One of the most annoying things about it to me is that I haven't found a reason to give them money yet because the free tier for my personal stuff is very comfortably sized and I don't have a traditional enterprise network or anything like that people would benefit from over here. For one area in cloud security that I think I have potentially been misunderstood around, so I want to take at least this opportunity to clear the air on it a little bit has been that, by all accounts, I've spent the last, mmm, few months or so just absolutely beating the crap out of Azure. Before I wind up adding a little nuance and context to that, I'd love to get your take on what, by all accounts, has been a pretty disastrous year-and-a-half for Azure security.Chris: I think it's been a disastrous year-and-a-half for Azure security. Um—[laugh].Corey: [laugh]. That was something of a leading question, wasn't it?Chris: Yeah, no, I mean, it is. And if you think, though, back, Microsoft's repeatedly had these the ebb and flow of security disasters. You know, Code Red back in whatever the 2000s, NT 4.0 patching back in the '90s. So, I think we're just hitting one of those peaks again, or hopefully, we're hitting the peak and not [laugh] just starting the uptick. A lot of what Azure has built is stuff that they already had, commercial off-the-shelf software, they wrapped multi-tenancy around it, gave it a new SKU under the Azure name, and called is cloud. So, am I super-surprised that somebody figured out how to leverage a Jupyter notebook to find the back-end credentials to drop the firewall tables to go find the next guy over's Cosmos DB? No, I'm not.Corey: I find their failures to be less egregious on a technical basis because let's face it, let's be very clear here, this stuff is hard. I am not pretending for even a slight second that I'm a better security engineer than the very capable, very competent people who work there. This stuff is incredibly hard. And I'm not—Chris: And very well-funded people.Corey: Oh, absolutely, yeah. They make more than I do, presumably. But it's one of those areas where I'm not sitting here trying to dunk on them, their work, their efforts, et cetera, and I don't do a good enough job of clarifying that. My problem is the complete radio silence coming out of Microsoft on this. If AWS had a series of issues like this, I'm hard-pressed to imagine a scenario where they would not have much more transparent communications, they might very well trot out a number of their execs to go on a tour to wind up talking about these things and what they're doing systemically to change it.Because six of these in, it's like, okay, this is now a cultural problem. It's not one rando engineer wandering around the company screwing things up on a rotational basis. It's, what are you going to do? It's unlikely that firing Steven is going to be your fix for these things. So, that is part of it.And then most recently, they wound up having a blog post on the MSRC, the Microsoft Security Resource Center is I believe that acronym? The [mrsth], whatever; and it sounds like a virus you pick up in a hospital—but the problem that I have with it is that they spent most of that being overly defensive and dunking on SOCRadar, the vulnerability researcher who found this and reported it to them. And they had all kinds of quibbles with how it was done, what they did with it, et cetera, et cetera. It's, “Excuse me, you're the ones that left customer data sitting out there in the Azure equivalent of an S3 bucket and you're calling other people out for basically doing your job for you? Excuse me?”Chris: But it wasn't sensitive customer data. It was only the contract information, so therefore it was okay.Corey: Yeah, if I put my contract information out there and try and claim it's not sensitive information, my clients will laugh and laugh as they sue me into the Stone Age.Chris: Yeah well, clearly, you don't have the same level of clickthrough terms that Microsoft is able to negotiate because, you know, [laugh].Corey: It's awful as well, it doesn't even work because, “Oh, it's okay, I lost some of your data, but that's okay because it wasn't particularly sensitive.” Isn't that kind of up to you?Chris: Yes. And if A, I'm actually, you know, a big AWS shop and then I'm looking at Azure and I've got my negotiations in there and Amazon gets wind that I'm negotiating with Azure, that's not going to do well for me and my business. So no, this kind of material is incredibly sensitive. And that was an incredibly tone-deaf response on their part. But you know, to some extent, it was more of a response than we've seen from some of the other Azure multi-tenancy breakdowns.Corey: Yeah, at least they actually said something. I mean, there is that. It's just—it's wild to me. And again, I say this as an Azure customer myself. Their computer vision API is basically just this side of magic, as best I can tell, and none of the other providers have anything like it.That's what I want. But, you know, it almost feels like that service is under NDA because no one talks about it when they're using this service. I did a whole blog post singing its praises and no one from that team reached out to me to say, “Hey, glad you liked it.” Not that they owe me anything, but at the same time it's incredible. Why am I getting shut out? It's like, does this company just have an entire policy of not saying anything ever to anyone at any time? It seems it.Chris: So, a long time ago, I came to this realization that even if you just look at the terminology of the three providers, Amazon has accounts. Why does Amazon have Amazon—or AWS accounts? Because they're a retail company and that's what you signed up with to buy your underwear. Google has projects because they were, I guess, a developer-first thing and that was how they thought about it is, “Oh, you're going to go build something. Here's your project.”What does Microsoft have? Microsoft Azure Subscriptions. Because they are still about the corporate enterprise IT model of it's really about how much we're charging you, not really about what you're getting. So, given that you're not a big enterprise IT customer, you don't—I presume—do lots and lots of golfing at expensive golf resorts, you're probably not fitting their demographic.Corey: You're absolutely not. And that's wild to me. And yet, here we are.Chris: Now, what's scary is they are doing so many interesting things with artificial intelligence… that if… their multi-tenancy boundaries are as bad as we're starting to see, then what else is out there? And more and more, we is carbon-based life forms are relying on Microsoft and other cloud providers to build AI, that's kind of a scary thing. Go watch Satya's keynote at Microsoft Ignite and he's showing you all sorts of ways that AI is going to start replacing the gig economy. You know, it's not just Tesla and self-driving cars at this point. Dali is going to replace the independent graphics designer.They've got things coming out in their office suite that are going to replace the mom-and-pop marketing shops that are generating menus and doing marketing plans for your local restaurants or whatever. There's a whole slew of things where they're really trying to replace people.Corey: That is a wild thing to me. And part of the problem I have in covering AWS is that I have to differentiate in a bunch of different ways between AWS and its Amazon corporate parent. And they have that problem, too, internally. Part of the challenge they have, in many cases, is that perks you give to employees have to scale to one-and-a-half million people, many of them in fulfillment center warehouse things. And that is a different type of problem that a company, like for example, Google, where most of their employees tend to be in office job-style environments.That's a weird thing and I don't know how to even start conceptualizing things operating at that scale. Everything that they do is definitionally a very hard problem when you have to make it scale to that point. What all of the hyperscale cloud providers do is, from where I sit, complete freaking magic. The fact that it works as well as it does is nothing short of a modern-day miracle.Chris: Yeah, and it is more than just throwing hardware at the problem, which was my on-prem solution to most of the things. “Oh, hey. We need higher availability? Okay, we're going to buy two of everything.” We called it the Noah's Ark model, and we have an A side and a B side.And, “Oh, you know what? Just in case we're going to buy some extra capacity and put it in a different city so that, you know, we can just fail from our primary city to our secondary city.” That doesn't work at the cloud provider scale. And really, we haven't seen a major cloud outage—I mean, like, a bad one—in quite a while.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: The outages are always fascinating, just from the way that they are reported in the mainstream media. And again, this is hard, I get it. I am not here to crap on journalists. They, for some ungodly, unknowable reason, have decided not to spend their entire career focusing on the nuances of one very specific, very deep industry. I don't know why.But as [laugh] a result, they wind up getting a lot of their baseline facts wrong about these things. And that's fair. I'm not here to necessarily act as an Amazon spokesperson when these things happen. They have an awful lot of very well-paid people who can do that. But it is interesting just watching the blowback and the reaction of whatever there's an outage, the conversation is never “Does Amazon or Azure or Google suck?” It's, “Does cloud suck as a whole?”That's part of the reason I care so much about Azure getting their act together. If it were just torpedoing Microsoft's reputation, then well, that's sad, but okay. But it extends far beyond that to a point where it's almost where the enterprise groundhog sees the shadow of a data breach and then we get six more years of data center build-outs instead of moving things to a cloud. I spent too many years working in data centers and I have the scars from the cage nuts and crimping patch cables frantically in the middle of the night to prove it. I am thrilled at the fact that I don't believe I will ever again have to frantically drive across town in the middle of the night to replace a hard drive before the rest of the array degrades. Cloud has solved those problems beautifully. I don't want to go back to the Dark Ages.Chris: Yeah, and I think that there's a general potential that we could start seeing this big push towards going back on-prem for effectively sovereign data reasons, whether it's this country has said, “You cannot store your data about our citizens outside of our borders,” and either they're doing that because they do not trust the US Silicon Valley privacy or whatever, or because if it's outside of our borders, then our secret police agents can come knocking on the door at two in the morning to go find out what some dissidents' viewings habits might have been, I see sovereign cloud as this thing that may be a back step from this ubiquitous thing that we have right now in Amazon, Azure, and Google. And so, as we start getting to the point in the history books where we start seeing maps with lots of flags, I think we're going to start seeing a bifurcation of cloud as just a whole thing. We see it already right now. The AWS China partition is not owned by Amazon, it is not run by Amazon, it is not controlled by Amazon. It is controlled by the communist government of China. And nobody is doing business in Russia right now, but if they had not done what they had done earlier this year, we might very well see somebody spinning up a cloud provider that is completely controlled by and in the Russian government.Corey: Well, yes or no, but I want to challenge that assessment for a second because I've had conversations with a number of folks about this where people say, “Okay, great. Like, is the alt-right, for example, going to have better options now that there might be a cloud provider spinning up there?” Or, “Well, okay, what about a new cloud provider to challenge the dominance of the big three?” And there are all these edge cases, either geopolitically or politically based upo—or folks wanting to wind up approaching it from a particular angle, but if we were hired to build out an MVP of a hyperscale cloud provider, like, the budget for that MVP would look like one 100 billion at this point to get started and just get up to a point of critical mass before you could actually see if this thing has legs. And we'd probably burn through almost all of that before doing a single dime in revenue.Chris: Right. And then you're doing that in small markets. Outside of the China partition, these are not massively large markets. I think Oracle is going down an interesting path with its idea of Dedicated Cloud and Oracle Alloy [unintelligible 00:22:52].Corey: I like a lot of what Oracle's doing, and if younger me heard me say that, I don't know how hard I'd hit myself, but here we are. Their free tier for Oracle Cloud is amazing, their data transfer prices are great, and their entire approach of, “We'll build an entire feature complete region in your facility and charge you what, from what I can tell, is a very reasonable amount of money,” works. And it is feature complete, not, “Well, here are the three services that we're going to put in here and everything else is well… it's just sort of a toehold there so you can start migrating it into our big cloud.” No. They're doing it right from that perspective.The biggest problem they've got is the word Oracle at the front end and their, I would say borderline addiction to big-E enterprise markets. I think the future of cloud looks a lot more like cloud-native companies being founded because those big enterprises are starting to describe themselves in similar terminology. And as we've seen in the developer ecosystem, as go startups, so do big companies a few years later. Walk around any big company that's undergoing a digital transformation, you'll see a lot more Macs on desktops, for example. You'll see CI/CD processes in place as opposed to, “Well, oh, you want something new, it's going to be eight weeks to get a server rack downstairs and accounting is going to have 18 pages of forms for you to fill out.” No, it's “click the button,” or—Chris: Don't forget the six months of just getting the financial CapEx approvals.Corey: Exactly.Chris: You have to go through the finance thing before you even get to start talking to techies about when you get your server. I think Oracle is in an interesting place though because it is embracing the fact that it is number four, and so therefore, it's like we are going to work with AWS, we are going to work with Azure, our database can run in AWS or it can run in our cloud, we can interconnect directly, natively, seamlessly with Azure. If I were building a consumer-based thing and I was moving into one of these markets where one of these governments was demanding something like a sovereign cloud, Oracle is a great place to go and throw—okay, all of our front-end consumer whatever is all going to sit in AWS because that's what we do for all other countries. For this one country, we're just going to go and build this thing in Oracle and we're going to leverage Oracle Alloy or whatever, and now suddenly, okay, their data is in their country and it's subject to their laws but I don't have to re-architect to go into one of these, you know, little countries with tin horn dictators.Corey: It's the way to do multi-cloud right, from my perspective. I'll use a component service in a different cloud, I'm under no illusions, though, in doing that I'm increasing my resiliency. I'm not removing single points of failure; I'm adding them. And I make that trade-off on a case-by-case basis, knowingly. But there is a case for some workloads—probably not yours if you're listening to this; assume not, but when you have more context, maybe so—where, okay, we need to be across multiple providers for a variety of strategic or contextual reasons for this workload.That does not mean everything you build needs to be able to do that. It means you're going to make trade-offs for that workload, and understanding the boundaries of where that starts and where that stops is going to be important. That is not the worst idea in the world for a given appropriate workload, that you can optimize stuff into a container and then can run, more or less, anywhere that can take a container. But that is also not the majority of most people's workloads.Chris: Yeah. And I think what that comes back to from the security practitioner standpoint is you have to support not just your primary cloud, your favorite cloud, the one you know, you have to support any cloud. And whether that's, you know, hey, congratulations. Your developers want to use Tailscale because it bypasses a ton of complexity in getting these remote island VPCs from this recent acquisition integrated into your network or because you're going into a new market and you have to support Oracle Cloud in Saudi Arabia, then you as a practitioner have to kind of support any cloud.And so, one of the reasons that I've joined and I'm working on, and so excited about Steampipe is it kind of does give you that. It is a uniform interface to not just AWS, Azure, and Google, but all sorts of clouds, whether it's GitHub or Oracle, or Tailscale. So, that's kind of the message I have for security practitioners at this point is, I tried, I fought, I screamed and yelled and ranted on Twitter, against, you know, doing multi-cloud, but at the end of the day, we were still multi-cloud.Corey: When I see these things evolving, is that, yeah, as a practitioner, we're increasingly having to work across multiple providers, but not to a stupendous depth that's the intimidating thing that scares the hell out of people. I still remember my first time with the AWS console, being so overwhelmed with a number of services, and there were 12. Now, there are hundreds, and I still feel that same sense of being overwhelmed, but I also have the context now to realize that over half of all customer spend globally is on EC2. That's one service. Yes, you need, like, five more to get it to work, but okay.And once you go through learning that to get started, and there's a lot of moving parts around it, like, “Oh, God, I have to do this for every service?” No, take Route 53—my favorite database, but most people use it as a DNS service—you can go start to finish on basically everything that service does that a human being is going to use in less than four hours, and then you're more or less ready to go. Everything is not the hairy beast that is EC2. And most of those services are not for you, whoever you are, whatever you do, most AWS services are not for you. Full stop.Chris: Yes and no. I mean, as a security practitioner, you need to know what your developers are doing, and I've worked in large organizations with lots of things and I would joke that, oh, yeah, I'm sure we're using every service but the IoT, and then I go and I look at our bill, and I was like, “Oh, why are we dropping that much on IoT?” Oh, because they wanted to use the Managed MQTT service.Corey: Ah, I start with the bill because the bill is the source of truth.Chris: Yes, they wanted to use the Managed MQTT service. Okay, great. So, we're now in IoT. But how many of those things have resource policies, how many of those things can be made public, and how many of those things are your CSPM actually checking for and telling you that, hey, a developer has gone out somewhere and made this SageMaker notebook public, or this MQTT topic public. And so, that's where you know, you need to have that level of depth and then you've got to have that level of depth in each cloud. To some extent, if the cloud is just the core basic VMs, object storage, maybe some networking, and a managed relational database, super simple to understand what all you need to do to build a baseline to secure that. As soon as you start adding in on all of the fancy services that AWS has. I re—Corey: Yeah, migrating your Step Functions workflow to other cloud is going to be a living goddamn nightmare. Migrating something that you stuffed into a container and run on EC2 or Fargate is probably going to be a lot simpler. But there are always nuances.Chris: Yep. But the security profile of a Step Function is significantly different. So, you know, there's not much you can do there wrong, yet.Corey: You say that now, but wait for their next security breach, and then we start calling them Stumble Functions instead.Chris: Yeah. I say that. And the next thing, you know, we're going to have something like Lambda [unintelligible 00:30:31] show up and I'm just going to be able to put my Step Function on the internet unauthenticated. Because, you know, that's what Amazon does: they innovate, but they don't necessarily warn security practitioners ahead of their innovation that, hey, you're we're about to release this thing. You might want to prepare for it and adjust your baselines, or talk to your developers, or here's a service control policy that you can drop in place to, you know, like, suppress it for a little bit. No, it's like, “Hey, these things are there,” and by the time you see the tweets or read the documentation, you've got some developer who's put it in production somewhere. And then it becomes a lot more difficult for you as a security practitioner to put the brakes on it.Corey: I really want to thank you for spending so much time talking to me. If people want to learn more and follow your exploits—as they should—where can they find you?Chris: They can find me at steampipe.io/blog. That is where all of my latest rants, raves, research, and how-tos show up.Corey: And we will, of course, put a link to that in the [show notes 00:31:37]. Thank you so much for being so generous with your time. I appreciate it.Chris: Perfect, thank you. You have a good one.Corey: Chris Farris, cloud security nerd at Turbot. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, and be sure to mention exactly which Azure communications team you work on.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About KelseyKelsey Hightower is the Principal Developer Advocate at Google, the co-chair of KubeCon, the world's premier Kubernetes conference, and an open source enthusiast. He's also the co-author of Kubernetes Up & Running: Dive into the Future of Infrastructure.Links: Twitter: @kelseyhightower Company site: Google.com Book: Kubernetes Up & Running: Dive into the Future of Infrastructure TranscriptAnnouncer: Hello and welcome to Screaming in the Cloud, with your host Cloud economist Corey Quinn. This weekly show features conversations with people doing interesting work in the world of Cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I'm joined this week by Kelsey Hightower, who claims to be a principal developer advocate at Google, but based upon various keynotes I've seen him in, he basically gets on stage and plays video games like Tetris in front of large audiences. So I assume he is somehow involved with e-sports. Kelsey, welcome to the show.Kelsey: You've outed me. Most people didn't know that I am a full-time e-sports Tetris champion at home. And the technology thing is just a side gig.Corey: Exactly. It's one of those things you do just to keep the lights on, like you're waiting to get discovered, but in the meantime, you're waiting table. Same type of thing. Some people wait tables you more or less a sling Kubernetes, for lack of a better term.Kelsey: Yes.Corey: So let's dive right into this. You've been a strong proponent for a long time of Kubernetes and all of its intricacies and all the power that it unlocks and I've been pretty much the exact opposite of that, as far as saying it tends to be over complicated, that it's hype-driven and a whole bunch of other, shall we say criticisms that are sometimes bounded in reality and sometimes just because I think it'll be funny when I put them on Twitter. Where do you stand on the state of Kubernetes in 2020?Kelsey: So, I want to make sure it's clear what I do. Because when I started talking about Kubernetes, I was not working at Google. I was actually working at CoreOS where we had a competitor Kubernetes called Fleet. And Kubernetes coming out kind of put this like fork in our roadmap, like where do we go from here? What people saw me doing with Kubernetes was basically learning in public. Like I was really excited about the technology because it's attempting to solve a very complex thing. I think most people will agree building a distributed system is what cloud providers typically do, right? With VMs and hypervisors. Those are very big, complex distributed systems. And before Kubernetes came out, the closest I'd gotten to a distributed system before working at CoreOS was just reading the various white papers on the subject and hearing stories about how Google has systems like Borg tools, like Mesa was being used by some of the largest hyperscalers in the world, but I was never going to have the chance to ever touch one of those unless I would go work at one of those companies.So when Kubernetes came out and the fact that it was open source and I could read the code to understand how it was implemented, to understand how schedulers actually work and then bonus points for being able to contribute to it. Those early years, what you saw me doing was just being so excited about systems that I attended to build on my own, becoming this new thing just like Linux came up. So I kind of agree with you that a lot of people look at it as a more of a hype thing. They're looking at it regardless of their own needs, regardless of understanding how it works and what problems is trying to solve that. My stance on it, it's a really, really cool tool for the level that it operates in, and in order for it to be successful, people can't know that it's there.Corey: And I think that might be where part of my disconnect from Kubernetes comes into play. I have a background in ops, more or less, the grumpy Unix sysadmin because it's not like there's a second kind of Unix sysadmin you're ever going to encounter. Where everything in development works in theory, but in practice things pan out a little differently. I always joke that ops is the difference between theory and practice. In theory, devs can do everything and there's no ops needed. In practice, well it's been a burgeoning career for a while. The challenge with this is Kubernetes at times exposes certain levels of abstraction that, sorry certain levels of detail that generally people would not want to have to think about or deal with, while papering over other things with other layers of abstraction on top of it. That obscure, valuable troubleshooting information from a running something in an operational context. It absolutely is a fascinating piece of technology, but it feels today like it is overly complicated for the use a lot of people are attempting to put it to. Is that a fair criticism from where you sit?Kelsey: So I think the reason why it's a fair criticism is because there are people attempting to run their own Kubernetes cluster, right? So when we think about the cloud, unless you're in OpenStack land, but for the people who look at the cloud and you say, "Wow, this is much easier." There's an API for creating virtual machines and I don't see the distributed state store that's keeping all of that together. I don't see the farm of hypervisors. So we don't necessarily think about the inherent complexity into a system like that, because we just get to use it. So on one end, if you're just a user of a Kubernetes cluster, maybe using something fully managed or you have an ops team that's taking care of everything, your interface of the system becomes this Kubernetes configuration language where you say, "Give me a load balancer, give me three copies of this container running." And if we do it well, then you'd think it's a fairly easy system to deal with because you say, "kubectl, apply," and things seem to start running.Just like in the cloud where you say, "AWS create this VM, or G cloud compute instance, create." You just submit API calls and things happen. I think the fact that Kubernetes is very transparent to most people is, now you can see the complexity, right? Imagine everyone driving with the hood off the car. You'd be looking at a lot of moving things, but we have hoods on cars to hide the complexity and all we expose is the steering wheel and the pedals. That car is super complex but we don't see it. So therefore we don't attribute as complexity to the driving experience.Corey: This to some extent feels it's on the same axis as serverless, with just a different level of abstraction piled onto it. And while I am a large proponent of serverless, I think it's fantastic for a lot of Greenfield projects. The constraints inherent to the model mean that it is almost completely non-tenable for a tremendous number of existing workloads. Some developers like to call it legacy, but when I hear the term legacy I hear, "it makes actual money." So just treating it as, "Oh, it's a science experiment we can throw into a new environment, spend a bunch of time rewriting it for minimal gains," is just not going to happen as companies undergo digital transformations, if you'll pardon the term.Kelsey: Yeah, so I think you're right. So let's take Amazon's Lambda for example, it's a very opinionated high-level platform that assumes you're going to build apps a certain way. And if that's you, look, go for it. Now, one or two levels below that there is this distributed system. Kubernetes decided to play in that space because everyone that's building other platforms needs a place to start. The analogy I like to think of is like in the mobile space, iOS and Android deal with the complexities of managing multiple applications on a mobile device, security aspects, app stores, that kind of thing. And then you as a developer, you build your thing on top of those platforms and APIs and frameworks. Now, it's debatable, someone would say, "Why do we even need an open-source implementation of such a complex system? Why not just everyone moved to the cloud?" And then everyone that's not in a cloud on-premise gets left behind.But typically that's not how open source typically works, right? The reason why we have Linux, the precursor to the cloud is because someone looked at the big proprietary Unix systems and decided to re-implement them in a way that anyone could run those systems. So when you look at Kubernetes, you have to look at it from that lens. It's the ability to democratize these platform layers in a way that other people can innovate on top. That doesn't necessarily mean that everyone needs to start with Kubernetes, just like not everyone needs to start with the Linux server, but it's there for you to build the next thing on top of, if that's the route you want to go.Corey: It's been almost a year now since I made an original tweet about this, that in five years, no one will care about Kubernetes. So now I guess I have four years running on that clock and that attracted a bit of, shall we say controversy. There were people who thought that I meant that it was going to be a flash in the pan and it would dry up and blow away. But my impression of it is that in, well four years now, it will have become more or less system D for the data center, in that there's a bunch of complexity under the hood. It does a bunch of things. No-one sensible wants to spend all their time mucking around with it in most companies. But it's not something that people have to think about in an ongoing basis the way it feels like we do today.Kelsey: Yeah, I mean to me, I kind of see this as the natural evolution, right? It's new, it gets a lot of attention and kind of the assumption you make in that statement is there's something better that should be able to arise, giving that checkpoint. If this is what people think is hot, within five years surely we should see something else that can be deserving of that attention, right? Docker comes out and almost four or five years later you have Kubernetes. So it's obvious that there should be a progression here that steals some of the attention away from Kubernetes, but I think where it's so new, right? It's only five years in, Linux is like over 20 years old now at this point, and it's still top of mind for a lot of people, right? Microsoft is still porting a lot of Windows only things into Linux, so we still discuss the differences between Windows and Linux.The idea that the cloud, for the most part, is driven by Linux virtual machines, that I think the majority of workloads run on virtual machines still to this day, so it's still front and center, especially if you're a system administrator managing BDMs, right? You're dealing with tools that target Linux, you know the Cisco interface and you're thinking about how to secure it and lock it down. Kubernetes is just at the very first part of that life cycle where it's new. We're all interested in even what it is and how it works, and now we're starting to move into that next phase, which is the distro phase. Like in Linux, you had Red Hat, Slackware, Ubuntu, special purpose distros.Some will consider Android a special purpose distribution of Linux for mobile devices. And now that we're in this distro phase, that's going to go on for another 5 to 10 years where people start to align themselves around, maybe it's OpenShift, maybe it's GKE, maybe it's Fargate for EKS. These are now distributions built on top of Kubernetes that start to add a little bit more opinionation about how Kubernetes should be pushed together. And then we'll enter another phase where you'll build a platform on top of Kubernetes, but it won't be worth mentioning that Kubernetes is underneath because people will be more interested on the thing above.Corey: I think we're already seeing that now, in terms of people no longer really care that much what operating system they're running, let alone with distribution of that operating system. The things that you have to care about slip below the surface of awareness and we've seen this for a long time now. Originally to install a web server, it wound up taking a few days and an intimate knowledge of GCC compiler flags, then RPM or D package and then yum on top of that, then ensure installed, once we had configuration management that was halfway decent.Then Docker run, whatever it is. And today feels like it's with serverless technologies being what they are, it's effectively a push a file to S3 or it's equivalent somewhere else and you're done. The things that people have to be aware of and the barrier to entry continually lowers. The downside to that of course, is that things that people specialize in today and effectively make very lucrative careers out of are going to be not front and center in 5 to 10 years the way that they are today. And that's always been the way of technology. It's a treadmill to some extent.Kelsey: And on the flip side of that, look at all of the new jobs that are centered around these cloud-native technologies, right? So you know, we're just going to make up some numbers here, imagine if there were only 10,000 jobs around just Linux system administration. Now when you look at this whole Kubernetes landscape where people are saying we can actually do a better job with metrics and monitoring. Observability is now a thing culturally that people assume you should have, because you're dealing with these distributed systems. The ability to start thinking about multi-regional deployments when I think that would've been infeasible with the previous tools or you'd have to build all those tools yourself. So I think now we're starting to see a lot more opportunities, where instead of 10,000 people, maybe you need 20,000 people because now you have the tools necessary to tackle bigger projects where you didn't see that before.Corey: That's what's going to be really neat to see. But the challenge is always to people who are steeped in existing technologies. What does this mean for them? I mean I spent a lot of time early in my career fighting against cloud because I thought that it was taking away a cornerstone of my identity. I was a large scale Unix administrator, specifically focusing on email. Well, it turns out that there aren't nearly as many companies that need to have that particular skill set in house as it did 10 years ago. And what we're seeing now is this sort of forced evolution of people's skillsets or they hunker down on a particular area of technology or particular application to try and make a bet that they can ride that out until retirement. It's challenging, but at some point it seems that some folks like to stop learning, and I don't fully pretend to understand that. I'm sure I will someday where, "No, at this point technology come far enough. We're just going to stop here, and anything after this is garbage." I hope not, but I can see a world in which that happens.Kelsey: Yeah, and I also think one thing that we don't talk a lot about in the Kubernetes community, is that Kubernetes makes hyper-specialization worth doing because now you start to have a clear separation from concerns. Now the OS can be hyperfocused on security system calls and not necessarily packaging every programming language under the sun into a single distribution. So we can kind of move part of that layer out of the core OS and start to just think about the OS being a security boundary where we try to lock things down. And for some people that play at that layer, they have a lot of work ahead of them in locking down these system calls, improving the idea of containerization, whether that's something like Firecracker or some of the work that you see VMware doing, that's going to be a whole class of hyper-specialization. And the reason why they're going to be able to focus now is because we're starting to move into a world, whether that's serverless or the Kubernetes API.We're saying we should deploy applications that don't target machines. I mean just that step alone is going to allow for so much specialization at the various layers because even on the networking front, which arguably has been a specialization up until this point, can truly specialize because now the IP assignments, how networking fits together, has also abstracted a way one more step where you're not asking for interfaces or binding to a specific port or playing with port mappings. You can now let the platform do that. So I think for some of the people who may be not as interested as moving up the stack, they need to be aware that the number of people we need being hyper-specialized at Linux administration will definitely shrink. And a lot of that work will move up the stack, whether that's Kubernetes or managing a serverless deployment and all the configuration that goes with that. But if you are a Linux, like that is your bread and butter, I think there's going to be an opportunity to go super deep, but you may have to expand into things like security and not just things like configuration management.Corey: Let's call it the unfulfilled promise of Kubernetes. On paper, I love what it hints at being possible. Namely, if I build something that runs well on top of Kubernetes than we truly have a write once, run anywhere type of environment. Stop me if you've heard that one before, 50,000 times in our industry... or history. But in practice, as has happened before, it seems like it tends to fall down for one reason or another. Now, Amazon is famous because for many reasons, but the one that I like to pick on them for is, you can't say the word multi-cloud at their events. Right. That'll change people's perspective, good job. The people tend to see multi-cloud are a couple of different lenses.I've been rather anti multi-cloud from the perspective of the idea that you're setting out day one to build an application with the idea that it can be run on top of any cloud provider, or even on-premises if that's what you want to do, is generally not the way to proceed. You wind up having to make certain trade-offs along the way, you have to rebuild anything that isn't consistent between those providers, and it slows you down. Kubernetes on the other hand hints at if it works and fulfills this promise, you can suddenly abstract an awful lot beyond that and just write generic applications that can run anywhere. Where do you stand on the whole multi-cloud topic?Kelsey: So I think we have to make sure we talk about the different layers that are kind of ready for this thing. So for example, like multi-cloud networking, we just call that networking, right? What's the IP address over there? I can just hit it. So we don't make a big deal about multi-cloud networking. Now there's an area where people say, how do I configure the various cloud providers? And I think the healthy way to think about this is, in your own data centers, right, so we know a lot of people have investments on-premises. Now, if you were to take the mindset that you only need one provider, then you would try to buy everything from HP, right? You would buy HP store's devices, you buy HP racks, power. Maybe HP doesn't sell air conditioners. So you're going to have to buy an air conditioner from a vendor who specializes in making air conditioners, hopefully for a data center and not your house.So now you've entered this world where one vendor does it make every single piece that you need. Now in the data center, we don't say, "Oh, I am multi-vendor in my data center." Typically, you just buy the switches that you need, you buy the power racks that you need, you buy the ethernet cables that you need, and they have common interfaces that allow them to connect together and they typically have different configuration languages and methods for configuring those components. The cloud on the other hand also represents the same kind of opportunity. There are some people who really love DynamoDB and S3, but then they may prefer something like BigQuery to analyze the data that they're uploading into S3. Now, if this was a data center, you would just buy all three of those things and put them in the same rack and call it good.But the cloud presents this other challenge. How do you authenticate to those systems? And then there's usually this additional networking costs, egress or ingress charges that make it prohibitive to say, "I want to use two different products from two different vendors." And I think that's-Corey: ...winds up causing serious problems.Kelsey: Yes, so that data gravity, the associated cost becomes a little bit more in your face. Whereas, in a data center you kind of feel that the cost has already been paid. I already have a network switch with enough bandwidth, I have an extra port on my switch to plug this thing in and they're all standard interfaces. Why not? So I think the multi-cloud gets lost in the chew problem, which is the barrier to entry of leveraging things across two different providers because of networking and configuration practices.Corey: That's often the challenge, I think, that people get bogged down in. On an earlier episode of this show we had Mitchell Hashimoto on, and his entire theory around using Terraform to wind up configuring various bits of infrastructure, was not the idea of workload portability because that feels like the windmill we all keep tilting at and failing to hit. But instead the idea of workflow portability, where different things can wind up being interacted with in the same way. So if this one division is on one cloud provider, the others are on something else, then you at least can have some points of consistency in how you interact with those things. And in the event that you do need to move, you don't have to effectively redo all of your CICD process, all of your tooling, et cetera. And I thought that there was something compelling about that argument.Kelsey: And that's actually what Kubernetes does for a lot of people. For Kubernetes, if you think about it, when we start to talk about workflow consistency, if you want to deploy an application, queue CTL, apply, some config, you want the application to have a load balancer in front of it. Regardless of the cloud provider, because Kubernetes has an extension point we call the cloud provider. And that's where Amazon, Azure, Google Cloud, we do all the heavy lifting of mapping the high-level ingress object that specifies, "I want a load balancer, maybe a few options," to the actual implementation detail. So maybe you don't have to use four or five different tools and that's where that kind of workload portability comes from. Like if you think about Linux, right? It has a set of system calls, for the most part, even if you're using a different distro at this point, Red Hat or Amazon Linux or Google's container optimized Linux.If I build a Go binary on my laptop, I can SCP it to any of those Linux machines and it's going to probably run. So you could call that multi-cloud, but that doesn't make a lot of sense because it's just because of the way Linux works. Kubernetes does something very similar because it sits right on top of Linux, so you get the portability just from the previous example and then you get the other portability and workload, like you just stated, where I'm calling kubectl apply, and I'm using the same workflow to get resources spun up on the various cloud providers. Even if that configuration isn't one-to-one identical.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: One thing I'm curious about is you wind up walking through the world and seeing companies adopting Kubernetes in different ways. How are you finding the adoption of Kubernetes is looking like inside of big E enterprise style companies? I don't have as much insight into those environments as I probably should. That's sort of a focus area for the next year for me. But in startups, it seems that it's either someone goes in and rolls it out and suddenly it's fantastic, or they avoid it entirely and do something serverless. In large enterprises, I see a lot of Kubernetes and a lot of Kubernetes stories coming out of it, but what isn't usually told is, what's the tipping point where they say, "Yeah, let's try this." Or, "Here's the problem we're trying to solve for. Let's chase it."Kelsey: What I see is enterprises buy everything. If you're big enough and you have a big enough IT budget, most enterprises have a POC of everything that's for sale, period. There's some team in some pocket, maybe they came through via acquisition. Maybe they live in a different state. Maybe it's just a new project that came out. And what you tend to see, at least from my experiences, if I walk into a typical enterprise, they may tell me something like, "Hey, we have a POC, a Pivotal Cloud Foundry, OpenShift, and we want some of that new thing that we just saw from you guys. How do we get a POC going?" So there's always this appetite to evaluate what's for sale, right? So, that's one case. There's another case where, when you start to think about an enterprise there's a big range of skillsets. Sometimes I'll go to some companies like, "Oh, my insurance is through that company, and there's ex-Googlers that work there." They used to work on things like Borg, or something else, and they kind of know how these systems work.And they have a slightly better edge at evaluating whether Kubernetes is any good for the problem at hand. And you'll see them bring it in. Now that same company, I could drive over to the other campus, maybe it's five miles away and that team doesn't even know what Kubernetes is. And for them, they're going to be chugging along with what they're currently doing. So then the challenge becomes if Kubernetes is a great fit, how wide of a fit it isn't? How many teams at that company should be using it? So what I'm currently seeing as there are some enterprises that have found a way to make Kubernetes the place where they do a lot of new work, because that makes sense. A lot of enterprises to my surprise though, are actually stepping back and saying, "You know what? We've been stitching together our own platform for the last five years. We had the Netflix stack, we got some Spring Boot, we got Console, we got Vault, we got Docker. And now this whole thing is getting a little more fragile because we're doing all of this glue code."Kubernetes, We've been trying to build our own Kubernetes and now that we know what it is and we know what it isn't, we know that we can probably get rid of this kind of bespoke stack ourselves and just because of the ecosystem, right? If I go to HashiCorp's website, I would probably find the word Kubernetes as much as I find the word Nomad on their site because they've made things like Console and Vault become first-class offerings inside of the world of Kubernetes. So I think it's that momentum that you see across even People Oracle, Juniper, Palo Alto Networks, they're all have seem to have a Kubernetes story. And this is why you start to see the enterprise able to adopt it because it's so much in their face and it's where the ecosystem is going.Corey: It feels like a lot of the excitement and the promise and even the same problems that Kubernetes is aimed at today, could have just as easily been talked about half a decade ago in the context of OpenStack. And for better or worse, OpenStack is nowhere near where it once was. It would felt like it had such promise and such potential and when it didn't pan out, that left a lot of people feeling relatively sad, burnt out, depressed, et cetera. And I'm seeing a lot of parallels today, at least between what was said about OpenStack and what was said about Kubernetes. How do you see those two diverging?Kelsey: I will tell you the big difference that I saw, personally. Just for my personal journey outside of Google, just having that option. And I remember I was working at a company and we were like, "We're going to roll our own OpenStack. We're going to buy a free BSD box and make it a file server. We're going all open sources," like do whatever you want to do. And that was just having so many issues in terms of first-class integrations, education, people with the skills to even do that. And I was like, "You know what, let's just cut the check for VMware." We want virtualization. VMware, for the cost and when it does, it's good enough. Or we can just actually use a cloud provider. That space in many ways was a purely solved problem. Now, let's fast forward to Kubernetes, and also when you get OpenStack finished, you're just back where you started.You got a bunch of VMs and now you've got to go figure out how to build the real platform that people want to use because no one just wants a VM. If you think Kubernetes is low level, just having OpenStack, even OpenStack was perfect. You're still at square one for the most part. Maybe you can just say, "Now I'm paying a little less money for my stack in terms of software licensing costs," but from an extraction and automation and API standpoint, I don't think OpenStack moved the needle in that regard. Now in the Kubernetes world, it's solving a huge gap.Lots of people have virtual machine sprawl than they had Docker sprawl, and when you bring in this thing by Kubernetes, it says, "You know what? Let's reign all of that in. Let's build some first-class abstractions, assuming that the layer below us is a solved problem." You got to remember when Kubernetes came out, it wasn't trying to replace the hypervisor, it assumed it was there. It also assumed that the hypervisor had APIs for creating virtual machines and attaching disc and creating load balancers, so Kubernetes came out as a complementary technology, not one looking to replace. And I think that's why it was able to stick because it solved a problem at another layer where there was not a lot of competition.Corey: I think a more cynical take, at least one of the ones that I've heard articulated and I tend to agree with, was that OpenStack originally seemed super awesome because there were a lot of interesting people behind it, fascinating organizations, but then you wound up looking through the backers of the foundation behind it and the rest. And there were something like 500 companies behind it, an awful lot of them were these giant organizations that ... they were big e-corporate IT enterprise software vendors, and you take a look at that, I'm not going to name anyone because at that point, oh will we get letters.But at that point, you start seeing so many of the patterns being worked into it that it almost feels like it has to collapse under its own weight. I don't, for better or worse, get the sense that Kubernetes is succumbing to the same thing, despite the CNCF having an awful lot of those same backers behind it and as far as I can tell, significantly more money, they seem to have all the money to throw at these sorts of things. So I'm wondering how Kubernetes has managed to effectively sidestep I guess the open-source miasma that OpenStack didn't quite manage to avoid.Kelsey: Kubernetes gained its own identity before the foundation existed. Its purpose, if you think back from the Borg paper almost eight years prior, maybe even 10 years prior. It defined this problem really, really well. I think Mesos came out and also had a slightly different take on this problem. And you could just see at that time there was a real need, you had choices between Docker Swarm, Nomad. It seems like everybody was trying to fill in this gap because, across most verticals or industries, this was a true problem worth solving. What Kubernetes did was played in the exact same sandbox, but it kind of got put out with experience. It's not like, "Oh, let's just copy this thing that already exists, but let's just make it open."And in that case, you don't really have your own identity. It's you versus Amazon, in the case of OpenStack, it's you versus VMware. And that's just really a hard place to be in because you don't have an identity that stands alone. Kubernetes itself had an identity that stood alone. It comes from this experience of running a system like this. It comes from research and white papers. It comes after previous attempts at solving this problem. So we agree that this problem needs to be solved. We know what layer it needs to be solved at. We just didn't get it right yet, so Kubernetes didn't necessarily try to get it right.It tried to start with only the primitives necessary to focus on the problem at hand. Now to your point, the extension interface of Kubernetes is what keeps it small. Years ago I remember plenty of meetings where we all got in rooms and said, "This thing is done." It doesn't need to be a PaaS. It doesn't need to compete with serverless platforms. The core of Kubernetes, like Linux, is largely done. Here's the core objects, and we're going to make a very great extension interface. We're going to make one for the container run time level so that way people can swap that out if they really want to, and we're going to do one that makes other APIs as first-class as ones we have, and we don't need to try to boil the ocean in every Kubernetes release. Everyone else has the ability to deploy extensions just like Linux, and I think that's why we're avoiding some of this tension in the vendor world because you don't have to change the core to get something that feels like a native part of Kubernetes.Corey: What do you think is currently being the most misinterpreted or misunderstood aspect of Kubernetes in the ecosystem?Kelsey: I think the biggest thing that's misunderstood is what Kubernetes actually is. And the thing that made it click for me, especially when I was writing the tutorial Kubernetes The Hard Way. I had to sit down and ask myself, "Where do you start trying to learn what Kubernetes is?" So I start with the database, right? The configuration store isn't Postgres, it isn't MySQL, it's Etcd. Why? Because we're not trying to be this generic data stores platform. We just need to store configuration data. Great. Now, do we let all the components talk to Etcd? No. We have this API server and between the API server and the chosen data store, that's essentially what Kubernetes is. You can stop there. At that point, you have a valid Kubernetes cluster and it can understand a few things. Like I can say, using the Kubernetes command-line tool, create this configuration map that stores configuration data and I can read it back.Great. Now I can't do a lot of things that are interesting with that. Maybe I just use it as a configuration store, but then if I want to build a container platform, I can install the Kubernetes kubelet agent on a bunch of machines and have it talk to the API server looking for other objects you add in the scheduler, all the other components. So what that means is that Kubernetes most important component is its API because that's how the whole system is built. It's actually a very simple system when you think about just those two components in isolation. If you want a container management tool that you need a scheduler, controller, manager, cloud provider integrations, and now you have a container tool. But let's say you want a service mesh platform. Well in a service mesh you have a data plane that can be Nginx or Envoy and that's going to handle routing traffic. And you need a control plane. That's going to be something that takes in configuration and it uses that to configure all the things in a data plane.Well, guess what? Kubernetes is 90% there in terms of a control plane, with just those two components, the API server, and the data store. So now when you want to build control planes, if you start with the Kubernetes API, we call it the API machinery, you're going to be 95% there. And then what do you get? You get a distributed system that can handle kind of failures on the back end, thanks to Etcd. You're going to get our backs or you can have permission on top of your schemas, and there's a built-in framework, we call it custom resource definitions that allows you to articulate a schema and then your own control loops provide meaning to that schema. And once you do those two things, you can build any platform you want. And I think that's one thing that it takes a while for people to understand that part of Kubernetes, that the thing we talk about today, for the most part, is just the first system that we built on top of this.Corey: I think that's a very far-reaching story with implications that I'm not entirely sure I am able to wrap my head around. I hope to see it, I really do. I mean you mentioned about writing Learn Kubernetes the Hard Way and your tutorial, which I'll link to in the show notes. I mean my, of course, sarcastic response to that recently was to register the domain Kubernetes the Easy Way and just re-pointed to Amazon's ECS, which is in no way shape or form Kubernetes and basically has the effect of irritating absolutely everyone as is my typical pattern of behavior on Twitter. But I have been meaning to dive into Kubernetes on a deeper level and the stuff that you've written, not just the online tutorial, both the books have always been my first port of call when it comes to that. The hard part, of course, is there's just never enough hours in the day.Kelsey: And one thing that I think about too is like the web. We have the internet, there's webpages, there's web browsers. Web Browsers talk to web servers over HTTP. There's verbs, there's bodies, there's headers. And if you look at it, that's like a very big complex system. If I were to extract out the protocol pieces, this concept of HTTP verbs, get, put, post and delete, this idea that I can put stuff in a body and I can give it headers to give it other meaning and semantics. If I just take those pieces, I can bill restful API's.Hell, I can even bill graph QL and those are just different systems built on the same API machinery that we call the internet or the web today. But you have to really dig into the details and pull that part out and you can build all kind of other platforms and I think that's what Kubernetes is. It's going to probably take people a little while longer to see that piece, but it's hidden in there and that's that piece that's going to be, like you said, it's going to probably be the foundation for building more control planes. And when people build control planes, I think if you think about it, maybe Fargate for EKS represents another control plane for making a serverless platform that takes to Kubernetes API, even though the implementation isn't what you find on GitHub.Corey: That's the truth. Whenever you see something as broadly adopted as Kubernetes, there's always the question of, "Okay, there's an awful lot of blog posts." Getting started to it, learn it in 10 minutes, I mean at some point, I'm sure there are some people still convince Kubernetes is, in fact, a breakfast cereal based upon what some of the stuff the CNCF has gotten up to. I wouldn't necessarily bet against it socks today, breakfast cereal tomorrow. But it's hard to find a decent level of quality, finding the certain quality bar of a trusted source to get started with is important. Some people believe in the hero's journey, story of a narrative building.I always prefer to go with the morons journey because I'm the moron. I touch technologies, I have no idea what they do and figure it out and go careening into edge and corner cases constantly. And by the end of it I have something that vaguely sort of works and my understanding's improved. But I've gone down so many terrible paths just by picking a bad point to get started. So everyone I've talked to who's actually good at things has pointed to your work in this space as being something that is authoritative and largely correct and given some of these people, that's high praise.Kelsey: Awesome. I'm going to put that on my next performance review as evidence of my success and impact.Corey: Absolutely. Grouchy people say, "It's all right," you know, for the right people that counts. If people want to learn more about what you're up to and see what you have to say, where can they find you?Kelsey: I aggregate most of outward interactions on Twitter, so I'm @KelseyHightower and my DMs are open, so I'm happy to field any questions and I attempt to answer as many as I can.Corey: Excellent. Thank you so much for taking the time to speak with me today. I appreciate it.Kelsey: Awesome. I was happy to be here.Corey: Kelsey Hightower, Principal Developer Advocate at Google. I'm Corey Quinn. This is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on Apple podcasts. If you've hated this podcast, please leave a five-star review on Apple podcasts and then leave a funny comment. Thanks.Announcer: This has been this week's episode of Screaming in the Cloud. You can also find more Core at screaminginthecloud.com or wherever fine snark is sold.Announcer: This has been a HumblePod production. Stay humble.
In this episode of the Virtual Coffee with Ashish edition, we spoke with Justin Garrison (Personal Website) from AWS to talk about what scenarios make sense to choose AWS EKS vs AWS ECS vs AWS Fargate vs bare metal Kubernetes & everything you need to understand for implementing AWS EKS in your environment. --Announcing Cloud Security Villains Project-- We are always looking to find creative ways to educate folks in Cloud Security and the Cloud Security Villains is part of this education pieces. Cloud Security Villains are coming, you can learn how to defeat them in this YouTube Playlist link Episode ShowNotes, Links and Transcript on Cloud Security Podcast: www.cloudsecuritypodcast.tv Host Twitter: Ashish Rajan (@hashishrajan) Guest Twitter: Justin Garrison (Personal Website) Podcast Twitter - @CloudSecPod @CloudSecureNews If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security News - Cloud Security Academy Spotify TimeStamp for Interview Questions (00:00 introduction (02:31 snyk.io/csp (03:10 Justin's path into Tech (08:14) What is AWS EKS? (10:32) EKS vs ECS vs Fargate (14:52) Why pick EKS vs ECS vs Fargate? (23:05) Security Kubernetes API vs on-prem deployment? (34:26) What's involved in deploying EKS? (38:50) EKS clusters when scaling Kubernetes (42:52) How clusters are structured? (47:02) Cluster availability when upgrading (49:00) Why people struggle with EKS? (51:31) How can people learn more about EKS? (52:57) The Fun Section
About RickRick is the Product Leader of the AWS Optimization team. He previously led the cloud optimization product organization at Turbonomic, and previously was the Microsoft Azure Resource Optimization program owner.Links Referenced: AWS: https://console.aws.amazon.com LinkedIn: https://www.linkedin.com/in/rick-ochs-06469833/ Twitter: https://twitter.com/rickyo1138 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. For those of you who've been listening to this show for a while, the theme has probably emerged, and that is that one of the key values of this show is to give the guest a chance to tell their story. It doesn't beat the guests up about how they approach things, it doesn't call them out for being completely wrong on things because honestly, I'm pretty good at choosing guests, and I don't bring people on that are, you know, walking trash fires. And that is certainly not a concern for this episode.But this might devolve into a screaming loud argument, despite my best effort. Today, I'm joined by Rick Ochs, Principal Product Manager at AWS. Rick, thank you for coming back on the show. The last time we spoke, you were not here you were at, I believe it was Turbonomic.Rick: Yeah, that's right. Thanks for having me on the show, Corey. I'm really excited to talk to you about optimization and my current role and what we're doing.Corey: Well, let's start at the beginning. Principal product manager. It sounds like one of those corporate titles that can mean a different thing in every company or every team that you're talking to. What is your area of responsibility? Where do you start and where do you stop?Rick: Awesome. So, I am the product manager lead for all of AWS Optimizations Team. So, I lead the product team. That includes several other product managers that focus in on Compute Optimizer, Cost Explorer, right-sizing recommendations, as well as Reservation and Savings Plan purchase recommendations.Corey: In other words, you are the person who effectively oversees all of the AWS cost optimization tooling and approaches to same?Rick: Yeah.Corey: Give or take. I mean, you could argue that oh, every team winds up focusing on helping customers save money. I could fight that argument just as effectively. But you effectively start and stop with respect to helping customers save money or understand where the money is going on their AWS bill.Rick: I think that's a fair statement. And I also agree with your comment that I think a lot of service teams do think through those use cases and provide capabilities, you know? There's, like, S3 storage lines. You know, there's all sorts of other products that do offer optimization capabilities as well, but as far as, like, the unified purpose of my team, it is, unilaterally focused on how do we help customers safely reduce their spend and not hurt their business at the same time.Corey: Safely being the key word. For those who are unaware of my day job, I am a partial owner of The Duckbill Group, a consultancy where we fix exactly one problem: the horrifying AWS bill. This is all that I've been doing for the last six years, so I have some opinions on AWS bill reduction as well. So, this is going to be a fun episode for the two of us to wind up, mmm, more or less smacking each other around, but politely because we are both professionals. So, let's start with a very high level. How does AWS think about AWS bills from a customer perspective? You talk about optimizing it, but what does that mean to you?Rick: Yeah. So, I mean, there's a lot of ways to think about it, especially depending on who I'm talking to, you know, where they sit in our organization. I would say I think about optimization in four major themes. The first is how do you scale correctly, whether that's right-sizing or architecting things to scale in and out? The second thing I would say is, how do you do pricing and discounting, whether that's Reservation management, Savings Plan Management, coverage, how do you handle the expenditures of prepayments and things like that?Then I would say suspension. What that means is turn the lights off when you leave the room. We have a lot of customers that do this and I think there's a lot of opportunity for more. Turning EC2 instances off when they're not needed if they're non-production workloads or other, sort of, stateful services that charge by the hour, I think there's a lot of opportunity there.And then the last of the four methods is clean up. And I think it's maybe one of the lowest-hanging fruit, but essentially, are you done using this thing? Delete it. And there's a whole opportunity of cleaning up, you know, IP addresses unattached EBS volumes, sort of, these resources that hang around in AWS accounts that sort of getting lost and forgotten as well. So, those are the four kind of major thematic strategies for how to optimize a cloud environment that we think about and spend a lot of time working on.Corey: I feel like there's—or at least the way that I approach these things—that there are a number of different levels you can look at AWS billing constructs on. The way that I tend to structure most of my engagements when I'm working with clients is we come in and, step one: cool. Why do you care about the AWS bill? It's a weird question to ask because most of the engineering folks look at me like I've just grown a second head. Like, “So, why do you care about your AWS bill?” Like, “What? Why do you? You run a company doing this?”It's no, no, no, it's not that I'm being rhetorical and I don't—I'm trying to be clever somehow and pretend that I don't understand all the nuances around this, but why does your business care about lowering the AWS bill? Because very often, the answer is they kind of don't. What they care about from a business perspective is being able to accurately attribute costs for the service or good that they provide, being able to predict what that spend is going to be, and also yes, a sense of being good stewards of the money that has been entrusted to them by via investors, public markets, or the budget allocation process of their companies and make sure that they're not doing foolish things with it. And that makes an awful lot of sense. It is rare at the corporate level that the stated number one concern is make the bills lower.Because at that point, well, easy enough. Let's just turn off everything you're running in production. You'll save a lot of money in your AWS bill. You won't be in business anymore, but you'll be saving a lot of money on the AWS bill. The answer is always deceptively nuanced and complicated.At least, that's how I see it. Let's also be clear that I talk with a relatively narrow subset of the AWS customer totality. The things that I do are very much intentionally things that do not scale. Definitionally, everything that you do has to scale. How do you wind up approaching this in ways that will work for customers spending billions versus independent learners who are paying for this out of their own personal pocket?Rick: It's not easy [laugh], let me just preface that. The team we have is incredible and we spent so much time thinking about scale and the different personas that engage with our products and how they're—what their experience is when they interact with a bill or AWS platform at large. There's also a couple of different personas here, right? We have a persona that focuses in on that cloud cost, the cloud bill, the finance, whether that's—if an organization is created a FinOps organization, if they have a Cloud Center of Excellence, versus an engineering team that maybe has started to go towards decentralized IT and has some accountability for the spend that they attribute to their AWS bill. And so, these different personas interact with us in really different ways, where Cost Explorer downloading the CUR and taking a look at the bill.And one thing that I always kind of imagine is somebody putting a headlamp on and going into the caves in the depths of their AWS bill and kind of like spelunking through their bill sometimes, right? And so, you have these FinOps folks and billing people that are deeply interested in making sure that the spend they do have meets their business goals, meaning this is providing high value to our company, it's providing high value to our customers, and we're spending on the right things, we're spending the right amount on the right things. Versus the engineering organization that's like, “Hey, how do we configure these resources? What types of instances should we be focused on using? What services should we be building on top of that maybe are more flexible for our business needs?”And so, there's really, like, two major personas that I spend a lot of time—our organization spends a lot of time wrapping our heads around. Because they're really different, very different approaches to how we think about cost. Because you're right, if you just wanted to lower your AWS bill, it's really easy. Just size everything to a t2.nano and you're done and move on [laugh], right? But you're [crosstalk 00:08:53]—Corey: Aw, t3 or t4.nano, depending upon whether regional availability is going to save you less. I'm still better at this. Let's not kid ourselves I kid. Mostly.Rick: For sure. So t4.nano, absolutely.Corey: T4g. Remember, now the way forward is everything has an explicit letter designator to define which processor company made the CPU that underpins the instance itself because that's a level of abstraction we certainly wouldn't want the cloud provider to take away from us any.Rick: Absolutely. And actually, the performance differences of those different processor models can be pretty incredible [laugh]. So, there's huge decisions behind all of that as well.Corey: Oh, yeah. There's so many factors that factor in all these things. It's gotten to a point of you see this usually with lawyers and very senior engineers, but the answer to almost everything is, “It depends.” There are always going to be edge cases. Easy example of, if you check a box and enable an S3 Gateway endpoint inside of a private subnet, suddenly, you're not passing traffic through a 4.5 cent per gigabyte managed NAT Gateway; it's being sent over that endpoint for no additional cost whatsoever.Check the box, save a bunch of money. But there are scenarios where you don't want to do it, so always double-checking and talking to customers about this is critically important. Just because, the first time you make a recommendation that does not work for their constraints, you lose trust. And make a few of those and it looks like you're more or less just making naive recommendations that don't add any value, and they learn to ignore you. So, down the road, when you make a really high-value, great recommendation for them, they stop paying attention.Rick: Absolutely. And we have that really high bar for recommendation accuracy, especially with right sizing, that's such a key one. Although I guess Savings Plan purchase recommendations can be critical as well. If a customer over commits on the amount of Savings Plan purchase they need to make, right, that's a really big problem for them.So, recommendation accuracy must be above reproach. Essentially, if a customer takes a recommendation and it breaks an application, they're probably never going to take another right-sizing recommendation again [laugh]. And so, this bar of trust must be exceptionally high. That's also why out of the box, the compute optimizer recommendations can be a little bit mild, they're a little time because the first order of business is do no harm; focus on the performance requirements of the application first because we have to make sure that the reason you build these workloads in AWS is served.Now ideally, we do that without overspending and without overprovisioning the capacity of these workloads, right? And so, for example, like if we make these right-sizing recommendations from Compute Optimizer, we're taking a look at the utilization of CPU, memory, disk, network, throughput, iops, and we're vending these recommendations to customers. And when you take that recommendation, you must still have great application performance for your business to be served, right? It's such a crucial part of how we optimize and run long-term. Because optimization is not a one-time Band-Aid; it's an ongoing behavior, so it's really critical that for that accuracy to be exceptionally high so we can build business process on top of it as well.Corey: Let me ask you this. How do you contextualize what the right approach to optimization is? What is your entire—there are certain tools that you have… by ‘you,' I mean, of course, as an organization—have repeatedly gone back to and different approaches that don't seem to deviate all that much from year to year, and customer to customer. How do you think about the general things that apply universally?Rick: So, we know that EC2 is a very popular service for us. We know that sizing EC2 is difficult. We think about that optimization pillar of scaling. It's an obvious area for us to help customers. We run into this sort of industry-wide experience where whenever somebody picks the size of a resource, they're going to pick one generally larger than they need.It's almost like asking a new employee at your company, “Hey, pick your laptop. We have a 16 gig model or a 32 gig model. Which one do you want?” That person [laugh] making the decision on capacity, hardware capacity, they're always going to pick the 32 gig model laptop, right? And so, we have this sort of human nature in IT of, we don't want to get called at two in the morning for performance issues, we don't want our apps to fall over, we want them to run really well, so we're going to size things very conservatively and we're going to oversize things.So, we can help customers by providing those recommendations to say, you can size things up in a different way using math and analytics based on the utilization patterns, and we can provide and pick different instance types. There's hundreds and hundreds of instance types in all of these regions across the globe. How do you know which is the right one for every single resource you have? It's a very, very hard problem to solve and it's not something that is lucrative to solve one by one if you have 100 EC2 instances. Trying to pick the correct size for each and every one can take hours and hours of IT engineering resources to look at utilization graphs, look at all of these types available, look at what is the performance difference between processor models and providers of those processors, is there application compatibility constraints that I have to consider? The complexity is astronomical.And then not only that, as soon as you make that sizing decision, one week later, it's out of date and you need a different size. So, [laugh] you didn't really solve the problem. So, we have to programmatically use data science and math to say, “Based on these utilization values, these are the sizes that would make sense for your business, that would have the lowest cost and the highest performance together at the same time.” And it's super important that we provide this capability from a technology standpoint because it would cost so much money to try to solve that problem that the savings you would achieve might not be meaningful. Then at the same time… you know, that's really from an engineering perspective, but when we talk to the FinOps and the finance folks, the conversations are more about Reservations and Savings Plans.How do we correctly apply Savings Plans and Reservations across a high percentage of our portfolio to reduce the costs on those workloads, but not so much that dynamic capacity levels in our organization mean we all of a sudden have a bunch of unused Reservations or Savings Plans? And so, a lot of organizations that engage with us and we have conversations with, we start with the Reservation and Savings Plan conversation because it's much easier to click a few buttons and buy a Savings Plan than to go institute an entire right-sizing campaign across multiple engineering teams. That can be very difficult, a much higher bar. So, some companies are ready to dive into the engineering task of sizing; some are not there yet. And they're a little maybe a little earlier in their FinOps journey, or the building optimization technology stacks, or achieving higher value out of their cloud environments, so starting with kind of the low hanging fruit, it can vary depending on the company, size of company, technical aptitudes, skill sets, all sorts of things like that.And so, those finance-focused teams are definitely spending more time looking at and studying what are the best practices for purchasing Savings Plans, covering my environment, getting the most out of my dollar that way. Then they don't have to engage the engineering teams; they can kind of take a nice chunk off the top of their bill and sort of have something to show for that amount of effort. So, there's a lot of different approaches to start in optimization.Corey: My philosophy runs somewhat counter to this because everything you're saying does work globally, it's safe, it's non-threatening, and then also really, on some level, feels like it is an approach that can be driven forward by finance or business. Whereas my worldview is that cost and architecture in cloud are one and the same. And there are architectural consequences of cost decisions and vice versa that can be adjusted and addressed. Like, one of my favorite party tricks—although I admit, it's a weird party—is I can look at the exploded PDF view of a customer's AWS bill and describe their architecture to them. And people have questioned that a few times, and now I have a testimonial on my client website that mentions, “It was weird how he was able to do this.”Yeah, it's real, I can do it. And it's not a skill, I would recommend cultivating for most people. But it does also mean that I think I'm onto something here, where there's always context that needs to be applied. It feels like there's an entire ecosystem of product companies out there trying to build what amount to a better Cost Explorer that also is not free the way that Cost Explorer is. So, the challenge I see there's they all tend to look more or less the same; there is very little differentiation in that space. And in the fullness of time, Cost Explorer does—ideally—get better. How do you think about it?Rick: Absolutely. If you're looking at ways to understand your bill, there's obviously Cost Explorer, the CUR, that's a very common approach is to take the CUR and put a BI front-end on top of it. That's a common experience. A lot of companies that have chops in that space will do that themselves instead of purchasing a third-party product that does do bill breakdown and dissemination. There's also the cross-charge show-back organizational breakdown and boundaries because you have these super large organizations that have fiefdoms.You know, if HR IT and sales IT, and [laugh] you know, product IT, you have all these different IT departments that are fiefdoms within your AWS bill and construct, whether they have different ABS accounts or say different AWS organizations sometimes, right, it can get extremely complicated. And some organizations require the ability to break down their bill based on those organizational boundaries. Maybe tagging works, maybe it doesn't. Maybe they do that by using a third-party product that lets them set custom scopes on their resources based on organizational boundaries. That's a common approach as well.We do also have our first-party solutions, they can do that, like the CUDOS dashboard as well. That's something that's really popular and highly used across our customer base. It allows you to have kind of a dashboard and customizable view of your AWS costs and, kind of, split it up based on tag organizational value, account name, and things like that as well. So, you mentioned that you feel like the architectural and cost problem is the same problem. I really don't disagree with that at all.I think what it comes down to is some organizations are prepared to tackle the architectural elements of cost and some are not. And it really comes down to how does the customer view their bill? Is it somebody in the finance organization looking at the bill? Is it somebody in the engineering organization looking at the bill? Ideally, it would be both.Ideally, you would have some of those skill sets that overlap, or you would have an organization that does focus in on FinOps or cloud operations as it relates to cost. But then at the same time, there are organizations that are like, “Hey, we need to go to cloud. Our CIO told us go to cloud. We don't want to pay the lease renewal on this building.” There's a lot of reasons why customers move to cloud, a lot of great reasons, right? Three major reasons you move to cloud: agility, [crosstalk 00:20:11]—Corey: And several terrible ones.Rick: Yeah, [laugh] and some not-so-great ones, too. So, there's so many different dynamics that get exposed when customers engage with us that they might or might not be ready to engage on the architectural element of how to build hyperscale systems. So, many of these customers are bringing legacy workloads and applications to the cloud, and something like a re-architecture to use stateless resources or something like Spot, that's just not possible for them. So, how can they take 20% off the top of their bill? Savings Plans or Reservations are kind of that easy, low-hanging fruit answer to just say, “We know these are fairly static environments that don't change a whole lot, that are going to exist for some amount of time.”They're legacy, you know, we can't turn them off. It doesn't make sense to rewrite these applications because they just don't change, they don't have high business value, or something like that. And so, the architecture part of that conversation doesn't always come into play. Should it? Yes.The long-term maturity and approach for cloud optimization does absolutely account for architecture, thinking strategically about how you do scaling, what services you're using, are you going down the Kubernetes path, which I know you're going to laugh about, but you know, how do you take these applications and componentize them? What services are you using to do that? How do you get that long-term scale and manageability out of those environments? Like you said at the beginning, the complexity is staggering and there's no one unified answer. That's why there's so many different entrance paths into, “How do I optimize my AWS bill?”There's no one answer, and every customer I talk to has a different comfort level and appetite. And some of them have tried suspension, some of them have gone heavy down Savings Plans, some of them want to dabble in right-sizing. So, every customer is different and we want to provide those capabilities for all of those different customers that have different appetites or comfort levels with each of these approaches.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: And I think that's very fair. I think that it is not necessarily a bad thing that you wind up presenting a lot of these options to customers. But there are some rough edges. An example of this is something I encountered myself somewhat recently and put on Twitter—because I have those kinds of problems—where originally, I remember this, that you were able to buy hourly Savings Plans, which again, Savings Plans are great; no knock there. I would wish that they applied to more services rather than, “Oh, SageMaker is going to do its own Savings Pla”—no, stop keeping me from going from something where I have to manage myself on EC2 to something you manage for me and making that cost money. You nailed it with Fargate. You nailed it with Lambda. Please just have one unified Savings Plan thing. But I digress.But you had a limit, once upon a time, of $1,000 per hour. Now, it's $5,000 per hour, which I believe in a three-year all-up-front means you will cheerfully add $130 million purchase to your shopping cart. And I kept adding a bunch of them and then had a little over a billion dollars a single button click away from being charged to my account. Let me begin with what's up with that?Rick: [laugh]. Thank you for the tweet, by the way, Corey.Corey: Always thrilled to ruin your month, Rick. You know that.Rick: Yeah. Fantastic. We took that tweet—you know, it was tongue in cheek, but also it was a serious opportunity for us to ask a question of what does happen? And it's something we did ask internally and have some fun conversations about. I can tell you that if you clicked purchase, it would have been declined [laugh]. So, you would have not been—Corey: Yeah, American Express would have had a problem with that. But the question is, would you have attempted to charge American Express, or would something internally have gone, “This has a few too many commas for us to wind up presenting it to the card issuer with a straight face?”Rick: [laugh]. Right. So, it wouldn't have gone through and I can tell you that, you know, if your account was on a PO-based configuration, you know, it would have gone to the account team. And it would have gone through our standard process for having a conversation with our customer there. That being said, we are—it's an awesome opportunity for us to examine what is that shopping cart experience.We did increase the limit, you're right. And we increased the limit for a lot of reasons that we sat down and worked through, but at the same time, there's always an opportunity for improvement of our product and experience, we want to make sure that it's really easy and lightweight to use our products, especially purchasing Savings Plans. Savings Plans are already kind of wrought with mental concern and risk of purchasing something so expensive and large that has a big impact on your AWS bill, so we don't really want to add any more friction necessarily the process but we do want to build an awareness and make sure customers understand, “Hey, you're purchasing this. This has a pretty big impact.” And so, we're also looking at other ways we can kind of improve the ability for the Savings Plan shopping cart experience to ensure customers don't put themselves in a position where you have to unwind or make phone calls and say, “Oops.” Right? We [laugh] want to avoid those sorts of situations for our customers. So, we are looking at quite a few additional improvements to that experience as well that I'm really excited about that I really can't share here, but stay tuned.Corey: I am looking forward to it. I will say the counterpoint to that is having worked with customers who do make large eight-figure purchases at once, there's a psychology element that plays into it. Everyone is very scared to click the button on the ‘Buy It Now' thing or the ‘Approve It.' So, what I've often found is at that scale, one, you can reduce what you're buying by half of it, and then see how that treats you and then continue to iterate forward rather than doing it all at once, or reach out to your account team and have them orchestrate the buy. In previous engagements, I had a customer do this religiously and at one point, the concierge team bought the wrong thing in the wrong region, and from my perspective, I would much rather have AWS apologize for that and fix it on their end, than from us having to go with a customer side of, “Oh crap, oh, crap. Please be nice to us.”Not that I doubt you would do it, but that's not the nervous conversation I want to have in quite the same way. It just seems odd to me that someone would want to make that scale of purchase without ever talking to a human. I mean, I get it. I'm as antisocial as they come some days, but for that kind of money, I kind of just want another human being to validate that I'm not making a giant mistake.Rick: We love that. That's such a tremendous opportunity for us to engage and discuss with an organization that's going to make a large commitment, that here's the impact, here's how we can help. How does it align to our strategy? We also do recommend, from a strategic perspective, those more incremental purchases. I think it creates a better experience long-term when you don't have a single Savings Plan that's going to expire on a specific day that all of a sudden increases your entire bill by a significant percentage.So, making staggered monthly purchases makes a lot of sense. And it also works better for incremental growth, right? If your organization is growing 5% month-over-month or year-over-year or something like that, you can purchase those incremental Savings Plans that sort of stack up on top of each other and then you don't have that risk of a cliff one day where one super-large SP expires and boom, you have to scramble and repurchase within minutes because every minute that goes by is an additional expense, right? That's not a great experience. And so that's, really, a large part of why those staggered purchase experiences make a lot of sense.That being said, a lot of companies do their math and their finance in different ways. And single large purchases makes sense to go through their process and their rigor as well. So, we try to support both types of purchasing patterns.Corey: I think that is an underappreciated aspect of cloud cost savings and cloud cost optimization, where it is much more about humans than it is about math. I see this most notably when I'm helping customers negotiate their AWS contracts with AWS, where they are often perspectives such as, “Well, we feel like we really got screwed over last time, so we want to stick it to them and make them give us a bigger percentage discount on something.” And it's like, look, you can do that, but I would much rather, if it were me, go for something that moves the needle on your actual business and empowers you to move faster, more effectively, and lead to an outcome that is a positive for everyone versus the well, we're just going to be difficult in this one point because they were difficult on something last time. But ego is a thing. Human psychology is never going to have an API for it. And again, customers get to decide their own destiny in some cases.Rick: I completely agree. I've actually experienced that. So, this is the third company I've been working at on Cloud optimization. I spent several years at Microsoft running an optimization program. I went to Turbonomic for several years, building out the right-sizing and savings plan reservation purchase capabilities there, and now here at AWS.And through all of these journeys and experiences working with companies to help optimize their cloud spend, I can tell you that the psychological needle—moving the needle is significantly harder than the technology stack of sizing something correctly or deleting something that's unused. We can solve the technology part. We can build great products that identify opportunities to save money. There's still this psychological component of IT, for the last several decades has gone through this maturity curve of if it's not broken, don't touch it. Five-nines, six sigma, all of these methods of IT sort of rationalizing do no harm, don't touch anything, everything must be up.And it even kind of goes back several decades. Back when if you rebooted a physical server, the motherboard capacitors would pop, right? So, there's even this anti—or this stigma against even rebooting servers sometimes. In the cloud really does away with a lot of that stuff because we have live migration and we have all of these, sort of, stateless designs and capabilities, but we still carry along with us this mentality of don't touch it; it might fall over. And we have to really get past that.And that means that the trust, we went back to the trust conversation where we talk about the recommendations must be incredibly accurate. You're risking your job, in some cases; if you are a DevOps engineer, and your commitments on your yearly goals are uptime, latency, response time, load time, these sorts of things, these operational metrics, KPIs that you use, you don't want to take a downsized recommendation. It has a severe risk of harming your job and your bonus.Corey: “These instances are idle. Turn them off.” It's like, yeah, these instances are the backup site, or the DR environment, or—Rick: Exactly.Corey: —something that takes very bursty but occasional traffic. And yeah, I know it costs us some money, but here's the revenue figures for having that thing available. Like, “Oh, yeah. Maybe we should shut up and not make dumb recommendations around things,” is the human response, but computers don't have that context.Rick: Absolutely. And so, the accuracy and trust component has to be the highest bar we meet for any optimization activity or behavior. We have to circumvent or supersede the human aversion, the risk aversion, that IT is built on, right?Corey: Oh, absolutely. And let's be clear, we see this all the time where I'm talking to customers and they have been burned before because we tried to save money and then we took a production outage as a side effect of a change that we made, and now we're not allowed to try to save money anymore. And there's a hidden truth in there, which is auto-scaling is something that a lot of customers talk about, but very few have instrumented true auto-scaling because they interpret is we can scale up to meet demand. Because yeah, if you don't do that you're dropping customers on the floor.Well, what about scaling back down again? And the answer there is like, yeah, that's not really a priority because it's just money. We're not disappointing customers, causing brand reputation, and we're still able to take people's money when that happens. It's only money; we can fix it later. Covid shined a real light on a lot of the stuff just because there are customers that we've spoken to who's—their user traffic dropped off a cliff, infrastructure spend remained constant day over day.And yeah, they believe, genuinely, they were auto-scaling. The most interesting lies are the ones that customers tell themselves, but the bill speaks. So, getting a lot of modernization traction from things like that was really neat to watch. But customers I don't think necessarily intuitively understand most aspects of their bill because it is a multidisciplinary problem. It's engineering, its finance, its accounting—which is not the same thing as finance—and you need all three of those constituencies to be able to communicate effectively using a shared and common language. It feels like we're marriage counseling between engineering and finance, most weeks.Rick: Absolutely, we are. And it's important we get it right, that the data is accurate, that the recommendations we provide are trustworthy. If the finance team gets their hands on the savings potential they see out of right-sizing, takes it to engineering, and then engineering comes back and says, “No, no, no, we can't actually do that. We can't actually size those,” right, we have problems. And they're cultural, they're transformational. Organizations' appetite for these things varies greatly and so it's important that we address that problem from all of those angles. And it's not easy to do.Corey: How big do you find the optimization problem is when you talk to customers? How focused are they on it? I have my answers, but that's the scale of anec-data. I want to hear your actual answer.Rick: Yeah. So, we talk with a lot of customers that are very interested in optimization. And we're very interested in helping them on the journey towards having an optimal estate. There are so many nuances and barriers, most of them psychological like we already talked about.I think there's this opportunity for us to go do better exposing the potential of what an optimal AWS estate would look like from a dollar and savings perspective. And so, I think it's kind of not well understood. I think it's one of the biggest areas or barriers of companies really attacking the optimization problem with more vigor is if they knew that the potential savings they could achieve out of their AWS environment would really align their spend much more closely with the business value they get, I think everybody would go bonkers. And so, I'm really excited about us making progress on exposing that capability or the total savings potential and amount. It's something we're looking into doing in a much more obvious way.And we're really excited about customers doing that on AWS where they know they can trust AWS to get the best value for their cloud spend, that it's a long-term good bet because their resources that they're using on AWS are all focused on giving business value. And that's the whole key. How can we align the dollars to the business value, right? And I think optimization is that connection between those two concepts.Corey: Companies are generally not going to greenlight a project whose sole job is to save money unless there's something very urgent going on. What will happen is as they iterate forward on the next generation of services or a migration of a service from one thing to another, they will make design decisions that benefit those optimizations. There's low-hanging fruit we can find, usually of the form, “Turn that thing off,” or, “Configure this thing slightly differently,” that doesn't take a lot of engineering effort in place. But, on some level, it is not worth the engineering effort it takes to do an optimization project. We've all met those engineers—speaking is one of them myself—who, left to our own devices, will spend two months just knocking a few hundred bucks a month off of our AWS developer environment.We steal more than office supplies. I'm not entirely sure what the business value of doing that is, in most cases. For me, yes, okay, things that work in small environments work very well in large environments, generally speaking, so I learned how to save 80 cents here and that's a few million bucks a month somewhere else. Most folks don't have that benefit happening, so it's a question of meeting them where they are.Rick: Absolutely. And I think the scale component is huge, which you just touched on. When you're talking about a hundred EC2 instances versus a thousand, optimization becomes kind of a different component of how you manage that AWS environment. And while single-decision recommendations to scale an individual server, the dollar amount might be different, the percentages are just about the same when you look at what is it to be sized correctly, what is it to be configured correctly? And so, it really does come down to priority.And so, it's really important to really support all of those companies of all different sizes and industries because they will have different experiences on AWS. And some will have more sensitivity to cost than others, but all of them want to get great business value out of their AWS spend. And so, as long as we're meeting that need and we're supporting our customers to make sure they understand the commitment we have to ensuring that their AWS spend is valuable, it is meaningful, right, they're not spending money on things that are not adding value, that's really important to us.Corey: I do want to have as the last topic of discussion here, how AWS views optimization, where there have been a number of repeated statements where helping customers optimize their cloud spend is extremely important to us. And I'm trying to figure out where that falls on the spectrum from, “It's the thing we say because they make us say it, but no, we're here to milk them like cows,” all the way on over to, “No, no, we passionately believe in this at every level, top to bottom, in every company. We are just bad at it.” So, I'm trying to understand how that winds up being expressed from your lived experience having solved this problem first outside, and then inside.Rick: Yeah. So, it's kind of like part of my personal story. It's the main reason I joined AWS. And, you know, when you go through the interview loops and you talk to the leaders of an organization you're thinking about joining, they always stop at the end of the interview and ask, “Do you have any questions for us?” And I asked that question to pretty much every single person I interviewed with. Like, “What is AWS's appetite for helping customers save money?”Because, like, from a business perspective, it kind of is a little bit wonky, right? But the answers were varied, and all of them were customer-obsessed and passionate. And I got this sense that my personal passion for helping companies have better efficiency of their IT resources was an absolute primary goal of AWS and a big element of Amazon's leadership principle, be customer obsessed. Now, I'm not a spokesperson, so [laugh] we'll see, but we are deeply interested in making sure our customers have a great long-term experience and a high-trust relationship. And so, when I asked these questions in these interviews, the answers were all about, “We have to do the right thing for the customer. It's imperative. It's also in our DNA. It's one of the most important leadership principles we have to be customer-obsessed.”And it is the primary reason why I joined: because of that answer to that question. Because it's so important that we achieve a better efficiency for our IT resources, not just for, like, AWS, but for our planet. If we can reduce consumption patterns and usage across the planet for how we use data centers and all the power that goes into them, we can talk about meaningful reductions of greenhouse gas emissions, the cost and energy needed to run IT business applications, and not only that, but most all new technology that's developed in the world seems to come out of a data center these days, we have a real opportunity to make a material impact to how much resource we use to build and use these things. And I think we owe it to the planet, to humanity, and I think Amazon takes that really seriously. And I'm really excited to be here because of that.Corey: As I recall—and feel free to make sure that this comment never sees the light of day—you asked me before interviewing for the role and then deciding to accept it, what I thought about you working there and whether I would recommend it, whether I wouldn't. And I think my answer was fairly nuanced. And you're working there now and we still are on speaking terms, so people can probably guess what my comments took the shape of, generally speaking. So, I'm going to have to ask now; it's been, what, a year since you joined?Rick: Almost. I think it's been about eight months.Corey: Time during a pandemic is always strange. But I have to ask, did I steer you wrong?Rick: No. Definitely not. I'm very happy to be here. The opportunity to help such a broad range of companies get more value out of technology—and it's not just cost, right, like we talked about. It's actually not about the dollar number going down on a bill. It's about getting more value and moving the needle on how do we efficiently use technology to solve business needs.And that's been my career goal for a really long time, I've been working on optimization for, like, seven or eight, I don't know, maybe even nine years now. And it's like this strange passion for me, this combination of my dad taught me how to be a really good steward of money and a great budget manager, and then my passion for technology. So, it's this really cool combination of, like, childhood life skills that really came together for me to create a career that I'm really passionate about. And this move to AWS has been such a tremendous way to supercharge my ability to scale my personal mission, and really align it to AWS's broader mission of helping companies achieve more with cloud platforms, right?And so, it's been a really nice eight months. It's been wild. Learning AWS culture has been wild. It's a sharp diverging culture from where I've been in the past, but it's also really cool to experience the leadership principles in action. They're not just things we put on a website; they're actually things people talk about every day [laugh]. And so, that journey has been humbling and a great learning opportunity as well.Corey: If people want to learn more, where's the best place to find you?Rick: Oh, yeah. Contact me on LinkedIn or Twitter. My Twitter account is @rickyo1138. Let me know if you get the 1138 reference. That's a fun one.Corey: THX 1138. Who doesn't?Rick: Yeah, there you go. And it's hidden in almost every single George Lucas movie as well. You can contact me on any of those social media platforms and I'd be happy to engage with anybody that's interested in optimization, cloud technology, bill, anything like that. Or even not [laugh]. Even anything else, either.Corey: Thank you so much for being so generous with your time. I really appreciate it.Rick: My pleasure, Corey. It was wonderful talking to you.Corey: Rick Ochs, Principal Product Manager at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment, rightly pointing out that while AWS is great and all, Azure is far more cost-effective for your workloads because, given their lack security, it is trivially easy to just run your workloads in someone else's account.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
AWS Lambda is one of the most famous AWS services these days. If you are just starting with your cloud journey you might be confused about what Lambda actually is, what are the limitations, and when you should be using it or not. In this episode, we provide a beginner-friendly introduction to Lambda and summarise everything there's to know about it: when to use it and when not, differences with containers, the pricing model, limitations, and integrations. By the end of this episode, we will also chime in with some of our opinions and share whether we believe that Lambda is the future of cloud computing or not!
About ChrisChris Short has been a proponent of open source solutions throughout his over two decades in various IT disciplines, including systems, security, networks, DevOps management, and cloud native advocacy across the public and private sectors. He currently works on the Kubernetes team at Amazon Web Services and is an active Kubernetes contributor and Co-chair of OpenGitOps. Chris is a disabled US Air Force veteran living with his wife and son in Greater Metro Detroit. Chris writes about Cloud Native, DevOps, and other topics at ChrisShort.net. He also runs the Cloud Native, DevOps, GitOps, Open Source, industry news, and culture focused newsletter DevOps'ish.Links Referenced: DevOps'ish: https://devopsish.com/ EKS News: https://eks.news/ Containers from the Couch: https://containersfromthecouch.com opengitops.dev: https://opengitops.dev ChrisShort.net: https://chrisshort.net Twitter: https://twitter.com/ChrisShort TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Coming back to us since episode two—it's always nice to go back and see the where are they now type of approach—I am joined by Senior Developer Advocate at AWS Chris Short. Chris, been a few years. How has it been?Chris: Ha. Corey, we have talked outside of the podcast. But it's been good. For those that have been listening, I think when we recorded I wasn't even—like, when was season two, what year was that? [laugh].Corey: Episode two was first pre-pandemic and the rest. I believe—Chris: Oh. So, yeah. I was at Red Hat, maybe, when I—yeah.Corey: Yeah. You were doing Red Hat stuff, back when you got to work on open-source stuff, as opposed to now, where you're not within 1000 miles of that stuff, right?Chris: Actually well, no. So, to be clear, I'm on the EKS team, the Kubernetes team here at AWS. So, when I joined AWS in October, they were like, “Hey, you do open-source stuff. We like that. Do more.” And I was like, “Oh, wait, do more?” And they were like, “Yes, do more.” “Okay.”So, since joining AWS, I've probably done more open-source work than the three years at Red Hat that I did. So, that's kind of—you know, like, it's an interesting point when I talk to people about it because the first couple months are, like—you know, my friends are like, “So, are you liking it? Are you enjoying it? What's going on?” And—Corey: Do they beat you with reeds? Like, all the questions people have about companies? Because—Chris: Right. Like, I get a lot of random questions about Amazon and AWS that I don't know the answer to.Corey: Oh, when I started telling people, I fixed Amazon bills, I had to quickly pivot that to AWS bills because people started asking me, “Well, can you save me money on underpants?” It's I—Chris: Yeah.Corey: How do you—fine. Get the prime credit card. It docks 5% off the bill, so there you go. But other than that, no, I can't.Chris: No.Corey: It's—Chris: Like, I had to call my bank this morning about a transaction that I didn't recognize, and it was from Amazon. And I was like, that's weird. Why would that—Corey: Money just flows one direction, and that's the wrong direction from my employer.Chris: Yeah. Like, what is going on here? It shouldn't have been on that card kind of thing. And I had to explain to the person on the phone that I do work at Amazon but under the Web Services team. And he was like, “Oh, so you're in IT?”And I'm like, “No.” [laugh]. “It's actually this big company. That—it's a cloud company.” And they're like, “Oh, okay, okay. Yeah. The cloud. Got it.” [laugh]. So, it's interesting talking to people about, “I work at Amazon.” “Oh, my son works at Amazon distribution center,” blah, blah, blah. It's like, cool. “I know about that, but very little. I do this.”Corey: Your son works in Amazon distribution center. Is he a robot? Is normally my next question on that? Yeah. That's neither here nor there.So, you and I started talking a while back. We both write newsletters that go to a somewhat similar audience. You write DevOps'ish. I write Last Week in AWS. And recently, you also have started EKS News because, yeah, the one thing I look at when I'm doing these newsletters every week is, you know what I want to do? That's right. Write more newsletters.Chris: [laugh].Corey: So, you are just a glutton for punishment? And, yeah, welcome to the addiction, I suppose. How's it been going for you?Chris: It's actually been pretty interesting, right? Like, we haven't pushed it very hard. We're now starting to include it in things. Like we did Container Day; we made sure that EKS news was on the landing page for Container Day at KubeCon EU. And you know, it's kind of just grown organically since then.But it was one of those things where it's like, internally—this happened at Red Hat, right—when I started live streaming at Red Hat, the ultimate goal was to do our product management—like, here's what's new in the next version thing—do those live so anybody can see that at any point in time anywhere on Earth, the second it's available. Similar situation to here. This newsletter actually is generated as part of a report my boss puts together to brief our other DAs—or developer advocates—you know, our solutions architects, the whole nine yards about new EKS features. So, I was like, why can't we just flip that into a weekly newsletter, you know? Like, I can pull from the same sources you can.And what's interesting is, he only does the meeting bi-weekly. So, there's some weeks where it's just all me doing it and he ends up just kind of copying and pasting the newsletter into his document, [laugh] and then adds on for the week. But that report meeting for that team is now getting disseminated to essentially anyone that subscribes to eks.news. Just go to the site, there's a subscribe thing right there. And we've gotten 20 issues in and it's gotten rave reviews, right?Corey: I have been a subscriber for a while. I will say that it has less Chris Short personality—Chris: Mm-hm.Corey: —to it than DevOps'ish does, which I have to assume is by design. A lot of The Duckbill Group's marketing these days is no longer in my voice, rather intentionally, because it turns out that being a sarcastic jackass and doing half-billion dollar AWS contracts can not to be the most congruent thing in the world. So okay, we're slowly ameliorating that. It's professional voice versus snarky voice.Chris: Well, and here's the thing, right? Like, I realized this year with DevOps'ish that, like, if I want to take a week off, I have to do, like, what you did when your child was born. You hired folks to like, do the newsletter for you, or I actually don't do the newsletter, right? It's binary: hire someone else to do it, or don't do it. So, the way I structured this newsletter was that any developer advocate on my team could jump in and take over the newsletter so that, you know, if I'm off that week, or whatever may be happening, I, Chris Short, am not the voice. It is now the entire developer advocate team.Corey: I will challenge you on that a bit. Because it's not Chris Short voice, that's for sure, but it's also not official AWS brand voice either.Chris: No.Corey: It is clearly written by a human being who is used to communicating with the audience for whom it is written. And that is no small thing. Normally, when oh, there's a corporate newsletter; that's just a lot of words to say it's bad. This one is good. I want to be very clear on that.Chris: Yeah, I mean, we have just, like, DevOps'ish, we have sections, just like your newsletter, there's certain sections, so any new, what's new announcements, those go in automatically. So, like, that can get delivered to your inbox every Friday. Same thing with new blog posts about anything containers related to EKS, those will be in there, then Containers from the Couch, our streaming platform, essentially, for all things Kubernetes. Those videos go in.And then there's some ecosystem news as well that I collect and put in the newsletter to give people a broader sense of what's going on out there in Kubernetes-land because let's face it, there's upstream and then there's downstream, and sometimes those aren't in sync, and that's normal. That's how Kubernetes kind of works sometimes. If you're running upstream Kubernetes, you are awesome. I appreciate you, but I feel like that would cause more problems and it's worse sometimes.Corey: Thank you for being the trailblazers. The rest of us can learn from your misfortune.Chris: [laugh]. Yeah, exactly. Right? Like, please file your bugs accordingly. [laugh].Corey: EKS is interesting to me because I don't see a lot of it, which is, probably, going to get a whole lot of, “Wait, what?” Moments because wait, don't you deal with very large AWS bills? And I do. But what I mean by that is that EKS, until you're using its Fargate expression, charges for the control plane, which rounds to no money, and the rest is running on EC2 instances running in a company's account. From the billing perspective, there is no difference between, “We're running massive fleets of EKS nodes.” And, “We're managing a whole bunch of EC2 instances by hand.”And that feels like an interesting allegory for how Kubernetes winds up expressing itself to cloud providers. Because from a billing perspective, it just looks like one big single-tenant application that has some really strange behaviors internally. It gets very chatty across AZs when there's no reason to, and whatnot. And it becomes a very interesting study in how to expose aspects of what's going on inside of those containers and inside of the Kubernetes environment to the cloud provider in a way that becomes actionable. There are no good answers for this yet, but it's something I've been seeing a lot of. Like, “Oh, I thought you'd be running Kubernetes. Oh, wait, you are and I just keep forgetting what I'm looking at sometimes.”Chris: So, that's an interesting point. The billing is kind of like, yeah, it's just compute, right? So—Corey: And my insight into AWS and the way I start thinking about it is always from a billing perspective. That's great. It's because that means the more expensive the services, the more I know about it. It's like, “IAM. What is that?” Like, “Oh, I have no idea. It's free. How important could it be?” Professional advice: do not take that philosophy, ever.Chris: [laugh]. No. Ever. No.Corey: Security: it matters. Oh, my God. It's like you're all stars. Your IAM policy should not be. I digress.Chris: Right. Yeah. Anyways, so two points I want to make real quick on that is, one, we've recently released an open-source project called Carpenter, which is really cool in my purview because it looks at your Kubernetes file and says, “Oh, you want this to run on ARM instance.” And you can even go so far as to say, right, here's my limits, and it'll find an instance that fits those limits and add that to your cluster automatically. Run your pod on that compute as long as it needs to run and then if it's done, it'll downsize—eventually, kind of thing—your cluster.So, you can basically just throw a bunch of workloads at it, and it'll auto-detect what kind of compute you will need and then provision it for you, run it, and then be done. So, that is one-way folks are probably starting to save money running EKS is to adopt Carpenter as your autoscaler as opposed to the inbuilt Kubernetes autoscaler. Because this is instance-aware, essentially, so it can say, like, “Oh, your massive ARM application can run here,” because you know, thank you, Graviton. We have those processors in-house. And you know, you can run your ARM64 instances, you can run all the Intel workloads you want, and it'll right size the compute for your workloads.And I'll look at one container or all your containers, however you want to configure it. Secondly, the good folks over at Kubecost have opencost, which is the open-source version of Kubecost, basically. So, they have a service that you can run in your clusters that will help you say, “Hey, maybe this one notes too heavy; maybe this one notes too light,” and you know, give you some insights into Kubernetes spend that are a little bit more granular as far as usage and things like that go. So, those two projects right there, I feel like, will give folks an optimal savings experience when it comes to Kubernetes. But to your point, it's just compute, right? And that's really how we treat it, kind of, here internally is that it's a way to run… compute, Kubernetes, or ECS, or any of those tools.Corey: A fairly expensive one because ignoring entirely for a second the actual raw cost of compute, you also have the other side of it, which is in every environment, unless you are doing something very strange or pre-funding as a one-person startup in your spare time, your payroll costs will it—should—exceed your AWS bill by a fairly healthy amount. And engineering time is always more expensive than services time. So, for example, looking at EKS, I would absolutely recommend people use that rather than rolling their own because—Chris: Rolling their own? Yeah.Corey: —get out of that engineering space where your time is free. I assure you from a business context, it is not. So, there's always that question of what you can do to make things easier for people and do more of the heavy lifting.Chris: Yeah, and to your rather cheeky point that there's 17 ways to run a container on AWS, it is answering that question, right? Like those 17 ways, like, how much of this do you want to run yourself, you could run EKS distro on EC2 instances if you want full control over your environment.Corey: And then run IoT Greengrass core on top within that cluster—Chris: Right.Corey: So, I can run my own Lambda function runtime, so I'm not locked in. Also, DynamoDB local so I'm not locked into AWS. At which point I have gone so far around the bend, no one can help me.Chris: Well—Corey: Pro tip, don't do that. Just don't do that.Chris: But to your point, we have all these options for compute, and specifically containers because there's a lot of people that want to granularly say, “This is where my engineering team gets involved. Everything else you handle.” If I want EKS on Spot Instances only, you can do that. If you want EKS to use Carpenter and say only run ARM workloads, you can do that. If you want to say Fargate and not have anything to manage other than the container file, you can do that.It's how much does your team want to manage? That's the customer obsession part of AWS coming through when it comes to containers is because there's so many different ways to run those workloads, but there's so many different ways to make sure that your team is right-sized, based off the services you're using.Corey: I do want to change gears a bit here because you are mostly known for a couple of things: the DevOps'ish newsletter because that is the oldest and longest thing you've been doing the time that I've known you; EKS, obviously. But when prepping for this show, I discovered you are now co-chair of the OpenGitOps project.Chris: Yes.Corey: So, I have heard of GitOps in the context of, “Oh, it's just basically your CI/CD stuff is triggered by Git events and whatnot.” And I'm sitting here going, “Okay, so from where you're sitting, the two best user interfaces in the world that you have discovered are YAML and Git.” And I just have to start with the question, “Who hurt you?”Chris: [laugh]. Yeah, I share your sentiment when it comes to Git. Not so much with YAML, but I think it's because I'm so used to it. Maybe it's Stockholm Syndrome, maybe the whole YAML thing. I don't know.Corey: Well, it's no XML. We'll put it that way.Chris: Thankfully, yes because if it was, I would have way more, like, just template files laying around to build things. But the—Corey: And rage. Don't forget rage.Chris: And rage, yeah. So, GitOps is a little bit more than just Git in IaC—infrastructure as Code. It's more like Justin Garrison, who's also on my team, he calls it infrastructure software because there's four main principles to GitOps, and if you go to opengitops.dev, you can see them. It's version one.So, we put them on the website, right there on the page. You have to have a declared state and that state has to live somewhere. Now, it's called GitOps because Git is probably the most full-featured thing to put your state in, but you could use an S3 bucket and just version it, for example. And make it private so no one else can get to it.Corey: Or you could use local files: copy-of-copy-of-this-thing-restored-parentheses-use-this-one-dot-final-dot-doc-dot-zip. You know, my preferred naming convention.Chris: Ah, yeah. Wow. Okay. [laugh]. Yeah.Corey: Everything I touch is terrifying.Chris: Yes. Geez, I'm sorry. So first, it's declarative. You declare your state. You store it somewhere. It's versioned and immutable, like I said. And then pulled automatically—don't focus so much on pull—but basically, software agents are applying the desired state from source. So, what does that mean? When it's—you know, the fourth principle is implemented, continuously reconciled. That means those software agents that are checking your desired state are actually putting it back into the desired state if it's out of whack, right? So—Corey: You're talking about agents running it persistently on instances, validating—Chris: Yes.Corey: —a checkpoint on a cron. How is this meaningfully different than a Puppet agent running in years past? Having spent I learned to speak publicly by being a traveling trainer for Puppet; same type of model, and in fact, when I was at Pinterest, we wound up having a fair bit—like, that was their entire model, where they would have—the Puppet's code would live in an S3 bucket that was then copied down, I believe, via Git, and then applied to the instance on a schedule. Like, that sounds like this was sort of a early days GitOps.Chris: Yeah, exactly. Right? Like so it's, I like to think of that as a component of GitOps, right? DevOps, when you talk about DevOps in general, there's a lot of stuff out there. There's a lot of things labeled DevOps that maybe are, or maybe aren't sticking to some of those DevOps core things that make you great.Like the stuff that Nicole Forsgren writes about in books, you know? Accelerate is on my desk for a reason because there's things that good, well-managed DevOps practices do. I see GitOps as an actual implementation of DevOps in an open-source manner because all the tooling for GitOps these days is open-source and it all started as open-source. Now, you can get, like, Flux or Argo—Argo, specifically—there's managed services out there for it, you can have Flux and not maintain it, through an add-on, on EKS for example, and it will reconcile that state for you automatically. And the other thing I like to say about GitOps, specifically, is that it moves at the speed of the Kubernetes Audit Log.If you've ever looked at a Kubernetes audit log, you know it's rather noisy with all these groups and versions and kinds getting thrown out there. So, GitOps will say, “Oh, there's an event for said thing that I'm supposed to be watching. Do I need to change anything? Yes or no? Yes? Okay, go.”And the change gets applied, or, “Hey, there's a new Git thing. Pull it in. A change has happened inGit I need to update it.” You can set it to reconcile on events on time. It's like a cron or it's like an event-driven architecture, but it's combined.Corey: How does it survive the stake through the heart of configuration management? Because before I was doing all this, I wasn't even a T-shaped engineer: you're broad across a bunch of things, but deep in one or two areas, and one of mine was configuration management. I wrote part of SaltStack, once upon a time—Chris: Oh.Corey: —due to a bunch of very strange coincidences all hitting it once, like, I taught people how to use Puppet. But containers ultimately arose and the idea of immutable infrastructure became a thing. And these days when we were doing full-on serverless, well, great, I just wind up deploying a new code bundle to the Lambdas function that I wind up caring about, and that is a immutable version replacement. There is no drift because there is no way to log in and change those things other than through a clear deployment of this as the new version that goes out there. Where does GitOps fit into that imagined pattern?Chris: So, configuration management becomes part of your approval process, right? So, you now are generating an audit log, essentially, of all changes to your system through the approval process that you set up as part of your, how you get things into source and then promote that out to production. That's kind of the beauty of it, right? Like, that's why we suggest using Git because it has functions, like, requests and issues and things like that you can say, “Hey, yes, I approve this,” or, “Hey, no, I don't approve that. We need changes.” So, that's kind of natively happening with Git and, you know, GitLab, GitHub, whatever implementation of Git. There's always, kind of—Corey: Uh, JIF-ub is, I believe, the pronunciation.Chris: JIF-ub? Oh.Corey: Yeah. That's what I'm—Chris: Today, I learned. Okay.Corey: Exactly. And that's one of the things that I do for my lasttweetinaws.com Twitter client that I build—because I needed it, and if other people want to use it, that's great—that is now deployed to 20 different AWS commercial regions, simultaneously. And that is done via—because it turns out that that's a very long to execute for loop if you start down that path—Chris: Well, yeah.Corey: I wound up building out a GitHub Actions matrix—sorry a JIF-ub—actions matrix job that winds up instantiating 20 parallel builds of the CDK deploy that goes out to each region as expected. And because that gets really expensive with native GitHub Actions runners for, like, 36 cents per deploy, and I don't know how to test my own code, so every time I have a typo, that's another quarter in the jar. Cool, but that was annoying for me so I built my own custom runner system that uses Lambda functions as runners running containers pulled from ECR that, oh, it just runs in parallel, less than three minutes. Every time I commit something between I press the push button and it is out and running in the wild across all regions. Which is awesome and also terrifying because, as previously mentioned, I don't know how to test my code.Chris: Yeah. So, you don't know what you're deploying to 20 regions sometime, right?Corey: But it also means I have a pristine, re-composable build environment because I can—Chris: Right.Corey: Just automatically have that go out and the fact that I am making a—either merging a pull request or doing a direct push because I consider main to be my feature branch as whenever something hits that, all the automation kicks off. That was something that I found to be transformative as far as a way of thinking about this because I was very tired of having to tweak my local laptop environment to, “Oh, you didn't assume the proper role and everything failed again and you broke it. Good job.” It wound up being something where I could start developing on more and more disparate platforms. And it finally is what got me away from my old development model of everything I build is on an EC2 instance, and that means that my editor of choice was Vim. I use the VS Code now for these things, and I'm pretty happy with it.Chris: Yeah. So, you know, I'm glad you brought up CDK. CDK gives you a lot of the capabilities to implement GitOps in a way that you could say, like, “Hey, use CDK to declare I need four Amazon EKS clusters with this size, shape, and configuration. Go.” Or even further, connect to these EKS clusters to RDS instances and load balancers and everything else.But you put that state into Git and then you have something that deploys that automatically upon changes. That is infrastructure as code. Now, when you say, “Okay, main is your feature branch,” you know, things happen on main, if this were running in Kubernetes across a fleet of clusters or the globe-wide in 20 regions, something like Flux or Argo would kick in and say, “There's been a change to source, main, and we need to roll this out.” And it'll start applying those changes. Now, what do you get with GitOps that you don't get with your configuration?I mean, can you rollback if you ever have, like, a bad commit that's just awful? I mean, that's really part of the process with GitOps is to make sure that you can, A, roll back to the previous good state, B, roll forward to a known good state, or C, promote that state up through various environments. And then having that all done declaratively, automatically, and immutably, and versioned with an audit log, that I think is the real power of GitOps in the sense that, like, oh, so-and-so approve this change to security policy XYZ on this date at this time. And that to an auditor, you just hand them a log file on, like, “Here's everything we've ever done to our system. Done.” Right?Like, you could get to that state, if you want to, which I think is kind of the idea of DevOps, which says, “Take all these disparate tools and processes and procedures and culture changes”—culture being the hardest part to adopt in DevOps; GitOps kind of forces a culture change where, like, you can't do a CAB with GitOps. Like, those two things don't fly. You don't have a configuration management database unless you absolutely—Corey: Oh, you CAB now but they're all the comments of the pull request.Chris: Right. Exactly. Like, don't push this change out until Thursday after this other thing has happened, kind of thing. Yeah, like, that all happens in GitHub. But it's very democratizing in the sense that people don't have to waste time in an hour-long meeting to get their five minutes in, right?Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: So, would it be overwhelmingly cynical to suggest that GitOps is the means to implement what we've all been pretending to have implemented for the last decade when giving talks at conferences?Chris: Ehh, I wouldn't go that far. I would say that GitOps is an excellent way to implement the things you've been talking about at all these conferences for all these years. But keep in mind, the technology has changed a lot in the, what 11, 12 years of the existence of DevOps, now. I mean, we've gone from, let's try to manage whole servers immutably to, “Oh, now we just need to maintain an orchestration platform and run containers.” That whole compute interface, you go from SSH to a Docker file, that's a big leap, right?Like, you don't have bespoke sysadmins; you have, like, a platform team. You don't have DevOps engineers; they're part of that platform team, or DevOps teams, right? Like, which was kind of antithetical to the whole idea of DevOps to have a DevOps team. You know, everybody's kind of in the same boat now, where we see skill sets kind of changing. And GitOps and Kubernetes-land is, like, a platform team that manages the cluster, and its state, and health and, you know, production essentially.And then you have your developers deploying what they want to deploy in when whatever namespace they've been given access to and whatever rights they have. So, now you have the potential for one set of people—the platform team—to use one set of GitOps tooling, and your applications teams might not like that, and that's fine. They can have their own namespaces with their own tooling in it. Like, Argo, for example, is preferred by a lot of developers because it has a nice UI with green and red dots and they can show people and it looks nice, Flux, it's command line based. And there are some projects out there that kind of take the UI of Argo and try to run Flux underneath that, and those are cool kind of projects, I think, in my mind, but in general, right, I think GitOps gives you the choice that we missed somewhat in DevOps implementations of the past because it was, “Oh, we need to go get cloud.” “Well, you can only use this cloud.” “Oh, we need to go get this thing.” “Well, you can only use this thing in-house.”And you know, there's a lot of restrictions sometimes placed on what you can use in your environment. Well, if your environment is Kubernetes, how do you restrict what you can run, right? Like you can't have an easily configured say, no open-source policy if you're running Kubernetes. [laugh] so it becomes, you know—Corey: Well, that doesn't stop some companies from trying.Chris: Yeah, that's true. But the idea of, like, enabling your developers to deploy at will and then promote their changes as they see fit is really the dream of DevOps, right? Like, same with production and platform teams, right? I want to push my changes out to a larger system that is across the globe. How do I do that? How do I manage that? How do I make sure everything's consistent?GitOps gives you those ways, with Kubernetes native things like customizations, to make consistent environments that are robust and actually going to be reconciled automatically if someone breaks the glass and says, “Oh, I need to run this container immediately.” Well, that's going to create problems because it's deviated from state and it's just that one region, so we'll put it back into state.Corey: It'll be dueling banjos, at some point. You'll try and doing something manually, it gets reverted automatically. I love that pattern. You'll get bored before the computer does, always.Chris: Yeah. And GitOps is very new, right? When you think about the lifetime of GitOps, I think it was coined in, like, 2018. So, it's only four years old, right? When—Corey: I prefer it to ChatOps, at least, as far as—Chris: Well, I mean—Corey: —implementation and expression of the thing.Chris: —ChatOps was a way to do DevOps. I think GitOps—Corey: Well, ChatOps is also a way to wind up giving whoever gets access to your Slack workspace root in production.Chris: Mmm.Corey: But that's neither here nor there.Chris: Mm-hm.Corey: It's yeah, we all like to pretend that's not a giant security issue in our industry, but that's a topic for another time.Chris: Yeah. And that's why, like, GitOps also depends upon you having good security, you know, and good authorization and approval processes. It enforces that upon—Corey: Yeah, who doesn't have one of those?Chris: Yeah. If it's a sole operation kind of deal, like in your setup, your case, I think you kind of got it doing right, right? Like, as far as GitOps goes—Corey: Oh, to be clear, we are 11 people and we do have dueling pull requests and all the rest.Chris: Right, right, right.Corey: But most of the stuff I talk about publicly is not our production stuff, so it really is just me. Just as a point of clarity there. I've n—the 11 people here do not all—the rest of you don't just sit there and clap as I do all the work.Chris: Right.Corey: Most days.Chris: No, I'm sure they don't. I'm almost certain they don't clap… for you. I mean, they would—Corey: No. No, they try and talk me out of it in almost every case.Chris: Yeah, exactly. So, the setup that you, Corey Quinn, have implemented to deploy these 20 regions is kind of very GitOps-y, in the sense that when main changes, it gets updated. Where it's not GitOps-y is what if the endpoint changes? Does it get reconciled? That's the piece you're probably missing is that continuous reconciliation component, where it's constantly checking and saying, “This thing out there is deployed in the way I want it. You know, the way I declared it to be in my source of truth.”Corey: Yeah, when you start having other people getting involved, there can—yeah, that's where regressions enter. And it's like, “Well, I know where things are so why would I change the endpoint?” Yeah, it turns out, not everyone has the state of the entire application in their head. Ideally it should live in—Chris: Yeah. Right. And, you know—Corey: —you know, Git or S3.Chris: —when I—yeah, exactly. When I think about interactions of the past coming out as a new DevOps engineer to work with developers, it's always been, will developers have access to prod or they don't? And if you're in that environment with—you're trying to run a multi-billion dollar operation, and your devs have direct—or one Dev has direct access to prod because prod is in his brain, that's where it's like, well, now wait a minute. Prod doesn't have to be only in your brain. You can put that in the codebase and now we know what is in your brain, right?Like, you can almost do—if you document your code, well, you can have your full lifecycle right there in one place, including documentation, which I think is the best part, too. So, you know, it encourages approval processes and automation over this one person has an entire state of the system in their head; they have to go in and fix it. And what if they're not on call, or in Jamaica, or on a cruise ship somewhere kind of thing? Things get difficult. Like, for example, I just got back from vacation. We were so far off the grid, we had satellite internet. And let me tell you, it was hard to write an email newsletter where I usually open 50 to 100 tabs.Corey: There's a little bit of internet out Californ-ie way.Chris: [laugh].Corey: Yeah it's… it's always weird going from, like, especially after pandemic; I have gigabit symmetric here and going even to re:Invent where I'm trying to upload a bunch of video and whatnot.Chris: Yeah. Oh wow.Corey: And the conference WiFi was doing its thing, and well, Verizon 5G was there but spotty. And well, yeah. Usual stuff.Chris: Yeah. It's amazing to me how connectivity has become so ubiquitous.Corey: To the point where when it's not there anymore, it's what do I do with myself? Same story about people pushing back against remote development of, “Oh, I'm just going to do it all on my laptop because what happens if I'm on a plane?” It's, yeah, the year before the pandemic, I flew 140,000 miles domestically and I was almost never hamstrung by my ability to do work. And my only local computer is an iPad for those things. So, it turns out that is less of a real world concern for most folks.Chris: Yeah I actually ordered the components to upgrade an old Nook that I have here and turn it into my, like, this is my remote code server, that's going to be all attached to GitHub and everything else. That's where I want to be: have Tailscale and just VPN into this box.Corey: Tailscale is transformative.Chris: Yes. Tailscale will change your life. That's just my personal opinion.Corey: Yep.Chris: That's not an AWS opinion or anything. But yeah, when you start thinking about your network as it could be anywhere, that's where Tailscale, like, really shines. So—Corey: Tailscale makes the internet work like we all wanted to believe that it worked.Chris: Yeah. And Wireguard is an excellent open-source project. And Tailscale consumes that and puts an amazingly easy-to-use UI, and troubleshooting tools, and routing, and all kinds of forwarding capabilities, and makes it kind of easy, which is really, really, really kind of awesome. And Tailscale and Kubernetes—Corey: Yeah, ‘network' and ‘easy' don't belong in the same sentence, but in this case, they do.Chris: Yeah. And trust me, the Kubernetes story in Tailscale, there is a lot of there. I understand you might want to not open ports in your VPC, maybe, but if you use Tailscale, that node is just another thing on your network. You can connect to that and see what's going on. Your management cluster is just another thing on the network where you can watch the state.But it's all—you're connected to it continuously through Tailscale. Or, you know, it's a much lighter weight, kind of meshy VPN, I would say, if I had to sum it up in one sentence. That was not on our agenda to talk about at all. Anyways. [laugh]Corey: No, no. I love how many different topics we talk about on these things. We'll have to have you back soon to talk again. I really want to thank you for being so generous with your time. If people want to learn more about what you're up to and how you view these things, where can they find you?Chris: Go to ChrisShort.net. So, Chris Short—I'm six-four so remember, it's Short—dot net, and you will find all the places that I write, you can go to devopsish.com to subscribe to my newsletter, which goes out every week. This year. Next year, there'll be breaks. And then finally, if you want to follow me on Twitter, Chris Short: at @ChrisShort on Twitter. All one word so you see two s's. Like, it's okay, there's two s's there.Corey: Links to all of that will of course be in the show notes. It's easier for people to do the clicky-clicky thing as a general rule.Chris: Clicky things are easier than the wordy things, yes.Corey: Says the Kubernetes guy.Chris: Yeah. Says the Kubernetes guy. Yeah, you like that, huh? Like I said, Argo gives you a UI. [laugh].Corey: Thank you [laugh] so much for your time. I really do appreciate it.Chris: Thank you. This has been fun. If folks have questions, feel free to reach out. Like, I am not one of those people that hides behind a screen all day and doesn't respond. I will respond to you eventually.Corey: I'm right here, Chris. Come on, come on. You're calling me out in front of myself. My God.Chris: Egh. It might take a day or two, but I will respond. I promise.Corey: Thanks again for your time. This has been Chris Short, senior developer advocate at AWS. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice and if it's YouTube, click the thumbs-up button. Whereas if you've hated this podcast, same thing, smash the buttons five-star review and leave an insulting comment that is written in syntactically correct YAML because it's just so easy to do.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
On The Cloud Pod this week, the team rediscovers who Ryan is after an eternity (a secret agent). Plus AWS Fargate now delivers faster scaling of applications; new features for Oracle Support Rewards; and Google Cloud Optimization AI: Cloud Fleet Routing API from GCP. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
About CaseyCasey spends his days leveraging AWS to help organizations improve the speed at which they deliver software. With a background in software development, he has spent the past 20 years architecting, building, and supporting software systems for organizations ranging from startups to Fortune 500 enterprises.Links Referenced: “17 Ways to Run Containers in AWS”: https://www.lastweekinaws.com/blog/the-17-ways-to-run-containers-on-aws/ “17 More Ways to Run Containers on AWS”: https://www.lastweekinaws.com/blog/17-more-ways-to-run-containers-on-aws/ kubernetestheeasyway.com: https://kubernetestheeasyway.com snark.cloud/quinntainers: https://snark.cloud/quinntainers ECS Chargeback: https://github.com/gaggle-net/ecs-chargeback twitter.com/nektos: https://twitter.com/nektos TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I had the pleasure of meeting at re:Invent last year, but we'll get to that story in a minute. Casey Lee is the CTO with a company called Gaggle, which is—as they frame it—saving lives. Now, that seems to be a relatively common position that an awful lot of different tech companies take. “We're saving lives here.” It's, “You show banner ads and some of them are attack platforms for JavaScript malware. Let's be serious here.” Casey, thank you for joining me, and what makes the statement that Gaggle saves lives not patently ridiculous?Casey: Sure. Thanks, Corey. Thanks for having me on the show. So Gaggle, we're ed-tech company. We sell software to school districts, and school districts use our software to help protect their students while the students use the school-issued Google or Microsoft accounts.So, we're looking for signs of bullying, harassment, self-harm, and potentially suicide from K-12 students while they're using these platforms. They will take the thoughts, concerns, emotions they're struggling with and write them in their school-issued accounts. We detect that and then we notify the school districts, and they get the students the help they need before they can do any permanent damage to themselves. We protect about 6 million students throughout the US. We ingest a lot of content.Last school year, over 6 billion files, about the equal number of emails ingested. We're looking for concerning content and then we have humans review the stuff that our machine learning algorithms detect and flag. About 40 million items had to go in front of humans last year, resulted in about 20,000 what we call PSSes. These are Possible Student Situations where students are talking about harming themselves or harming others. And that resulted in what we like to track as lives saved. 1400 incidents last school year where a student was dealing with suicide ideation, they were planning to take their own lives. We detect that and get them help within minutes before they can act on that. That's what Gaggle has been doing. We're using tech, solving tech problems, and also saving lives as we do it.Corey: It's easy to lob a criticism at some of the things you're alluding to, the idea of oh, you're using machine learning on student data for young kids, yadda, yadda, yadda. Look at the outcome, look at the privacy controls you have in place, and look at the outcomes you're driving to. Now, I don't necessarily trust the number of school administrations not to become heavy-handed and overbearing with it, but let's be clear, that's not the intent. That is not what the success stories you have alluded to. I've got to say I'm a fan, so thanks for doing what you're doing. I don't say that very often to people who work in tech companies.Casey: Cool. Thanks, Corey.Corey: But let's rewind a bit because you and I had passed like ships in the night on Twitter for a while, but last year at re:Invent something odd happened. First, my business partner procrastinated at getting his ticket—that's not the odd part; he does that a lot—but then suddenly ticket sales slammed shut and none were to be had anywhere. You reached out with a, “Hey, I have a spare ticket because someone can't go. Let me get it to you.” And I said, “Terrific. Let me pay you for the ticket and take you to dinner.”You said, “Yes on the dinner, but I'd rather you just look at my AWS bill and don't worry about the cost of the ticket.” “All right,” said I. I know a deal when I see one. We grabbed dinner at the Venetian. I said, “Bust out your laptop.” And you said, “Oh, I was kidding.” And I said, “Great. I wasn't. Bust it out.”And you went from laughing to taking notes in about the usual time that happens when I start looking at these things. But how was your recollection of that? I always tend to romanticize some of these things. Like, “And then everyone's restaurant just turned, stopped, and clapped the entire time.” Maybe that part didn't happen.Casey: Everything was right up until the clapping part. That was a really cool experience. I appreciate you walking through that with me. Yeah, we've got lots of opportunity to save on our AWS bill here at Gaggle, and in that little bit of time that we had together, I think I walked away with no more than a dozen ideas for where to shave some costs. The most obvious one, the first thing that you keyed in on, is we had RIs coming due that weren't really well-optimized and you steered me towards savings plans. We put that in place and we're able to apply those savings plans not just to our EC2 instances but also to our serverless spend as well.So, that was a very worthwhile and cost-effective dinner for us. The thing that was most surprising though, Corey, was your approach. Your approach to how to review our bill was not what I thought at all.Corey: Well, what did you expect my approach was going to be? Because this always is of interest to me. Like, do you expect me to, like, whip a portable machine learning rig out of my backpack full of GPUs or something?Casey: I didn't know if you had, like, some secret tool you were going to hit, or if nothing else, I thought you were going to go for the Cost Explorer. I spend a lot of time in Cost Explorer, that's my go-to tool, and you wanted nothing to do with Cost Exp—I think I was actually pulling up Cost Explorer for you and you said, “I'm not interested. Take me to the bills.” So, we went right to the billing dashboard, you started opening up the invoices, and I thought to myself, “I don't remember the last time I looked at an AWS invoice.” I just, it's noise; it's not something that I pay attention to.And I learned something, that you get a real quick view of both the cost and the usage. And that's what you were keyed in on, right? And you were looking at things relative to each other. “Okay, I have no idea about Gaggle or what they do, but normally, for a company that's spending x amount of dollars in EC2, why is your data transfer cost the way it is? Is that high or low?” So, you're looking for kind of relative numbers, but it was really cool watching you slice and dice that bill through the dashboard there.Corey: There are a few things I tie together there. Part of it is that this is sort of a surprising thing that people don't think about but start with big numbers first, rather than going alphabetically because I don't really care about your $6 Alexa for Business spend. I care a bit more about the $6 million, or whatever it happens to be at EC2—I'm pulling numbers completely out of the ether, let's be clear; I don't recall what the exact magnitude of your bill is and it's not relevant to the conversation.And then you see that and it's like, “Huh. Okay, you're spending $6 million on EC2. Why are you spending 400 bucks on S3? Seems to me that those two should be a little closer aligned. What's the deal here? Oh, God, you're using eight petabytes of EBS volumes. Oh, dear.”And just, it tends to lead to interesting stuff. Break it down by region, service, and use case—or usage type, rather—is what shows up on those exploded bills, and that's where I tend to start. It also is one of the easiest things to wind up having someone throw into a PDF and email my way if I'm not doing it in a restaurant with, you know, people clapping standing around.Casey: [laugh]. Right.Corey: I also want to highlight that you've been using AWS for a long time. You're a Container Hero; you are not bad at understanding the nuances and depths of AWS, so I take praise from you around this stuff as valuing it very highly. This stuff is not intuitive, it is deeply nuanced, and you have a business outcome you are working towards that invariably is not oriented day in day out around, “How do I get these services for less money than I'm currently paying?” But that is how I see the world and I tend to live in a very different space just based on the nature of what I do. It's sort of a case study and the advantage of specialization. But I know remarkably little about containers, which is how we wound up reconnecting about a week or so before we did this recording.Casey: Yeah. I saw your tweet; you were trying to run some workload—container workload—and I could hear the frustration on the other end of Twitter when you were shaking your fist at—Corey: I should not tweet angrily, and I did in this case. And, eh, every time I do I regret it. But it played well with the people, so that does help. I believe my exact comment was, “‘me: I've got this container. Run it, please.' ‘Google Cloud: Run. You got it, boss.' AWS has 17 ways to run containers and they all suck.”And that's painting with an overly broad brush, let's be clear, but that was at the tail end of two or three days of work trying to solve a very specific, very common, business problem, that I was just beating my head off of a wall again and again and again. And it took less than half an hour from start to finish with Google Cloud Run and I didn't have to think about it anymore. And it's one of those moments where you look at this and realize that the future is here, we just don't see it in certain ways. And you took exception to this. So please, let's dive in because 280 characters of text after half a bottle of wine is not the best context to have a nuanced discussion that leaves friendships intact the following morning.Casey: Nice. Well, I just want to make sure I understand the use case first because I was trying to read between the lines on what you needed, but let me take a guess. My guess is you got your source code in GitHub, you have a Docker file, and you want to be able to take that repo from GitHub and just have it continuously deployed somewhere in Run. And you don't want to have headaches with it; you just want to push more changes up to GitHub, Docker Build runs and updates some service somewhere. Am I right so far?Corey: Ish, but think a little further up the stack. It was in service of this show. So, this show, as people who are listening to this are probably aware by this point, periodically has sponsors, which we love: We thank them for participating in the ongoing support of this show, which empowers conversations like this. Sometimes a sponsor will come to us with, “Oh, and here's the URL we want to give people.” And it's, “First, you misspelled your company name from the common English word; there are three sublevels within the domain, and then you have a complex UTM tagging tracking co—yeah, you realize people are driving to work when they're listening to this?”So, I've built a while back a link shortener, snark.cloud because is it the shortest thing in the world? Not really, but it's easily understandable when I say that, and people hear it for what it is. And that's been running for a long time as an S3 bucket with full of redirects, behind CloudFront. So, I wind up adding a zero-byte object with a redirect parameter on it, and it just works.Now, the challenge that I have here as a business is that I am increasingly prolific these days. So, anything that I am not directly required to be doing, I probably shouldn't necessarily be the one to do it. And care and feeding of those redirect links is a prime example of this. So, I went hunting, and the things that I was looking for were, obviously, do the redirect. Now, if you pull up GitHub, there are hundreds of solutions here.There are AWS blog posts. One that I really liked and almost got working was Eric Johnson's three-part blog post on how to do it serverlessly, with API Gateway, and DynamoDB, no Lambdas required. I really liked aspects of what that was, but it was complex, I kept smacking into weird challenges as I went, and front end is just baffling to me. Because I needed a front end app for people to be able to use here; I need to be able to secure that because it turns out that if you just have a, anyone who stumbles across the URL can redirect things to other places, well, you've just empowered a whole bunch of spam email, and you're going to find that service abused, and everyone starts blocking it, and then you have trouble. Nothing lasts the first encounter with jerks.And I was getting more and more frustrated, and then I found something by a Twitter engineer on GitHub, with a few creative search terms, who used to work at Google Cloud. And what it uses as a client is it doesn't build any kind of custom web app. Instead, as a database, it uses not S3 objects, not Route 53—the ideal database—but a Google sheet, which sounds ridiculous, but every business user here knows how to use that.Casey: Sure.Corey: And it looks for the two columns. The first one is the slug after the snark.cloud, and the second is the long URL. And it has a TTL of five seconds on cache, so make a change to that spreadsheet, five seconds later, it's live. Everyone gets it, I don't have to build anything new, I just put it somewhere around the relevant people can access it, I gave him a tutorial and a giant warning on it, and everyone gets that. And it just works well. It was, “Click here to deploy. Follow the steps.”And the documentation was a little, eh, okay, I had to undo it once and redo it again. Getting the domain registered was getting—ported over took a bit of time, and there were some weird SSL errors as the certificates were set up, but once all of that was done, it just worked. And I tested the heck out of it, and cold starts are relatively low, and the entire thing fits within the free tier. And it is reminiscent of the magic that I first saw when I started working with some of the cloud providers services, years ago. It's been a long time since I had that level of delight with something, especially after three days of frustration. It's one of the, “This is a great service. Why are people not shouting about this from the rooftops?” That was my perspective. And I put it out on Twitter and oh, Lord, did I get comments. What was your take on it?Casey: Well, so my take was, when you're evaluating a platform to use for running your applications, how fast it can get you to Hello World is not necessarily the best way to go. I just assumed you're wrong. I assumed of the 17 ways AWS has to run containers, Corey just doesn't understand. And so I went after it. And I said, “Okay, let me see if I can find a way that solves his use case, as I understand it, through a quick tweet.”And so I tried to App Runner; I saw that App Runner does not meet your needs because you have to somehow get your Docker image pushed up to a repo. App Runner can take an image that's already been pushed up and deployed for you or it can build from source but neither of those were the way I understood your use case.Corey: Having used App Runner before via the Copilot CLI, it is the closest as best I can tell to achieving what I want. But also let's be clear that I don't believe there's a free tier; there needs to be a load balancer in front of it, so you're starting with 15 bucks a month for this thing. Which is not the end of the world. Had I known at the beginning that all of this was going to be there, I would have just signed up for a bit.ly account and called it good. But here we are.Casey: Yeah. I tried Copilot. Copilot is a great developer experience, but it also is just pulling together tons of—I mean just trying to do a Copilot service deploy, VPCs are being created and tons IAM roles are being created, code pipelines, there's just so much going on. I was like 20 minutes into it, and I said, “Yeah, this is not fitting the bill for what Corey was looking for.” Plus, it doesn't solve my the way I understood your use case, which is you don't want to worry about builds, you just want to push code and have new Docker images get built for you.Corey: Well, honestly, let's be clear here, once it's up and running, I don't want to ever have to touch the silly thing again.Casey: Right.Corey: And that's so far has been the case, after I forked the repo and made a couple of changes to it that I wanted to see. One of them was to render the entire thing case insensitive because I get that one wrong a lot, and the other is I wanted to change the permanent 301 redirect to a temporary 302 redirect because occasionally, sponsors will want to change where it goes in the fullness of time. And that is just fine, but I want to be able to support that and not have to deal with old cached data. So, getting that up and running was a bit of a challenge. But the way that it worked, was following the instructions in the GitHub repo.The developer environment had spun up in the Google's Cloud Shell was just spectacular. It prompted me for a few things and it told me step by step what to do. This is the sort of thing I could have given a basically non-technical user, and they would have had success with it.Casey: So, I tried it as well. I said, “Well, okay, if I'm going to respond to Corey here and challenge him on this, I need to try Cloud Run.” I had no experience with Cloud Run. I had a small example repo that loosely mapped what I understood you were trying to do. Within five minutes, I had Cloud Run working.And I was surprised anytime I pushed a new change, within 45 seconds the change was built and deployed. So, here's my conclusion, Corey. Google Cloud Run is great for your use case, and AWS doesn't have the perfect answer. But here's my challenge to you. I think that you just proved why there's 17 different ways to run containers on AWS, is because there's that many different types of users that have different needs and you just happen to be number 18 that hasn't gotten the right attention yet from AWS.Corey: Well, let's be clear, like, my gag about 17 ways to run containers on AWS was largely a joke, and it went around the internet three times. So, I wrote a list of them on the blog post of “17 Ways to Run Containers in AWS” and people liked it. And then a few months later, I wrote “17 More Ways to Run Containers on AWS” listing 17 additional services that all run containers.And my favorite email that I think I've ever received in feedback was from a salty AWS employee, saying that one of them didn't really count because of some esoteric reason. And it turns out that when I'm trying to make a point of you have a sarcastic number of ways to run containers, pointing out that well, one of them isn't quite valid, doesn't really shatter the argument, let's be very clear here. So, I appreciate the feedback, I always do. And it's partially snark, but there is an element of truth to it in that customers don't want to run containers, by and large. That is what they do in service of a business goal.And they want their application to run which is in turn to serve as the business goal that continues to abstract out into, “Remain a going concern via the current position the company stakes out.” In your case, it is saving lives; in my case, it is fixing horrifying AWS bills and making fun of Amazon at the same time, and in most other places, there are somewhat more prosaic answers to that. But containers are simply an implementation detail, to some extent—to my way of thinking—of getting to that point. An important one [unintelligible 00:18:20], let's be clear, I was very anti-container for a long time. I wrote a talk, “Heresy in the Church of Docker” that then was accepted at ContainerCon. It's like, “Oh, boy, I'm not going to leave here alive.”And the honest answer is many years later, that Kubernetes solves almost all the criticisms that I had with the downside of well, first, you have to learn Kubernetes, and that continues to be mind-bogglingly complex from where I sit. There's a reason that I've registered kubernetestheeasyway.com and repointed it to ECS, Amazon's container service that is not requiring you to cosplay as a cloud provider yourself. But even ECS has a number of challenges to it, I want to be very clear here. There are no silver bullets in this.And you're completely correct in that I have a large, complex environment, and the application is nuanced, and I'm willing to invest a few weeks in setting up the baseline underlying infrastructure on AWS with some of these services, ideally not all of them at once because that's something a lunatic would do, but getting them up and running. The other side of it, though, is that if I am trying to evaluate a cloud provider's handling of containers and how this stuff works, the reason that everyone starts with a Hello World-style example is that it delivers ideally, the meantime to dopamine. There's a reason that Hello World doesn't have 18 different dependencies across a bunch of different databases and message queues and all the other complicated parts of running a modern application. Because you just want to see how it works out of the gate. And if getting that baseline empty container that just returns the string ‘Hello World' is that complicated and requires that much work, my takeaway is not that this user experience is going to get better once I'd make the application itself more complicated.So, I find that off-putting. My approach has always been find something that I can get the easy, minimum viable thing up and running on, and then as I expand know that you'll be there to catch me as my needs intensify and become ever more complex. But if I can't get the baseline thing up and running, I'm unlikely to be super enthused about continuing to beat my head against the wall like, “Well, I'll just make it more complex. That'll solve the problem.” Because it often does not. That's my position.Casey: Yeah, I agree that dopamine hit is valuable in getting attached to want to invest into whatever tech stack you're using. The challenge is your second part of that. Your second part is will it grow with me and scale with me and support the complex edge cases that I have? And the problem I've seen is a lot of organizations will start with something that's very easy to get started with and then quickly outgrow it, and then come up with all sorts of weird Rube Goldberg-type solutions. Because they jumped all in before seeing—I've got kind of an example of that.I'm happy to announce that there's now 18 ways to run containers on AWS. Because in your use case, in the spirit of AWS customer obsession, I hear your use case, I've created an open-source project that I want to share called Quinntainers—Corey: Oh, no.Casey: —and it solves—yes. Quinntainers is live and is ready for the world. So, now we've got 18 ways to run containers. And if you have Corey's use case of, “Hey, here's my container. Run it for me,” now we've got a one command that you can run to get things going for you. I can share a link for you and you could check it out. This is a [unintelligible 00:21:38]—Corey: Oh, we're putting that in the [show notes 00:21:37], for sure. In fact, if you go to snark.cloud/quinntainers, you'll find it.Casey: You'll find it. There you go. The idea here was this: There is a real use case that you had, and I looked at AWS does not have an out-of-the-box simple solution for you. I agree with that. And Google Cloud Run does.Well, the answer would have been from AWS, “Well, then here, we need to make that solution.” And so that's what this was, was a way to demonstrate that it is a solvable problem. AWS has all the right primitives, just that use case hadn't been covered. So, how does Quinntainers work? Real straightforward: It's a command-line—it's an NPM tool.You just run a [MPX 00:22:17] Quinntainer, it sets up a GitHub action role in your AWS account, it then creates a GitHub action workflow in your repo, and then uses the Quinntainer GitHub action—reusable action—that creates the image for you; every time you push to the branch, pushes it up to ECR, and then automatically pushes up that new version of the image to App Runner for you. So, now it's using App Runner under the covers, but it's providing that nice developer experience that you are getting out of Cloud Run. Look, is container really the right way to go with running containers? No, I'm not making that point at all. But the point is it is a—Corey: It might very well be.Casey: Well, if you want to show a good Hello World experience, Quinntainer's the best because within 30 seconds, your app is now set up to continuously deliver containers into AWS for your very specific use case. The problem is, it's not going to grow for you. I mean that it was something I did over the weekend just for fun; it's not something that would ever be worthy of hitching up a real production workload to. So, the point there is, you can build frameworks and tools that are very good at getting that initial dopamine hit, but then are not going to be there for you unnecessarily as you mature and get more complex.Corey: And yet, I've tilted a couple of times at the windmill of integrating GitHub actions in anything remotely resembling a programmatic way with AWS services, as far as instance roles go. Are you using permanent credentials for this as stored secrets or are you doing the [OICD 00:23:50][00:23:50] handoff?Casey: OIDC. So, what happens is the tool creates the IAM role for you with the trust policy on GitHub's OIDC provider, sets all that up for you in your account, locks it down so that just your repo and your main branch is able to push or is able to assume the role, the role is set up just to allow deployments to App Runner and ECR repository. And then that's it. At that point, it's out of your way. And you're just git push, and couple minutes later, your updates are now running an App Runner for you.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning fast processing power, courtesy of third gen AMD EPYC processors without the IO, or hardware limitations, of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices, and say goodbye to noisy neighbors and egregious egress forever.Vultr delivers the power of the cloud with none of the bloat. "Screaming in the Cloud" listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G E T V U L T R.com/screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Don't undersell what you've just built. This is something that—is this what I would use for a large-scale production deployment, obviously not, but it has streamlined and made incredibly accessible things that previously have been very complex for folks to get up and running. One of the most disturbing themes behind some of the feedback I got was, at one point I said, “Well, have you tried running a Docker container on Lambda?” Because now it supports containers as a packaging format. And I said no because I spent a few weeks getting Lambda up and running back when it first came out and I've basically been copying and pasting what I got working ever since the way most of us do.And response is, “Oh, that explains a lot.” With the implication being that I'm just a fool. Maybe, but let's be clear, I am never the only person in the room who doesn't know how to do something; I'm just loud about what I don't know. And the failure mode of a bad user experience is that a customer feels dumb. And that's not okay because this stuff is complicated, and when a user has a bad time, it's a bug.I learned that in 2012. From Jordan Sissel the creator of LogStash. He has been an inspiration to me for the last ten years. And that's something I try to live by that if a user has a bad time, something needs to get fixed. Maybe it's the tool itself, maybe it's the documentation, maybe it's the way that GitHub repo's readme is structured in a way that just makes it accessible.Because I am not a trailblazer in most things, nor do I intend to be. I'm not the world's best engineer by a landslide. Just look at my code and you'd argue the fact that I'm an engineer at all. But if it's bad and it works, how bad is it? Is sort of the other side of it.So, my problem is that there needs to be a couple of things. Ignore for a second the aspect of making it the right answer to get something out of the door. The fact that I want to take this container and just run it, and you and I both reach for App Runner as the default AWS service that does this because I've been swimming in the AWS waters a while and you're a frickin AWS Container Hero, where it is expected that you know what most of these things do. For someone who shows up on the containers webpage—which by the way lists, I believe 15 ways to run containers on mobile and 19 ways to run containers on non-mobile, which is just fascinating in its own right—and it's overwhelming, it's confusing, and it's not something that makes it is abundantly clear what the golden path is. First, get it up and working, get it running, then you can add nuance and flavor and the rest, and I think that's something that's gotten overlooked in our mad rush to pretend that we're all Google engineers, circa 2012.Casey: Mmm. I think people get stressed out when they tried to run containers in AWS because they think, “What is that golden path?” You said golden path. And my advice to people is there is no golden path. And the great thing about AWS is they do continue to invest in the solutions they come up with. I'm still bitter about Google Reader.Corey: As am I.Casey: Yeah. I built so much time getting my perfect set of RSS feeds and then I had to find somewhere else to—with AWS, the different offerings that are available for running containers, those are there intentionally, it's not by accident. They're there to solve specific problems, so the trick is finding what works best for you and don't feel like one is better than the other is going to get more attention than others. And they each have different use cases.And I approach it this way. I've seen a couple of different people do some great flowcharts—I think Forrest did one, Vlad did one—on ways to make the decision on how to run your containers. And I break it down to three questions. I ask people first of all, where are you going to run these workloads? If someone says, “It has to be in the data center,” okay, cool, then ECS Anywhere or EKS Anywhere and we'll figure out if Kubernetes is needed.If they need specific requirements, so if they say, “No, we can run in the cloud, but we need privileged mode for containers,” or, “We need EBS volumes,” or, “We want really small container sizes,” like, less than a quarter-VCP or less than half a gig of RAM—or if you have custom log requirements, Fargate is not going to work for you, so you're going to run on EC2. Otherwise, run it on Fargate. But that's the first question. Figure out where are you going to run your containers. That leads to the second question: What's your control plane?But those are different, sort of related but different questions. And I only see six options there. That's App Runner for your control plane, LightSail for your control plane, Rosa if you're invested in OpenShift already, EKS either if you have Momentum and Kubernetes or you have a bunch of engineers that have a bunch of experience with Kubernetes—if you don't have either, don't choose it—or ECS. The last option Elastic Beanstalk, but let's leave that as a—if you're not currently invested in Elastic Beanstalk don't start today. But I look at those as okay, so I—first question, where am I going to run my containers? Second question, what do I want to use for my control plane? And there's different pros and cons of each of those.And then the third question, how do I want to manage them? What tools do I want to use for managing deployment? All those other tools like Copilot or App2Container or Proton, those aren't my control plane; those aren't where I run my containers; that's how I manage, deploy, and orchestrate all the different containers. So, I look at it as those three questions. But I don't know, what do you think of that, Corey?Corey: I think you're onto something. I think that is a terrific way of exploring that question. I would argue that setting up a framework like that—one or very similar—is what the AWS containers page should be, just coming from the perspective of what is the neophyte customer experience. On some level, you almost need a slide of have choose your level of experience ranging from, “What's a container?” To, “I named my kid Kubernetes because I make terrible life decisions,” and anywhere in between.Casey: Sure. Yeah, well, and I think that really dictates the control plane level. So, for example, LightSail, where does LightSail fit? To me, the value of LightSail is the simplicity. I'm looking at a monthly pricing: Seven bucks a month for a container.I don't know how [unintelligible 00:30:23] works, but I can think in terms of monthly pricing. And it's tailored towards a console user, someone just wants to click in, point to an image. That's a very specific user, there's thousands of customers that are very happy with that experience, and they use it. App Runner presents that scale to zero. That's one of the big selling points I see with App Runner. Likewise, with Google Cloud Run. I've got that scale to zero. I can't do that with ECS, or EKS, or any of the other platforms. So, if you've got something that has a ton of idle time, I'd really be looking at those. I would argue that I think I did the math, Google Cloud Run is about 30% more expensive than App Runner.Corey: Yeah, if you disregard the free tier, I think that's have it—running persistently at all times throughout the month, the drop-out cold starts would cost something like 40 some odd bucks a month or something like that. Don't quote me on it. Again and to be clear, I wound up doing this very congratulatory and complimentary tweet about them on I think it was Thursday, and then they immediately apparently took one look at this and said, “Holy shit. Corey's saying nice things about us. What do we do? What do we do?” Panic.And the next morning, they raised prices on a bunch of cloud offerings. Whew, that'll fix it. Like—Casey: [laugh].Corey: Di-, did you miss the direction you're going on here? No, that's the exact opposite of what you should be doing. But here we are. Interestingly enough, to tie our two conversation threads together, when I look at an AWS bill, unless you're using Fargate, I can't tell whether you're using Kubernetes or not because EKS is a small charge. And almost every case for the control plane, or Fargate under it.Everything else just manifests as EC2 spend. From the perspective of the cloud provider. If you're running a Kubernetes cluster, it is a single-tenant application that can have some very funky behaviors like cross-AZ chatter back and fourth because there's no internal mechanism to say talk to the free thing, rather than the two cents a gigabyte thing. It winds up spinning up and down in a bunch of different ways, and the behavior patterns, because of how placement works are not necessarily deterministic, depending upon workload. And that becomes something that people find odd when, “Okay, we look at our bill for a week, what can you say?”“Well, first question. Are you running Kubernetes at all?” And they're like, “Who invited these clowns?” Understand, we're not prying into your workloads for a variety of excellent legal and contractual reasons, here. We are looking at how they behave, and for specific workloads, once we have a conversation engineering team, yeah, we're going to dive in, but it is not at all intuitive from the outside to make any determination whether you're running containers, or whether you're running VMs that you just haven't done anything with in 20 years, or what exactly is going on. And that's just an artifact of the billing system.Casey: We ran into this challenge in Gaggle. We don't use EKS, we use ECS, but we have some shared clusters, lots of EC2 spend, hard to figure out which team is creating the services that's running that up. We actually ended up creating a tool—we open-sourced it—ECS Chargeback, and what it does is it looks at the CPU memory reservations for each task definition, and then prorates the overall charge of the ECS cluster, and then creates metrics in Datadog to give us a breakdown of cost per ECS service. And it also measures what we like to refer to as waste, right? Because if you're reserving four gigs of memory, but your utilization never goes over two gigs, we're paying for that reservation, but you're underutilizing.So, we're able to also show which services have the highest degree of waste, not just utilization, so it helps us go after it. But this is a hard problem. I'd be curious, how do you approach these shared ECS resources and slicing and dicing those bills?Corey: Everyone has a different approach, too. This there is no unifiable, correct answer. A previous show guest, Peter Hamilton, over at Remind had done something very similar, open-sourced a bunch of these things. Understanding what your spend is important on this, and it comes down to getting at the actual business concern because in some cases, effectively dead reckoning is enough. You take a look at the cluster that is really hard to attribute because it's a shared service. Great. It is 5% of your bill.First pass, why don't we just agree that it is a third for Service A, two-thirds for Service B, and we'll call it mostly good at that point? That can be enough in a lot of cases. With scale [laugh] you're just sort of hand-waving over many millions of dollars a year there. How about we get into some more depth? And then you start instrumenting and reporting to something, be it CloudWatch, be a Datadog, be it something else, and understanding what the use case is.In some cases, customers have broken apart shared clusters for that specific reason. I don't think that's necessarily the best approach from an engineering perspective, but again, this is not purely an engineering decision. It comes down to serving the business need. And if you're taking up partial credits on that cluster, for a tax credit for R&D for example, you want that position to be extraordinarily defensible, and spending a few extra dollars to ensure that it is the right business decision. I mean, again, we're pure advisory; we advise customers on what we would do in their position, but people often mistake that to be we're going to go for the lowest possible price—bad idea, or that we're going to wind up doing this from a purely engineering-centric point of view.It's, be aware of that in almost every case, with some very notable weird exceptions, the AWS Bill costs significantly less than the payroll expense that you have of people working on the AWS environment in various ways. People are more expensive, so the idea of, well, you can save a whole bunch of engineering effort by spending a bit more on your cloud, yeah, let's go ahead and do that.Casey: Yeah, good point.Corey: The real mark of someone who's senior enough is their answer to almost any question is, “It depends.” And I feel I've fallen into that trap as well. Much as I'd love to sit here and say, “Oh, it's really simple. You do X, Y, and Z.” Yeah… honestly, my answer, the simple answer, is I think that we orchestrate a cyber-bullying campaign against AWS through the AWS wishlist hashtag, we get people to harass their account managers with repeated requests for, “Hey, could you go ahead and [dip 00:36:19] that thing in—they give that a plus-one for me, whatever internal system you're using?”Just because this is a problem we're seeing more and more. Given that it's an unbounded growth problem, we're going to see it more and more for the foreseeable future. So, I wish I had a better answer for you, but yeah, that's stuff's super hard is honest, but it's also not the most useful answer for most of us.Casey: I'd love feedback from anyone from you or your team on that tool that we created. I can share link after the fact. ECS Chargeback is what we call it.Corey: Excellent. I will follow up with you separately on that. That is always worth diving into. I'm curious to see new and exciting approaches to this. Just be aware that we have an obnoxious talent sometimes for seeing these things and, “Well, what about”—and asking about some weird corner edge case that either invalidates the entire thing, or you're like, “Who on earth would ever have a problem like that?” And the answer is always, “The next customer.”Casey: Yeah.Corey: For a bounded problem space of the AWS bill. Every time I think I've seen it all, I just have to talk to one more customer.Casey: Mmm. Cool.Corey: In fact, the way that we approached your teardown in the restaurant is how we launched our first pass approach. Because there's value in something like that is different than the value of a six to eight-week-long, deep-dive engagement to every nook and cranny. And—Casey: Yeah, for sure. It was valuable to us.Corey: Yeah, having someone come in to just spend a day with your team, diving into it up one side and down the other, it seems like a weird thing, like, “How much good could you possibly do in a day?” And the answer in some cases is—we had a Honeycomb saying that in a couple of days of something like this, we wound up blowing 10% off their entire operating budget for the company, it led to an increased valuation, Liz Fong-Jones says that—on multiple occasions—that the company would not be what it was without our efforts on their bill, which is just incredibly gratifying to hear. It's easy to get lost in the idea of well, it's the AWS bill. It's just making big companies spend a little bit less to another big company. And that's not exactly, you know, saving the lives of K through 12 students here.Casey: It's opening up opportunities.Corey: Yeah. It's about optimizing for the win for everyone. Because now AWS gets a lot more money from Honeycomb than they would if Honeycomb had not continued on their trajectory. It's, you can charge customers a lot right now, or you can charge them a little bit over time and grow with them in a partnership context. I've always opted for the second model rather than the first.Casey: Right on.Corey: But here we are. I want to thank you for taking so much time out of well, several days now to argue with me on Twitter, which is always appreciated, particularly when it's, you know, constructive—thanks for that—Casey: Yeah.Corey: For helping me get my business partner to re:Invent, although then he got me that horrible puzzle of 1000 pieces for the Cloud-Native Computing Foundation landscape and now I don't ever want to see him again—so you know, that happens—and of course, spending the time to write Quinntainers, which is going to be at snark.cloud/quinntainers as soon as we're done with this recording. Then I'm going to kick the tires and send some pull requests.Casey: Right on. Yeah, thanks for having me. I appreciate you starting the conversation. I would just conclude with I think that yes, there are a lot of ways to run containers in AWS; don't let it stress you out. They're there for intention, they're there by design. Understand them.I would also encourage people to go a little deeper, especially if you got a significantly large workload. You got to get your hands dirty. As a matter of fact, there's a hands-on lab that a company called Liatrio does. They call it their Night Lab; it's a one-day free, hands-on, you run legacy monolithic job applications on Kubernetes, gives you first-hand experience on how to—gets all the way up into observability and doing things like Canary deployments. It's a great, great lab.But you got to do something like that to really get your hands dirty and understand how these things work. So, don't sweat it; there's not one right way. There's a way that will probably work best for each user, and just take the time and understand the ways to make sure you're applying the one that's going to give you the most runway for your workload.Corey: I will definitely dig into that myself. But I think you're right, I think you have nailed a point that is, again, a nuanced one and challenging to put in a rage tweet. But the services don't exist in a vacuum. They're not there because, despite the joke, someone wants to get promoted. It's because there are customer needs that are going on that, and this is another way of meeting those needs.I think there could be better guidance, but I also understand that there are a lot of nuanced perspectives here and that… hell is someone else's workflow—Casey: [laugh].Corey: —and there's always value in broadening your perspective a bit on those things. If people want to learn more about you and how you see the world, where's the best place to find you?Casey: Probably on Twitter: twitter.com/nektos, N-E-K-T-O-S.Corey: That might be the first time Twitter has been described as a best place for anything. But—Casey: [laugh].Corey: Thank you once again, for your time. It is always appreciated.Casey: Thanks, Corey.Corey: Casey Lee, CTO at Gaggle and AWS Container Hero. And apparently writing code in anger to invalidate my points, which is always appreciated. Please do more of that, folks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, or the YouTube comments, which is always a great place to go reading, whereas if you've hated this podcast, please leave a five-star review in the usual places and an angry comment telling me that I'm completely wrong, and then launching your own open-source tool to point out exactly what I've gotten wrong this time.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
If you had access to the stargate.. sorry, FARgate, how would you use it? Would you steal from the world's banks? Use it to fast travel to different continents for leisure? Or would you use it to steal cable? If you chose the latter, you'd probably fit in well with the Plutonians. Emory and Oglethorpe get high (420 blaze it), blast their eyes out, make counterfeit shirts, and unleash the universal remonster (accidentally) on the Aqua Teens. This episode also spawns a rare Diablo III item! Did Rick and Morty ever do that?! Doubtful! Other topics discussed include spicy brownies, cocky bowling, shitty day jobs, supporting the things you like, and WWE Survivor Series 2003's theme song. References: • Carey Means Cameo: https://www.cameo.com/meanscareyfrylock • Err on the /r/Place Atlas: https://place-atlas.stefanocoding.me/?id=txbtvc • Andy Merrill Twitter: https://twitter.com/amerrill2 • ATHF Production Music - On the Up: https://www.youtube.com/watch?v=_KYNZmFppaY • Funtime Bowl History: https://reporternewspapers.net/2019/11/01/old-school-bowling-rolls-on-at-funtime-bowl/ • [STRIKE!] Knocked Down by Pandemic, Historic Funtime Bowl Closed Permanently in Brookhaven: https://www.tonetoatl.com/2021/09/Funtime-Bowl-Northeast-Plaza-Brookhaven-Closed.html • 2003 Forum Thread about Universal Remonster: https://animesuperhero.com/forums/threads/c-c-aqua-teen-hunger-force-universal-re-monster-9-28.3453871/ Contacts: Support the Show: patreon.com/dancingisforbidden Leave a voice message: speakpipe.com/dancingisforbidden Discord: https://discord.gg/NpjSXPECw6 Instagram: @AquaTeenPod Twitter: @AquaTeenPod Email: DancingIsForbiddenPod@gmail.com YouTube: https://www.youtube.com/channel/UCe5gFb5eAYH3nyF3DZ5jwhQ Website: dancingisforbidden.com Twitch: twitch.tv/ronnieneeley
Firecracker is an open-source project, launched by AWS, for serverless computing. On today's Full Stack Journey podcast, Michael Hausenblas of AWS joins host Scott Lowe to talk about Firecracker, how it works, its adoption, and more. They also touch on EKS, Fargate, and Lambda.