About ChrisChris is a robotics engineer turned cloud security practitioner. From building origami robots for NASA, to neuroscience wearables, to enterprise software consulting, he is a passionate builder at heart. Chris is a cofounder of Common Fate, a company with a mission to make cloud access simple and secure.Links: Common Fate: https://commonfate.io/ Granted: https://granted.dev Twitter: https://twitter.com/chr_norm TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Let's face it, on-call firefighting at 2am is stressful! So there's good news and there's bad news. The bad news is that you probably can't prevent incidents from happening, but the good news is that incident.io makes incidents less stressful and a lot more valuable. incident.io is a Slack-native incident management platform that allows you to automate incident processes, focus on fixing the issues and learn from incident insights to improve site reliability and fix your vulnerabilities. Try incident.io, recover faster and sleep more.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It doesn't matter where you are on your journey in cloud—you could never have heard of Amazon the bookstore—and you encounter AWS and you spin up an account. And within 20 minutes, you will come to the realization that everyone in this space does. “Wow, logging in to AWS absolutely blows goats.”Today, my guest, obviously had that reaction, but unlike most people I talked to, decided to get up and do something about it. Chris Norman is the co-founder of Common Fate and most notably to how I know him is one of the original authors of the tool, Granted. Chris, thank you so much for joining me.Chris: Hey, Corey, thank you for having me.Corey: I have done podcasts before; I have done a blog post on it; I evangelize it on Twitter constantly, and even now, it is challenging in a few ways to explain holistically what Granted is. Rather than trying to tell your story for you, when someone says, “Oh, Granted, that seems interesting and impossible to Google for in isolation, so therefore, we know it's going to be good because all the open-source projects with hard to find names are,” what is Granted and what does it do?Chris: Granted is a command-line tool which makes it really easy for you to get access and assume roles when you're working with AWS. For me, when I'm using Granted day-to-day, I wake up, go to my computer—I'm working from home right now—crack open the MacBook and I log in and do some development work. I'm going to go and start working in the cloud.Corey: Oh, when I start first thing in the morning doing development work and logging into the cloud, I know. All right, I'm going to log in to AWS and now I know that my day is going downhill from here.Chris: [laugh]. Exactly, exactly. I think maybe the best days are when you don't need to log in at all. But when you do, I go and I open my terminal and I run this command. Using Granted, I ran this assume command and it authenticates me with single-sign-on into AWS, and then it opens up a console window in a particular account.Now, you might ask, “Well, that's a fairly standard thing.” And in fact, that's probably the way that the console and all of the tools work by default with AWS. Why do you need a third-party tool for this?Corey: Right. I've used a bunch of things that do varying forms of this and unlike Granted, you don't see me gushing about them. I want to be very clear, we have no business relationship. You're not sponsoring anything that I do. I'm not entirely clear on what your day job entails, but I have absolutely fallen in love with the Granted tool, which is why I'm dragging you on to this show, kicking and screaming, mostly to give me an excuse to rave about it some more.Chris: [laugh]. Exactly. And thank you for the kind words. And I'd say really what makes it special or why I've been so excited to be working on it is that it makes this access, particularly when you're working with multiple accounts, really, really easy. So, when I run assume and I open up that console window, you know, that's all fine and that's very similar to how a lot of the other tools and projects that are out there work, but when I want to open that second account and that second console window, maybe because I'm looking at like a development and a staging account at the same time, then Granted allows me to view both of those simultaneously in my browser. And we do that using some platform sort of tricks and building into the way that the browser works.Corey: Honestly, one of the biggest differences in how you describe what Granted is and how I view it is when you describe it as a CLI application because yes, it is that, but one of the distinguishing characteristics is you also have a Firefox extension that winds up leveraging the multi-container functionality extension that Firefox has. So, whenever I wind up running a single command—assume with a-c' flag, then I give it the name of my AWS profile, it opens the web console so I can ClickOps my heart's content inside of a tab that is locked to a container, which means I can have one or two or twenty different AWS accounts and/or regions up running simultaneously side-by-side, which is basically impossible any other way that I've ever looked at it.Chris: Absolutely, yeah. And that's, like, the big differentiating factor right now between Granted and between this sort of default, the native experience, if you're just using the AWS command line by itself. With Granted, you can—with these Firefox containers, all of your cookies, your profile, everything is all localized into that one container. It's actually it's a privacy features that are built into Firefox, which keeps everything really separate between your different profiles. And what we're doing with Granted is that we make it really easy to open a specific profiles that correspond with different AWS profiles that you're using.So, you'd have one which could be your development account, one which could be production or staging. And you can jump between these and navigate between them just as separate tabs in your browser, which is a massive improvement over, you know, what I've previously had to use in the past.Corey: The thing that really just strikes me about this is first, of course, the functionality and the rest, so I saw this—I forget how I even came across it—and immediately I started using it. On my Mac, it was great. I started using it when I was on the road, and it was less great because you built this thing in Go. It can compile and install on almost anything, but there were some assumptions that you had built into this in its early days that did not necessarily encompass all of the use cases that I use. For example, it hadn't really occurred to you that some lunatic would try and only use an iPad when they're on the road, so they have to be able to run this to get federated login links via SSHing into an EC2 instance running somewhere and not have it open locally.You seemed almost taken aback when I brought it up. Like, “What lunatic would do that?” Like, “Hi, I'm such a lunatic. Let's talk about this.” And it does that now, and it's awesome. It does seem to me though, and please correct me if I'm wrong on this assumption slash assessment that this is first and foremost aimed at desktop users, specifically people running Mac on the desktop, is that the genesis of it?Chris: It is indeed. And I think part of the cause behind that is that we originally built a tool for ourselves. And as we were building things and as we were working using the cloud, we were running things—you know, we like to think that we're following best practices when we're using AWS, and so we'd set up multiple accounts, we'd have a special account for development, a separate one for staging, a separate one for production, even internal tools that we would build, we would go and spin up an individual account for those. And then you know, we had lots of accounts. and to go and access those really easily was quite difficult.So, we definitely, we built it for ourselves first and I think that that's part of when we released it, it actually a little bit of cause for some of the initial problems. And some of the feedback that we had was that it's great to build tools for yourself, but when you're working in open-source, there's a lot of different diversity with how people are using things.Corey: We take different approaches. You want to try to align with existing best practices, whereas I am a loudmouth white guy who works in tech. So, what I do definitionally becomes a best practice in the ecosystem. It's easier to just comport with the ones that are already existing that smart people put together rather than just trying to competence your way through it, so you took a better path than I did.But there's been a lot of evolution to Granted as I've been using it for a while. I did a whole write-up on it and that got a whole bunch of eyes onto the project, which I can now admit was a nefarious plan on my part because popping into your community Slack and yelling at you for features I want was all well and good, but let's try and get some people with eyes on this who are smarter than me—which is not that high of a bar when it comes to SSO, and IAM, and federated login, and the rest—and they can start finding other enhancements that I'll probably benefit from. And sure enough, that's exactly what happened. My sneaky plan has come to fruition. Thanks for being a sucker, I guess. I mean—[laugh] it worked. I'm super thrilled by the product.Chris: [laugh]. I guess it's a great thing I think that the feedback and particularly something that's always been really exciting is just seeing new issues come through on GitHub because it really shows the kinds of interesting use cases and the kinds of interesting teams and companies that are using Granted to make their lives a little bit easier.Corey: When I go to the website—which again is impossible to Google—the website for those wondering is granted.dev. It's short, it's concise, I can say it on a podcast and people automatically know how to spell it. But at the top of the website—which is very well done by the way—it mentions that oh, you can, “Govern access to breakglass roles with Common Fate Cloud,” and it also says in the drop shadow nonsense thing in the upper corner, “Brought to you by Common Fate,” which is apparently the name of your company.So, the question I'll get to in a second is what does your company do, but first and foremost, is this going to be one of those rug-pull open-source projects where one day it's, “Oh, you want to log into your AWS accounts? Insert quarter to continue.” I'm mostly being a little over the top with that description, but we've all seen things that we love turn into molten garbage. What is the plan around this? Are you about to ruin this for the rest of us once you wind up raising a round or something? What's the deal?Chris: Yeah, it's a great question, Corey. And I think that to a degree, releasing anything like this that sits in the access workflow and helps you assume roles and helps you day-to-day, you know, we have a responsibility to uphold stability and reliability here and to not change things. And I think part of, like, not changing things includes not [laugh] rug-pulling, as you've alluded to. And I think that for some companies, it ends up that open-source becomes, like, a kind of a lead-generation tool, or you end up with, you know, now finally, let's go on add another login so that you have to log into Common Fate to use Granted. And I think that, to be honest, a tool like this where it's all about improving the speed of access, the incentives for us, like, it doesn't even make sense to try and add another login for to try to get people to, like, to say, login to Common Fate because that would make your signing process for AWS take even longer than it already does.Corey: Yeah, you decided that you know, what's the biggest problem? Oh, you can sleep at night, so let's go ahead and make it even worse, by now I want you to be this custodian of all my credentials to log into all of my accounts. And now you're going to be critical path, so if you're down, I'm not able to log into anything. And oh, by the way, I have to trust you with full access to my bank stuff. I just can't imagine that is a direction that you would be super excited about diving head-first into.Chris: No, no. Yeah, certainly not. And I think that the, you know, building anything in this space, and with what we're doing with Common Fate, you know, we're building a cloud platform to try to make IAM a little bit easier to work with, but it's really sensitive around granting any kind of permission and I think that you really do need that trust. So, trying to build trust, I guess, with our open-source projects is really important for us with Granted and with this project, that it's going to continue to be reliable and continue to work as it currently does.Corey: The way I see it, one of the dangers of doing anything that is particularly open-source—or that leans in the direction of building in Amazon's ecosystem—it leads to the natural question of, well, isn't this just going to be some people say stolen—and I don't think those people understand how open-source works—by AWS themselves? Or aren't they going to build something themselves at AWS that's going to wind up stomping this thing that you've built? And my honest and remarkably cynical answer is that, “You have built a tool that is a joy to use, that makes logging into AWS accounts streamlined and efficient in a variety of different patterns. Does that really sound like something AWS would do?” And followed by, “I wish they would because everyone would benefit from that rising tide.”I have to be very direct and very clear. Your product should not exist. This should be something the provider themselves handles. But nope. Instead, it has to exist. And while I'm glad it does, I also can't shake the feeling that I am incredibly annoyed by the fact that it has to.Chris: Yeah. Certainly, certainly. And it's something that I think about a little bit. I like to wonder whether there's maybe like a single feature flag or some single sort of configuration setting in AWS where they're not allowing different tabs to access different accounts, they're not allowing this kind of concurrent access. And maybe if we make enough noise about Granted, maybe one of the engineers will go and flick that switch and they'll just enable it by default.And then Granted itself will be a lot less relevant, but for everybody who's using AWS, that'll be a massive win because the big draw of using Granted is mainly just around being able to access different accounts at the same time. If AWS let you do that out of the box, hey, that would be great and, you know, I'd have a lot less stuff to maintain.Corey: Originally, I had you here to talk about Granted, but I took a glance at what you're actually building over at Common Fate and I'm about to basically hijack slash derail what probably is going to amount the rest of this conversation because you have a quick example on your site for by developers, for developers. You show a quick Python script that tries to access a S3 bucket object and it's denied. You copy the error message, you paste it into what you're building over a Common Fate, and in return, it's like, “Oh. Yeah, this is the policy that fixes it. Do you want us to apply it for you?”And I just about fell out of my chair because I have been asking for this explicit thing for a very long time. And AWS doesn't do it. Their IAM access analyzer claims to. Like, “Oh, just go look at CloudTrail and see what permissions it uses and we'll build a policy to scope it down.” “Okay. So, it's S3 access. Fair enough. To what object or what bucket?” “Guess,” is what it tells you there.And it's, this is crap. Who thinks this is a good user experience? You have built the thing that I wish AWS had built in natively. Because let's be honest here, I do what an awful lot of people do and overscope permissions massively just because messing around with the bare minimum set of permissions in many cases takes more time than building the damn thing in the first place.Chris: Oh, absolutely. Absolutely. And in fact, this—was a few years ago when I was consulting—I had a really similar sort of story where one of the clients that we were working with, the CTO of this company, he was needing to grant us access to AWS and we were needing to build a particular service. And he said, “Okay, can you just let me know the permissions that you will need and I'll go and deploy the role for this.” And I came back and I said, “Wait. I don't even know the permissions that I'm going to need because the damn thing isn't even built yet.”So, we went sort of back and forth around this. And the compromise ended up just being you know, way too much access. And that was sort of part of the inspiration for, you know, really this whole project and what we're building with Common Fate, just trying to make that feedback loop around getting to the right level of permissions a lot faster.Corey: Yeah, I am just so overwhelmingly impressed by the fact that you have built—and please don't take this as a criticism—but a set of very simple tools. Not simple in the terms of, “Oh, that's, like, three lines of bash, and a fool could write that on a weekend.” No. Simple in the sense of it solves a problem elegantly and well and it's straightforward—well, straightforward as anything in the world of access control goes—to wrap your head around exactly what it does. You don't tend to build these things by sitting around a table brainstorming with someone you met at co-founder dating pool or something and wind up figuring out, “Oh, we should go and solve that. That sounds like a billion-dollar problem.”This feels very much like the outcome of when you're sitting around talking to someone and let's start by drinking six beers so we become extraordinarily honest, followed immediately by let's talk about what sucks. What pisses you off the most? It feels like this is sort of the low-hanging fruit of things that upset people when it comes to AWS. I mean, if things had gone slightly differently, instead of focusing on AWS bills, IAM was next on my list of things to tackle just because I was tired of smacking my head into it.This is very clearly a problem space that you folks have analyzed deeply, worked within, and have put a lot of thought into. I want to be clear, I've thrown a lot of feature suggestions that you for Granted from start to finish. But all of them have been around interface stuff and usability and expanding use cases. None of them have been, “Well, that seems screamingly insecure.” Because it hasn't been.Chris: [laugh].Corey: It has been effective, start to finish, I think that from a security posture, you make terrific choices, in many cases better than ones I would have made a starting from scratch myself. Everything that I'm looking at in what you have built is from a position of this is absolutely amazing and it is transformative to my own workflows. Now, how can we improve it?Chris: Mmm. Thank you, Corey. And I'll say as well, maybe around the security angle, that one of the goals with Granted was to try and do things a little bit better than the default way that AWS does them when it comes to security. And it's actually been a bit of a source for challenges with some of the users that we've been working with with Granted because one of the things we wanted to do was encrypt the SSO token. And this is the token that when you sign in to AWS, kind of like, it allows you to then get access to all of the rest of the accounts.So, it's like a pretty—it's a short-lived token, but it's a really sensitive one. And you know, by default, it's just stored in plain text on your disk. So, we dump to a file and, you know, anything that can go and read that, they can go and get it. It's also a little bit hard to revoke and to lock people out. There's not really great workflows around that on AWS's side.So, we thought, “Okay, great. One of the goals for Granted can be that we will go and store this in your keychain in your system and we'll work natively with that.” And that's actually been a cause for a little bit of a hassle for some users, though, because by doing that and by storing all of this information in the keychain, it's actually broken some of the integrations with the rest of the tooling, which kind of expects tokens and things to be in certain places. So, we've actually had to, as part of dealing with that with Granted, we've had to give users the ability to opt out for that.Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: That's why I find this so, I think, just across the board, fantastic. It's you are very clearly engaged with your community. There's a community Slack that you have set up for this. And I know, I know, too many Slacks; everyone has this problem. This is one of those that is worth hanging in, at least from my perspective, just because one of the problems that you have, I suspect, is on my Mac it's great because I wind up automatically updating it to whatever the most recent one is every time I do a brew upgrade.But on the Linux side of the world, you've discovered what many of us have discovered, and that is that packaging things for Linux is a freaking disaster. The current installation is, “Great. Here's basically a curl bash.” Or, “Here, grab this tarball and install it.” And that's fine, but there's no real way of keeping that updated and synced.So, I was checking the other day, oh wow, I'm something like eight versions behind on this box. But it still just works. I upgraded. Oh, wow. There's new functionality here. This is stuff that's actually really handy. I like this quite a bit. Let's see what else we can do.I'm just so impressed, start to finish, by just how receptive you've been to various community feedbacks. And as well—I want to be very clear on this point, too—I've had folks who actually know what they're doing in an InfoSec sense look at what you're up to, and none of them had any issues of note. I'm sure that they have a pile of things like, with that curl bash, they should really be doing a GPG check. Yes, yes, fine. Whatever. If that's your target threat model, okay, great. Here in reality-land for what I do, this is awesome.And they don't seem to have any problems with, “Oh, yeah. By the way, sending analytics back up”—which, okay, fine, whatever. “And it's not disclosing them.” Okay, that's bad. “And it's including the contents of your AWS credentials.”Ahhhh. I did encounter something that was doing that on the back-end once. [cough]—Serverless Framework—sorry, something caught in my throat for a second.Chris: [laugh].Corey: No faster way I can think of to erode trust in that. But everything you're doing just makes sense.Chris: Oh, I do remember that. And that was a little bit of a fiasco, really, around all of that, right? And it's great to hear actually around that InfoSec folks and security people being, you know, not unhappy, I guess, with a tool like this. It's been interesting for me personally. We've really come from a practitioner's background.You know, I wouldn't call myself a security engineer at all. I would call myself as a sometimes a software developer, I guess. I have been hacking my way around Go and definitely learning a lot about how the cloud has worked over the past seven, eight years or so, but I wouldn't call myself a security engineer, so being very cautious around how all of these things work. And we've really tried to defer to things like the system keychain and defer to things that we know are pretty safe and work.Corey: The thing that I also want to call out as well is that your licensing is under the MIT license. This is not one of those, “Oh, you're required to wind up doing a bunch of branding stuff around it.” And, like some people say, “Oh, you have to own the trademark for all of these things.” I mean, I'm not an expert in international trademark law, let's be very clear, but I also feel that trademarking a term that is already used heavily in the space such as the word ‘Granted,' feels like kind of an uphill battle. And let's further be clear that it doesn't matter what you call this thing.In fact, I will call attention to an oddity that I've encountered a fair bit. After installing it, the first thing you do is you run the command ‘granted.' That sets it up, it lets you configure your browser, what browser you want to use, and it now supports standard out for that headless, EC2 use case. Great. Awesome. Love it. But then the other binary that ships with it is Assume. And that's what I use day-to-day. It actually takes me a minute sometimes when it's been long enough to remember that the tool is called Granted and not Assume what's up with that?Chris: So, part of the challenge that we ran into when we were building the Granted project is that we needed to export some environment variables. And these are really important when you're logging into AWS because you have your access key, your secret key, your session token. All of those, when you run the assume command, need to go into the terminal session that you called it. This doesn't matter so much when you're using the console mode, which is what we mentioned earlier where you can open 100 different accounts if you want to view all of those at the same time in your browser. But if you want to use it in your terminal, we wanted to make it look as really smooth and seamless as possible here.And we were really inspired by this approach from—and I have to shout them out and kind of give credit to them—a tool called AWSume—they're spelled A-W-S-U-M-E—Python-based tool that they don't do as much with single-sign-on, but we thought they had a really nice, like, general approach to the way that they did the scripting and aliasing. And we were inspired by that and part of that means that we needed to have a shell script that called this executable, which then will export things back out into the shell script. And we're doing all this wizardry under the hood to make the user experience really smooth and seamless. Part of that meant that we separated the commands into granted and assume and the other part of the naming for everything is that I felt Granted had a far better ring to it than calling the whole project Assume.Corey: True. And when you say assume, is it AWS or not? I've used the AWSume project before; I've used AWS Vault out of 99 Designs for a while. I've used—for three minutes—the native AWS SSO config, and that is just trash. Again, they're so good at the plumbing, so bad at the porcelain, I think is the criticism that I would levy toward a lot of this stuff.Chris: Mmm.Corey: And it's odd to think there's an entire company built around just smoothing over these sharp, obnoxious edges, but I'm saying this as someone who runs a consultancy and have five years that just fixes the bill for this one company. So, there's definitely a series of cottage industries that spring up around these things. I would be thrilled, on some level, if you wound up being completely subsumed by their product advancements, but it's been 15 years for a lot of this stuff and we're still waiting. My big failure mode that I'm worried about is that you never are.Chris: Yeah, exactly, exactly. And it's really interesting when you think about all of these user experience gaps in AWS being opportunities for, I guess, for companies like us, I think, trying to simplify a lot of the complexity for things. I'm interested in sort of waiting for a startup to try and, like, rebuild the actual AWS console itself to make it a little bit faster and easier to use.Corey: It's been done and attempted a bunch of different times. The problem is that the console is a lot of different things to a lot of different people, and as you step through that, you can solve for your use case super easily. “Yeah, what do I care? I use RDS, I use some VPC nonsense, and I use EC2. The end.” “Great. What about IAM?”Because I promise you're using that whether you know it or not. And okay, well, I'm talking to someone else who's DynamoDB, and someone else is full-on serverless, and someone else has more money than sense, so they mostly use SageMaker, and so on and so forth. And it turns out that you're effectively trying to rebuild everything. I don't know if that necessarily works.Chris: Yeah, and I think that's a good point around maybe while we haven't seen anything around that sort of space so far. You go to the console, and you click down, you see that list of 200 different services and all of those have had teams go and actually, like, build the UI and work with those individual APIs. Yeah.Corey: Any ideas as far as what's next for features on Granted?Chris: I think that, for us, it's continuing to work with everybody who's using it, and with a focus of stability and performance. We actually had somebody in the community raise an issue because they have an AWS config file that's over 7000 lines long. And I kind of pity that person, potentially, for their day-to-day. They must deal with so much complexity. Granted is currently quite slow when the config files get very big. And for us, I think, you know, we built it for ourselves; we don't have that many accounts just yet, so working to try to, like, make it really performant and really reliable is something that's really important.Corey: If you don't mind a feature request while we're at it—and I understand that this is more challenging than it looks like—I'm willing to fund this as a feature bounty that makes sense. And this also feels like it might be a good first project for a very particular type of person, I would love to get tab completion working in Zsh. You have it—Chris: Oh.Corey: For Fish because there's a great library that automatically populates that out, but for the Zsh side of it, it's, “Oh, I should just wind up getting Zsh completion working,” and I fell down a rabbit hole, let me tell you. And I come away from this with the perception of yeah, I'm not going to do it. I have not smart enough to check those boxes. But a lot of people are so that is the next thing I would love to see. Because I will change my browser to log into the AWS console for you, but be damned if I'm changing my shell.Chris: [laugh]. I think autocomplete probably should be higher on our roadmap for the tool, to be honest because it's really, like, a key metric and what we're focusing on is how easy is it to log in. And you know, if you're not too sure what commands to use or if we can save you a few keystrokes, I think that would be the, kind of like, reaching our goals.Corey: From where I'm sitting, you definitely have. I really want to thank you for taking the time to not only build this in the first place, but also speak with me about it. If people want to learn more, where's the best place to find you?Chris: So, you can find me on Twitter, I'm @chr_norm, or you can go and visit granted.dev and you'll have a link to join the Slack community. And I'm very active on the Slack.Corey: You certainly are, although I will admit that I fall into the challenge of being in just the perfectly opposed timezone from you and your co-founder, who are in different time zones to my understanding; one of you is on Australia and one of you was in London; you're the London guy as best I'm aware. And as a result, invariably, I wind up putting in feature requests right when no one's around. And, for better or worse, in the middle of the night is not when I'm usually awake trying to log into AWS. That is Azure time.Chris: [laugh]. Yeah, no, we don't have the US time zone properly covered yet for our community support and help. But we do have a fair bit of the world timezone covered. The rest of the team for Common Fate is all based in Australia and I'm out here over in London.Corey: Yeah. I just want to thank you again, for just being so accessible and, like, honestly receptive to feedback. I want to be clear, there's a way to give feedback and I do strive to do it constructively. I didn't come crashing into your Slack one day with a, “You know what your problem is?” I prefer to take the, “This is awesome. Here's what I think would be even better. Does that make sense?” As opposed to the imperious demands and GitHub issues and whatnot? It's, “I'd love it if it did this thing. Doesn't do this thing. Can you please make it do this thing?” Turns out that's the better way to drive change. Who knew?Chris: Yeah. [laugh]. Yeah, definitely. And I think that one of the things that's been the best around our journey with Granted so far has been listening to feedback and hearing from people how they would like to use the tool. And a big thank you to you, Corey, for actually suggesting changes that make it not only better for you, but better for everybody else who's using Granted.Corey: Well, at least as long as we're using my particular byzantine workload patterns in some way, or shape, or form, I'll hear that. But no, it's been an absolute pleasure and I really want to thank you for your time as well.Chris: Yeah, thank you for having me.Corey: Chris Norman, co-founder of Common Fate, as well as one of the two primary developers originally behind the Granted project that logs you into AWS without you having to lose your mind. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, incensed, raging comment that talks about just how terrible all of this is once you spend four hours logging into your AWS account by hand first.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Links: Azure has another security issue around its Synapse offering; this one was discovered by Tenable. Sysdig has a dive into the real threats to SSH on EC2. Tailscale has announced the ability to support Tailscale SSH. Chris Farris has a treatise on the The Philosphy of Prevention when it comes to cloud security. Google Cloud CISO Phil Venables asks whether security analogies are counterproductive. A security issue of sorts was discovered around sts:GetSessionToken Role Chaining in AWS The person responsible for the giant Capital One hack that took advantage of a series of small AWS misconfigurations has been convicted. Rogue GitHub apps could have hijacked countless repos for a week or two earlier this year. Wickr for Government achieves FedRAMP Ready designation It takes an open source project like trackiam to collate IAM actions, AWS APIs, and managed policies from all over the place Passwordle lets you guess commonly used passwords.
בפינה זו, נגיש לכם מידע על העבודה היומיומית בסביבת ענן מנקודת המבט שלנו.דוברי הפרק: אריאל מונפו, אבי קינן וערן לדור. בפרק הקודם, דיברנו על Amazon Lightsail. מהו השירות, למי הוא מיועד, מה השימושים שלו, מה ההבדל בינו לביו ה-EC2. כמו כן, אבי ביצע דמו בלייב ליצירת שרת חדש. בפרק זה, נדבר עם ערן לדור, מנהל תחום ה-FinOps בחברת AppsFlyer על אופטימיזציה בעולמות ה-FinOps. ניגע בסוגי האופטימיזציות שניתן לעשות, נשוחח על התרבות הארגונית, KPIs ונושאים מגוונים נוספים בחיי היום יום של צוותי ה-FinOps. רוצים להתעדכן בתכנים נוספים בנושאי ענן וטכנולוגיות מתקדמות? הירשמו עכשיו לניוזלטר שלנו ותמיד תישארו בעניינים. להרשמה: https://www.israelclouds.com/newslettersignup
We recently talked about migrating a monolithic application to AWS, using EC2, load balancers, S3 and RDS. In this episode we want to talk about a slightly different setup, where we are going for containers instead of EC2 and we want to deploy them in Fargate. In this We are going to cover all the components you will need in your architecture, the reasons to choose Fargate over any alternatives and discuss some CDK tricks to get started in a quick way (and the pitfalls that might come with them). In this episode, we mentioned the following resources: - CDK ECS Patterns: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs_patterns-readme.html - How to fine tune the health checks to speed up the deployment process: https://www.qovery.com/blog/how-to-speed-up-amazon-ecs-container-deployments - Previous Episode “37. How do you migrate a monolith to AWS without the drama?”: https://awsbites.com/37-how-do-you-migrate-a-monolith-to-aws-without-the-drama/ This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige #aws #docker #fargate
Full Description / Show Notes Gafnit explains how she found a vulnerability in RDS, an Amazon database service (1:40) Gafnit and Corey discuss the concept of not being able to win in cloud security (7:20) Gafnit talks about transparency around security breaches (11:02) Corey and Gafnit discuss effectively communicating with customers about security (13:00) Gafnit answers the question “Did you come at the RDS vulnerability exploration from a perspective of being deeper on the Postgres side or deeper on the AWS side? (18:10) Corey and Gafnit talk about the risk of taking a pre-existing open source solution and offering it as a managed service (19:07) Security measures in cloud-native approaches versus cloud-hosted (22:41) Gafnit and Corey discuss the security community (25:04) About GafnitGafnit Amiga is the Director of Security Research at Lightspin. Gafnit has 7 years of experience in Application Security and Cloud Security Research. Gafnit leads the Security Research Group at Lightspin, focused on developing new methods to conduct research for new cloud native services and Kubernetes. Previously, Gafnit was a lead product security engineer at Salesforce focused on their core platform and a security researcher at GE Digital. Gafnit holds a Bs.c in Computer Science from IDC Herzliya and a student for Ms.c in Data Science.Links Referenced: Lightspin: https://www.lightspin.io/ Twitter: https://twitter.com/gafnitav LinkedIn: https://www.linkedin.com/in/gafnit-amiga-b1357b125/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. We've taken a bit of a security bent to the conversations that we've been having on this show and over the past year or so and, well, today's episode is no different. In fact, we're going a little bit deeper than we normally tend to. My guest today is Gafnit Amiga, who's the Director of Security Research at Lightspin. Gafnit, thank you for joining me.Gafnit: Hey, Corey. Thank you for inviting me to the show.Corey: You sort of burst onto the scene—and by ‘scene,' I of course mean the cloud space, at least to the level of community awareness—back, I want to say in April of 2022 when you posted a very in-depth blog post about exploiting RDS and some misconfigurations on AWS's side to effectively display internal service credentials for the RDS service itself. Now, that sounds like it's one of those incredibly deep, incredibly murky things because it is, let's be clear. At a high level, can you explain to me exactly what it is that you found and how you did it? Gafnit: Yes, so, RDS is database service of Amazon. It's a managed service where you can choose the engine that you prefer. One of them is Postgres. There, I found the vulnerability. The vulnerability was in the extension in the log_fdw—so it's for—like, stands for Foreign Data Wrapper—where this extension is, therefore reading the logs directly of the engine, and then you can query it using SQL queries, which should be simpler and easy to use.And this extension enables you to provide a path. And there was a path traversal, but the traversal happened only when you dropped a validation of the wrapper. And this is how I managed to read local files from the database EC2 machine, which shouldn't happen because this is a managed service and you shouldn't have any access to the underlying host.Corey: It's always odd when the abstraction starts leaking, from an AWS perspective. I know that a friend of mine was on Aurora during the beta and was doing some high-performance work and suddenly started seeing SQL errors about /var/temp filling up, which is, for those who are not well versed in SQL, and even for those who are, that's not the sort of thing you tend to expect to show up on there. It feels like the underlying system tends to leak in—particularly in RDS sense—into what is otherwise at least imagined to be a fully-managed service.Gafnit: Yes because sometimes they want to give you an informative error so you will be able to realize what happened and what caused to the error, and sometimes they prefer not to give you too many information because they don't want you to get to the underlying machine. This is why, for example, you don't get a regular superuser; you have an RDS superuser in the database.Corey: It seems to me that this is sort of a problem of layering different security models on top of each other. If you take a cloud-native database that they designed, start to finish, themselves, like DynamoDB, the entire security model for Dynamo, as best I can determine, is wrapped up within IAM. So, if you know IAM—spoiler, nobody knows IAM completely, it seems—but if you have that on lock you've got it; there's nothing else you need to think about. Whereas with RDS, you have to layer on IAM to get access to the database and what you're allowed to do with it.But then there's an entirely separate user management system, in many respects, of local users for other Postgres or MySQL or any other systems that were using, to a point where even when they started supporting IRM for authentication to RDS at the database user level. It was flagged in the documentation with a bunch of warnings of, “Don't do this for high-volume stuff; only do this in development style environments.” So, it's clear that it has been a difficult marriage, for lack of a better term. And then you have to layer on all the other stuff that if God forbid, you're in a multi-cloud style environment or working with Kubernetes on top of all of this, and it seems like you're having to pick and choose between four or five different levels of security modeling, as well as understand how all of those things interplay together. How come we don't see things like this happening four times a day as a result?Gafnit: Well, I guess that there are more issues being found, but not always published but I think that this is what makes it more complex for both sides. Creating managed services with resources and third parties that everybody knows. To make it easy for them to use requires a deep understanding of the existing permission models of the service where you want to integrate it with your permission model and how the combination works. So, you actually need to understand how every change is going to affect the restrictions that you want to have. So, for example, if you don't want the database users to be able to read-write or do a network activity, so you really need to understand the permission model of the Postgres itself. So, it makes it more complicated for development, but it's also good for researchers because they already know Postgres and they have a good starting point.Corey: My philosophy has always been when you're trying to secure something, you need to have at least a topical level of understanding of the entire system, start to finish. One of the problems I've had with the idea of microservices as is frequently envisioned is that there's separation, but not real separation, so you have to hand-wave over a whole bunch of the security model. If you don't understand something, I believe it's very difficult to secure it. And let's be honest, even if you do understand [laugh] something, it can be very difficult to secure it. And the cloud vendors with IAM and similar systems don't seem to be doing themselves any favors, given the sheer complexity and the capabilities that they're demanding of themselves, even for having one AWS service talk to another one, but in the right way.And it's finicky, and it's nuanced, and debugging it becomes a colossal pain. And finally, at least those of us who are bad at these things, finally say, “The hell with it,” and they just grant full access from Service A to Service B—in the confines of a test environment. I'm not quite that nuts myself, most days. And then it's the biggest lie we always tell ourselves is once we have something overscoped like that, usually for CI/CD, it's, “Oh, todo: I'll go back and fix that later.” Yeah, I'm looking back five years ago and that's still on my todo list.For some reason, it's never been the number one priority. And in all likelihood, it won't be until right after it really should have been my number one priority. It feels like in cloud security particularly, you can't win, you can only not lose. I always found that to be something of a depressing perspective and I didn't accept it for the longest time. But increasingly, these days, it started to feel like that is the state of the world. Am I wrong on that? Am I just being too dour?Gafnit: What do you mean by you cannot lose?Corey: There's no winning in security from my perspective because no one is going to say, “All right. We won the security. Problem solved. The end.” Companies don't view security as a value-add. It is only about a downside risk mitigation play.It's, “Yay, another day of not getting breached.” And the failure mode from there is, “Okay, well, we got breached, but we found out about it ourselves immediately internally, rather than reading about it in The New York Times in two weeks.” The winning is just the steady-state, the status quo. It's just all different flavors of losing beyond that.Gafnit: So, I don't think it's quite the case because I can tell that they do do always an active work on securing the services and their structure because I went over other extensions before reaching to the log foreign data wrapper, and they actually excluded high-risk functionalities that could help me to achieve privileged access to the underlying host. And they do it with other services as well because they do always do the security review before having it integrated externally. But you know, it's an endless zone. You can always have something. Security vulnerabilities are always [arrays 00:09:06]. So everyone, whenever they can help and to search and to give their value, it's appreciated.Corey: I feel like I need to clarify a bit of nuance. When your blog post first came out talking about this, I was, well let's say a little irritated toward AWS on Twitter and other places. And Twitter is not a place for nuance, it is easy to look at that and think, “Oh, I was upset at AWS for having a vulnerability.” I am not, I want to be very clear on that. Now, it's certainly not good, but these are computers; that is the nature of how they work.If you want to completely secure computer, cut the power to it, sink it in concrete and then drop it in the ocean. And even then, there are exceptions to all of that. So, it's always a question of not blocking all risk; it's about trade-offs and what risk is acceptable. And to AWS is credit, they do say that they practice defense-in-depth. Being able to access the credentials for the running RDS service on top of the instance that it was running on, while that's certainly not good, isn't as if you'd suddenly had keys to everything inside of AWS and all their security model crumbles away before you.They do the right thing and the people working on these things are incredibly good. And they work very hard at these things. My concern and my complaint is, as much as I enjoy the work that you do and reading these blog posts talking about how you did it, it bothers me that I have to learn about a vulnerability in a service for which I pay not small amounts of money—RDS is the number one largest charge in my AWS bill every month—and I have to hear about it from a third-party rather than the vendor themselves. In this case, it was a full day later, where after your blog post went up, and they finally had a small security disclosure on AWS's site talking about it. And that pattern feels to me like it leads nowhere good.Gafnit: So, transparency is a key word here. And when I wrote the post, I asked if they want to add anything from their side, and they told that they already reached out to the vulnerable customers and they helped them to migrate to their fixed version. So, from their side, it didn't felt it's necessary to add it over there. But I did mention the fact that I did the investigation and no customer data was hurt. Yeah, but I think that if there will be maybe a more organized process for any submission of any vulnerability that where all the steps are aligned, it will help everyone and anyone can be informed with everything that happens.Corey: I have always been extraordinarily impressed by people who work at AWS and handle a lot of the triaging of vulnerability reports. Zack Glick, before he left, was doing an awful lot of that Dan [Erson 00:12:05] continues to be a one of the bright lights of AWS, from my perspective, just as far as customer communication and understanding exactly what the customer perspective is. And as individuals, I see nothing but stars over at AWS. To be clear, ‘Nothing but Stars' is also the name of most of my IAM policies, but that's neither here nor there.It seems like, on some level, there's a communications and policy misalignment, on some level, because I look at this and every conversation I ever have with AWS's security folks, they are eminently reasonable, they're incredibly intelligent, and they care. There's no mistaking that they legitimately care. But somewhere at the scale of company they're at, incentives get crossed, and everyone has a different position they're looking at these things from, and it feels like that disjointedness leads to almost a misalignment as far as how to effectively communicate things like this to customers.Gafnit: Yes, it looks like this is the case, but if more things will be discovered and published, I think that they will have eventually an organized process for that. Because I guess the researchers do find things over there, but they're not always being published for several reasons. But yes, they should work on that. [laugh].Corey: And that is part of the challenge as well, where AWS does not have a public vulnerability disclosure program. [unintelligible 00:13:30] hacker one, they don't have a public bug bounty program. They have a vulnerability disclosure email address, and the people working behind that are some of the hardest working folks in tech, but there is no unified way of building a community of researchers around the idea of exploring this. And that is a challenge because you have reported vulnerabilities, I have reported significantly fewer vulnerabilities, but it always feels like it's a hurry up and wait scenario where the communication is not always immediate and clear. And at best, it feels like we often get a begrudging, “Thank you.”Versus all right, if we just throw ethics completely out the window and decide instead that now we're going to wind up focusing on just effectively selling it to the highest bidder, the value of, for example, a hypervisor escape on EC2 for example, is incalculable. There is no amount of money that a bug bounty program could offer for something like that compared to what it is worth to the right bad actor at the right time. So, the vulnerabilities that we hear about are already we're starting from a basis of people who have a functioning sense of ethics, people who are not deeply compromised trying to do something truly nefarious. What worries me is the story of—what are the stories that we aren't seeing? What are the things that are being found where instead of fighting against the bureaucracy around disclosure and the rest, people just use them for their own ends? And I'm gratified by the level of response I see from AWS on the things that they do find out about, but I always have to wonder, what aren't we seeing?Gafnit: That's a good question. And it really depends on their side if they choose to expose it or not.Corey: Part of the challenge too, is the messaging and the communication around it and who gets credit and the rest. And it's weird, whenever they release some additional feature to one of their big headline services, there are blog posts, there are keynote speeches, there are customer references, they go on speaking tours, and the emails, oh, God, they never stopped the emails talking about how amazing all of these things are. But whenever there's a security vulnerability or a disclosure like this—and to be fair, AWS's response to this speaks very well of them—it's like you have to go sneak down into the dark sub-basement, with the filing cabinet behind the leopard sign and the rest, to even find out that these things exist. And I feel like they're not doing themselves any favors by developing that reputation for lack of transparency around these things. “Well, while there was no customer impact, so why would we talk about it?”Because otherwise, you're setting up a myth that there never is a vulnerability on the side of—what is it that you're building as a cloud provider. And when there is a problem down the road—because there always is going to be; nothing is perfect—people are going to say, “Hey, wait a minute. You didn't talk about this. What else haven't you talked about?”And it rebounds on them with sometimes really unfortunate side effects. With Azure as a counterexample here, we see a number of Azure exploits where, “Yeah, turned out that we had access to other customers' data and Azure had no idea until we told them.” And Azure does it statements about, “Oh, we have no evidence of any of this stuff being used improperly.” Okay, that can mean that you've either check your logs and things are great or you don't have logging. I don't know that necessarily is something I trust.Conversely, AWS has said in the past, “We have looked at the audit logs for this service dating back to its launch years ago, and have validated that none of that has never been used like this.” One of those responses breeds an awful lot of customer trust. The other one doesn't. And I just wish AWS knew a little bit more how good crisis communication around vulnerabilities can improve customer trust rather than erode it.Gafnit: Yes, and I think that, as you said, there will always be vulnerabilities. And I think that we are expecting to find more, so being able to communicate as clearly as you can and to expose things about maybe the fakes and how the investigation is being done, even in a high level, for all the vulnerabilities can gain more trust from the customer side.Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: You have experience in your background specifically around application security and cloud security research. You've been doing this for seven years at this point. When you started looking into this, did you come at the RDS vulnerability exploration from a perspective of being deeper on the Postgres side or deeper on the AWS side of things?Gafnit: So, it was both. I actually came to the RDS lead from another service where there was something [about 00:18:21] in the application level. But then I reached to an RDS and thought, well, it will be really nice to find thing over here and to reach the underlying machine. And when I entered to the RDS zone, I started to look at it from the application security eyes, but you have to know the cloud as well because there are integrations with S3, you need to understand the IAM model. So, you need a mix of both to exploit specifically this kind of issue. But you can also be database experts because the payload is a pure SQL.Corey: It always seems to me that this is an inherent risk in trying to take something that is pre-existing is an open-source solution—Postgres is one example but there are many more—and offer it as a managed service. Because I think one of the big misunderstandings is that when—well, AWS is just going to take something like Redis and offer that as a managed service, it's okay, I accept that they will offer a thing that respects the endpoints and then acts as if it were Redis, but under the hood, there is so much in all of these open-source projects that is built for optionality of wherever you want to run this thing, it will run there; whatever type of workload you want to throw at it, it can work. Whereas when you have a cloud provider converting these things into a managed service, they are going to strip out an awful lot of those things. An easy example might be okay, there's this thing that winds up having to calculate for the way the hard drives on a computer work and from a storage perspective.Well, all the big cloud providers already have interesting ways that they have solved storage. Every team does not reimplement that particular wheel; they use in-house services. Chubby's file locking, for example, over on Google side is a classic example of this that they've talked about an awful lot so every team building something doesn't have to rediscover all of that. So, the idea that, oh, we're just going to take up this open-source thing, clone it off a GitHub, fork it, and then just throw it into production as a managed service seems more than a little naive. What's your experience around seeing, as you get more [laugh] into the weeds of these things than most customers are allowed to get, what's your take on this?Do you find that this looks an awful lot like the open-source version that we all use? Or is it something that looks like it has been heavily customized to take advantage of what AWS is offering internally as underlying bedrock services?Gafnit: So, from what I saw until now, they do want to save the functionality so you will have the same experience as you're working with the same service that not on AWS because you're you are used to that. So, they are not doing dramatic changes, but they do want to reduce the risk in the security space. So, there will be some functionalities that they will not let you to do. And this is because of the managed party in areas where the full workload is deployed in your account and you can access it anyway, so they will not have the same security restrictions because you can access the workload anyway. But when it's managed, they need to prevent you from accessing the underlying host, for example. And they do the changes, but they're really picked to the specific actions that can lead you to that.Corey: It also feels like RDS is something of a, I don't want to call it a legacy service because it is clearly still very much actively developed, but it's what we'll call it a ‘classic service.' When I look at a new AWS launch, I tend to mentally bucket them into two things. There's the cloud-native approach, and we've already talked about DynamoDB. That would be one example of this. And there's the cloud-hosted model where you have to worry about things like instances and security groups and the networking stuff, and so on and so forth, where it's basically feels like they're running their thing on top of a pile of EC2 instances, and that abstraction starts leaking.Part of me wonders if looking at some of these older services like RDS, they made decisions in the design and build out of these things that they might not if they were to go ahead and build it out today. I mean, Aurora is an example of what that might look like. Have you found as you start looking around the various security foibles of different cloud services, that the security posture of some of the more cloud-native approaches is better or worse or the same as the cloud-hosted world?Gafnit: Well, so for example, in the several issues that were found, and also here in the RDS where you can see credentials in a file, this is not a best practice in security space. And so, definitely there are things to improve, even if it's developed on the provider side. But it's really hard to answer this question because in a managed area where you don't have any access, it's hard to tell how it's configured and if it's configured properly. So, you need to have some certification from their side.Corey: This is, on some level, part of the great security challenge, especially for something that is not itself open-source, where they obviously have terrific security teams, don't get me wrong. At no point do I want to ever come across a saying, “Oh, those AWS people don't know how security works.” That is provably untrue. But there is something to be said for the value of having a strong community in the security space focusing on this from the outside of looking at these things, of even helping other people contextualize these things. And I'm a little disheartened that none of the major cloud providers seem to have really embraced the idea of a cloud security community, to the point where the one that I'm most familiar with, the cloud security forum Slack team seems to be my default place where I go for context on things.Because I dabble. I keep my hand in when it comes to security, but I'm certainly no expert. That's what people like you are for. I make fun of clouds and I work on the billing parts of it and that's about as far as it goes for me. But being able to get context around is this a big deal? Is this description that a company is giving, is it accurate?For example, when your post came out, I had not heard of Lightspin in this context. So, reaching out to a few people I trusted, is this legitimate? The answer was, “Yes. It's legitimate and it's brilliant. That's a company that keep your eye on.” Great. That's useful context and there's no way to buy that. It has to come from having those conversations with people in the [broader 00:24:57] sense of the community. What's your experience been looking at the community side of the world of security?Gafnit: Well, so I think that the cloud security has a great community, and this is one of the things that we at Lightspin really want to increase and push forward. And we see ourselves as a security-driven company. We always do the best to publish a post, even detailed posts, not about vulnerabilities, about how things works in the cloud and how things are being evaluated, to release open-source tools where you can use them to check your environment even if you're not a customer. And I think that the community is always willing to explain and to investigate together. And it's a welcome effort, but I think that the messaging should be also for all layers, you know, also for the DevOps and the developers because it can really help if it will start from this point from their side, as well.Corey: It needs to be baked in, from start to finish.Gafnit: Yeah, exactly.Corey: I really want to thank you for taking the time out of your day to speak with me today. If people want to learn more about what you're up to, where's the best place for them to find you?Gafnit: So, you can find me on Twitter and on LinkedIn, and feel free to reach out.Corey: We will, of course, put links to that in the [show notes 00:26:25]. Thank you so much for being so generous with your time today. I appreciate it.Gafnit: Thank you, Corey.Corey: Gafnit Amiga, Director of Security Research at Lightspin. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, and if it's on the YouTubes, smash the like and subscribe buttons, which I'm told are there. Whereas if you've hated this podcast, same story, like and subscribe and the buttons, leave a five-star review on a various platform, but also leave an insulting, angry comment about how my observation that our IAM policies are all full of stars is inaccurate. And then I will go ahead and delete that comment later because you didn't set a strong password.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Cette semaine, on parle de mainframe et de modernization de ces applications dans le cloud. On aborde aussi brièvement une nouvelle instance EC2, une fonctionalité qui facilite la vie des data scientists dans Sagemaker et d'automatisation post migration apres avoir migré des VMs.
בפינה זו, נגיש לכם מידע על העבודה היומיומית בסביבת ענן מנקודת המבט שלנו.דוברי הפרק: אריאל מונפו ואבי קינן. בפרק הקודם, דיברנו על מה זה AWS Organizations, מה מהות השינוי, איך הוא מסייע לנו בניהול ריבוי חשבונות, מה הפיצ'רים הכלולים בתוכו, מה היתרונות שלו, למה השירות חשוב לאינטגרטורים וממה כדאי להיזהר. בפרק זה, נדבר על Amazon Lightsail. מהו השירות, למי הוא מיועד, מה השימושים שלו, מה ההבדל בינו לביו ה-EC2. כמו כן, אבי יבצע דמו בלייב ליצירת שרת חדש. רוצים להתעדכן לצפות בתכנים הכי איכותיים ומקצועיים? הירשמו עכשיו לכנס מחשוב הענן הגדול בישראל! להרשמה > https://www.israelcloudsummit.com/
Debbie Barnum and Rachael Gonzalez, Co-Founders of EC2, discuss the organization's mission, its 25 community partnerships, the monthly Cheers and Construction Careers event, and ConstructForce, an online searchable resource database in the AEC world.
En este nuevo episoEn este nuevo episodio sobre Contenedores en AWS conversamos con Omar Onofre sobre la solución de orquestación desarrollada por AWS, Amazon ECS. Repasamos las diferentes características y capacidades del servicio (conceptos de Task y Service, opciones de hosting en EC2 y Fargate, escalamiento, Capacity Providers), así como también conversamos sobre las integraciones con diferentes servicios de la plataforma ¡No se lo pierdan! Material Adicional: https://aws.amazon.com/es/ecs/getting-started/
On The Cloud Pod this week, the team struggles with scheduling to get everyone in the same room for just one week. Plus, Microsoft increases pay for talent retention while changing licensing for European Cloud Providers, Google Cloud introduces AlloyDB for PostgreSQL, and AWS announces EC2 support for NitroTPM. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
When it comes to choosing compute services on AWS, there are a lot of options, including EC2, ECS, Lambda, EKS… New ones keep emerging all the time! Selecting the right one for each application is no longer an easy choice. In this episode we discuss why you need compute services and what kinds of problems should be offloaded to something else entirely. We suggest how you can develop a methodology to make the selection process easier and less biased within your company. We discuss at a high level what are some of the different compute options available in AWS and finally we provide a few different options example use cases and describe how we picked the compute service for each. In this episode, we mentioned the following resources: - InfoQ article “A Recipe to Migrate and Scale Monoliths in the Cloud”: https://www.infoq.com/articles/cloud-migrate-scale/ - Our previous episode about migrating monoliths to the cloud: https://www.youtube.com/watch?v=GYa2RkYDfBQ - Article on choosing the right compute service: https://www.fourtheorem.com/blog/aws-compute This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige #aws #compute #lambda
About JamesJames has been part of AWS for over 15 years. During that time he's led software engineering for Amazon EC2 and more recently leads the AWS Commerce Platform group that runs some of the largest systems in the world, handling volumes of data and request rates that would make your eyes water. And AWS customers trust us to be right all the time so there's no room for error.Links Referenced:Email: firstname.lastname@example.orgTranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning-fast processing power, courtesy of third-gen AMD EPYC processors without the IO or hardware limitations of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general-purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices and say goodbye to noisy neighbors and egregious egress forever. Vultr delivers the power of the cloud with none of the bloat. “Screaming in the Cloud” listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G-E-T-V-U-L-T-R dot com slash screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Finding skilled DevOps engineers is a pain in the neck! And if you need to deploy a secure and compliant application to AWS, forgettaboutit! But that's where DuploCloud can help. Their comprehensive no-code/low-code software platform guarantees a secure and compliant infrastructure in as little as two weeks, while automating the full DevSecOps lifestyle. Get started with DevOps-as-a-Service from DuploCloud so that your cloud configurations are done right the first time. Tell them I sent you and your first two months are free. To learn more visit: snark.cloud/duplo. Thats's snark.cloud/D-U-P-L-O-C-L-O-U-D. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. And I've been angling to get someone from a particular department at AWS on this show for nearly its entire run. If you were to find yourself in an Amazon building and wander through the various dungeons and boiler rooms and subterranean basements—I presume; I haven't seen nearly as many of you inside of those buildings as people might think—you pass interesting departments labeled things like ‘Spline Reticulation,' or whatnot. And then you come to a very particular group called Commerce Platform.Now, I'm not generally one to tell other people's stories for them. My guest today is James Greenfield, the VP of Commerce Platform at AWS. James, thank you for joining me and suffering the slings and arrows I will no doubt be hurling at you.James: Thanks for having me. I'm looking forward to it.Corey: So, let's start at the very beginning—because I guarantee you, you're going to do a better job of giving the chapter and verse answer than I would from a background mired deeply in snark—what is Commerce Platform? It sounds almost like it's the retail website that sells socks, books, and underpants.James: So, Commerce Platform actually spans a bunch of different things. And so, I'm going to try not to bore you with a laundry list of all of the things that we do—it's a much longer list than most people assume even internal to AWS—at its core, Commerce Platform owns all of the infrastructure and processes and software that takes the fact that you've been running an EC2 instance, or you're storing an object in S3 for some period of time, and turns it into a number at the end of the month. That is what you asked for that service and then proceeds to try to give you as many ways to pay us as easily as possible. There are a few other bits in there that are maybe less obvious. One is we're also responsible for protecting the platform and our customers from fraudulent activity. And then we're also responsible for helping collect all of the data that we need for internal reporting to support some of the back-ends services that a business needs to do things like revenue recognition and general financial reporting.Corey: One of the interesting aspects about the billing system is just how deeply it permeates everything that happens within AWS. I frequently say that when it comes to cloud, cost and architecture are foundationally and fundamentally the same exact thing. If your entire service goes down, a few interesting things happen. One, I don't believe a single customer is going to complain other than maybe a few accountants here and there because the books aren't reconciling, but also you've removed a whole bunch of constraints around why things are the way that they are. Like, what is the most efficient way to run this workload?Well, if all the computers suddenly become free, I don't really care about efficiency, so much is, “Oh, hey. There's a fly, what do I have as a flyswatter? That's right, I'm going to drop a building on it.” And those constraints breed almost everything. I've said, for example, that S3 has infinite storage because it does.They can add drives faster than we're able to fill them—at least historically; they added some more replication services—but they're going to be able to buy hard drives faster than the rest of us are going to be able to stretch our budgets. If that constraint of the budget falls away, all bets are really off, and more or less, we're talking about the destruction of the cloud as a viable business entity. No pressure or anything.James: [laugh].Corey: You're also a recent transplant into AWS billing as a whole, Commerce Platform in general. You spent 15 years at the company, the vast majority of that over an EC2. So, either it was you've been exiled to a basically digital Siberia or it was one of those, “Okay, keeping all the EC2 servers up, this is easy. I don't see what people stress about.” And they say, “Oh, ho ho, try this instead.” How did you find yourself migrating over to the Commerce Platform?James: That's actually one I've had a lot from folks that I've worked with. You're right, I spent the first 15 or so years of my career at AWS in EC2, responsible for various things over there. And when the leadership role in Commerce Platform opened up, the timing was fortuitous, and part of it, I was in the process of relocating my family. We moved to Vancouver in the middle of last year. And we had an opening in the role and started talking about, potentially, me stepping into that role.The reason that I took it—there's a few reasons, but the primary reason is that if I look back over my career, I've kind of naturally gravitated towards owning things where people only really remember that they exist when they're not working. And for some reason, you know, I enjoy the opportunity to try to keep those kinds of services ticking over to the point where people don't notice them. And so, Commerce Platform lands squarely in that space. I've always been attracted to opportunities to have an impact, and it's hard to imagine having much more of an impact than in the Commerce Platform space. It underpins everything, as you said earlier.Every single one of our customers depends on the service, whether they think about it or realize it. Every single service that we offer to customers depends on us. And so, that really is the sort of nexus within AWS. And I'm a platform guy, I've always been a platform guy. I like the force multiplier nature of platforms, and so Commerce Platform, you know, as I kind of thought through all of those elements, really was a great opportunity to step in.And I think there's something to be said for, I've been a customer of Commerce Platform internally for a long time. And so, a chance to cross over and be on the other side of that was something that I didn't want to pass up. And so, you know, I'm digging in, and learning quickly, ramping up. By no means an expert, very dependent on a very smart, talented, committed group of people within the team. That's kind of the long and short of how and why.Corey: Let's say that I am taking on the role of an AWS product team, for the sake of argument. I know, keep the cringe down for a second, as far as oh, God, the wince is just inevitable when the idea of me working there ever comes up to anyone. But I have an idea for a service—obviously, it runs containers, and maybe it does some other things as well—going from idea to six-pager to MVP to barely better than MVP day-one launch, and at some point, various things happen to that service. It gets staff with a team, objectives and a roadmap get built, a P&L and budget, and a pricing model and the rest. One the last thing that happens, apparently, is someone picks the worst name off of a list of candidates, slaps it on the product, and ships it off there.At what point does the billing system and figuring out the pricing dimensions for a given service tend to factor in? Is that a last-minute story? Is that almost from the beginning? Where along that journey does, “Oh, by the way, we're building this thing. Maybe we should figure out, I don't know, how to make money from it.” Factor into the conversation?James: There are two parts to that answer. Pretty early on as we're trying to define what that service is going to look like, we're already typically thinking about what are the dimensions that we might charge along. The actual pricing discussions typically happen fairly late, but identifying those dimensions and, sort of, the right way to present it to customers happens pretty early on. The thing that doesn't happen early enough is actually pulling the Commerce Platform team in. but it is something that we're going to work this year to try to get a little bit more in front of.Corey: Have you found historically that you have a pretty good idea of how a service is going to be priced, everything is mostly thought through, a service goes to either private preview or you're discussing about a launch, and then more or less, I don't know, someone like me crops up with a, “Hey, yeah, let's disregard 90% of what the service does because I see a way to misuse the remaining 10% of it as a database.” And you run some mental math and realize, “Huh. We're suddenly giving, like, eight petabytes of storage per customer away for free. Maybe we should guard against that because otherwise, it's rife with misuse.” It used to be that I could find interesting ways to sneak through the cracks of various services—usually in pursuit of a laugh—those are getting relatively hard to come by and invariably a lot more trouble than they're worth. Is that just better comprehensive diligence internally, is that learning from customers, or am I just bad at this?James: No, I mean, what you're describing is almost a variant of the Defender's Dilemma. They are way more ways to abuse something than you can imagine, and so defending against that is pretty challenging. And it's important because, you know, if you turn the economics of something upside down, then it just becomes harder for us to offer it to customers who want to use it legitimately. I would say 90% of that improvement is us learning. We make plenty of mistakes, but I think, you know, one of the things that I've always been impressed by over my time here is how intentional we are trying to learn from those mistakes.And so, I think that's what you're seeing there. And then we try very hard to listen to customers, talk to folks like you, because one of the best ways to tackle anything it smells of the Defender's Dilemma is to harness that collective creativity of a large number of smart people because you really are trying to cover as much ground as possible.Corey: There was a fun joke going around a while back of what is the most expensive environment you can get running on a free tier account before someone from AWS steps in, and I think I got it to something like half a billion dollars in the first month. Now, I haven't actually tested this for reasons that mostly have to do with being relatively poor compared to, you know, being able to buy Guam. And understanding as well the fraud protections built into something like AWS are largely built around defending against getting service usage for free that in some way, shape or form, benefits the attacker. The easy example of that would be mining cryptocurrency, which is just super-economic as long as you use someone else's AWS account to do it. Whereas a lot of my vectors are, “Yeah, ignore all of that. How do I just make the bill artificially high? What can I do to misuse data transfer? And passing a single gigabyte through, how much can I make that per gigabyte cost be?” And, “Oh, circular replication and the Lambda invokes itself pattern,” and basically every bad architectural decision you can possibly make only this time, it's intentional.And that shines some really interesting light on it. And I have to give credit where due, a lot of that didn't come from just me sitting here being sick and twisted nearly so much as it did having seen examples of that type of misconfiguration—by mistake—in a variety of customer accounts, most confidently my own because it turns out that the way I learn things is by screwing them up first.James: Yeah, you've touched on a couple of different things in there. So, you know, maybe the first one is, I typically try to draw a line between fraud and abuse. And fraud is essentially trying to spend somebody else's money to get something for free. And we spent a lot of time trying to shut that down, and we're getting really good at catching it. And then abuse is either intentional or unintentional. There's intentional abuse: You find a chink in our armor and you try to take advantage of it.But much more commonly is unintentional abuse. It's not really abuse, you know. Abuse has very negative connotations, but it's unintentionally setting something up so that you run up a much larger bill than you intended. And we have a number of different internal efforts, and we're working on a bunch more this year, to try to catch those early on because one of my personal goals is to minimize the frequency with which we surprise customers. And the least favorite kind of surprise for customers is a [laugh] large bill. And so, what you're talking about there is, in a sufficiently complex system, there's always going to be weaknesses and ways to get yourself tied up in knots.We're trying both at the service team level, but also within my teams to try to find ways to make it as hard as possible to accidentally do that to yourself and then catch when you do so that we can stop it. And even more on the intentional abuse side of things, if somebody's found a way to do something that's problematic for our services, then you know, that's pretty much on us. But we will often reach out and engage with whoever's doing and try to understand what they're trying to do and why. Because often, somebody's trying to do something legitimate, they've got a problem to solve, they found a creative way to solve it, and it may put strain on the service because it's just not something we designed for, and so we'll try to work with them to use that to feed into either new services, or find a better place for that workload, or just bolster what they're using. And maybe that's something that eventually becomes a fully-fledged feature that we offer the customers. We're always open to learning from our customers. They have found far more creative ways to get really cool things done with our services than we've ever imagined. And that's true today.Corey: I mean, most of my service criticisms come down to the fact that you have more-or-less built a very late model, high performing iPad, and I'm out there complaining about, “What a shitty hammer this thing is, it barely works at all, and then it breaks in my hand. What gives?” I would also challenge something you said a minute ago that the worst day for some customers is to get a giant surprise bill, but [unintelligible 00:13:53] to that is, yeah, but, on some level, that kind of only money; you do have levers on your side to fix those issues. A worse scenario is you have a customer that exhibits fraud-like behavior, they're suddenly using far more resources than they ever did before, so let's go ahead and turn them off or throttle them significantly, and you call them up to tell them you saved them some money, and, “Our Superbowl ad ran. What exactly do you think you're doing?” Because they don't get a second bite at that kind of Apple.So, there's a parallel on both sides of this. And those are just two examples. The world is full of nuances, and at the scale that you folks operate at. The one-in-a-million events happen multiple times a second, the corner cases become common cases, and I'm surprised—to be direct—how little I see you folks dropping the ball.James: Credit to all of the teams. I think our secret sauce, if anything, really does come down to our people. Like, a huge amount of what you see as hopefully relatively consistent, good execution comes down to people behind the scenes making sure. You know, like, some of it is software that we built and made sure it's robust and tested to scale, but there's always an element of people behind the scenes, when you hit those edge cases or something doesn't quite go the way that you planned, making sure that things run smoothly. And that, if anything, is something that I'm immensely proud of and is kind of amazing to watch from the inside.Corey: And, on some level, it's the small errors that are the bigger concern than the big ones. Back a couple years ago, when they announced GP3 volumes at re:Invent, well, great, well spin up a test volume and kick the tires on it for an hour. And I think it was 80 or 100 gigs or whatnot, and the next day in the bill, it showed up as about $5,000. And it was, “Okay, that's not great. Not great at all.” And it turned out that it was a mispricing error by I think a factor of a million.And okay, at least it stood out. But there are scenarios where we were prepared to pay it because, oops, you got one over on us. Good job. That's never been the mindset I've gotten about AWS's philosophy for pricing. The better example that I love because no one took it seriously, was a few years before that when there was a LightSail bug in the billing system, and it made the papers because people suddenly found that for their LightSail instance, they were getting predicted bills of $4 billion.And the way I see it, you really only had to make that work once and then you've made your numbers for the year, so why not? Someone's going to pay for it, probably. But that was such out-of-the-world numbers that no one saw that and ever thought it was anything other than a bug. It's the small pernicious things that creep in. Because the billing system is vast; I had no idea when I started working with AWS bills just how complicated it really was.James: Yeah, I remember both of those, and there's something in there that you touched on that I think is really important. That's something that I realized pretty early on at Amazon, and it's why customer obsession is our flagship leadership principle. It's not because it's love and butterflies and unicorns; customer obsession is key to us because that's how you build a long-term sustainable business is your customers depend on you. And it drives how we think about everything that we do. And in the billing space, small errors, even if there are small errors in the customer's favor, slowly erode that trust.So, we take any kind of error really seriously and we try to figure out how we can make sure that it doesn't happen again. We don't always get that right. As you said, we've built an enormous, super-complex business to growing really quickly, and really quick growth like that always acts as kind of a multiplier on top of complexity. And on the pricing points, we're managing millions of pricing points at the moment.And our tools that we use internally, there's always room for improvement. It's a huge area of focus for us. We're in the beginning of looking at applying things like formal methods to make sure that we can make very hard guarantees about the correctness of some of those. But at the end of the day, people are plugging numbers in and you need as many belts and braces as possible to make sure that you don't make mistakes there.Corey: One of the things that struck me by surprise when I first started getting deep into this space was the fact that the finalized bill was—what does it mean to have this be ‘finalized?' It can hit the Cost and Usage Report in an S3 bucket and it can change retroactively after the month closed periodically. And that's when I started to have an inkling of a few things: Not just the sheer scale and complexity inherent to something like the billing system that touches everything, but the sheer data retention stories where you clearly have to be able to go back and reconstruct a bill from the raw data years ago. And I know what the output of all of those things are in the form of Cost and Usage Reports and the billing data from our client accounts—which is the single largest expense in all of our AWS accounts; we spent thousands and thousands and thousands of dollars a year just on storing all of that data, let alone the processing piece of it—the sheer scale is staggering. I used to wonder why does it take you a day to record me using something to it's showing up in the bill? And the more I learned the more it became a how can you do that in only a day?James: Yes, the scale is actually mind-boggling. I'm pretty sure that the core of our billing system is—I'm reasonably confident it's the largest or one of the largest data processing systems on the planet. I remember pretty early on when I joined Commerce Platform and was still starting to wrap my head around some of these things, Googling the definition of quadrillion because we measured the number of metering events, which is how we record usage in services, on a daily basis in the quadrillions, which is a billion billions. So, it's just an absolutely staggering number. And so, the scale here is just out of this world.That's saying something because it's not like other services across AWS are small in their own right. But I'm still reasonably sure that being one of a handful of services that is kind of at the nexus of AWS and kind of deals with the aggregate of AWS's scale, this is probably one of the biggest systems on the planet. And that shows up in all sorts of places. You start with that input, just the sheer volume of metering events, but that has to produce as an output pretty fine-grained line item detailed information, which ultimately rolls up into the total that a customer will see in their bill. But we have a number of different systems further down the pipeline that try to do things like analyze your usage, make sensible recommendations, look for opportunities to improve your efficiency, give you the ability to slice and dice your data and allocate it out to different parts of your business in whatever way it makes sense for your business. And so, those systems have to deal with anywhere from millions to billions to recently, we were talking about trillions of data points themselves. And so, I was tangentially aware of some of the scale of this, but being in the thick of it having joined the team really just does underscore just how vast the systems are.Corey: I think it's, on some level, more than a little unfortunate that that story isn't being more widely told, more frequently. Because when Commerce Platform has job postings that are available on the website, you read it and it's very vague. It doesn't tend to give hard numbers about a lot of these things, and people who don't play in these waters can easily be forgiven for thinking the way that you folks do your job is you fire up one of those 24 terabyte of RAM instances that—you know, those monstrous things that you folks offer—and what do you do next? Well, Microsoft Excel. We have a special high memory version that we've done some horse-trading with our friends over at Microsoft for.It's, yeah, you're several steps beyond that, at this point. It's a challenging problem that every one of your customers has to deal with, on some level, as well. But we're only dealing with the output of a lot of the processing that you folks are doing first.James: You're exactly right. And a big focus for some of my teams is figuring out how to help customers deal with that output. Because even if you're talking about couple of orders of magnitude reduction, you're still talking about very large numbers there. So, to help customers make sense of that, we have a range of tools that exist, we're investing in.There's another dimension of complexity in the space that I think is one that's also very easy to miss. And I think of it as arbitrary complexity. And it's arbitrary because some of the rules that we have to box within here are driven by legislative changes. As you operate more and more countries around the world, you want to make sure that we're tax compliant, that we help our customers be tax compliant. Those rules evolve pretty rapidly, and Country A may sit next to Country B, but that doesn't mean that they're talking to one another. They've all got their own ideas. They're trying to accomplish r—00:22:47Corey: A company is picking up and relocating from India to Germany. How do we—James: Exactly.Corey: —change that on the AWS side and the rest? And it's, “Hoo boy, have you considered burning it all down and filing an insurance claim to start over?” And, like, there's a lot of complexity buried underneath that that just doesn't rise to the notice of 99% of your customers.James: And the fact that it doesn't rise to the notice is something that we strive for. Like, these shouldn't be things that customers have to worry about. Because it really is about clearing away the things that, as far as possible, you don't want to have to spend time thinking about so that you can focus on the thing that your business does that differentiates you. It's getting rid of that undifferentiated heavy lifting. And there's a ton of that in this space, and if you're blissfully unaware of it, then hopefully that means that we're doing our job.Corey: What I'm, I think, the most surprised about, and I have been for a long time. And please don't take this as an insult to various other folks—engineers, the rest, not just in other parts of AWS but throughout the other industry—but talking to the people who work within Commerce Platform has always been just a fantastic experience. The caliber of people that you have managed to attract and largely retain—we don't own people, they do matriculate out eventually—but the caliber of people that you've retained on your teams has just been out of this world. And at first, I wondered, why are these awesome people working on something as boring and prosaic as billing? And then I started learning a little bit more as I went, and, “Oh, wow. How did they learn all the stuff that they have to hold in their head in tension at once to be able to build things like this?” It's incredibly inspiring just watching the caliber of the people that you've been able to bring in.James: I've been really, really excited joining this team, as I've gotten other folks on the team because there's some super-smart people here. But what's really jumped out to me is how committed the team is. This is, for the most part, a team that has been in the space for many years. Many of them have—we talk about boomerangs, folks who live AWS, go spend some time somewhere else and come back and there's a surprisingly high proportion of folks in Commerce Platform who have spent time somewhere else and then come back because they enjoy the space, they find that challenging, folks are attracted to the ability to have an impact because it is so foundational. But yeah, there's a super-committed core to this team. And I really enjoy working with teams where you've got that because then you really can take the long view and build something great. And I think we have tons of opportunities to do that here.Corey: It sounds ridiculous, but I've reached out to team members before to explain two-cent variances in my bill, and never once have I been confronted with a, “It's two cents. What do you care?” They understand the requirement that these things be accurate, not just, “Eh, take our word for it.” And also, frankly, they understand that two cents on a $20 bill looks a little different on a $20 million bill. So yeah, let us figure out if this is systemic or something I have managed to break.It turns out the Cost and Usage Report processing systems don't love it when there's a cost allocation tag whose name contains an emoji. Who knew? It's the little things in life that just have this fun way of breaking when you least expect it.James: They're also a surprisingly interesting problem. So like, it turns out something as simple as rounding numbers consistently across a distributed system at this scale, is a non-trivial problem. And if you don't, then you do get small seventh or eighth decimal place differences that add up to something that then shows up as a two-cent difference somewhere. And so, there's some really, really interesting problems in the space. And I think the team often takes these kinds of things as a personal challenge. It should be correct, and it's not, so we should go make sure it is correct. The interesting problems abound here, but at the end of the day, it's the kind of thing that any engineering team wants to go and make sure it's correct because they know that it can be.Corey: This episode is sponsored in parts by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years. And now EnterpriseDB has you covered wherever you deploy PostgreSQL on premises, private cloud, and they just announced a fully managed service on AWS and Azure called BigAnimal, all one word. Don't leave managing your database to your cloud vendor because they're too busy launching another half dozen manage databases to focus on any one of them that they didn't build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money, they can even help you migrate legacy applications, including Oracle, to the cloud.To learn more, try BigAnimal for free. Go to biganimal.com/snark, and tell them Corey sent you.Corey: On the one hand, I love people who just round and estimate—we all do that, let's be clear; I sit there and I back-of-the-envelope everything first. But then I look at some of your pricing pages and I count the digits after the zeros. Like, you're talking about trillionths of a dollar on some of your pricing points. And you add it up in the course of a given hour and it's like, oh, it's $250 a month, most months. And it's you work backwards to way more decimal places of precision than is required, sometimes.I'm also a personal fan of the bill that counts, for example, number of Route 53 zones. Great. And it counts them to four decimal places of precision. Like, I don't even know what half of it Route 53 zone is at this point, let alone something to, like, ah the 1,000th of the zone is going to cause this. It's all an artifact of what the underlying systems are.Can you by any chance shed a little light on what the evolution of those systems has been over a period of time? I have to imagine that anything you built in the early days, 16 years ago or so from the time of this recording when S3 launched to general availability, you probably didn't have to worry about this scope and scale of what you do, now. In fact, I suspect if you tried to funnel this volume through S3 back then, the whole thing would have collapsed under its own weight. What's evolved over the time that you had the billing system there? Because changes come slowly to your environment. And frankly, I appreciate that as a customer. I don't like surprising people in finance.James: Yeah, you're totally right. So, I joined the EC2 team as an engineer myself, some 16 years ago, and the very first thing that I did was our billing integration. And so, my relationship with the Commerce Platform organization—what was the billing team way back when—it goes back over my entire career at AWS. And at the time, the billing team was similar, you know, [unintelligible 00:28:34] eight people. And that was everything. There was none of the scale and complexity; it was all one system.And much like many of our biggest, oldest services—EC2 is very similar, S3 is as well—there's been significant growth over the last decade-and-a-half. A lot of that growth has been rapid, and rapid growth presents its own challenges. And you live with decisions that you make early on that you didn't realize were significant decisions that have pretty deep implications 15 years later. We're still working through some of those; they present their own challenges. Evolving an existing system to keep up with the growth of business and a customer base that's as varied and complex as ours is always challenging.And also harder but I also think more fun than a clean sheet redo at this point. Like, that's a great thought exercise for, well, if we got to do this again today, what would we do now that we've learned so much over the last 15 years? But there's this—I find it personally fascinating challenge with evolving a live system where it's like, “No, no, like, things exist, so how do we go from there to where we want to be next?”Corey: Turn the billing system off for 18 months, rebuild—James: Yeah. [laugh].Corey: The whole thing from first principles. Light it up. I'm sure you'd have a much better billing system, and also not a company left anymore.James: [laugh]. Exactly, exactly. I've always enjoyed that challenge. You know, even prior to AWS, my previous careers have involved similar kinds of constraints where you've got a live system, or you've got an existing—in the one case, it was an existing SDK that was deployed to tens of thousands of customers around the world, and so backwards compatibility was something that I spent the first five years of my career thinking about it way more detail than I think most people do. And it's a very similar mindset. And I enjoy that challenge. I enjoy that: How do I evolve from here to there without breaking customers along the way?And that's something that we take pretty seriously across AWS. I think SimpleDB is the poster child for we never turn things off. But that applies equally to the services that are maybe less visible to customers, and billing is definitely one of them. Like, we don't get to switch stuff off. We don't get to throw things away and start again. It's this constant state of evolution.Corey: So, let's say that I were to find a way to route data through a series of two Managed NAT Gateways and then egress to internet, and the sheer density of the expense of that traffic tears a hole in the fabric of space-time, it goes back 15 years ago, and you can make a single change to how the billing system was built. What would it be? What pisses you off the most about the current constraints that you have to work within or around?James: I think one of the biggest challenges we've got, actually, is the concept of an account. Because an account means half-a-dozen different things. And way back, when it seemed like a great idea, you just needed an account; an account was your customer, and it was the same thing as the boundary that you put all your resources inside. And of course, it's the same thing that you're going to roll all of your usage up and issue a bill against. And that has been one of the areas that's seen the most evolution and probably still has a pretty long way to go.And what's interesting about that is, that's probably something we could have seen coming because we watched the retail business go through, kind of, the same evolution because they started with, well, a customer is a customer is a customer and had to evolve to support the concept of sellers and partners. And then users are different than customers, and you want to log in and that's a different thing. So, we saw that kind of bifurcation of a single entity into a wide range of different related but separate entities, and I think if we'd looked at that, you know, thought out 15 years, then yeah, we could probably have learned something from that. But at the same time, when AWS first kicked off, we had wild ambitions for it, but there was no guarantee that it was going to be the monster that it is today. So, I'm always a little bit reluctant to—like, it's a great thought exercise, but it's easy to end up second-guessing a pretty successful 15 years, so I'm always a little bit careful to walk that line. But I think account is one of the things that we would probably go back and think about a little bit more.Corey: I want to be very clear with this next question that it is intentionally setting up a question I suspect you get a lot. It does not mirror my own thinking on the matter even slightly, but I get a version of it myself all the time. “AWS bills, that sounds boring as hell. Why would you choose to work on such a thing?” Now, I have a laundry list of answers to that aren't nearly as interesting as I suspect yours are going to be. What makes working on this problem space interesting to you?James: There's a bunch of different things. So, first and foremost, the scale that we're talking about here is absolutely mind-blowing. And for any engineer who wants to get stuck into problems that deal with mind-blowingly large volumes of data, incredibly rich dimensions, problems where, honestly, applying techniques like statistical reasoning or machine learning is really the only way to chip away at it, that exists in spades in the space. It's not always immediately obvious, and I think from the outside, it's easy to assume this is actually pretty simple. So, the scale is a huge part of that.Corey: “Oh, petabytes. How quaint.”James: [laugh]. Exactly. Exactly I mean, it's mind-blowing every time I see some of the numbers in various parts of the Commerce Platform space. I talked about quadrillions earlier. Trillions is a pretty common unit of measure.The complexity that I talked about earlier, that's a result of external environments is another one. So, imposed by external entities, whether it's a government or a tax authority somewhere, or a business requirement from customers, or ourselves. I enjoy those as well. Those are different kinds of challenge. They really keep you on your toes.I enjoy thinking of them as an engineering problem, like, how do I get in front of them? And that's something we spend a lot of time doing in Commerce Platform. And when we get it right, customers are just unaware of it. And then the third one is, I personally am always attracted to the opportunity to have an impact. And this is a space where we get to hopefully positively impact every single customer every day. And that, to me is pretty fulfilling.Those are kind of the three standout reasons why I think this is actually a super-exciting space. And I think it's often an underestimated space. I think once folks join the team and sort of start to dig in, I've never heard anybody after they've joined, telling me that what they're doing is boring. Challenging, yes. Is frustrating, sometimes. Hard, absolutely, but boring never comes up.Corey: There's almost no service, other than IAM, that I can think of that impacts every customer simultaneously. And it's easy for me to sit in the cheap seats and say, “Oh, you should change this,” or, “You should change that.” But every change you have is so massive in scale that it's going to break a whole bunch of companies' automations around the bill processing in different ways. You have an entire category of user persona who is used to clicking a certain button in this certain place in the console to generate the report every month, and if that button moves or changes color, or has a different font, suddenly that renders their documentation invalid, and they're scrambling because it's not their core competency—nor should it be—and every change you make is so constricted, just based upon all the different concerns that you've got to be juggling with. How do you get anything done at all? I find that to be one of the most impressive aspects about your organization, bar none.James: Yeah, I'm not going to lie and say that it isn't a challenge, but a lot of it comes down to the talent that we have on the team. We have a super-motivated, super-smart, super-engaged team, and we spend a lot of time figuring out how to make sure that we can keep moving, keep up with the business, keep up with a world that's getting more complicated [laugh] with every passing day. So, you've kind of hit on one of the core challenges there, which is, how do we keep up with all of those different dimensions that are demanding an increasing amount of engineering and new support and new investment from us, while we keep those customers happy?And I think you touched on something else a little bit indirectly there, which is, a lot of our customers are actually pretty technical across AWS. The customers that Commerce Platform supports, are often the least technical of our customers, and so often need the most help understanding why things are the way they are, where the constraints are.Corey: “A big bill from Amazon. How many books did you people buy last month?”—James: [laugh]. Exactly.Corey: —is still very much level of understanding in some cases. And it's not because they're dumb; far from it. It's just, imagine that some people view there as being more to life than understanding the nuances and intricacies of cloud computing. How dare they?James: Exactly. Who would have thought?Corey: So, as you look now over all of your domain, such as it is, what sucks the most? What are you looking to fix as far as impactful changes that the rest of the world might experience? Because I'm not going to accept one of those questions like, “Oh, yeah, on the back-end, we have this storage subsystem for a tertiary thing that just annoys me because it wakes us up once in a whi”—no, no, I want something customer-facing. What's the painful thing you're looking at fixing next?James: I don't like surprising customers. And free tier is, sort of, one of those buckets of surprises, but there are others. Another one that's pretty squarely in my sights is, whether we like it or not, customer accounts get compromised. Usually, it's a password got reused somewhere or was accidentally committed into a GitHub repository somewhere.And we have pretty established, pretty effective mechanisms for finding all of those, we'll scan for passwords and credentials, and alert customers to those, and help them correct that pretty quickly. We're also actually pretty good at detecting when an account does start to do something that suggests that it's been compromised. Usually, the first thing that a compromised account starts to do is cryptocurrency mining. We're pretty quick to catch those; we catch those within a matter of hours, much faster most days.What we haven't really cracked and where I'm focused at the moment is getting back to the customer in a way that's effective. And by that I mean specifically, we detect an account compromised super-quickly, we reach out automatically. And so, you know, a customer has got some kind of contact from us usually within a couple of hours. It's not having the effect that we need it to. Customers are still being surprised a month later by a large bill. And so, we're digging into how much of that is because they never saw the contact, they didn't know what to do with the contact.Corey: It got buried with all the other, “Hey, we saw you spun up an S3 bucket. Have you heard of what S3 is?” Again, that's all valuable, but you have 300-some-odd services. If you start doing that for every service, you're going to hit mail sending limits for Gmail.James: Exactly. It's not just enough that we detect those and notify customers; we have to reduce the size of the surprise. It's one thing to spend 100 bucks a month on average, and then suddenly find that your spend has jumped $250 because you reused the password somewhere and somebody got ahold of it and it's cryptocurrency-mining your account. It's a whole different ballgame to spend 100 bucks a month and then at the end of the month discover that your bill is suddenly $2,000 or $20,000. And so, that's something that I really wanted to make some progress on this year. Corey: I've really enjoyed our conversation. If people want to learn more about how you view these things, how you're approaching some of these problems, or potentially are just the right kind of warped to consider joining up, where's the best place for them to go?James: They should drop me an email at email@example.com. That is the most direct way to get hold of me, and I promise I will get back to you. I try to stay on top of my email as much as possible. But that will come straight to me, and I'm always happy to talk to folks about the space, talk to folks about opportunities in this team, opportunities across AWS, or just hear what's not working, make sure that it's something that we're aware of and looking at.Corey: Throughout Amazon, but particularly within Commerce Platform, I've always appreciated the response of, whenever I report something, no matter how ridiculous it is—and I assure you there's an awful lot of ridiculousness in my bug reports—the response has always been the same: “Tell me more. Help me understand what it is you're trying to achieve—even if it is ridiculous—so we can look at this and see what is actually going on.” Every Amazonian team has been great about that or you're not at Amazon very long, but you folks have taken that to an otherworldly level. I just want to thank you for doing that.James: I appreciate you for calling that out. We try, you know, we really do. We take listening to our customers very seriously because, at the end of the day, that's what makes us better, and that's how we make sure we're in it for the long haul.Corey: Thanks once again for being so generous with your time. I really appreciate it.James: Yeah, thanks for having me on. I've enjoyed it.Corey: James Greenfield, VP of Commerce Platform at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment—possibly on YouTube as well—about how you aren't actually giving this five-stars at all; you have taken three trillions of a star off of the rating.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
On The Cloud Pod this week, Peter's been suspended without pay for two weeks for not filing his vacation requests in triplicate. Plus it's earnings season once again, there's a major Google and SWIFT collaboration afoot, and MSK Serverless is now generally available, making Kafka management fairly hassle-free. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
An airhacks.fm conversation with Goran Opacic (@goranopacic) about: sales force automation at ehsteh, Palm Pilot syncing, starting a SaS company, hetzner, Azure, then AWS, running EC2 machines, going serverless, kubernetes and the clouds, running MicroProfile applications on Quarkus and AWS Lambda, one code base - multiple lambdas, Lambda runs on Firecracker VM, OkHTTP on Lambdas, tree shaking with GraalVM, AWS CodeArtifact to cache Maven repositories, Amazon ECR, AWS CodeCommit, databases are hard to split, AWS CodeDeploy with scheduler, code hot swap, managed services is serverless, running AWS Fargate on spot intances, using Eclipse BIRT on AWS Lambda, Goran is AWS Data Hero, Goran Opacic on twitter: @goranopacic, Goran's blog: madabout.cloud
Building Your Own FreeBSD-based NAS, Writing a device driver for Unix V6, EC2: What Colin Percival's been up to, Beckhoff releases TwinCAT/BSD Hypervisor, Writing a NetBSD kernel module, and more. NOTES This episode of BSDNow is brought to you by Tarsnap (https://www.tarsnap.com/bsdnow) and the BSDNow Patreon (https://www.patreon.com/bsdnow) Headlines Building Your Own FreeBSD-based NAS (https://klarasystems.com/articles/building-your-own-freebsd-based-nas-with-zfs/) Writing a device driver for Unix V6 (https://mveg.es/posts/writing-a-device-driver-for-unix-v6/) News Roundup FreeBSD/EC2: What I've been up to (https://www.daemonology.net/blog/2022-03-29-FreeBSD-EC2-report.html) Beckhoff has released its TwinCAT/BSD Hypervisor (https://www.automationworld.com/control/article/22144694/beckhoff-hypervisor-enables-virtual-machines-for-control-applications) Writing a NetBSD kernel module (https://saurvs.github.io/post/writing-netbsd-kern-mod/) Benedicts Git Finds Projects Run anything (like full blown GTK apps) under Capsicum (https://github.com/unrelentingtech/capsicumizer) Twitter client for UEFI (https://github.com/arata-nvm/mitnal) n³ The unorthodox terminal file manager (https://github.com/jarun/nnn) OpenVi: Portable OpenBSD vi for UNIX systems (https://github.com/johnsonjh/OpenVi) Gists and Articles Step-by-step instructions on installing the latest NVIDIA drivers on FreeBSD 13.0 and above (https://gist.github.com/Mostly-BSD/4d3cacc0ee2f045ed8505005fd664c6e) FreeBSD SSH Hardening (https://gist.github.com/koobs/e01cf8869484a095605404cd0051eb11) GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems (https://gtfobins.github.io) Tarsnap This weeks episode of BSDNow was sponsored by our friends at Tarsnap, the only secure online backup you can trust your data to. Even paranoids need backups. Feedback/Questions Ben - Backing Up (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Ben%20-%20Backing%20Up.md) Ethan - Thanks (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Ethan%20-%20Thanks.md) Maxi - question about note taking (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Maxi%20%20-%20question%20about%20note%20taking.md) Send questions, comments, show ideas/topics, or stories you want mentioned on the show to firstname.lastname@example.org (mailto:email@example.com) ***
About Averywvdial, bup, sshuttle, netselect, popularity-contest, redo, gfblip, GFiber, and now @Tailscale doing WireGuard mesh. Top search result for "epic treatise."Links Referenced: Webpage: https://tailscale.com Tailscale Twitter: https://twitter.com/tailscale Personal Twitter: https://twitter.com/apenwarr TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Generally, at the start of these shows, I mention something about money. When I have a promoted guest, which means that they are sponsoring this episode, I talk about that. This is not that moment. There's no money changing hands here.And in fact, I'm about to talk about a product that I am a huge fan of, but I'm, also as of this recording, not paying for. So, one might think I'm the product, but no. Let's actually start by talking about money. My guest today is Avery Pennarun, the CEO of Tailscale, and as of today, being the day that this goes out, you folks have just raised $100 million in a Series B. First, thank you for joining me, followed immediately by congratulations.Avery: It's great to be here, and thank you. It's an exciting announcement that I hope we don't end up spending too much time talking about because money is a lot more boring than technology. But yeah, we are very happy, both to be here and to be making the announcement.Corey: Yeah. CRV and Insight Partners are the lead investors on the round. And it's great to see because I've been using Tailscale for a while now. And it is a transformative experience for the way that I think about these things. A while back, I wrote a Lambda layer that lets Lambda functions take advantage of it, but in fairness, I did write it, so anyone looking at that should—“Haha, that's why you're not a developer full-time. You're bad at it.” Yes, I am.But I can't stop raving about how useful Tailscale is, with the counterpoint that it's also very difficult to explain to people who are not—at least in my experience—broken in a very particular way, as I am. What is Tailscale? And what does it do?Avery: Right. Well, I mean, first of all, one of the things I really like about Tailscale and what we built is that, you know, even if you're not a super great developer—like you just described yourself—you can get excited about it, you can use it for things, you can build on top of it, and contribute back without having to understand every single little detail of what it does, right? Tailscale is something that a lot of people get excited about without having to know how it works; they just know what it gives them, right? The answer to what Tailscale is, is sort of… it can be hard to explain to people who don't know about the kinds of problems that it solves, but the super short answer is it connects all of your devices and virtual machines and containers to each other, wherever they are, without going through an intermediary, right? So, it minimizes latency and it maximizes throughput, and it minimizes pain. And it sounds like that should be hard, but you can get it all done in, like, five minutes.Corey: I have been using it for a while now. Originally, I was using it and federating through it I believe, via Google. I rebuilt and tore down the entire network in about five minutes, instead started federating through GitHub. Nowadays, you apparently changed your position on that identity and you use third-party SSL sources, as well as retaining user information and login stuff yourselves, which is just, it's almost starved for choice, on some level. But I am such a fan of the product that if you'll forgive me if I talk for about a minute or so on how I use it and my experience of it.Avery: Go for it.Corey: So, I wind up firing up Tailscale, and I have a network that from any of my devices, I can talk to any other. I have a couple of EC2 machines hanging out in AWS, I have a Raspberry Pi that I use as a DNS server sitting in the other room, I have my iPad, I have my iPhone, I have my laptop, I have my desktop, I have a VM sitting over in Google Cloud, I have a different VM sitting over an Oracle Cloud. And all of these things can talk to each other directly over a secured network. I can override DNS and talk to these things just by the machine name, I can talk to them via the address that winds up being passed out to them through this. It is transformative. It works on IPv4, IPv6, if I'm on a network without IPv6 access using Tailscale, suddenly I can.I can emerge from almost any other node on this network. And adding a new device to this is effectively opening a link in a browser on either that device or a different one, clicking approve once I log in, and it's done. That is my experience of it, so far. Is that directionally correct as far as how you think about the product? Because again, I use DNS TXT records as a database for God's sake. I am probably not the world's foremost technical authority on the proper use of things.Avery: Right. Yeah. I mean, that's a good description of what it does. I think it actually—it's weird, right? It's hard to get across in words just how simple it is, right?That one-minute description used a bunch of technical-sounding terminology that probably the listeners to your podcast will understand. But, like, the average tech person doesn't need to know any of those things in order to use Tailscale, right? You download it from the app store on your phone and your laptop. And you install Tailscale on both from the App Store. You log into your Google account or your GitHub account, and that's it. Those two devices are tied together in time and space; they can see each other. You can access a web server that you're running on your laptop from your phone without doing anything else, right?And then you can start a VM in AWS and you load Tailscale in there, and now that's part of your network. And so, there's—you don't need to know what IPv4 and IPv6 even are. You don't need to know what DNS even is. It just, you know, the magic sort of comes together. We do a ton of stuff behind the scenes to make that magic work. But it's this —one thing that one customer said to us one time is, like, “It makes the internet work the way you thought the internet worked until you learned how the internet worked.” If that makes sense.Corey: Right. It basically works on duct tape and toothpicks all spit together, and it's amazing that it works at all. I mean, this is going to sound relatively banal, but the way that I've used Tailscale the most is on my phone or on my iPad or on my Mac. I will connect to the Tailscale network by default, and when that is done, it passes out my pi-hole's IP address as the custom DNS server for the entire network. So, I don't see a whole bunch of ads, not just in browser, but in apps and the rest.And every once in a while when something is broken because an ad server is apparently critical to something, great, I turn off the VPN on that device, use the natural stuff. My experience of the internet gets worse as a result and the thing starts working again, then I turn it back on. It is more or less the thing that I use as a very strange-looking ad blocker, in some respects, that I can toggle on and off with the click of a button. But it's magic, it is effectively magic. From the device side, it's open up an app and toggle a switch, or it is grab from the menu bar on a Mac, there's an application that runs and just click the connect button or the disconnect button.There is no MFA every time you connect. There is no type in a username and password. There is no lengthy handshake. I hit connect and it is connected by the time I have moved the mouse back from the menu bar to the application I was working in. Whenever I show this to someone who uses a corporate VPN, they don't believe me.Avery: Right. Yeah, exactly. It's hard to believe. It's like, “Hey, did anything actually happen here?” Because we removed you know, for example, it doesn't by default catch all your traffic, it only catches the traffic to your private network, so it's safe to leave it on all the time because it's not interfering with what you're doing.What you're describing is using Pi-Hole, which is a Raspberry Pi-based DNS server that is an ad blocker, most people using Pi-Hole have one at home, so when they're at home they get ads blocked, but when they leave home they don't get their ads blocked. If you add Tailscale to that, you can use your Pi-Hole even when you're not at home, and it sort of makes it that much more useful. I think an important difference from, say, other services that you can use an adblocker or a privacy VPN is that we never see your traffic, right? Tailscale creates a private network between you and all your personal devices, and that private network is private even from us, right? We help you connect the devices to each other, but when your traffic goes to Pi-Hole, it's your Pi-Hole. It's not our adblocker. It's your adblocker, right, so we never see what traffic you're going to, we never see what DNS names you're looking up because it was just never made available to us, right?Corey: Right. But did you do—the level of visibility you have into my network is fascinating in a variety of different ways, but it is also equally fascinating—one of those ways—is that how limited it is. You know what devices I have, the last time they've connected, the version of Tailscale they're running, an IP address on it, and you also wind up seeing what services are advertised and available on those networks if I decide to enable that. Which is great for things like development; I'm going to be doing development in a local dev sense on an EC2 instance somewhere. And well, I don't want to set up a tunnel with SSH to wind up having to proxy traffic over there just so I can wind up hitting some high port that I bound to, and I certainly don't want to expose that to the general internet; that is a worst practice for all these things.And Tailscale magically makes this go away. I haven't done this in much depth yet with a variety of my team members, but when you start working on this with teams who are doing development work, someone can have something running on their laptop and just seamlessly share it with their colleagues. It's transformative, especially in an area where very often that colleague is not sitting in the same room getting the greasy fingerprints on your laptop screen.Avery: Yep. Yeah, exactly. So, you mentioned the services list which you have to specifically opt into, and the reason we did that is that, you know, the list of devices and hostnames and IP addresses, we have to collect because that's how the service works, right? You send us the information about your devices, and then we send the public keys for those devices to the other devices. We can't get out of collecting that, whereas the services list is purely an interesting add-on feature, and we decided that we didn't want to collect that by default because it would make people nervous about their privacy.So, if you want that feature, you click it on; if you don't want it, don't turn it on, you can still share services with people inside your network; they just need to know that those services exist. You send them the URL or whatever and it'll work, but it doesn't show up as a list of things that we can see in that case. But yeah, sharing stuff between your coworkers is definitely… is a major use case for Tailscale and dev and infrastructure teams in particular. Like, you can—designers, for example, run a test version of the website on their laptop, and then they say, “Hey, visit this URL on my laptop.” And you don't have to be in the same office, you can both be sitting in different cafes in different cities. Tailscale will make it so that the connection between those two computers still works, even if they're both behind firewalls, even if they're both behind different NATs, and so on.Corey: One of the things that astounded me the most; I am reluctant to completely trust things that are new that touch the network. Early on in my career, I made network engineering mistake 101, which is making a change to the firewall in your data center without having another way in. And the drive across town or calling remote hands to get them to let you back in and when you locked things out. Because you folks are building these things on a pretty consistent clip; there are a lot of updates and releases across all of the platforms. And invariably, I find myself on some devices version behind or so, just because of the pace of innovation. “Oh, great. We're updating the VPN client. Cool. So, I'm going to expect this thing to drop and I'm going to have to go in and jigger it to get it working again.”That has never happened. I have finally given in to, I guess, the iron test of this, and I have closed SSH from the internet to most of these nodes. In fact, some of them sit —the Pi-Hole sitting at home, if you're not on my home network, there is no outside way in without breaking in. It is absolutely one of those things that disappears into the background in a way that I was extraordinarily surprised to find.Avery: Right. Well, that is something—I mean, I'm old and grumpy, I guess, is sort of the beginning part of all this, right? I've seen all this annoying stuff that happens with software. And, you know, and many of us, in fact, at Tailscale are old and grumpy, and we just didn't want to repeat those same things. So, first of all, network stuff to an even stronger degree than virtually any other kind of product, if your network stops working, everything stops working, right, so it's number one priority that Tailscale has to not mess up your network.Because if it does, you instantly lose faith. There's kind of like—Tailscale gives you this magical feeling when you first install it, but that feeling of magic goes away very quickly the first time it screws something up and you can't connect when you really need to. So, we put a huge amount of work into making sure that you can connect when you really need to. We have a lot of automated tests. One of our policies that I think is almost unheard of is that we intend to never deprecate support for older versions of the Tailscale client.And to this day, we're about three years into Tailscale, we've never deprecated an old client that anybody is using. So eventually, people—though in fact hard to believe, but eventually, people do stop using some old versions, so those ones don't work anymore, necessarily. But any version of Tailscale that is in use today is going to keep working as long as anybody is using it. We have a very, very, very strong backwards compatibility policy. Because the worst thing that I can imagine is having some Raspberry Pi sitting out in the void somewhere that I haven't looked at for two years, that whoops, Tailscale broke it, and now I can't connect to it, and now I have to go drive down there and fix it, right? It would be just insultingly terrible for that to happen.And we just make sure that doesn't happen. Another thing that people get excited about is, like, on a Debian system or whatever, if you've got the Debian package installed, you can do an apt-get upgrade. Tailscale upgrades and even your SSH session doesn't drop. Every now and then people [comment and was like 00:14:13] —Corey: That was the weirdest part. I was expecting it to go away or hang for a long period of time. And sure, I guess it might drop a packet or so, I've never bothered to look because it is so seamless.Avery: Right. Yeah, exactly. It's just, like, “Wait. Did anything even happen?” It's like, “Yes”—Corey: Right—Avery: —“Something happened. We upgraded it out from underneath you.”Corey: —my next thing is [crosstalk 00:14:28]—yeah, I grep Tailscale on the process table. Like, okay, is this just a stale thing that's existing [unintelligible 00:14:34] to bounce it? No, it has just been started. It was so seamless under the hood that it was amazing. There is something that is—a lot of things have been very deeply right on this.Something else that I think is worth pointing out is that if any company had the brainpower there to roll their own crypto, it would be you folks, but you don't. You're riding on top of WireGuard, an open-source project that does full-mesh VPNs with terrible user interfaces.Avery: Yep. So, you know, I guess disclosure. Back in 1997 when I started my first startup, I was not smart enough to not roll my own crypto. And therefore the VPN I wrote at the time definitely had giant security holes. It was also not that popular, so nobody found them. But I, you know eventually I found [crosstalk 00:15:21]—Corey: “Except a bank, which I really shouldn't disclose.” Kidding, I'm kidding. But yeah.Avery: [laugh]. No, no, no. The bank never used that software. [laugh]. But yeah. Nowadays, I've been through a lot, and I… I would not describe myself as a security expert. Although people often describe me as a security expert. I don't know what that means. But I am enough of an expert to know that I should not be rolling my own crypto. And the people who invented WireGuard, it's one of the—I feel like I'm overstating things, but I'm not—it's one of the biggest leaps forward in cryptography, in probably the history of computing. Now, it builds on a series of things that are part of the same leap forward, right? It's built on the protocol that Signal uses called the Noise Protocol, right? Signal and Noise are built on the Ed25519 curve, made by —or popularized by Dan Bernstein who's a major cryptographer in this area. Sometimes popular, sometimes—Corey: Oh, djb.Avery: —not popular. Yeah, exactly.Corey: He also, near and dear to my heart, wrote djbdns, which was a well-known, widely deployed DNS server, by which I of course mean database. Please, continue.Avery: Yep. [laugh]. I've been a huge fan of basically everything djb has ever made in the history of—Corey: Oh, you're a qmail person. I am on the postfix side of [unintelligible 00:16:37].Avery: Yep. Well, my first startup back in 1997, we made Linux-based server appliances for small businesses. And we use qmail, we use djbdns, we used a couple of other djb products. And you know, for the history of that product—you know, leaving aside my VPN that was a security hole—the djb stuff never had a single problem. That company was eventually acquired by IBM.One of the first things IBM did is, like, “Whoa, djb has a super-weird software license. We can't be doing this. Let's replace it with software that has a decent license.” So, they dropped out djbdns and started using BIND. Within a week, there was a security hole in BIND that affected all of these appliances that they now controlled, right?So, djb is a very big-brained, super genius in security, whatever you might think of his personality. And it's sort of like was the basis for this revolution in cryptography that WireGuard has sort of brought to the networking world. And it's hard to overstate. Just, like, the number of lines of code, there's something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case.And that made it something really powerful to build on top of. Like, it's super hard for somebody like me to screw up the security of a WireGuard deployment, where it's very easy to screw up the security of an IPsec deployment.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: I just want to call something out as well, that when I say that you folks definitely have the intellectual firepower to roll your own crypto should you choose to do so, but you chose not to, if anything, I'm understating it. To be clear, one of the blog posts you had somewhat recently out was how you are maintaining what is effectively your own fork of the Go programming language. Which is one of those things when someone hears that it's like, “I'm sorry, can you say that again? Because I am almost certain I misunderstood something.” What is the high-level version of that?Avery: Well, there's, I think, two important points there. One of them is that yes, we did fork the Go programming language; it's supposed to be a temporary fork because it allows us to do some experiments with the go back-end. And the primary reason we were able to do that is because we employ a couple of people who used to be on the core Go team. And that was not because we went out looking for people who used to be on the core Go team, that's just how it worked out. But because we do, it's easier for them to fork Go than it would be for the average person, and in many ways, it's easier for them to get their job done by just continuing to work on the codebase they've already worked on.But the second point is actually, as compilers go, the Go compiler is probably the very easiest one I've ever seen to be able to fork and edit. Like it's super-clear code, you're just editing Go code, which is already pretty easy. But they really put a ton of work into making it readable and understandable. So, like, average people actually can fork the Go compiler and not be completely bamboozled by how difficult everything is, right? Compared to, like, GCC where just building the thing is something that takes you weeks to learn how to do, right, Go is just, like, you run this script and build your compiler [unintelligible 00:19:35]—Corey: Yeah. Let me clear this quarter on my schedule so I can go ahead and do that. Yeah, no, thank you.Avery: Yeah. I've built copies of GCC and it's absolutely nightmarish, right? And built people's forks of GCC for special embedded processors and stuff. And this is, like, a f—this is a career that you can specialize in, building GCC, right? There are people that do this, right? And the Go compiler, it's really—Corey: Well, it's 40 years of load-bearing technical debt.Avery: Yeah. Yeah. But the Go compiler. It's very nice; it's just a program that's written in Go, that compiles under Go, and then you end up with one binary, right? And as long as you have that binary, everything just works, right? And so, it's actually surprisingly easy to fork Go. I don't want to—you know, I wouldn't put that on the same level of difficulty as, like, not screwing up cryptography, if you're trying to do it yourself. [crosstalk 00:20:16]Corey: [crosstalk 00:20:16] their own crypto algorithm that they themselves can't defeat. Yeah, it turns out that basically, breaking crypto is a team sport. Who knew?Avery: Yeah. Exactly. Generally, with security, you have this problem a lot, right? It's a lot harder to build a system that nobody can break into, than it is to break into a random system, right? Because you know, the job of securing something against everybody is much harder than the job of finding something you can break into.Corey: So, I did have a question about something you said earlier, where one of the use cases—one of the design goals—is not to have a breaking change to a point where an old device cannot still connect to the private network. But you do have a key expiry for devices where a device needs to relog in, and it can be anywhere between 3 and 180 as I look at it. I don't know if some of the more enterprise-y options have longer options that they can set, but what happen—how do you not have to drive out to the back of beyond to re-authenticate that Raspberry Pi every six months?Avery: Ah. So, this is something, it's at the policy layer, and we have not finished refining this to perfection, I would say, right now. What we do have though, if your key does expire, there's a button in the admin panel to say, like, boost this device for a little bit longer. Sort of unexpire it for another 30 minutes—I don't remember what the—how much time it is—then you can SSH into the device and do a proper key refresh on it without actually having to drive out there. Now, we did for one version, accidentally break the key reactivation feature so that if the client noticed it's key is expired, it actually disconnected from the Tailscale network altogether and then didn't receive the message to, like, “Hey, could you please increase the length of your key?” That was fixable by power cycling it, which you could often get somebody to do without driving all the way out there. But we fixed that, so now that—Corey: “Have you tried turning it off and back on again,” is still a surprisingly effective way of troubleshooting something.Avery: Yeah, exactly. So, that wasn't—I mean, it was kind of annoying for some people. But yeah, the reason we use, by default, every key always expires is because unlimited time credentials are one of the worst security holes that people don't really acknowledge. Because technically, it'll never be the, like—you know, it'll never show up as the highest severity security hole that you have an unlimited time credential sitting in your home directory, but it is something that—well, I can tell a story. There is a company that I heard about that had you know—SSH keys are typically unlimited time credentials; the easiest way to do it is you run ssh-keygen, it puts something in your home directory, you copy the public key to all the devices you want to be able to log into, and then you never think about it again.So, this is a company that, of course, every developer in their company had done this; they had a production network with a bunch of SSH keys in it. Some not very ethical employee worked there, had keys in their production systems, and eventually got fired. Now, of course, this company had good processes in place, they went through all the devices and took out this person's public key from all the devices. What they didn't know is that during lunch one day, this person had gone around to all their coworkers' workstations that hadn't been locked, downloaded the private keys for those people on his—Corey: Oh no.Avery: —computer before he got fired. And so, shortly after he got fired, their entire production network got wiped out. Now, they didn't have enough forensics at the time to know how it all got wiped out, so they spent some time putting it all back in place, this time with forensics. About a month later—they rebuilt everything from scratch, all new public keys and everything. You couldn't possibly have any backdoors in this system, right?And then a month later, it all got wiped out again. This time, the forensics revealed and, like, it was one of the existing employees, coming from a different country, that had gotten into their private production network and wiped everything out. How did that happen? It was because this person had years earlier, downloaded all their public—or private keys when he wandered around through the office. You can fix this problem instantly, by just expiring your keys and forcing your rotation periodically, right?SSH doesn't make that very easy. You can with SSH setup, SSH certificate authentication, which is a huge ordeal to get configured, but once it's working, it solves this particular problem, right? Tailscale [crosstalk 00:24:19]—Corey: On Mac and iOS, there is a slight improvement to this that I'm a big fan of because I agree with you. I am lousy at rotating my keys, but there's an open-source project called Secretive that I use on the Mac that stores the private key in the Secure Enclave, which the Mac will not let out of it. And I have to use Touch ID to authenticate every time I want to connect to something. Which can get annoying from time to time, but there is no way for someone to copy that off. Historically, I would—Avery: That's true.Corey: Have a passphrase that was also tied to the key so if someone grabbed it off the disk, it still theoretically would not be usable. And that was—but again, that is an absolute vector that needs to be addressed and thought about. Key rotation is huge.Avery: And you have to go through this effort to sort it all out, right? So Tailscale, we just have this policy: We don't do unlimited length credentials; we do key rotation for everything, and we just sort of set different time limits for this rotation depending on how picky you want to be about it. But any key expiry is much, much better than no key expiry. Even if you set it to a six-month key expiry, you still have at least it's only the six-month window that somebody could theoretically reuse your keys. And we can also rotate keys behind the scenes and so on.So, in the SSH case, the way people use Tailscale, you stopped opening the SSH port to the world. You're only SSH when you're connected over Tailscale. The fact that your Tailscale keys rotate and expire over time is what protects your SSH session. So, you could keep using static SSH keys that never expire—don't try to figure out all this other complicated stuff, right—and you're still protected from these private SSH, like, unlimited length keys. Now, that said, for servers, Tailscale does have a button where you can say, like, “Please stop expiring the key.” This is a server, nobody's ever going to get physical access to the machine.The only thing we could do with the private key for this machine is allow other people to SSH into it, which is not very dangerous, right? It's pretty much, like, somebody stealing your SSH authorized keys file; like, it doesn't really matter. And for that case, you turn off the expiry altogether. But expiring keys is intended for use by, like, devices that employees are actually holding in their hands where if it expires, it's no big deal, you push the login button and it refreshes.Corey: There's something that is very nice about dealing with something that is just so sensible. I mean, we've all—at least in the olden days of running sysadmin stuff, we had this problem we would generate—or purchase back in those days—SSL certificates and, great, they expire to a year or so at the end of the year, people forget, and then it would expire you to run around fixing this. And the default knee-jerk response was that was awful. Let's get the next one for five years so we didn't have to think about it that long.And it's always a wildcard and so it gets put all over the place, and you wind up with these problems. One of the things that Let's Encrypt has done super well is forcing a rotation every 90 days so you know where it is. It's just often enough you want to automate it. And ACM, the AWS certificate manager that they use, takes a slightly different approach. It doesn't give you the private key; it embeds it in other places so they can handle the rotation themselves.And they start screaming in your email if they can't verify that it's time for renewal long before it hits. It's different approaches to the problem, but yeah, five years out, how should I know all the places the certificate has wound up in that intervening time? Most of the people who did it aren't there anymore. And one day, surprise, a website breaks, either because its SSL cert isn't working, or one of the back-end services it depends on suddenly doesn't have that working. It's become a mess, so having a forced modernity to these things is important.Avery: Right. It's forced modernity, and it's just basically, it's all behind the scenes. Like, you don't even think about the fact that Tailscale gave you a key because that is not relevant to your day-to-day life, right? You logged in, something happened, all these devices ended up on your network. What actually happened is that public and private keys—you know, a private key was generated, the public keys were distributed properly, things are getting rotated, but you don't have to care about all that stuff.So, it's fun that Tailscale is what we call secure by default, right? People love to use it because it's easier, it makes their life easier, but security teams like it because actually, it changes the default security posture from, like, “Ugh, I'm going to have to tell everybody to please stop doing these five things because it always creates security holes,” to like, “Whoa, the thing that they're going to do most naturally is actually going to be safe.” Right? I really like that about it. You're not thinking about certificates, but their certificates are getting rotated exactly as they should be.Corey: There's just something so nice about computers doing the heavy lifting for us. It's one of the weird things about Tailscale is it falls into a very strange spot where there is effectively zero maintenance burden on me, but I still use it to toggle it on or off in scenarios often enough to remember that it's there and that I'm using it. It is the perfect sweet spot of being somewhat close to top of mind, but never in a sense that is, “Oh, I got to deal with this freaking thing again.” It never feels that way. Logging into it, it has long-lived sessions at the browser, so it isn't one of those, ah, you have to go back to GitHub and re-authenticate and do all these other dog-and-pony show things. It just works. It is damn near a consumer-level of ease-of-use, start to finish. The hard part, of course, is how on earth you explain this to someone [laugh] without a background in this space.Avery: Yeah, exactly. It's something we ask ourselves sometimes is, like, well, you know, Tailscale is great for developers right now. It is easy enough to use, even for consumers, but, like, how would you explain it to consumers and find a good use case for consumers? And it's something that I think we are going to do eventually, but it hasn't been, up until now, a super high priority for us just because developers are this sort of like the core audience that we haven't even finished building a great product that does everything that they want, yet. There is one little feature in Tailscale that's the beginning of something that's consumer-friendly; it's called Taildrop.I don't know if you've seen this one. You can turn it on, and basically, it acts like AirDrop in Apple products, except you don't need to care about physical proximity and it works with every kind of device, not just Apple devices, right? So, you can add it as—it shows up in the share pane on your Mac OS or Windows or iOS device. You can use it from Linux, you just use it to send files of any type, and it sends them point to point not through a cloud provider so that we never see a copy of the file. It only goes between your devices over your encrypted network. So, that's something that consumers kind of like.Corey: Feels like Tailprint for Bonjour could wind up being another aspect of this as well. And I'm still hoping for something almost Ansible-like where run the following command, whether it's pre-approved or not, on a following subset of things. In my case, for example, it's, I would love it if it would just automatically, when I press the button, update Tailscale across all of the nodes that support it, namely the Linux boxes. I don't think you can trigger an App Store update from within a sandboxed app on iOS, but I've been—Avery: Right.Corey: Surprised before. Yeah. But it's nice to be able to do some things.Avery: Yeah. This is one of those—yeah, we get that request a lot for, like, can you push a button to auto-update Tailscale? It makes me really sad that we get this request because the need for this is a sign that all of the OS vendors have completely botched software updates, right? Like, the OS should be the thing, updating your software on a good schedule based on a set of rules, and it shouldn't be the job of every single application to provide their own software update. It's actually a massive, embarrassing, security hole that software can even update itself, right?Because if it can update itself, then you know, imagine someone breaks into the production services of a company that is offering a particular program. They put malware into a version of the software, they put it into the software update server, and then they trigger everything in the network to push the software update to those devices. Now, you've got malware installed on all your devices, right? It's very strange that people asked for this as a feature. [laugh].Tailscale currently does not have that feature; it doesn't push software updates on its own. But it's such a popular feature that I think we're going to have to implement it because everybody wants this because Windows, for example, is simply just never going to automatically update your software for you. We have to have these weird-super admin rights on your machine so that we can push software updates because nobody else will. I feel really weird about that. You know, the security world should be protesting this more.But instead, they're like, asking, can you please put this feature in because I've got a checklist in my compliance thing that says, “Is all your software up-to-date?” I don't have a checklist item that says, “Does any of my software have super-admin rights that they shouldn't have?” Right? It's sort of, I guess, the next level of supply-chain management is the big word. Nobody—there is no supply chain management for software.Corey: There isn't, for better or worse. I wish there were, but there simply is not. Ugh. Next year, maybe. We hope.Avery: Yep. So, you have to trust your vendors, fundamentally, which I guess will always be true. That's true for Tailscale as well, right? Whether or not we include the software update pushing. If you're installing a VPN product provided by a vendor, you have to trust that we're going to put the right stuff into the software.And the best—the only thing I can really do is just be honest about these issues and say, “Well, look, we try our best. We definitely try not to implement features that are going to turn into security holes for you.” And I think we do a lot better than most vendors do in that area. But it's very hard to be perfect because nobody knows how to do software supply chain well.Corey: Ugh. I hear you. I that's the nice thing, too. Honestly, the big reason I know I need to update these things and the reason I want to do it's actually you. Because whenever I log in and look at my devices in the Tailscale thing, there's a little icon next to the one that there's an update available here.And you have fixed a lot of the niceties on this, like, ah, there's an update available for the iOS version. It's, “Really? Because it's not available in the Apple Store yet,” as I sit there spamming the thing. That stopped happening. There's a lot of just very nice quality-of-life improvements that are easy to miss.Avery: Yep, yeah, that's kind of weird. We actually went a little overboard on the update available notifications for a while because there's always this trade-off, right? Like I said, we have a policy of never breaking old versions, so when people see the update available notification, they kind of panic. It's, like, “Oh no, I better install the update, before Talescale cuts me off.” And, like, well, we're not actually ever going to cut you off, so you shouldn't have to worry about that stuff.But on the other hand, you're not going to get the latest features and bug fixes unless you're running the latest version, so when people email us saying, “Hey, I'm using Tailscale from six months ago, and I have this problem,” the first thing our support team does is say, “Well, can you please try the latest one, and does the problem go away?” Because it's kind of inefficient debugging six-month-old software. So, one way we were trying to, like, minimize that cost is, like, hey, we could just tell people there's a new version available and then maybe they'll update it themselves. But that resulted in people panicking. Like, oh, no, I need to install the software really, really soon because I can't afford to break my network.Corey: Right.Avery: And because our system is based on WireGuard and this is —you know, I'll probably jinx it by saying this but, like, we've never had an actual security hole that we've had to issue a Tailscale update to resolve, right? People see the update available thing and, like, “Oh, no, I bet there's a whole bunch of vulnerabilities that they fixed.” It's like, “Well, no.” WireGuard has also never had a vulnerability, right? [laugh] it's… yeah, it's, you know, sooner or later there probably will be one, and when there is one, we'll probably have to make the, you know, update notification in red or something instead of just the little icon on the admin panel. But yeah, it's—Corey: [laugh].Avery: —we try [crosstalk 00:35:23]—Corey: Nice job on jinxing it, by the way, I appreciate that.Avery: Yeah I know. I mean, I try to try my best. [laugh]. But I've actually been surprised. It's very much like my experience with all the djb stuff we used in the past.Like, when we were using qmail and djbdns for years, there was never once a security hole, right? It's very interesting that it is possible to design software that never once has a security hole. And nobody does that, right? I mean, I would say I'm not as smart as djb; our software is probably, you know, not going to be as one hundred percent perfect as that, but we try really, really hard to aim for that as a goal.Corey: Yeah. I really want to thank you for taking the time to speak with me about everything Tailscale is up to. And again, congratulations on your Series B. If people want to learn more, where should they go?Avery: I guess, tailscale.com is the place. We also have @tailscale in Twitter. My own personal Twitter is @apenwarr, which you probably won't be able to spell unless you Google for me or something—Corey: But it's in the [show notes 00:36:19], which makes this even easier.Avery: It is? Ah, there you go. So yeah, there's lots of information. But the number one thing I tell people is, like, look, it is a lot easier to get started than you think it is. Even after you've heard it 100 times, nobody ever believes how easy it is to get started. Just go to the App Store, download the app, log into your account, and you're already done, right? Try that and you don't even have to read anything.Corey: I would tear you apart for that statement if it weren't—if it were slightly less true than it is, but it is transformative. Give it a try. It's a strong endorsement from me. Thank you so much for your time. I appreciate it.Avery: Thank you, too. Great talking to you, and talk next time.Corey: Indeed. Avery Pennarun, CEO of Tailscale. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this show, please leave a five-star review on your podcast platform of choice, and smash the like and subscribe buttons, whereas if you've hated it, same thing—five-star review, smash the buttons—and also leave an angry bitter comment about how you are smart enough to roll your own crypto, so you don't understand why other people wouldn't do it.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.