Podcasts about EC2

Share on
Share on Facebook
Share on Twitter
Share on Reddit
Share on LinkedIn
Copy link to clipboard
  • 140PODCASTS
  • 389EPISODES
  • 39mAVG DURATION
  • 5WEEKLY NEW EPISODES
  • Jun 30, 2022LATEST

POPULARITY

20122013201420152016201720182019202020212022


Best podcasts about EC2

Latest podcast episodes about EC2

Screaming in the Cloud
Granted, Common Fate, and AWS Functionality with Chris Norman

Screaming in the Cloud

Play Episode Listen Later Jun 30, 2022 33:34


About ChrisChris is a robotics engineer turned cloud security practitioner. From building origami robots for NASA, to neuroscience wearables, to enterprise software consulting, he is a passionate builder at heart. Chris is a cofounder of Common Fate, a company with a mission to make cloud access simple and secure.Links: Common Fate: https://commonfate.io/ Granted: https://granted.dev Twitter: https://twitter.com/chr_norm TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Let's face it, on-call firefighting at 2am is stressful! So there's good news and there's bad news. The bad news is that you probably can't prevent incidents from happening, but the good news is that incident.io makes incidents less stressful and a lot more valuable. incident.io is a Slack-native incident management platform that allows you to automate incident processes, focus on fixing the issues and learn from incident insights to improve site reliability and fix your vulnerabilities. Try incident.io, recover faster and sleep more.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. It doesn't matter where you are on your journey in cloud—you could never have heard of Amazon the bookstore—and you encounter AWS and you spin up an account. And within 20 minutes, you will come to the realization that everyone in this space does. “Wow, logging in to AWS absolutely blows goats.”Today, my guest, obviously had that reaction, but unlike most people I talked to, decided to get up and do something about it. Chris Norman is the co-founder of Common Fate and most notably to how I know him is one of the original authors of the tool, Granted. Chris, thank you so much for joining me.Chris: Hey, Corey, thank you for having me.Corey: I have done podcasts before; I have done a blog post on it; I evangelize it on Twitter constantly, and even now, it is challenging in a few ways to explain holistically what Granted is. Rather than trying to tell your story for you, when someone says, “Oh, Granted, that seems interesting and impossible to Google for in isolation, so therefore, we know it's going to be good because all the open-source projects with hard to find names are,” what is Granted and what does it do?Chris: Granted is a command-line tool which makes it really easy for you to get access and assume roles when you're working with AWS. For me, when I'm using Granted day-to-day, I wake up, go to my computer—I'm working from home right now—crack open the MacBook and I log in and do some development work. I'm going to go and start working in the cloud.Corey: Oh, when I start first thing in the morning doing development work and logging into the cloud, I know. All right, I'm going to log in to AWS and now I know that my day is going downhill from here.Chris: [laugh]. Exactly, exactly. I think maybe the best days are when you don't need to log in at all. But when you do, I go and I open my terminal and I run this command. Using Granted, I ran this assume command and it authenticates me with single-sign-on into AWS, and then it opens up a console window in a particular account.Now, you might ask, “Well, that's a fairly standard thing.” And in fact, that's probably the way that the console and all of the tools work by default with AWS. Why do you need a third-party tool for this?Corey: Right. I've used a bunch of things that do varying forms of this and unlike Granted, you don't see me gushing about them. I want to be very clear, we have no business relationship. You're not sponsoring anything that I do. I'm not entirely clear on what your day job entails, but I have absolutely fallen in love with the Granted tool, which is why I'm dragging you on to this show, kicking and screaming, mostly to give me an excuse to rave about it some more.Chris: [laugh]. Exactly. And thank you for the kind words. And I'd say really what makes it special or why I've been so excited to be working on it is that it makes this access, particularly when you're working with multiple accounts, really, really easy. So, when I run assume and I open up that console window, you know, that's all fine and that's very similar to how a lot of the other tools and projects that are out there work, but when I want to open that second account and that second console window, maybe because I'm looking at like a development and a staging account at the same time, then Granted allows me to view both of those simultaneously in my browser. And we do that using some platform sort of tricks and building into the way that the browser works.Corey: Honestly, one of the biggest differences in how you describe what Granted is and how I view it is when you describe it as a CLI application because yes, it is that, but one of the distinguishing characteristics is you also have a Firefox extension that winds up leveraging the multi-container functionality extension that Firefox has. So, whenever I wind up running a single command—assume with a-c' flag, then I give it the name of my AWS profile, it opens the web console so I can ClickOps my heart's content inside of a tab that is locked to a container, which means I can have one or two or twenty different AWS accounts and/or regions up running simultaneously side-by-side, which is basically impossible any other way that I've ever looked at it.Chris: Absolutely, yeah. And that's, like, the big differentiating factor right now between Granted and between this sort of default, the native experience, if you're just using the AWS command line by itself. With Granted, you can—with these Firefox containers, all of your cookies, your profile, everything is all localized into that one container. It's actually it's a privacy features that are built into Firefox, which keeps everything really separate between your different profiles. And what we're doing with Granted is that we make it really easy to open a specific profiles that correspond with different AWS profiles that you're using.So, you'd have one which could be your development account, one which could be production or staging. And you can jump between these and navigate between them just as separate tabs in your browser, which is a massive improvement over, you know, what I've previously had to use in the past.Corey: The thing that really just strikes me about this is first, of course, the functionality and the rest, so I saw this—I forget how I even came across it—and immediately I started using it. On my Mac, it was great. I started using it when I was on the road, and it was less great because you built this thing in Go. It can compile and install on almost anything, but there were some assumptions that you had built into this in its early days that did not necessarily encompass all of the use cases that I use. For example, it hadn't really occurred to you that some lunatic would try and only use an iPad when they're on the road, so they have to be able to run this to get federated login links via SSHing into an EC2 instance running somewhere and not have it open locally.You seemed almost taken aback when I brought it up. Like, “What lunatic would do that?” Like, “Hi, I'm such a lunatic. Let's talk about this.” And it does that now, and it's awesome. It does seem to me though, and please correct me if I'm wrong on this assumption slash assessment that this is first and foremost aimed at desktop users, specifically people running Mac on the desktop, is that the genesis of it?Chris: It is indeed. And I think part of the cause behind that is that we originally built a tool for ourselves. And as we were building things and as we were working using the cloud, we were running things—you know, we like to think that we're following best practices when we're using AWS, and so we'd set up multiple accounts, we'd have a special account for development, a separate one for staging, a separate one for production, even internal tools that we would build, we would go and spin up an individual account for those. And then you know, we had lots of accounts. and to go and access those really easily was quite difficult.So, we definitely, we built it for ourselves first and I think that that's part of when we released it, it actually a little bit of cause for some of the initial problems. And some of the feedback that we had was that it's great to build tools for yourself, but when you're working in open-source, there's a lot of different diversity with how people are using things.Corey: We take different approaches. You want to try to align with existing best practices, whereas I am a loudmouth white guy who works in tech. So, what I do definitionally becomes a best practice in the ecosystem. It's easier to just comport with the ones that are already existing that smart people put together rather than just trying to competence your way through it, so you took a better path than I did.But there's been a lot of evolution to Granted as I've been using it for a while. I did a whole write-up on it and that got a whole bunch of eyes onto the project, which I can now admit was a nefarious plan on my part because popping into your community Slack and yelling at you for features I want was all well and good, but let's try and get some people with eyes on this who are smarter than me—which is not that high of a bar when it comes to SSO, and IAM, and federated login, and the rest—and they can start finding other enhancements that I'll probably benefit from. And sure enough, that's exactly what happened. My sneaky plan has come to fruition. Thanks for being a sucker, I guess. I mean—[laugh] it worked. I'm super thrilled by the product.Chris: [laugh]. I guess it's a great thing I think that the feedback and particularly something that's always been really exciting is just seeing new issues come through on GitHub because it really shows the kinds of interesting use cases and the kinds of interesting teams and companies that are using Granted to make their lives a little bit easier.Corey: When I go to the website—which again is impossible to Google—the website for those wondering is granted.dev. It's short, it's concise, I can say it on a podcast and people automatically know how to spell it. But at the top of the website—which is very well done by the way—it mentions that oh, you can, “Govern access to breakglass roles with Common Fate Cloud,” and it also says in the drop shadow nonsense thing in the upper corner, “Brought to you by Common Fate,” which is apparently the name of your company.So, the question I'll get to in a second is what does your company do, but first and foremost, is this going to be one of those rug-pull open-source projects where one day it's, “Oh, you want to log into your AWS accounts? Insert quarter to continue.” I'm mostly being a little over the top with that description, but we've all seen things that we love turn into molten garbage. What is the plan around this? Are you about to ruin this for the rest of us once you wind up raising a round or something? What's the deal?Chris: Yeah, it's a great question, Corey. And I think that to a degree, releasing anything like this that sits in the access workflow and helps you assume roles and helps you day-to-day, you know, we have a responsibility to uphold stability and reliability here and to not change things. And I think part of, like, not changing things includes not [laugh] rug-pulling, as you've alluded to. And I think that for some companies, it ends up that open-source becomes, like, a kind of a lead-generation tool, or you end up with, you know, now finally, let's go on add another login so that you have to log into Common Fate to use Granted. And I think that, to be honest, a tool like this where it's all about improving the speed of access, the incentives for us, like, it doesn't even make sense to try and add another login for to try to get people to, like, to say, login to Common Fate because that would make your signing process for AWS take even longer than it already does.Corey: Yeah, you decided that you know, what's the biggest problem? Oh, you can sleep at night, so let's go ahead and make it even worse, by now I want you to be this custodian of all my credentials to log into all of my accounts. And now you're going to be critical path, so if you're down, I'm not able to log into anything. And oh, by the way, I have to trust you with full access to my bank stuff. I just can't imagine that is a direction that you would be super excited about diving head-first into.Chris: No, no. Yeah, certainly not. And I think that the, you know, building anything in this space, and with what we're doing with Common Fate, you know, we're building a cloud platform to try to make IAM a little bit easier to work with, but it's really sensitive around granting any kind of permission and I think that you really do need that trust. So, trying to build trust, I guess, with our open-source projects is really important for us with Granted and with this project, that it's going to continue to be reliable and continue to work as it currently does.Corey: The way I see it, one of the dangers of doing anything that is particularly open-source—or that leans in the direction of building in Amazon's ecosystem—it leads to the natural question of, well, isn't this just going to be some people say stolen—and I don't think those people understand how open-source works—by AWS themselves? Or aren't they going to build something themselves at AWS that's going to wind up stomping this thing that you've built? And my honest and remarkably cynical answer is that, “You have built a tool that is a joy to use, that makes logging into AWS accounts streamlined and efficient in a variety of different patterns. Does that really sound like something AWS would do?” And followed by, “I wish they would because everyone would benefit from that rising tide.”I have to be very direct and very clear. Your product should not exist. This should be something the provider themselves handles. But nope. Instead, it has to exist. And while I'm glad it does, I also can't shake the feeling that I am incredibly annoyed by the fact that it has to.Chris: Yeah. Certainly, certainly. And it's something that I think about a little bit. I like to wonder whether there's maybe like a single feature flag or some single sort of configuration setting in AWS where they're not allowing different tabs to access different accounts, they're not allowing this kind of concurrent access. And maybe if we make enough noise about Granted, maybe one of the engineers will go and flick that switch and they'll just enable it by default.And then Granted itself will be a lot less relevant, but for everybody who's using AWS, that'll be a massive win because the big draw of using Granted is mainly just around being able to access different accounts at the same time. If AWS let you do that out of the box, hey, that would be great and, you know, I'd have a lot less stuff to maintain.Corey: Originally, I had you here to talk about Granted, but I took a glance at what you're actually building over at Common Fate and I'm about to basically hijack slash derail what probably is going to amount the rest of this conversation because you have a quick example on your site for by developers, for developers. You show a quick Python script that tries to access a S3 bucket object and it's denied. You copy the error message, you paste it into what you're building over a Common Fate, and in return, it's like, “Oh. Yeah, this is the policy that fixes it. Do you want us to apply it for you?”And I just about fell out of my chair because I have been asking for this explicit thing for a very long time. And AWS doesn't do it. Their IAM access analyzer claims to. Like, “Oh, just go look at CloudTrail and see what permissions it uses and we'll build a policy to scope it down.” “Okay. So, it's S3 access. Fair enough. To what object or what bucket?” “Guess,” is what it tells you there.And it's, this is crap. Who thinks this is a good user experience? You have built the thing that I wish AWS had built in natively. Because let's be honest here, I do what an awful lot of people do and overscope permissions massively just because messing around with the bare minimum set of permissions in many cases takes more time than building the damn thing in the first place.Chris: Oh, absolutely. Absolutely. And in fact, this—was a few years ago when I was consulting—I had a really similar sort of story where one of the clients that we were working with, the CTO of this company, he was needing to grant us access to AWS and we were needing to build a particular service. And he said, “Okay, can you just let me know the permissions that you will need and I'll go and deploy the role for this.” And I came back and I said, “Wait. I don't even know the permissions that I'm going to need because the damn thing isn't even built yet.”So, we went sort of back and forth around this. And the compromise ended up just being you know, way too much access. And that was sort of part of the inspiration for, you know, really this whole project and what we're building with Common Fate, just trying to make that feedback loop around getting to the right level of permissions a lot faster.Corey: Yeah, I am just so overwhelmingly impressed by the fact that you have built—and please don't take this as a criticism—but a set of very simple tools. Not simple in the terms of, “Oh, that's, like, three lines of bash, and a fool could write that on a weekend.” No. Simple in the sense of it solves a problem elegantly and well and it's straightforward—well, straightforward as anything in the world of access control goes—to wrap your head around exactly what it does. You don't tend to build these things by sitting around a table brainstorming with someone you met at co-founder dating pool or something and wind up figuring out, “Oh, we should go and solve that. That sounds like a billion-dollar problem.”This feels very much like the outcome of when you're sitting around talking to someone and let's start by drinking six beers so we become extraordinarily honest, followed immediately by let's talk about what sucks. What pisses you off the most? It feels like this is sort of the low-hanging fruit of things that upset people when it comes to AWS. I mean, if things had gone slightly differently, instead of focusing on AWS bills, IAM was next on my list of things to tackle just because I was tired of smacking my head into it.This is very clearly a problem space that you folks have analyzed deeply, worked within, and have put a lot of thought into. I want to be clear, I've thrown a lot of feature suggestions that you for Granted from start to finish. But all of them have been around interface stuff and usability and expanding use cases. None of them have been, “Well, that seems screamingly insecure.” Because it hasn't been.Chris: [laugh].Corey: It has been effective, start to finish, I think that from a security posture, you make terrific choices, in many cases better than ones I would have made a starting from scratch myself. Everything that I'm looking at in what you have built is from a position of this is absolutely amazing and it is transformative to my own workflows. Now, how can we improve it?Chris: Mmm. Thank you, Corey. And I'll say as well, maybe around the security angle, that one of the goals with Granted was to try and do things a little bit better than the default way that AWS does them when it comes to security. And it's actually been a bit of a source for challenges with some of the users that we've been working with with Granted because one of the things we wanted to do was encrypt the SSO token. And this is the token that when you sign in to AWS, kind of like, it allows you to then get access to all of the rest of the accounts.So, it's like a pretty—it's a short-lived token, but it's a really sensitive one. And you know, by default, it's just stored in plain text on your disk. So, we dump to a file and, you know, anything that can go and read that, they can go and get it. It's also a little bit hard to revoke and to lock people out. There's not really great workflows around that on AWS's side.So, we thought, “Okay, great. One of the goals for Granted can be that we will go and store this in your keychain in your system and we'll work natively with that.” And that's actually been a cause for a little bit of a hassle for some users, though, because by doing that and by storing all of this information in the keychain, it's actually broken some of the integrations with the rest of the tooling, which kind of expects tokens and things to be in certain places. So, we've actually had to, as part of dealing with that with Granted, we've had to give users the ability to opt out for that.Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: That's why I find this so, I think, just across the board, fantastic. It's you are very clearly engaged with your community. There's a community Slack that you have set up for this. And I know, I know, too many Slacks; everyone has this problem. This is one of those that is worth hanging in, at least from my perspective, just because one of the problems that you have, I suspect, is on my Mac it's great because I wind up automatically updating it to whatever the most recent one is every time I do a brew upgrade.But on the Linux side of the world, you've discovered what many of us have discovered, and that is that packaging things for Linux is a freaking disaster. The current installation is, “Great. Here's basically a curl bash.” Or, “Here, grab this tarball and install it.” And that's fine, but there's no real way of keeping that updated and synced.So, I was checking the other day, oh wow, I'm something like eight versions behind on this box. But it still just works. I upgraded. Oh, wow. There's new functionality here. This is stuff that's actually really handy. I like this quite a bit. Let's see what else we can do.I'm just so impressed, start to finish, by just how receptive you've been to various community feedbacks. And as well—I want to be very clear on this point, too—I've had folks who actually know what they're doing in an InfoSec sense look at what you're up to, and none of them had any issues of note. I'm sure that they have a pile of things like, with that curl bash, they should really be doing a GPG check. Yes, yes, fine. Whatever. If that's your target threat model, okay, great. Here in reality-land for what I do, this is awesome.And they don't seem to have any problems with, “Oh, yeah. By the way, sending analytics back up”—which, okay, fine, whatever. “And it's not disclosing them.” Okay, that's bad. “And it's including the contents of your AWS credentials.”Ahhhh. I did encounter something that was doing that on the back-end once. [cough]—Serverless Framework—sorry, something caught in my throat for a second.Chris: [laugh].Corey: No faster way I can think of to erode trust in that. But everything you're doing just makes sense.Chris: Oh, I do remember that. And that was a little bit of a fiasco, really, around all of that, right? And it's great to hear actually around that InfoSec folks and security people being, you know, not unhappy, I guess, with a tool like this. It's been interesting for me personally. We've really come from a practitioner's background.You know, I wouldn't call myself a security engineer at all. I would call myself as a sometimes a software developer, I guess. I have been hacking my way around Go and definitely learning a lot about how the cloud has worked over the past seven, eight years or so, but I wouldn't call myself a security engineer, so being very cautious around how all of these things work. And we've really tried to defer to things like the system keychain and defer to things that we know are pretty safe and work.Corey: The thing that I also want to call out as well is that your licensing is under the MIT license. This is not one of those, “Oh, you're required to wind up doing a bunch of branding stuff around it.” And, like some people say, “Oh, you have to own the trademark for all of these things.” I mean, I'm not an expert in international trademark law, let's be very clear, but I also feel that trademarking a term that is already used heavily in the space such as the word ‘Granted,' feels like kind of an uphill battle. And let's further be clear that it doesn't matter what you call this thing.In fact, I will call attention to an oddity that I've encountered a fair bit. After installing it, the first thing you do is you run the command ‘granted.' That sets it up, it lets you configure your browser, what browser you want to use, and it now supports standard out for that headless, EC2 use case. Great. Awesome. Love it. But then the other binary that ships with it is Assume. And that's what I use day-to-day. It actually takes me a minute sometimes when it's been long enough to remember that the tool is called Granted and not Assume what's up with that?Chris: So, part of the challenge that we ran into when we were building the Granted project is that we needed to export some environment variables. And these are really important when you're logging into AWS because you have your access key, your secret key, your session token. All of those, when you run the assume command, need to go into the terminal session that you called it. This doesn't matter so much when you're using the console mode, which is what we mentioned earlier where you can open 100 different accounts if you want to view all of those at the same time in your browser. But if you want to use it in your terminal, we wanted to make it look as really smooth and seamless as possible here.And we were really inspired by this approach from—and I have to shout them out and kind of give credit to them—a tool called AWSume—they're spelled A-W-S-U-M-E—Python-based tool that they don't do as much with single-sign-on, but we thought they had a really nice, like, general approach to the way that they did the scripting and aliasing. And we were inspired by that and part of that means that we needed to have a shell script that called this executable, which then will export things back out into the shell script. And we're doing all this wizardry under the hood to make the user experience really smooth and seamless. Part of that meant that we separated the commands into granted and assume and the other part of the naming for everything is that I felt Granted had a far better ring to it than calling the whole project Assume.Corey: True. And when you say assume, is it AWS or not? I've used the AWSume project before; I've used AWS Vault out of 99 Designs for a while. I've used—for three minutes—the native AWS SSO config, and that is just trash. Again, they're so good at the plumbing, so bad at the porcelain, I think is the criticism that I would levy toward a lot of this stuff.Chris: Mmm.Corey: And it's odd to think there's an entire company built around just smoothing over these sharp, obnoxious edges, but I'm saying this as someone who runs a consultancy and have five years that just fixes the bill for this one company. So, there's definitely a series of cottage industries that spring up around these things. I would be thrilled, on some level, if you wound up being completely subsumed by their product advancements, but it's been 15 years for a lot of this stuff and we're still waiting. My big failure mode that I'm worried about is that you never are.Chris: Yeah, exactly, exactly. And it's really interesting when you think about all of these user experience gaps in AWS being opportunities for, I guess, for companies like us, I think, trying to simplify a lot of the complexity for things. I'm interested in sort of waiting for a startup to try and, like, rebuild the actual AWS console itself to make it a little bit faster and easier to use.Corey: It's been done and attempted a bunch of different times. The problem is that the console is a lot of different things to a lot of different people, and as you step through that, you can solve for your use case super easily. “Yeah, what do I care? I use RDS, I use some VPC nonsense, and I use EC2. The end.” “Great. What about IAM?”Because I promise you're using that whether you know it or not. And okay, well, I'm talking to someone else who's DynamoDB, and someone else is full-on serverless, and someone else has more money than sense, so they mostly use SageMaker, and so on and so forth. And it turns out that you're effectively trying to rebuild everything. I don't know if that necessarily works.Chris: Yeah, and I think that's a good point around maybe while we haven't seen anything around that sort of space so far. You go to the console, and you click down, you see that list of 200 different services and all of those have had teams go and actually, like, build the UI and work with those individual APIs. Yeah.Corey: Any ideas as far as what's next for features on Granted?Chris: I think that, for us, it's continuing to work with everybody who's using it, and with a focus of stability and performance. We actually had somebody in the community raise an issue because they have an AWS config file that's over 7000 lines long. And I kind of pity that person, potentially, for their day-to-day. They must deal with so much complexity. Granted is currently quite slow when the config files get very big. And for us, I think, you know, we built it for ourselves; we don't have that many accounts just yet, so working to try to, like, make it really performant and really reliable is something that's really important.Corey: If you don't mind a feature request while we're at it—and I understand that this is more challenging than it looks like—I'm willing to fund this as a feature bounty that makes sense. And this also feels like it might be a good first project for a very particular type of person, I would love to get tab completion working in Zsh. You have it—Chris: Oh.Corey: For Fish because there's a great library that automatically populates that out, but for the Zsh side of it, it's, “Oh, I should just wind up getting Zsh completion working,” and I fell down a rabbit hole, let me tell you. And I come away from this with the perception of yeah, I'm not going to do it. I have not smart enough to check those boxes. But a lot of people are so that is the next thing I would love to see. Because I will change my browser to log into the AWS console for you, but be damned if I'm changing my shell.Chris: [laugh]. I think autocomplete probably should be higher on our roadmap for the tool, to be honest because it's really, like, a key metric and what we're focusing on is how easy is it to log in. And you know, if you're not too sure what commands to use or if we can save you a few keystrokes, I think that would be the, kind of like, reaching our goals.Corey: From where I'm sitting, you definitely have. I really want to thank you for taking the time to not only build this in the first place, but also speak with me about it. If people want to learn more, where's the best place to find you?Chris: So, you can find me on Twitter, I'm @chr_norm, or you can go and visit granted.dev and you'll have a link to join the Slack community. And I'm very active on the Slack.Corey: You certainly are, although I will admit that I fall into the challenge of being in just the perfectly opposed timezone from you and your co-founder, who are in different time zones to my understanding; one of you is on Australia and one of you was in London; you're the London guy as best I'm aware. And as a result, invariably, I wind up putting in feature requests right when no one's around. And, for better or worse, in the middle of the night is not when I'm usually awake trying to log into AWS. That is Azure time.Chris: [laugh]. Yeah, no, we don't have the US time zone properly covered yet for our community support and help. But we do have a fair bit of the world timezone covered. The rest of the team for Common Fate is all based in Australia and I'm out here over in London.Corey: Yeah. I just want to thank you again, for just being so accessible and, like, honestly receptive to feedback. I want to be clear, there's a way to give feedback and I do strive to do it constructively. I didn't come crashing into your Slack one day with a, “You know what your problem is?” I prefer to take the, “This is awesome. Here's what I think would be even better. Does that make sense?” As opposed to the imperious demands and GitHub issues and whatnot? It's, “I'd love it if it did this thing. Doesn't do this thing. Can you please make it do this thing?” Turns out that's the better way to drive change. Who knew?Chris: Yeah. [laugh]. Yeah, definitely. And I think that one of the things that's been the best around our journey with Granted so far has been listening to feedback and hearing from people how they would like to use the tool. And a big thank you to you, Corey, for actually suggesting changes that make it not only better for you, but better for everybody else who's using Granted.Corey: Well, at least as long as we're using my particular byzantine workload patterns in some way, or shape, or form, I'll hear that. But no, it's been an absolute pleasure and I really want to thank you for your time as well.Chris: Yeah, thank you for having me.Corey: Chris Norman, co-founder of Common Fate, as well as one of the two primary developers originally behind the Granted project that logs you into AWS without you having to lose your mind. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, incensed, raging comment that talks about just how terrible all of this is once you spend four hours logging into your AWS account by hand first.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

AWS Morning Brief
Enter Your Passwordle

AWS Morning Brief

Play Episode Listen Later Jun 30, 2022 5:20


Links:  Azure has another security issue around its Synapse offering; this one was discovered by Tenable. Sysdig has a dive into the real threats to SSH on EC2. Tailscale has announced the ability to support Tailscale SSH. Chris Farris has a treatise on the The Philosphy of Prevention when it comes to cloud security. Google Cloud CISO Phil Venables asks whether security analogies are counterproductive.  A security issue of sorts was discovered around sts:GetSessionToken Role Chaining in AWS The person responsible for the giant Capital One hack that took advantage of a series of small AWS misconfigurations has been convicted. Rogue GitHub apps could have hijacked countless repos for a week or two earlier this year. Wickr for Government achieves FedRAMP Ready designation It takes an open source project like trackiam to collate IAM actions, AWS APIs, and managed policies from all over the place Passwordle lets you guess commonly used passwords.

Screaming in the Cloud
TikTok and Short Form Content for Developers with Linda Vivah

Screaming in the Cloud

Play Episode Listen Later Jun 28, 2022 34:01


Full Description / Show Notes Corey and Linda talk about Tiktok and the online developer community (1:18) Linda talks about what prompted her to want to work at AWS (5:29) Linda discusses navigating the change from just being part of the developer community to being an employee of AWS (10:37) Linda talks about moving AWS more in the direction of short form content, and Corey and Linda talk about the Tiktok algorithm (15:56) Linda talks about the potential struggle of going from short form to long form content (25:21) About LindaLinda Vivah is a Site Reliability Engineer for a major media organization in NYC, a tech content creator, an AWS community builder member, a part-time wedding singer, and the founder of a STEM jewelry shop called Coding Crystals. At the time of this recording she was about to join AWS in her current position as a Developer Advocate.Linda had an untraditional journey into tech. She was a Philosophy major in college and began her career in journalism. In 2015, she quit her tv job to attend The Flatiron School, a full stack web development immersive program in NYC. She worked as a full-stack developer building web applications for 5 years before shifting into SRE to work on the cloud end internally.Throughout the years, she's created tech content on platforms like TikTok & Instagram and believes that sometimes the best way to learn is to teach.Links Referenced:lindavivah.com: https://lindavivah.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: Let's face it, on-call firefighting at 2am is stressful! So there's good news and there's bad news. The bad news is that you probably can't prevent incidents from happening, but the good news is that incident.io makes incidents less stressful and a lot more valuable. incident.io is a Slack-native incident management platform that allows you to automate incident processes, focus on fixing the issues and learn from incident insights to improve site reliability and fix your vulnerabilities. Try incident.io, recover faster and sleep more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. We talk a lot about how people go about getting into this ridiculous industry of ours, and I've talked a little bit about how I go about finding interesting and varied guests to show up and help me indulge my ongoing love affair on this show with the sound of my own voice. Today, we're going to be able to address both of those because today I'm speaking to Linda Haviv, who, as of this recording, has accepted a job as a Developer Advocate at AWS, but has not started. Linda, welcome to the show.Linda: Thank you so much for having me, Corey. Happy to be here.Corey: So, you and I have been talking for a while and there's been a lot of interesting things I learned along the way. You were one of the first people I encountered when I joined the TikToks, as all the kids do these days, and was trying to figure out is there a community of folks who use AWS. Which really boils down to, “So, where are these people that are sad all the time?” Well, it turns out, they're on TikTok, so there we go. We found my people.And that was great. And we started talking, and it turns out that we were both in the AWS community builder program. And we've developed a bit of a rapport. We talk about different things. And then, I guess, weird stuff started happening, in the context of you were—you're doing very well at building an audience for yourself on TikTok.I tried it, and it was—my sense of humor sometimes works, sometimes doesn't. I've had challenges in finding any reasonable way to monetize it because a 30-second video doesn't really give nuance for a full ad read, for example. And you've been looking at it from the perspective of a content creator looking to build the audience slash platform is step one, and then, eh, step two, you'll sort of figure out aspects of monetization later. Which, honestly, is a way easier way to do it in hindsight, but, yeah, the things that we learn. Now, that you're going to AWS, first, you planning to still be on the TikToks and whatnot?Linda: Absolutely. So, I really look at TikTok as a funnel. I don't think it's the main place, you're going to get that deep-dive content but I think it's a great way, especially for things that excite you or get you into understanding it, especially beginner-type audience, I think there's a lot of untapped market of people looking to into tech, or technologists that aren't in the cloud. I mean, even when I worked—I worked as a web developer and then kind of learned more about the cloud, and I started out as a front-end developer and shifted into, like, SRE and infrastructure, so even for people within tech, you can have a huge tech community which there is on TikTok, with a younger community—but not all of them really understand the cloud necessarily, depending on their job function. So, I think it's a great way to kind of expose people to that.For me, my exposure came from community. I met somebody at a meetup who was working in cloud, and it wasn't even on the job that I really started getting into cloud because many times in corporations, you might be working on a specific team and you're not really encountering other ends, and it seems kind of like a mystery. Although it shouldn't seem like magic, many times when you're doing certain job functions—especially the DevOps—could end up feeling like magic. So, [laugh] for the good and the bad. So sometimes, if you're not working on that end, you really sometimes take it for granted.And so, for me, I actually—meetups were the way I got exposed to that end. And then I brought it back into my work and shifted internally and did certifications and started, even, lunch-and-learns where I work to get more people in their learning journey together within the company, and you know, help us as we're migrating to the cloud, as we're building on the cloud. Which, of course, we have many more roles down the road. I did it for a few years and saw the shift. But I worked at a media company for many years and now shifting to AWS, and so I've seen that happen on different ends.Not—oh, I wasn't the one doing the migration because I was on the other end of that time, but now for the last two years, I was working on [laugh] the infrastructure end, and so it's really fascinating. And many people actually—until now I feel like—that will work on maybe the web and mobile and don't always know as much about the cloud. I think it's a great way to funnel things in a quick manner. I think also society is getting used to short videos, and our attention span is very low, and I think for—Corey: No argument here.Linda: —[crosstalk 00:04:39] spending so mu—yeah, and we're spending so much time on these platforms, we might as well, you know, learn something. And I think it depends what content. Some things work well, some things doesn't. As with anything content creation, you kind of have to do trial and error, but I do find the audience to be a bit different on TikTok versus Twitter versus Instagram versus YouTube. Which is interesting how it's going to play out on YouTube, too, which is a whole ‘nother topic conversation.Corey: Well, it's odd to me watching your path. It's almost the exact opposite of mine where I started off on the back-end, grumpy sysadmin world and, “Oh, why would I ever need to learn JavaScript?” “Well, genius, because as the world progresses, guess what? That's right. The entire world becomes JavaScript. Welcome.”And it took me a long time to come around to that. You started with the front-end world and then basically approached from the exact opposite end. Let's be clear, back in my day, mine was the common path. These days, yours is very much the common path.Linda: Yeah.Corey: I also want to highlight that all of those transitions and careers that you spoke about, you were at the same company for nine years, which in tech is closer to 30. So, I have to ask, what was it that inspired you, after nine years, to decide, “I'm going to go work somewhere else. But not just anywhere; I'm going to AWS.” Because normally people don't almost institutionalized lifers past a certain point.Linda: [laugh].Corey: Like, “Oh, you'll be there till you retire or die.” Whereas seeing significant career change after that long in one place, even if you've moved around internally and experienced a lot of different roles, is not common at all what sparked that?Linda: Yeah. Yeah, no, it's such a good question. I always think about that, too, especially as I was reflecting because I'm, you know, in the midst of this transition, and I've gotten a lot of reflecting over the last two weeks [laugh], or more. But I think the main thing for me is, I always, wherever I was—and this kind of something that—I'm very proactive when it comes to trying to transition. I think, even when I was—right, I held many roles in the same company; I used to work in TV production and actually left for three months to go to a coding boot camp and then came back on the other end, but I understood the product in a different way.So, for that time period, it was really interesting to work on the other end. But, you know, as I kind of—every time I wanted to progress further, I always made a move that was actually new and put me in an uncomfortable place, even within the same company. And I'm at the point now that I'm in my career, I felt like this next step really needs to be, you know, at AWS. It's not, like, the natural progression for me. I worked alongside—on the client end—with AWS and have seen so many projects come through and how much our own workloads have changed.And it's just been an incredible journey, also dealing with accounts team. On that end, I've worked alongside them, so for me, it was kind of a natural progression. I was very passionate about cloud computing at AWS and I kind of wanted to take it to that next place, and I felt like—also, dealing with the community as part of my job is a dream part to me because I was always doing that on the side on social media. So, it wasn't part of my day-to-day job. I was working as an SRE and an infrastructure engineer, so I didn't get to do that as part of my day-to-day.I was making videos at 2 a.m. and, you know, kind of trying to, like, do—you know, interact with the community like that. And I think—I come from a performing background, the people background, I was singing since I was four years old. I always go to—I was a wedding singer, so I go into a room and I love making people happy or giving value. And I think, like, education has a huge part of that. And in a way, like making that content and—Corey: You got to get people's attention—Linda: Yeah.Corey: —you can't teach them a damn thing.Linda: Right. Exactly. So, it's kind of a mix of everything. It's like that performance, the love of learning. You know, between you and I, like, I wanted to be a lawyer before I thought I was going to—before I went to tech.I thought I was going to be a lawyer purely because I loved the concept of going to law school. I never took time to think about the law part, like, being the lawyer part. I always thought, “Oh, school.” I'm a student at heart. I always call myself a professional student. I really think that's part of what you need to be in this world, in this tech industry, and I think for me, that's what keeps my fire going.I love to experiment, to learn, to build. And there's something very fulfilling about building products. If you take a step back, like, you're kind of—you know, for me that part, every time I look back at that, that always is what kind of keeps me going. When I was doing front-end, it felt a lot more like I was doing smaller things than when I was doing infrastructure, so I felt like that was another reason why I shifted. I love doing the front-end, but I felt like I was spending two days on an Internet Explorer bug and it just drove me—[laugh] it just made it feel unfulfilling versus spending two days on, you know, trying to understand why, you know, something doesn't run the infrastructure or, like, there's—you know, it's failing blindly, you know? Stuff like that. Like, I don't know, for me that felt more fulfilling because the problem was more macro. But I think I needed both. I have a love for both, but I definitely prefer being back-end. So. [laugh]. Well, I'm saying that now but—[laugh].Corey: This might be a weakness on my part where I'm basically projecting onto others, and this is—I might be completely wrong on this, but I tend to take a bit of a bifurcated view of community. I mean, community is part of the reason that I know the things I know and how I got to this place that I am, so use that as a cautionary tale if you want. But when I talk to someone like you at this moment, where you're in the community, I'm in the community, and I'm talking to you about a problem I'm having and we're working on ways to potentially solve that or how to think about that. I view us as basically commiserating on these things, whereas as soon as you start on day one—and yes, it's always day one—at AWS and this becomes your day job and you work there, on some level, for me, there's a bit shift that happens and a switch gets flipped in my head where, oh, you actually work at this company. That means you're the problem.And I'm not saying that in a way of being antagonistic. Please, if you're watching or listening to this, do not antagonize the developer advocates. They have a very hard job understanding all this so they can explain that to the rest of us. But how do you wind up planning to navigate, or I guess your views on, I guess, handling the shift between, “One of the customers like the rest of us,” to, as I say, “Part of the problem,” for lack of a better term.Linda: Or, like, work because you kind of get the—you know. I love this question and it's something I've been pondering a lot on because I think the messaging will need to be a little different [coming from me 00:10:44] in the sense of, there needs to be—just in anything, you have to kind of create trust. And to create trust, you have to be vulnerable and authentic. And I think I, for example, utilize a lot of things outside of just the AWS cloud topic to do that now, even, when I—you know, kind of building it without saying where I work or anything like that, going into this role and it being my job, it's going to be different kind of challenge as far as the messaging, but I think it still holds true that part, that just developing trust and authenticity, I might have to do more of that, you know? I might have to really share more of that part, share other things to really—because it's more like people come, it doesn't matter how much somet—how many times you explain it, many times, they will see your title and they will judge you for it, and they don't know what happened before. Every TikTok, for example, you have to act like it's a new person watching. There is no series, you know? Like, yes, there's a series but, like, sometimes you can make that but it's not really the way TikTok functions or a short-form video functions. So, you kind of have to think this is my first time—Corey: It works really terribly when you're trying to break it out that way on TikTok.Linda: [laugh]. Yeah.Corey: Right. Here's part 17 of my 80-TikTok-video saga. And it's, “Could you just turn this into a blog post or put this on YouTube or something? I don't have four hours to spend learning how all this stuff works in your world.”Linda: Yeah. And you know, I think repeating certain things, too, is really important. So, they say you have to repeat something eight times for people to see it or [laugh] something like that. I learned that in media [crosstalk 00:12:13]—Corey: In a row, or—yeah. [laugh].Linda: I mean, the truth is that when you, kind of like, do a TikTok maybe, like, there's something you could also say or clarify because I think there's going to be—and I'm going to have to—there's going to be a lot of trial and error for me; I don't know if I have answers—but my plan is going into it very much testing that kind of introduction, or, like, clarifying what that role is. Because the truth is, the role is advocating on behalf of the community and really helping that community, so making sure that—you don't have to say it as far as a definition maybe, but, like, making sure that comes across when you create a video. And I think that's going to be really important for me, and more important than the prior even creating content going forward. So, I think that's one thing that I definitely feel like is key.As well as creating more raw interaction. So, it depends on the platform, too. Instagram, for example, is much more community—how do I put this? Instagram is much more easy to navigate as far as reaching the same community because you have something, like, called Instagram Stories, right? So, on Instagram Stories, you're bringing those stories, mostly the same people that follow you. You're able to build that trust through those stories.On TikTok, they just released Stories. I haven't really tried them much and I don't play with it a lot, but I think that's something I will utilize because those are the people that are already follow you, meaning they have seen a piece of content. So, I think addressing it differently and knowing who's watching what and trying to kind of put yourself in their shoes when you're trying to, you know, teach something, it's important for you to have that trust with them. And I think—key to everything—being raw and authentic. I think people see through that. I would hope they do.And I think, uh, [laugh] that's what I'm going to be trying to do. I'm just going to be really myself and real, and try to help people and I hope that comes through because that's—I'm passionate about getting more people into the cloud and getting them educated. And I feel like it's something that could also allow you to build anything, just from anywhere on your computer, brings people together, the world is getting smaller, really. And just being able to meet people through that and there's just a way to also change your life. And people really could change their life.I changed my life, I think, going into tech and I'm in the United States and I, you know—I'm in New York, you know, but I feel like so many people in the States and outside of the States, you know, all over the world, you know, have access to this, and it's powerful to be able to build something and contribute and be a part of the future of technology, which AWS is.Corey: I feel like, in three years or whatever it is that you leave AWS in the far future, we're going to basically pull this video up and MST3k came together. It's like, “Remember how naive you were talking about these things?” And I'm mostly kidding, but let's be serious. You are presumably going to be focusing on the idea of short-form content. That is—Linda: Yeah.Corey: What your bread-and-butter of audience-building has been around, and that is something that is new for AWS.Linda: Yeah.Corey: And I'm always curious as to how companies and their cultures continue to evolve. I can only imagine there's a lot of support structure in place for that. I personally remember giving a talk at an AWS event and I had my slides reviewed by their legal team, as they always do, and I had a slide that they were looking at very closely where I was listing out the top five AWS services that are bullshit. And they don't really have a framework for that, so instead, they did their typical thing of, “Okay, we need to make sure that each of those services starts with the appropriate AWS or Amazon naming convention and are they capitalized properly?” Because they have a framework for working on those things.I'm really curious as to how the AWS culture and way of bringing messaging to where people are is going to be forced to evolve now that they, like it or not, are going to be having significantly increased presence on TikTok and other short-form platforms.Linda: I mean, it's really going to be interesting to see how this plays out. There's so much content that's put out, but sometimes it's just not reaching the right audience, so making sure that funnel exists to the right people is important and reaching those audiences. So, I think even YouTube Shorts, for example. Many people in tech use YouTube to search a question.They do not care about the intro, sometimes. It depends what kind of following, it depends if [in gaming 00:16:30], but if you're coming and you're building something, it's like a Stack Overflow sometimes. You want to know the answer to your question. Now, YouTube Shorts is a great solution to that because many times people want the shortest possible answer. Now, of course, if it's a tutorial on how to build something, and it warrants ten minutes, that's great.Even ten minutes is considered, now, Shorts because TikTok now has ten-minute videos, but I think TikTok is now searchable in the way YouTube is, and I think let's say YouTube Shorts is short-form, but very different type of short-form than TikTok is. TikTok, hooks matter. YouTube answers to your questions, especially in chat. I wouldn't say everything in YouTube is like that; depends on the niche. But I think even within short-form, there's going to be a different strategy regarding that.So, kind of like having that mix. I guess, depending on platform and audience, that's there. Again, trial and error, but we'll see how this plays out and how this will evolve. Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning-fast processing power, courtesy of third-gen AMD EPYC processors without the IO or hardware limitations of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general-purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices and say goodbye to noisy neighbors and egregious egress forever. Vultr delivers the power of the cloud with none of the bloat. Screaming in the Cloud listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G-E-T-V-U-L-T-R dot com slash screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: I feel like there are two possible outcomes here. One is that AWS—Linda: Yeah.Corey: Nails this pivot into short-form content, and the other is that all your TikTok videos start becoming ten minutes long, which they now support, welcome to my TED Talk. It's awful, and then you wind up basically being video equivalent for all of your content, of recipes when you search them on the internet where first they circle the point to death 18 times with, “Back when I was a small child growing up in the hinterlands, we wound—my grandmother would always make the following stew after she killed the bison with here bare hands. Why did grandma kill a bison? We don't know.” And it just leads down this path so they can get, like, long enough content or they can have longer and longer articles to display more ads.And then finally at the end, it's like ingredient one: butter. Ingredient two, there is no ingredient two. Okay. That explains why it's delicious. Awesome. But I don't like having people prolong it. It's just, give me the answer I'm looking for.Linda: Yeah.Corey: Get to the point. Tell me the story. And—Linda: And this is—Corey: —I'm really hoping that is not the direction your content goes in. Which I don't think it would, but that is the horrifying thing and if for some chance I'm right, I will look like Nostradamus when we do that MST3k episode.Linda: No, no. I mean, I really am—I always personally—even when I was creating content these last few years and testing different things, I'm really a fan of the shortest way possible because I don't have the patience to watch long videos. And maybe it's because I'm a New Yorker that can't sit down from the life of me—apart from when I code of course—but, you know, I don't like wasting time, I'm always on the go, I'm with my coffee, I'm like—that's the kind of style I prefer to bring in videos in the sense of, like, people have no time. [laugh]. You know?The amount of content we're consuming is just, uh, bonkers. So, I don't think our mind is really a built for consuming [laugh] this much content every time you open your phone, or every time you look, you know, online. It's definitely something that is challenging in a whole different way. But I think where my content—if it's ten minutes, it better be because I can't shorten it. That's my thing. So, you can hold me accountable to that because—Corey: Yeah, I want ten minutes of—Linda: I'm not a—Corey: Content, not three minutes of content in a ten-minute bag.Linda: Exactly. Exactly. So, if it's a ten-minute video, it would have been in one hour that I cut down, like, meaning a tutorial, a very much technical types of content. I think things that are that long, especially in tech, would be something like, on that end—unless, of course, you know, I'm not talking about, like, longer videos on YouTube which are panels or that kind of thing. I'm talking more like if I'm doing something on TikTok specifically.TikTok also cares about your watch time, so if people aren't interested in it, it's not going to do well, it doesn't matter how many followers you have. Which is what I do like about the way TikTok functions as opposed to, let's say, Instagram. Instagram is more like it gives it to your following—and this is the current state, I don't know if it always evolves—but the current state is, Instagram Reels kind of functions in a way where it goes first to the people that follow you, but, like, in a way that's more amplified than TikTok. TikTox tests people that follows you, but if it's not a good video, it won't do well. And honestly, they're many good videos videos that don't go viral. I'm not talking about that.Sometimes it's also the topic and the niche and the sound and the title. I mean, there's so many people who take a topic and do it in three different ways and one of them goes viral. I mean, there's so many factors that play into it and it's hard to really, like, always, you know, kind of reverse engineer but I do think that with TikTok, things won't do well, more likely if it's not a good piece of content as opposed to—or, like, too long, right? Not—I shouldn't say not good a good piece of content—it's too long.Corey: The TikTok algorithm is inscrutable to me. TikTok is firmly convinced, based upon what it shows me, that I am apparently a lesbian. Which okay, fine. Awesome. Whatever. I'm also—it keeps showing me ads for ADHD stuff, and it was like, “Wow, like, how did it know that?” Followed by, “Oh, right. I'm on TikTok. Nevermind.”And I will say at one point, it recommended someone to me who, looking at the profile picture, she's my nanny. And it's, I have a strong policy of not, you know, stalking my household employees on social media. We are not Facebook friends, we are not—in a bunch of different areas. Like, how on earth would they have figured this out? I'm filling the corkboard with conspiracy and twine followed by, “Wait a minute. We probably both connect from the same WiFi network, which looks like the same IP address and it probably doesn't require a giant data science team to put two and two together on those things.” So, it was great. I was all set to do the tinfoil hat conspiracy, but no, no, that's just very basic correlation 101.Linda: And also, this is why I don't enable contacts on TikTok. You know, how it says, “Oh, connect your contacts?”Corey: Oh, I never do that. Like, “Can we look at your contacts?”Linda: Never.Corey: “No.” “Can we look at all of your photos?” “Absolutely not.” “Can we track you across apps?” “Why would anyone say yes to this? You're going to do it anyway, but I'll say no.” Yeah.Linda: Got to give the least privilege. [laugh]. Definitely not—Corey: Oh absolutely.Linda: Yeah. I think they also help [crosstalk 00:22:40]—Corey: But when I'm looking at—the monetization problem is always a challenge on things like this, too, because when I'm—my guilty TikTok scrolling pleasures hit, it's basically late at night, I just want to see—I want something to want to wind down and decompress. And I'm not about ready to watch, “Hey, would you like to migrate your enterprise database to this other thing?” It's, I… no. There's a reason that the ads that seem to be everywhere and doing well are aimed at the mass market, they're generally impulse buys, like, “Hey, do you want to set that thing over there on fire, but you're not close enough to get the job done? But this flame thrower today. Done.”And great, like, that is something everyone can enjoy, but these nuanced database products and anything else is B2B SaaS style stuff, it feels like it's a very tough sell and no one has quite cracked that nut, yet.Linda: Yeah, and I think the key there—this is, I'm guessing based on, like, what I want to try out a lot—is the hook and the way you're presenting it has to be very product-focused in the sense that it needs to be very relatable. Even if you don't know anything about tech, you need to be—like, for example, in the architecture page on AWS, there's a video about the Emirates going to Mars mission. Space is a very interesting topic, right? I think, a hook, like, “Do want to see how, like, how this is bu—” like, it's all, like, freely available to see exactly [laugh] how this was built. Like, it might—in the right wording, of course—it might be interesting to someone who's looking for fun-fact-style content.Now, is it really addressing the people that are building everyday? Not really always, depends who's on there and the mass market there. But I feel like going on the product and the things that are mass-market, and then working backwards to the tech part of it, even if they learn something and then want to learn more, that's really where I see TikTok. I don't think every platform would be, maybe, like this, but that's where I see getting people: kind of inviting them in to learn more, but making it cool and fun. It's very important, but it feels cool and fun. [laugh]. So.Because you're right, you're scrolling at 2 a.m. who wants to start seeing that. Like, it's all about how you teach. The content is there, the content has—you know, that's my thing. It's like, the content is there. You don't need to—it's yes, there's the part where things are always evolving and you need to keep track of that; that's whole ‘nother type thing which you do very well, right?And then there's a part where, like, the content that already exists, which part is evergreen? Meaning, which part is, like, something that could be re—also is not timely as far as update, for example, well-architected framework. Yes, it evolves all the time, you always have new pillars, but the guide, the story, that is an evergreen in some sense because that guide doesn't, you know, that whole concept isn't going anywhere. So, you know, why should someone care about that?Corey: Right. How to turn on two-factor authentication for your AWS account.Linda: Right.Corey: That's evergreen. That's the sort of thing that—and this is the problem, I think, AWS has had for a long time where they're talking about new features, new enhancements, new releases. But you look what people are actually doing and so much of it is just the same stuff again and again because yeah, that is how most of the cloud works. It turns out that three-quarters of company's production infrastructures tends to run on EC2 more frequently than it tends to run on IoT Greengrass. Imagine that.So, there's this idea of continuing to focus on these things. Now, one of my predictions is that you're going to have a lot of fun with this and on some level, it's going to really work for you. In others, it's going to be hilariously—well, its shortcomings might be predictable. I can just picture now you're at re:Invent; you have a breakout talk and terrific. And you've successfully gotten your talk down to one minute and then you're sitting there with—Linda: [laugh].Corey: —the remainder of maybe 59. Like, oh, right. Yeah. Turns out not everything is short-form. Are you predicting any—Linda: Yep.Corey: Problems going from short-form to long-form in those instances?Linda: I think it needs to go hand-in-hand, to be honest. I think when you're creating any short-form content, you have—you know, maybe something short is actually sometimes in some ways, right, harder because you really have to make sure, especially in a technical standpoint, leaving things out is sometimes—leaves, like, a blind spot. And so, making sure you're kind of—whatever you're educating, you kind of, to be clear, “Here's where you learn more. Here's how I'm going to answer this next question for you: go here.” Now, in a longer-form content, you would cover all that.So, there's always that longevity. I think even when I write a script, and there's many scripts I'm still [laugh] I've had many ideas until now I've been doing this still at 2 a.m. so of course, there's many that didn't, you know, get released, but those are the things that are more time consuming to create because you're taking something that's an hour-long, and trying to make sure you're pulling out the things that are most—that are hook-style, that invite people in, that are accurate, okay, that really give you—explain to you clearly where are the blind spots that I'm not explaining on this video are. So, “XYZ here is, like, the high level, but by the way, there's, like, this and this.” And in a long-form, you kind of have to know the long-form version of it to make the short-form, in some ways, depending on what—you're doing because you're funneling them to somewhere. That's my thing. Because I don't think there should be [crosstalk 00:27:36]—Corey: This is the curse of Twitter, on some level. It's, “Well, you forgot about this corner case.” “Yeah, I had 280 characters to get into.” Like, the whole point of short-form content—which I do consider Twitter to be—is a glimpse and a hook, and get people interested enough to go somewhere and learn more.For something like AWS, this makes a lot of sense. When you highlight a capability or something interesting, it's something relevant, whereas on the other side of it, where it's this, “Oh, great. Now, here's an 8000-word blog post on how I did this thing.” Yeah, I'm going to get relatively fewer amounts of traffic through that giant thing, but the people who are they're going to be frickin' invested because that's going to be a slog.Linda: Exactly.Corey: “And now my eight-hour video on how exactly I built this thing with TypeScript.” Badly—Linda: Exactly.Corey: —as it turns out because I'm a bad programmer.Linda: [laugh]. No, you're not. I love your shit-posting. It's great.Corey: Challenge accepted.Linda: [laugh]. I love what you just mentioned because I think you're hitting the nail on the head when it comes to the quality content that's niche focus, like, there needs to be a good healthy mix. I think always doing that, like, mass-market type video, it doesn't give you, also, the credibility you need. So, doing those more niche things that might not be relevant to everybody, but here and there, are part of that is really key for your own knowledge and for, like, the com—you know, as far as, like, helping someone specific. Because it's almost like—right, when you're selling a service and you're using social media, right, not everybody's going to buy your service. It doesn't matter what business you're in right? The deep-divers are going to be the people that pay up. It's just a numbers game, right? The more people you, kind of, address from there, you'll find—Corey: It's called a funnel for a reason.Linda: Right. Exactly.Corey: Free content, paid content. Almost anyone will follow me on Twitter; fewer than will sign up for a newsletter; fewer will listen to a podcast; fewer will watch a video, and almost none of them will buy a consulting engagement. But ‘almost' and ‘actually none of them,' it turns out is a very different world.Linda: Exactly. [laugh]. So FYI, I think there's—Corey: And that's fine. That's the way it works.Linda: That's the way it works. And I think there needs to be that niche content that might not be, like, the most viral thing, but viral doesn't mean quality, you know? It doesn't. There's many things that play into what viral is, but it's important to have the quality content for the people that need that content, and finding those people, you know, it's easier when you have that kind of mass engagement. Like, who knows? I'm a student. I told you; I'm a professional student. I'm still [laugh] learning every day.Corey: Working with AWS almost makes it a requirement. I wish you luck—Linda: Yeah.Corey: —in the new gig and I also want to thank you for taking time out of your day to speak with me about how you got to this point. And we're all very eager to see where you go from here.Linda: Thank you so much, Corey, for having me. I'm a huge fan, I love your content, I'm an avid reader of your newsletter and I am looking forward to very much being in touch and on the Twitterverse and beyond. So. [laugh].Corey: If people want to learn more about what you're up to, and other assorted nonsense, where's the best place they can go to find you?Linda: So, the best place they could go is lindavivah.com. I have all my different social handles listed on there as well a little bit about me, and I hope to connect with you. So, definitely go to lindavivah.com.Corey: And that link will, of course, be in the [show notes 00:30:39]. Thank you so much for taking the time to speak with me. I really appreciate it.Linda: Thank you, Corey. Have a wonderful rest of the day.Corey: Linda Haviv, AWS Developer Advocate, very soon now anyway. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, smash the like and subscribe buttons, and of course, leave an angry comment that you have broken down into 40 serialized TikTok videos.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Yalla To The Cloud
Episode 104: FinOps Optimization Part 1

Yalla To The Cloud

Play Episode Listen Later Jun 28, 2022 12:51


בפינה זו, נגיש לכם מידע על העבודה היומיומית בסביבת ענן מנקודת המבט שלנו.דוברי הפרק: אריאל מונפו, אבי קינן וערן לדור. בפרק הקודם, דיברנו על Amazon Lightsail. מהו השירות, למי הוא מיועד, מה השימושים שלו, מה ההבדל בינו לביו ה-EC2. כמו כן, אבי ביצע דמו בלייב ליצירת שרת חדש. בפרק זה, נדבר עם ערן לדור, מנהל תחום ה-FinOps בחברת AppsFlyer על אופטימיזציה בעולמות ה-FinOps. ניגע בסוגי האופטימיזציות שניתן לעשות, נשוחח על התרבות הארגונית, KPIs ונושאים מגוונים נוספים בחיי היום יום של צוותי ה-FinOps. רוצים להתעדכן בתכנים נוספים בנושאי ענן וטכנולוגיות מתקדמות? הירשמו עכשיו לניוזלטר שלנו ותמיד תישארו בעניינים. להרשמה: https://www.israelclouds.com/newslettersignup

AWS Bites
42. How do you containerise and run your API with Fargate?

AWS Bites

Play Episode Listen Later Jun 23, 2022 18:35


We recently talked about migrating a monolithic application to AWS, using EC2, load balancers, S3 and RDS. In this episode we want to talk about a slightly different setup, where we are going for containers instead of EC2 and we want to deploy them in Fargate. In this We are going to cover all the components you will need in your architecture, the reasons to choose Fargate over any alternatives and discuss some CDK tricks to get started in a quick way (and the pitfalls that might come with them). In this episode, we mentioned the following resources: - CDK ECS Patterns: https://docs.aws.amazon.com/cdk/api/v2/docs/aws-cdk-lib.aws_ecs_patterns-readme.html - How to fine tune the health checks to speed up the deployment process: https://www.qovery.com/blog/how-to-speed-up-amazon-ecs-container-deployments - Previous Episode “37. How do you migrate a monolith to AWS without the drama?”: https://awsbites.com/37-how-do-you-migrate-a-monolith-to-aws-without-the-drama/ This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: ​​https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige #aws #docker #fargate

Screaming in the Cloud
Transparency in Cloud Security with Gafnit Amiga

Screaming in the Cloud

Play Episode Listen Later Jun 21, 2022 30:10


Full Description / Show Notes Gafnit explains how she found a vulnerability in RDS, an Amazon database service (1:40) Gafnit and Corey discuss the concept of not being able to win in cloud security (7:20) Gafnit talks about transparency around security breaches (11:02) Corey and Gafnit discuss effectively communicating with customers about security (13:00) Gafnit answers the question “Did you come at the RDS vulnerability exploration from a perspective of being deeper on the Postgres side or deeper on the AWS side? (18:10) Corey and Gafnit talk about the risk of taking a pre-existing open source solution and offering it as a managed service (19:07) Security measures in cloud-native approaches versus cloud-hosted (22:41) Gafnit and Corey discuss the security community (25:04) About GafnitGafnit Amiga is the Director of Security Research at Lightspin. Gafnit has 7 years of experience in Application Security and Cloud Security Research. Gafnit leads the Security Research Group at Lightspin, focused on developing new methods to conduct research for new cloud native services and Kubernetes. Previously, Gafnit was a lead product security engineer at Salesforce focused on their core platform and a security researcher at GE Digital. Gafnit holds a Bs.c in Computer Science from IDC Herzliya and a student for Ms.c in Data Science.Links Referenced: Lightspin: https://www.lightspin.io/ Twitter: https://twitter.com/gafnitav LinkedIn: https://www.linkedin.com/in/gafnit-amiga-b1357b125/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. We've taken a bit of a security bent to the conversations that we've been having on this show and over the past year or so and, well, today's episode is no different. In fact, we're going a little bit deeper than we normally tend to. My guest today is Gafnit Amiga, who's the Director of Security Research at Lightspin. Gafnit, thank you for joining me.Gafnit: Hey, Corey. Thank you for inviting me to the show.Corey: You sort of burst onto the scene—and by ‘scene,' I of course mean the cloud space, at least to the level of community awareness—back, I want to say in April of 2022 when you posted a very in-depth blog post about exploiting RDS and some misconfigurations on AWS's side to effectively display internal service credentials for the RDS service itself. Now, that sounds like it's one of those incredibly deep, incredibly murky things because it is, let's be clear. At a high level, can you explain to me exactly what it is that you found and how you did it? Gafnit: Yes, so, RDS is database service of Amazon. It's a managed service where you can choose the engine that you prefer. One of them is Postgres. There, I found the vulnerability. The vulnerability was in the extension in the log_fdw—so it's for—like, stands for Foreign Data Wrapper—where this extension is, therefore reading the logs directly of the engine, and then you can query it using SQL queries, which should be simpler and easy to use.And this extension enables you to provide a path. And there was a path traversal, but the traversal happened only when you dropped a validation of the wrapper. And this is how I managed to read local files from the database EC2 machine, which shouldn't happen because this is a managed service and you shouldn't have any access to the underlying host.Corey: It's always odd when the abstraction starts leaking, from an AWS perspective. I know that a friend of mine was on Aurora during the beta and was doing some high-performance work and suddenly started seeing SQL errors about /var/temp filling up, which is, for those who are not well versed in SQL, and even for those who are, that's not the sort of thing you tend to expect to show up on there. It feels like the underlying system tends to leak in—particularly in RDS sense—into what is otherwise at least imagined to be a fully-managed service.Gafnit: Yes because sometimes they want to give you an informative error so you will be able to realize what happened and what caused to the error, and sometimes they prefer not to give you too many information because they don't want you to get to the underlying machine. This is why, for example, you don't get a regular superuser; you have an RDS superuser in the database.Corey: It seems to me that this is sort of a problem of layering different security models on top of each other. If you take a cloud-native database that they designed, start to finish, themselves, like DynamoDB, the entire security model for Dynamo, as best I can determine, is wrapped up within IAM. So, if you know IAM—spoiler, nobody knows IAM completely, it seems—but if you have that on lock you've got it; there's nothing else you need to think about. Whereas with RDS, you have to layer on IAM to get access to the database and what you're allowed to do with it.But then there's an entirely separate user management system, in many respects, of local users for other Postgres or MySQL or any other systems that were using, to a point where even when they started supporting IRM for authentication to RDS at the database user level. It was flagged in the documentation with a bunch of warnings of, “Don't do this for high-volume stuff; only do this in development style environments.” So, it's clear that it has been a difficult marriage, for lack of a better term. And then you have to layer on all the other stuff that if God forbid, you're in a multi-cloud style environment or working with Kubernetes on top of all of this, and it seems like you're having to pick and choose between four or five different levels of security modeling, as well as understand how all of those things interplay together. How come we don't see things like this happening four times a day as a result?Gafnit: Well, I guess that there are more issues being found, but not always published but I think that this is what makes it more complex for both sides. Creating managed services with resources and third parties that everybody knows. To make it easy for them to use requires a deep understanding of the existing permission models of the service where you want to integrate it with your permission model and how the combination works. So, you actually need to understand how every change is going to affect the restrictions that you want to have. So, for example, if you don't want the database users to be able to read-write or do a network activity, so you really need to understand the permission model of the Postgres itself. So, it makes it more complicated for development, but it's also good for researchers because they already know Postgres and they have a good starting point.Corey: My philosophy has always been when you're trying to secure something, you need to have at least a topical level of understanding of the entire system, start to finish. One of the problems I've had with the idea of microservices as is frequently envisioned is that there's separation, but not real separation, so you have to hand-wave over a whole bunch of the security model. If you don't understand something, I believe it's very difficult to secure it. And let's be honest, even if you do understand [laugh] something, it can be very difficult to secure it. And the cloud vendors with IAM and similar systems don't seem to be doing themselves any favors, given the sheer complexity and the capabilities that they're demanding of themselves, even for having one AWS service talk to another one, but in the right way.And it's finicky, and it's nuanced, and debugging it becomes a colossal pain. And finally, at least those of us who are bad at these things, finally say, “The hell with it,” and they just grant full access from Service A to Service B—in the confines of a test environment. I'm not quite that nuts myself, most days. And then it's the biggest lie we always tell ourselves is once we have something overscoped like that, usually for CI/CD, it's, “Oh, todo: I'll go back and fix that later.” Yeah, I'm looking back five years ago and that's still on my todo list.For some reason, it's never been the number one priority. And in all likelihood, it won't be until right after it really should have been my number one priority. It feels like in cloud security particularly, you can't win, you can only not lose. I always found that to be something of a depressing perspective and I didn't accept it for the longest time. But increasingly, these days, it started to feel like that is the state of the world. Am I wrong on that? Am I just being too dour?Gafnit: What do you mean by you cannot lose?Corey: There's no winning in security from my perspective because no one is going to say, “All right. We won the security. Problem solved. The end.” Companies don't view security as a value-add. It is only about a downside risk mitigation play.It's, “Yay, another day of not getting breached.” And the failure mode from there is, “Okay, well, we got breached, but we found out about it ourselves immediately internally, rather than reading about it in The New York Times in two weeks.” The winning is just the steady-state, the status quo. It's just all different flavors of losing beyond that.Gafnit: So, I don't think it's quite the case because I can tell that they do do always an active work on securing the services and their structure because I went over other extensions before reaching to the log foreign data wrapper, and they actually excluded high-risk functionalities that could help me to achieve privileged access to the underlying host. And they do it with other services as well because they do always do the security review before having it integrated externally. But you know, it's an endless zone. You can always have something. Security vulnerabilities are always [arrays 00:09:06]. So everyone, whenever they can help and to search and to give their value, it's appreciated.Corey: I feel like I need to clarify a bit of nuance. When your blog post first came out talking about this, I was, well let's say a little irritated toward AWS on Twitter and other places. And Twitter is not a place for nuance, it is easy to look at that and think, “Oh, I was upset at AWS for having a vulnerability.” I am not, I want to be very clear on that. Now, it's certainly not good, but these are computers; that is the nature of how they work.If you want to completely secure computer, cut the power to it, sink it in concrete and then drop it in the ocean. And even then, there are exceptions to all of that. So, it's always a question of not blocking all risk; it's about trade-offs and what risk is acceptable. And to AWS is credit, they do say that they practice defense-in-depth. Being able to access the credentials for the running RDS service on top of the instance that it was running on, while that's certainly not good, isn't as if you'd suddenly had keys to everything inside of AWS and all their security model crumbles away before you.They do the right thing and the people working on these things are incredibly good. And they work very hard at these things. My concern and my complaint is, as much as I enjoy the work that you do and reading these blog posts talking about how you did it, it bothers me that I have to learn about a vulnerability in a service for which I pay not small amounts of money—RDS is the number one largest charge in my AWS bill every month—and I have to hear about it from a third-party rather than the vendor themselves. In this case, it was a full day later, where after your blog post went up, and they finally had a small security disclosure on AWS's site talking about it. And that pattern feels to me like it leads nowhere good.Gafnit: So, transparency is a key word here. And when I wrote the post, I asked if they want to add anything from their side, and they told that they already reached out to the vulnerable customers and they helped them to migrate to their fixed version. So, from their side, it didn't felt it's necessary to add it over there. But I did mention the fact that I did the investigation and no customer data was hurt. Yeah, but I think that if there will be maybe a more organized process for any submission of any vulnerability that where all the steps are aligned, it will help everyone and anyone can be informed with everything that happens.Corey: I have always been extraordinarily impressed by people who work at AWS and handle a lot of the triaging of vulnerability reports. Zack Glick, before he left, was doing an awful lot of that Dan [Erson 00:12:05] continues to be a one of the bright lights of AWS, from my perspective, just as far as customer communication and understanding exactly what the customer perspective is. And as individuals, I see nothing but stars over at AWS. To be clear, ‘Nothing but Stars' is also the name of most of my IAM policies, but that's neither here nor there.It seems like, on some level, there's a communications and policy misalignment, on some level, because I look at this and every conversation I ever have with AWS's security folks, they are eminently reasonable, they're incredibly intelligent, and they care. There's no mistaking that they legitimately care. But somewhere at the scale of company they're at, incentives get crossed, and everyone has a different position they're looking at these things from, and it feels like that disjointedness leads to almost a misalignment as far as how to effectively communicate things like this to customers.Gafnit: Yes, it looks like this is the case, but if more things will be discovered and published, I think that they will have eventually an organized process for that. Because I guess the researchers do find things over there, but they're not always being published for several reasons. But yes, they should work on that. [laugh].Corey: And that is part of the challenge as well, where AWS does not have a public vulnerability disclosure program. [unintelligible 00:13:30] hacker one, they don't have a public bug bounty program. They have a vulnerability disclosure email address, and the people working behind that are some of the hardest working folks in tech, but there is no unified way of building a community of researchers around the idea of exploring this. And that is a challenge because you have reported vulnerabilities, I have reported significantly fewer vulnerabilities, but it always feels like it's a hurry up and wait scenario where the communication is not always immediate and clear. And at best, it feels like we often get a begrudging, “Thank you.”Versus all right, if we just throw ethics completely out the window and decide instead that now we're going to wind up focusing on just effectively selling it to the highest bidder, the value of, for example, a hypervisor escape on EC2 for example, is incalculable. There is no amount of money that a bug bounty program could offer for something like that compared to what it is worth to the right bad actor at the right time. So, the vulnerabilities that we hear about are already we're starting from a basis of people who have a functioning sense of ethics, people who are not deeply compromised trying to do something truly nefarious. What worries me is the story of—what are the stories that we aren't seeing? What are the things that are being found where instead of fighting against the bureaucracy around disclosure and the rest, people just use them for their own ends? And I'm gratified by the level of response I see from AWS on the things that they do find out about, but I always have to wonder, what aren't we seeing?Gafnit: That's a good question. And it really depends on their side if they choose to expose it or not.Corey: Part of the challenge too, is the messaging and the communication around it and who gets credit and the rest. And it's weird, whenever they release some additional feature to one of their big headline services, there are blog posts, there are keynote speeches, there are customer references, they go on speaking tours, and the emails, oh, God, they never stopped the emails talking about how amazing all of these things are. But whenever there's a security vulnerability or a disclosure like this—and to be fair, AWS's response to this speaks very well of them—it's like you have to go sneak down into the dark sub-basement, with the filing cabinet behind the leopard sign and the rest, to even find out that these things exist. And I feel like they're not doing themselves any favors by developing that reputation for lack of transparency around these things. “Well, while there was no customer impact, so why would we talk about it?”Because otherwise, you're setting up a myth that there never is a vulnerability on the side of—what is it that you're building as a cloud provider. And when there is a problem down the road—because there always is going to be; nothing is perfect—people are going to say, “Hey, wait a minute. You didn't talk about this. What else haven't you talked about?”And it rebounds on them with sometimes really unfortunate side effects. With Azure as a counterexample here, we see a number of Azure exploits where, “Yeah, turned out that we had access to other customers' data and Azure had no idea until we told them.” And Azure does it statements about, “Oh, we have no evidence of any of this stuff being used improperly.” Okay, that can mean that you've either check your logs and things are great or you don't have logging. I don't know that necessarily is something I trust.Conversely, AWS has said in the past, “We have looked at the audit logs for this service dating back to its launch years ago, and have validated that none of that has never been used like this.” One of those responses breeds an awful lot of customer trust. The other one doesn't. And I just wish AWS knew a little bit more how good crisis communication around vulnerabilities can improve customer trust rather than erode it.Gafnit: Yes, and I think that, as you said, there will always be vulnerabilities. And I think that we are expecting to find more, so being able to communicate as clearly as you can and to expose things about maybe the fakes and how the investigation is being done, even in a high level, for all the vulnerabilities can gain more trust from the customer side.Corey: DoorDash had a problem. As their cloud-native environment scaled and developers delivered new features, their monitoring system kept breaking down. In an organization where data is used to make better decisions about technology and about the business, losing observability means the entire company loses their competitive edge. With Chronosphere, DoorDash is no longer losing visibility into their applications suite. The key? Chronosphere is an open-source compatible, scalable, and reliable observability solution that gives the observability lead at DoorDash business, confidence, and peace of mind. Read the full success story at snark.cloud/chronosphere. That's snark.cloud slash C-H-R-O-N-O-S-P-H-E-R-E.Corey: You have experience in your background specifically around application security and cloud security research. You've been doing this for seven years at this point. When you started looking into this, did you come at the RDS vulnerability exploration from a perspective of being deeper on the Postgres side or deeper on the AWS side of things?Gafnit: So, it was both. I actually came to the RDS lead from another service where there was something [about 00:18:21] in the application level. But then I reached to an RDS and thought, well, it will be really nice to find thing over here and to reach the underlying machine. And when I entered to the RDS zone, I started to look at it from the application security eyes, but you have to know the cloud as well because there are integrations with S3, you need to understand the IAM model. So, you need a mix of both to exploit specifically this kind of issue. But you can also be database experts because the payload is a pure SQL.Corey: It always seems to me that this is an inherent risk in trying to take something that is pre-existing is an open-source solution—Postgres is one example but there are many more—and offer it as a managed service. Because I think one of the big misunderstandings is that when—well, AWS is just going to take something like Redis and offer that as a managed service, it's okay, I accept that they will offer a thing that respects the endpoints and then acts as if it were Redis, but under the hood, there is so much in all of these open-source projects that is built for optionality of wherever you want to run this thing, it will run there; whatever type of workload you want to throw at it, it can work. Whereas when you have a cloud provider converting these things into a managed service, they are going to strip out an awful lot of those things. An easy example might be okay, there's this thing that winds up having to calculate for the way the hard drives on a computer work and from a storage perspective.Well, all the big cloud providers already have interesting ways that they have solved storage. Every team does not reimplement that particular wheel; they use in-house services. Chubby's file locking, for example, over on Google side is a classic example of this that they've talked about an awful lot so every team building something doesn't have to rediscover all of that. So, the idea that, oh, we're just going to take up this open-source thing, clone it off a GitHub, fork it, and then just throw it into production as a managed service seems more than a little naive. What's your experience around seeing, as you get more [laugh] into the weeds of these things than most customers are allowed to get, what's your take on this?Do you find that this looks an awful lot like the open-source version that we all use? Or is it something that looks like it has been heavily customized to take advantage of what AWS is offering internally as underlying bedrock services?Gafnit: So, from what I saw until now, they do want to save the functionality so you will have the same experience as you're working with the same service that not on AWS because you're you are used to that. So, they are not doing dramatic changes, but they do want to reduce the risk in the security space. So, there will be some functionalities that they will not let you to do. And this is because of the managed party in areas where the full workload is deployed in your account and you can access it anyway, so they will not have the same security restrictions because you can access the workload anyway. But when it's managed, they need to prevent you from accessing the underlying host, for example. And they do the changes, but they're really picked to the specific actions that can lead you to that.Corey: It also feels like RDS is something of a, I don't want to call it a legacy service because it is clearly still very much actively developed, but it's what we'll call it a ‘classic service.' When I look at a new AWS launch, I tend to mentally bucket them into two things. There's the cloud-native approach, and we've already talked about DynamoDB. That would be one example of this. And there's the cloud-hosted model where you have to worry about things like instances and security groups and the networking stuff, and so on and so forth, where it's basically feels like they're running their thing on top of a pile of EC2 instances, and that abstraction starts leaking.Part of me wonders if looking at some of these older services like RDS, they made decisions in the design and build out of these things that they might not if they were to go ahead and build it out today. I mean, Aurora is an example of what that might look like. Have you found as you start looking around the various security foibles of different cloud services, that the security posture of some of the more cloud-native approaches is better or worse or the same as the cloud-hosted world?Gafnit: Well, so for example, in the several issues that were found, and also here in the RDS where you can see credentials in a file, this is not a best practice in security space. And so, definitely there are things to improve, even if it's developed on the provider side. But it's really hard to answer this question because in a managed area where you don't have any access, it's hard to tell how it's configured and if it's configured properly. So, you need to have some certification from their side.Corey: This is, on some level, part of the great security challenge, especially for something that is not itself open-source, where they obviously have terrific security teams, don't get me wrong. At no point do I want to ever come across a saying, “Oh, those AWS people don't know how security works.” That is provably untrue. But there is something to be said for the value of having a strong community in the security space focusing on this from the outside of looking at these things, of even helping other people contextualize these things. And I'm a little disheartened that none of the major cloud providers seem to have really embraced the idea of a cloud security community, to the point where the one that I'm most familiar with, the cloud security forum Slack team seems to be my default place where I go for context on things.Because I dabble. I keep my hand in when it comes to security, but I'm certainly no expert. That's what people like you are for. I make fun of clouds and I work on the billing parts of it and that's about as far as it goes for me. But being able to get context around is this a big deal? Is this description that a company is giving, is it accurate?For example, when your post came out, I had not heard of Lightspin in this context. So, reaching out to a few people I trusted, is this legitimate? The answer was, “Yes. It's legitimate and it's brilliant. That's a company that keep your eye on.” Great. That's useful context and there's no way to buy that. It has to come from having those conversations with people in the [broader 00:24:57] sense of the community. What's your experience been looking at the community side of the world of security?Gafnit: Well, so I think that the cloud security has a great community, and this is one of the things that we at Lightspin really want to increase and push forward. And we see ourselves as a security-driven company. We always do the best to publish a post, even detailed posts, not about vulnerabilities, about how things works in the cloud and how things are being evaluated, to release open-source tools where you can use them to check your environment even if you're not a customer. And I think that the community is always willing to explain and to investigate together. And it's a welcome effort, but I think that the messaging should be also for all layers, you know, also for the DevOps and the developers because it can really help if it will start from this point from their side, as well.Corey: It needs to be baked in, from start to finish.Gafnit: Yeah, exactly.Corey: I really want to thank you for taking the time out of your day to speak with me today. If people want to learn more about what you're up to, where's the best place for them to find you?Gafnit: So, you can find me on Twitter and on LinkedIn, and feel free to reach out.Corey: We will, of course, put links to that in the [show notes 00:26:25]. Thank you so much for being so generous with your time today. I appreciate it.Gafnit: Thank you, Corey.Corey: Gafnit Amiga, Director of Security Research at Lightspin. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, and if it's on the YouTubes, smash the like and subscribe buttons, which I'm told are there. Whereas if you've hated this podcast, same story, like and subscribe and the buttons, leave a five-star review on a various platform, but also leave an insulting, angry comment about how my observation that our IAM policies are all full of stars is inaccurate. And then I will go ahead and delete that comment later because you didn't set a strong password.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

Le Podcast AWS en Français
Quoi de neuf ?

Le Podcast AWS en Français

Play Episode Listen Later Jun 17, 2022 12:35


Cette semaine, on parle de mainframe et de modernization de ces applications dans le cloud. On aborde aussi brièvement une nouvelle instance EC2, une fonctionalité qui facilite la vie des data scientists dans Sagemaker et d'automatisation post migration apres avoir migré des VMs.

Yalla To The Cloud
Episode 103: Amazon Lightsail

Yalla To The Cloud

Play Episode Listen Later Jun 14, 2022 9:56


בפינה זו, נגיש לכם מידע על העבודה היומיומית בסביבת ענן מנקודת המבט שלנו.דוברי הפרק: אריאל מונפו ואבי קינן. בפרק הקודם, דיברנו על מה זה AWS Organizations, מה מהות השינוי, איך הוא מסייע לנו בניהול ריבוי חשבונות, מה הפיצ'רים הכלולים בתוכו, מה היתרונות שלו, למה השירות חשוב לאינטגרטורים וממה כדאי להיזהר. בפרק זה, נדבר על Amazon Lightsail. מהו השירות, למי הוא מיועד, מה השימושים שלו, מה ההבדל בינו לביו ה-EC2. כמו כן, אבי יבצע דמו בלייב ליצירת שרת חדש. רוצים להתעדכן לצפות בתכנים הכי איכותיים ומקצועיים? הירשמו עכשיו לכנס מחשוב הענן הגדול בישראל! להרשמה > https://www.israelcloudsummit.com/

Spotlight on the Community
EC2 Connects the AEC Industry With Current and Prospective Employees to Cultivate a Diverse and Skilled Workforce

Spotlight on the Community

Play Episode Listen Later Jun 8, 2022 24:50


Debbie Barnum and Rachael Gonzalez, Co-Founders of EC2, discuss the organization's mission, its 25 community partnerships, the monthly Cheers and Construction Careers event, and ConstructForce, an online searchable resource database in the AEC world.

Podcast AWS LATAM
EP97: Contenedores en AWS - Introducción a Amazon ECS

Podcast AWS LATAM

Play Episode Listen Later May 31, 2022 10:37


En este nuevo episoEn este nuevo episodio sobre Contenedores en AWS conversamos con Omar Onofre sobre la solución de orquestación desarrollada por AWS, Amazon ECS. Repasamos las diferentes características y capacidades del servicio (conceptos de Task y Service, opciones de hosting en EC2 y Fargate, escalamiento, Capacity Providers), así como también conversamos sobre las integraciones con diferentes servicios de la plataforma ¡No se lo pierdan! Material Adicional: https://aws.amazon.com/es/ecs/getting-started/

The Cloud Pod
166: The Cloud Pod Eagerly Awaits the Microsoft Pay Increase

The Cloud Pod

Play Episode Listen Later May 27, 2022 61:47


On The Cloud Pod this week, the team struggles with scheduling to get everyone in the same room for just one week. Plus, Microsoft increases pay for talent retention while changing licensing for European Cloud Providers, Google Cloud introduces AlloyDB for PostgreSQL, and AWS announces EC2 support for NitroTPM. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights

AWS Bites
38. How do you choose the right compute service on AWS?

AWS Bites

Play Episode Listen Later May 26, 2022 29:27


When it comes to choosing compute services on AWS, there are a lot of options, including EC2, ECS, Lambda, EKS… New ones keep emerging all the time! Selecting the right one for each application is no longer an easy choice. In this episode we discuss why you need compute services and what kinds of problems should be offloaded to something else entirely. We suggest how you can develop a methodology to make the selection process easier and less biased within your company. We discuss at a high level what are some of the different compute options available in AWS and finally we provide a few different options example use cases and describe how we picked the compute service for each. In this episode, we mentioned the following resources: - InfoQ article “A Recipe to Migrate and Scale Monoliths in the Cloud”: https://www.infoq.com/articles/cloud-migrate-scale/ - Our previous episode about migrating monoliths to the cloud: https://www.youtube.com/watch?v=GYa2RkYDfBQ - Article on choosing the right compute service: https://www.fourtheorem.com/blog/aws-compute This episode is also available on YouTube: https://www.youtube.com/AWSBites You can listen to AWS Bites wherever you get your podcasts: - Apple Podcasts: https://podcasts.apple.com/us/podcast/aws-bites/id1585489017 - Spotify: https://open.spotify.com/show/3Lh7PzqBFV6yt5WsTAmO5q - Google: https://podcasts.google.com/feed/aHR0cHM6Ly9hbmNob3IuZm0vcy82YTMzMTJhMC9wb2RjYXN0L3Jzcw== - Breaker: https://www.breaker.audio/aws-bites - RSS: ​​https://anchor.fm/s/6a3312a0/podcast/rss Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on Twitter: - https://twitter.com/eoins - https://twitter.com/loige #aws #compute #lambda

Le Podcast AWS en Français
Quoi de neuf ?

Le Podcast AWS en Français

Play Episode Listen Later May 20, 2022 12:08


Des nouveautés côté infrastructure puisqu'on parle de Trusted Platform Module (TPM), de EC2 auto-scaling et de Backup, mais aussi dans le monde du serverless avec Step Functions et dans le monde du mobile avec Amplify.

Coder Radio
466: Luxury Emotional Manipulation

Coder Radio

Play Episode Listen Later May 18, 2022 51:40


Why Mike feels like Heroku is in a failed state, what drove us crazy about Google I/O this year, how Chris botched something super important, and some serious Python love sprinkled throughout.

Screaming in the Cloud
Stepping Onto the AWS Commerce Platform with James Greenfield

Screaming in the Cloud

Play Episode Listen Later May 17, 2022 45:23


About JamesJames has been part of AWS for over 15 years. During that time he's led software engineering for Amazon EC2 and more recently leads the AWS Commerce Platform group that runs some of the largest systems in the world, handling volumes of data and request rates that would make your eyes water. And AWS customers trust us to be right all the time so there's no room for error.Links Referenced:Email: jamesg@amazon.comTranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning-fast processing power, courtesy of third-gen AMD EPYC processors without the IO or hardware limitations of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general-purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices and say goodbye to noisy neighbors and egregious egress forever. Vultr delivers the power of the cloud with none of the bloat. “Screaming in the Cloud” listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G-E-T-V-U-L-T-R dot com slash screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Finding skilled DevOps engineers is a pain in the neck! And if you need to deploy a secure and compliant application to AWS, forgettaboutit! But that's where DuploCloud can help. Their comprehensive no-code/low-code software platform guarantees a secure and compliant infrastructure in as little as two weeks, while automating the full DevSecOps lifestyle. Get started with DevOps-as-a-Service from DuploCloud so that your cloud configurations are done right the first time. Tell them I sent you and your first two months are free. To learn more visit: snark.cloud/duplo. Thats's snark.cloud/D-U-P-L-O-C-L-O-U-D. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. And I've been angling to get someone from a particular department at AWS on this show for nearly its entire run. If you were to find yourself in an Amazon building and wander through the various dungeons and boiler rooms and subterranean basements—I presume; I haven't seen nearly as many of you inside of those buildings as people might think—you pass interesting departments labeled things like ‘Spline Reticulation,' or whatnot. And then you come to a very particular group called Commerce Platform.Now, I'm not generally one to tell other people's stories for them. My guest today is James Greenfield, the VP of Commerce Platform at AWS. James, thank you for joining me and suffering the slings and arrows I will no doubt be hurling at you.James: Thanks for having me. I'm looking forward to it.Corey: So, let's start at the very beginning—because I guarantee you, you're going to do a better job of giving the chapter and verse answer than I would from a background mired deeply in snark—what is Commerce Platform? It sounds almost like it's the retail website that sells socks, books, and underpants.James: So, Commerce Platform actually spans a bunch of different things. And so, I'm going to try not to bore you with a laundry list of all of the things that we do—it's a much longer list than most people assume even internal to AWS—at its core, Commerce Platform owns all of the infrastructure and processes and software that takes the fact that you've been running an EC2 instance, or you're storing an object in S3 for some period of time, and turns it into a number at the end of the month. That is what you asked for that service and then proceeds to try to give you as many ways to pay us as easily as possible. There are a few other bits in there that are maybe less obvious. One is we're also responsible for protecting the platform and our customers from fraudulent activity. And then we're also responsible for helping collect all of the data that we need for internal reporting to support some of the back-ends services that a business needs to do things like revenue recognition and general financial reporting.Corey: One of the interesting aspects about the billing system is just how deeply it permeates everything that happens within AWS. I frequently say that when it comes to cloud, cost and architecture are foundationally and fundamentally the same exact thing. If your entire service goes down, a few interesting things happen. One, I don't believe a single customer is going to complain other than maybe a few accountants here and there because the books aren't reconciling, but also you've removed a whole bunch of constraints around why things are the way that they are. Like, what is the most efficient way to run this workload?Well, if all the computers suddenly become free, I don't really care about efficiency, so much is, “Oh, hey. There's a fly, what do I have as a flyswatter? That's right, I'm going to drop a building on it.” And those constraints breed almost everything. I've said, for example, that S3 has infinite storage because it does.They can add drives faster than we're able to fill them—at least historically; they added some more replication services—but they're going to be able to buy hard drives faster than the rest of us are going to be able to stretch our budgets. If that constraint of the budget falls away, all bets are really off, and more or less, we're talking about the destruction of the cloud as a viable business entity. No pressure or anything.James: [laugh].Corey: You're also a recent transplant into AWS billing as a whole, Commerce Platform in general. You spent 15 years at the company, the vast majority of that over an EC2. So, either it was you've been exiled to a basically digital Siberia or it was one of those, “Okay, keeping all the EC2 servers up, this is easy. I don't see what people stress about.” And they say, “Oh, ho ho, try this instead.” How did you find yourself migrating over to the Commerce Platform?James: That's actually one I've had a lot from folks that I've worked with. You're right, I spent the first 15 or so years of my career at AWS in EC2, responsible for various things over there. And when the leadership role in Commerce Platform opened up, the timing was fortuitous, and part of it, I was in the process of relocating my family. We moved to Vancouver in the middle of last year. And we had an opening in the role and started talking about, potentially, me stepping into that role.The reason that I took it—there's a few reasons, but the primary reason is that if I look back over my career, I've kind of naturally gravitated towards owning things where people only really remember that they exist when they're not working. And for some reason, you know, I enjoy the opportunity to try to keep those kinds of services ticking over to the point where people don't notice them. And so, Commerce Platform lands squarely in that space. I've always been attracted to opportunities to have an impact, and it's hard to imagine having much more of an impact than in the Commerce Platform space. It underpins everything, as you said earlier.Every single one of our customers depends on the service, whether they think about it or realize it. Every single service that we offer to customers depends on us. And so, that really is the sort of nexus within AWS. And I'm a platform guy, I've always been a platform guy. I like the force multiplier nature of platforms, and so Commerce Platform, you know, as I kind of thought through all of those elements, really was a great opportunity to step in.And I think there's something to be said for, I've been a customer of Commerce Platform internally for a long time. And so, a chance to cross over and be on the other side of that was something that I didn't want to pass up. And so, you know, I'm digging in, and learning quickly, ramping up. By no means an expert, very dependent on a very smart, talented, committed group of people within the team. That's kind of the long and short of how and why.Corey: Let's say that I am taking on the role of an AWS product team, for the sake of argument. I know, keep the cringe down for a second, as far as oh, God, the wince is just inevitable when the idea of me working there ever comes up to anyone. But I have an idea for a service—obviously, it runs containers, and maybe it does some other things as well—going from idea to six-pager to MVP to barely better than MVP day-one launch, and at some point, various things happen to that service. It gets staff with a team, objectives and a roadmap get built, a P&L and budget, and a pricing model and the rest. One the last thing that happens, apparently, is someone picks the worst name off of a list of candidates, slaps it on the product, and ships it off there.At what point does the billing system and figuring out the pricing dimensions for a given service tend to factor in? Is that a last-minute story? Is that almost from the beginning? Where along that journey does, “Oh, by the way, we're building this thing. Maybe we should figure out, I don't know, how to make money from it.” Factor into the conversation?James: There are two parts to that answer. Pretty early on as we're trying to define what that service is going to look like, we're already typically thinking about what are the dimensions that we might charge along. The actual pricing discussions typically happen fairly late, but identifying those dimensions and, sort of, the right way to present it to customers happens pretty early on. The thing that doesn't happen early enough is actually pulling the Commerce Platform team in. but it is something that we're going to work this year to try to get a little bit more in front of.Corey: Have you found historically that you have a pretty good idea of how a service is going to be priced, everything is mostly thought through, a service goes to either private preview or you're discussing about a launch, and then more or less, I don't know, someone like me crops up with a, “Hey, yeah, let's disregard 90% of what the service does because I see a way to misuse the remaining 10% of it as a database.” And you run some mental math and realize, “Huh. We're suddenly giving, like, eight petabytes of storage per customer away for free. Maybe we should guard against that because otherwise, it's rife with misuse.” It used to be that I could find interesting ways to sneak through the cracks of various services—usually in pursuit of a laugh—those are getting relatively hard to come by and invariably a lot more trouble than they're worth. Is that just better comprehensive diligence internally, is that learning from customers, or am I just bad at this?James: No, I mean, what you're describing is almost a variant of the Defender's Dilemma. They are way more ways to abuse something than you can imagine, and so defending against that is pretty challenging. And it's important because, you know, if you turn the economics of something upside down, then it just becomes harder for us to offer it to customers who want to use it legitimately. I would say 90% of that improvement is us learning. We make plenty of mistakes, but I think, you know, one of the things that I've always been impressed by over my time here is how intentional we are trying to learn from those mistakes.And so, I think that's what you're seeing there. And then we try very hard to listen to customers, talk to folks like you, because one of the best ways to tackle anything it smells of the Defender's Dilemma is to harness that collective creativity of a large number of smart people because you really are trying to cover as much ground as possible.Corey: There was a fun joke going around a while back of what is the most expensive environment you can get running on a free tier account before someone from AWS steps in, and I think I got it to something like half a billion dollars in the first month. Now, I haven't actually tested this for reasons that mostly have to do with being relatively poor compared to, you know, being able to buy Guam. And understanding as well the fraud protections built into something like AWS are largely built around defending against getting service usage for free that in some way, shape or form, benefits the attacker. The easy example of that would be mining cryptocurrency, which is just super-economic as long as you use someone else's AWS account to do it. Whereas a lot of my vectors are, “Yeah, ignore all of that. How do I just make the bill artificially high? What can I do to misuse data transfer? And passing a single gigabyte through, how much can I make that per gigabyte cost be?” And, “Oh, circular replication and the Lambda invokes itself pattern,” and basically every bad architectural decision you can possibly make only this time, it's intentional.And that shines some really interesting light on it. And I have to give credit where due, a lot of that didn't come from just me sitting here being sick and twisted nearly so much as it did having seen examples of that type of misconfiguration—by mistake—in a variety of customer accounts, most confidently my own because it turns out that the way I learn things is by screwing them up first.James: Yeah, you've touched on a couple of different things in there. So, you know, maybe the first one is, I typically try to draw a line between fraud and abuse. And fraud is essentially trying to spend somebody else's money to get something for free. And we spent a lot of time trying to shut that down, and we're getting really good at catching it. And then abuse is either intentional or unintentional. There's intentional abuse: You find a chink in our armor and you try to take advantage of it.But much more commonly is unintentional abuse. It's not really abuse, you know. Abuse has very negative connotations, but it's unintentionally setting something up so that you run up a much larger bill than you intended. And we have a number of different internal efforts, and we're working on a bunch more this year, to try to catch those early on because one of my personal goals is to minimize the frequency with which we surprise customers. And the least favorite kind of surprise for customers is a [laugh] large bill. And so, what you're talking about there is, in a sufficiently complex system, there's always going to be weaknesses and ways to get yourself tied up in knots.We're trying both at the service team level, but also within my teams to try to find ways to make it as hard as possible to accidentally do that to yourself and then catch when you do so that we can stop it. And even more on the intentional abuse side of things, if somebody's found a way to do something that's problematic for our services, then you know, that's pretty much on us. But we will often reach out and engage with whoever's doing and try to understand what they're trying to do and why. Because often, somebody's trying to do something legitimate, they've got a problem to solve, they found a creative way to solve it, and it may put strain on the service because it's just not something we designed for, and so we'll try to work with them to use that to feed into either new services, or find a better place for that workload, or just bolster what they're using. And maybe that's something that eventually becomes a fully-fledged feature that we offer the customers. We're always open to learning from our customers. They have found far more creative ways to get really cool things done with our services than we've ever imagined. And that's true today.Corey: I mean, most of my service criticisms come down to the fact that you have more-or-less built a very late model, high performing iPad, and I'm out there complaining about, “What a shitty hammer this thing is, it barely works at all, and then it breaks in my hand. What gives?” I would also challenge something you said a minute ago that the worst day for some customers is to get a giant surprise bill, but [unintelligible 00:13:53] to that is, yeah, but, on some level, that kind of only money; you do have levers on your side to fix those issues. A worse scenario is you have a customer that exhibits fraud-like behavior, they're suddenly using far more resources than they ever did before, so let's go ahead and turn them off or throttle them significantly, and you call them up to tell them you saved them some money, and, “Our Superbowl ad ran. What exactly do you think you're doing?” Because they don't get a second bite at that kind of Apple.So, there's a parallel on both sides of this. And those are just two examples. The world is full of nuances, and at the scale that you folks operate at. The one-in-a-million events happen multiple times a second, the corner cases become common cases, and I'm surprised—to be direct—how little I see you folks dropping the ball.James: Credit to all of the teams. I think our secret sauce, if anything, really does come down to our people. Like, a huge amount of what you see as hopefully relatively consistent, good execution comes down to people behind the scenes making sure. You know, like, some of it is software that we built and made sure it's robust and tested to scale, but there's always an element of people behind the scenes, when you hit those edge cases or something doesn't quite go the way that you planned, making sure that things run smoothly. And that, if anything, is something that I'm immensely proud of and is kind of amazing to watch from the inside.Corey: And, on some level, it's the small errors that are the bigger concern than the big ones. Back a couple years ago, when they announced GP3 volumes at re:Invent, well, great, well spin up a test volume and kick the tires on it for an hour. And I think it was 80 or 100 gigs or whatnot, and the next day in the bill, it showed up as about $5,000. And it was, “Okay, that's not great. Not great at all.” And it turned out that it was a mispricing error by I think a factor of a million.And okay, at least it stood out. But there are scenarios where we were prepared to pay it because, oops, you got one over on us. Good job. That's never been the mindset I've gotten about AWS's philosophy for pricing. The better example that I love because no one took it seriously, was a few years before that when there was a LightSail bug in the billing system, and it made the papers because people suddenly found that for their LightSail instance, they were getting predicted bills of $4 billion.And the way I see it, you really only had to make that work once and then you've made your numbers for the year, so why not? Someone's going to pay for it, probably. But that was such out-of-the-world numbers that no one saw that and ever thought it was anything other than a bug. It's the small pernicious things that creep in. Because the billing system is vast; I had no idea when I started working with AWS bills just how complicated it really was.James: Yeah, I remember both of those, and there's something in there that you touched on that I think is really important. That's something that I realized pretty early on at Amazon, and it's why customer obsession is our flagship leadership principle. It's not because it's love and butterflies and unicorns; customer obsession is key to us because that's how you build a long-term sustainable business is your customers depend on you. And it drives how we think about everything that we do. And in the billing space, small errors, even if there are small errors in the customer's favor, slowly erode that trust.So, we take any kind of error really seriously and we try to figure out how we can make sure that it doesn't happen again. We don't always get that right. As you said, we've built an enormous, super-complex business to growing really quickly, and really quick growth like that always acts as kind of a multiplier on top of complexity. And on the pricing points, we're managing millions of pricing points at the moment.And our tools that we use internally, there's always room for improvement. It's a huge area of focus for us. We're in the beginning of looking at applying things like formal methods to make sure that we can make very hard guarantees about the correctness of some of those. But at the end of the day, people are plugging numbers in and you need as many belts and braces as possible to make sure that you don't make mistakes there.Corey: One of the things that struck me by surprise when I first started getting deep into this space was the fact that the finalized bill was—what does it mean to have this be ‘finalized?' It can hit the Cost and Usage Report in an S3 bucket and it can change retroactively after the month closed periodically. And that's when I started to have an inkling of a few things: Not just the sheer scale and complexity inherent to something like the billing system that touches everything, but the sheer data retention stories where you clearly have to be able to go back and reconstruct a bill from the raw data years ago. And I know what the output of all of those things are in the form of Cost and Usage Reports and the billing data from our client accounts—which is the single largest expense in all of our AWS accounts; we spent thousands and thousands and thousands of dollars a year just on storing all of that data, let alone the processing piece of it—the sheer scale is staggering. I used to wonder why does it take you a day to record me using something to it's showing up in the bill? And the more I learned the more it became a how can you do that in only a day?James: Yes, the scale is actually mind-boggling. I'm pretty sure that the core of our billing system is—I'm reasonably confident it's the largest or one of the largest data processing systems on the planet. I remember pretty early on when I joined Commerce Platform and was still starting to wrap my head around some of these things, Googling the definition of quadrillion because we measured the number of metering events, which is how we record usage in services, on a daily basis in the quadrillions, which is a billion billions. So, it's just an absolutely staggering number. And so, the scale here is just out of this world.That's saying something because it's not like other services across AWS are small in their own right. But I'm still reasonably sure that being one of a handful of services that is kind of at the nexus of AWS and kind of deals with the aggregate of AWS's scale, this is probably one of the biggest systems on the planet. And that shows up in all sorts of places. You start with that input, just the sheer volume of metering events, but that has to produce as an output pretty fine-grained line item detailed information, which ultimately rolls up into the total that a customer will see in their bill. But we have a number of different systems further down the pipeline that try to do things like analyze your usage, make sensible recommendations, look for opportunities to improve your efficiency, give you the ability to slice and dice your data and allocate it out to different parts of your business in whatever way it makes sense for your business. And so, those systems have to deal with anywhere from millions to billions to recently, we were talking about trillions of data points themselves. And so, I was tangentially aware of some of the scale of this, but being in the thick of it having joined the team really just does underscore just how vast the systems are.Corey: I think it's, on some level, more than a little unfortunate that that story isn't being more widely told, more frequently. Because when Commerce Platform has job postings that are available on the website, you read it and it's very vague. It doesn't tend to give hard numbers about a lot of these things, and people who don't play in these waters can easily be forgiven for thinking the way that you folks do your job is you fire up one of those 24 terabyte of RAM instances that—you know, those monstrous things that you folks offer—and what do you do next? Well, Microsoft Excel. We have a special high memory version that we've done some horse-trading with our friends over at Microsoft for.It's, yeah, you're several steps beyond that, at this point. It's a challenging problem that every one of your customers has to deal with, on some level, as well. But we're only dealing with the output of a lot of the processing that you folks are doing first.James: You're exactly right. And a big focus for some of my teams is figuring out how to help customers deal with that output. Because even if you're talking about couple of orders of magnitude reduction, you're still talking about very large numbers there. So, to help customers make sense of that, we have a range of tools that exist, we're investing in.There's another dimension of complexity in the space that I think is one that's also very easy to miss. And I think of it as arbitrary complexity. And it's arbitrary because some of the rules that we have to box within here are driven by legislative changes. As you operate more and more countries around the world, you want to make sure that we're tax compliant, that we help our customers be tax compliant. Those rules evolve pretty rapidly, and Country A may sit next to Country B, but that doesn't mean that they're talking to one another. They've all got their own ideas. They're trying to accomplish r—00:22:47Corey: A company is picking up and relocating from India to Germany. How do we—James: Exactly.Corey: —change that on the AWS side and the rest? And it's, “Hoo boy, have you considered burning it all down and filing an insurance claim to start over?” And, like, there's a lot of complexity buried underneath that that just doesn't rise to the notice of 99% of your customers.James: And the fact that it doesn't rise to the notice is something that we strive for. Like, these shouldn't be things that customers have to worry about. Because it really is about clearing away the things that, as far as possible, you don't want to have to spend time thinking about so that you can focus on the thing that your business does that differentiates you. It's getting rid of that undifferentiated heavy lifting. And there's a ton of that in this space, and if you're blissfully unaware of it, then hopefully that means that we're doing our job.Corey: What I'm, I think, the most surprised about, and I have been for a long time. And please don't take this as an insult to various other folks—engineers, the rest, not just in other parts of AWS but throughout the other industry—but talking to the people who work within Commerce Platform has always been just a fantastic experience. The caliber of people that you have managed to attract and largely retain—we don't own people, they do matriculate out eventually—but the caliber of people that you've retained on your teams has just been out of this world. And at first, I wondered, why are these awesome people working on something as boring and prosaic as billing? And then I started learning a little bit more as I went, and, “Oh, wow. How did they learn all the stuff that they have to hold in their head in tension at once to be able to build things like this?” It's incredibly inspiring just watching the caliber of the people that you've been able to bring in.James: I've been really, really excited joining this team, as I've gotten other folks on the team because there's some super-smart people here. But what's really jumped out to me is how committed the team is. This is, for the most part, a team that has been in the space for many years. Many of them have—we talk about boomerangs, folks who live AWS, go spend some time somewhere else and come back and there's a surprisingly high proportion of folks in Commerce Platform who have spent time somewhere else and then come back because they enjoy the space, they find that challenging, folks are attracted to the ability to have an impact because it is so foundational. But yeah, there's a super-committed core to this team. And I really enjoy working with teams where you've got that because then you really can take the long view and build something great. And I think we have tons of opportunities to do that here.Corey: It sounds ridiculous, but I've reached out to team members before to explain two-cent variances in my bill, and never once have I been confronted with a, “It's two cents. What do you care?” They understand the requirement that these things be accurate, not just, “Eh, take our word for it.” And also, frankly, they understand that two cents on a $20 bill looks a little different on a $20 million bill. So yeah, let us figure out if this is systemic or something I have managed to break.It turns out the Cost and Usage Report processing systems don't love it when there's a cost allocation tag whose name contains an emoji. Who knew? It's the little things in life that just have this fun way of breaking when you least expect it.James: They're also a surprisingly interesting problem. So like, it turns out something as simple as rounding numbers consistently across a distributed system at this scale, is a non-trivial problem. And if you don't, then you do get small seventh or eighth decimal place differences that add up to something that then shows up as a two-cent difference somewhere. And so, there's some really, really interesting problems in the space. And I think the team often takes these kinds of things as a personal challenge. It should be correct, and it's not, so we should go make sure it is correct. The interesting problems abound here, but at the end of the day, it's the kind of thing that any engineering team wants to go and make sure it's correct because they know that it can be.Corey: This episode is sponsored in parts by our friend EnterpriseDB. EnterpriseDB has been powering enterprise applications with PostgreSQL for 15 years. And now EnterpriseDB has you covered wherever you deploy PostgreSQL on premises, private cloud, and they just announced a fully managed service on AWS and Azure called BigAnimal, all one word. Don't leave managing your database to your cloud vendor because they're too busy launching another half dozen manage databases to focus on any one of them that they didn't build themselves. Instead, work with the experts over at EnterpriseDB. They can save you time and money, they can even help you migrate legacy applications, including Oracle, to the cloud.To learn more, try BigAnimal for free. Go to biganimal.com/snark, and tell them Corey sent you.Corey: On the one hand, I love people who just round and estimate—we all do that, let's be clear; I sit there and I back-of-the-envelope everything first. But then I look at some of your pricing pages and I count the digits after the zeros. Like, you're talking about trillionths of a dollar on some of your pricing points. And you add it up in the course of a given hour and it's like, oh, it's $250 a month, most months. And it's you work backwards to way more decimal places of precision than is required, sometimes.I'm also a personal fan of the bill that counts, for example, number of Route 53 zones. Great. And it counts them to four decimal places of precision. Like, I don't even know what half of it Route 53 zone is at this point, let alone something to, like, ah the 1,000th of the zone is going to cause this. It's all an artifact of what the underlying systems are.Can you by any chance shed a little light on what the evolution of those systems has been over a period of time? I have to imagine that anything you built in the early days, 16 years ago or so from the time of this recording when S3 launched to general availability, you probably didn't have to worry about this scope and scale of what you do, now. In fact, I suspect if you tried to funnel this volume through S3 back then, the whole thing would have collapsed under its own weight. What's evolved over the time that you had the billing system there? Because changes come slowly to your environment. And frankly, I appreciate that as a customer. I don't like surprising people in finance.James: Yeah, you're totally right. So, I joined the EC2 team as an engineer myself, some 16 years ago, and the very first thing that I did was our billing integration. And so, my relationship with the Commerce Platform organization—what was the billing team way back when—it goes back over my entire career at AWS. And at the time, the billing team was similar, you know, [unintelligible 00:28:34] eight people. And that was everything. There was none of the scale and complexity; it was all one system.And much like many of our biggest, oldest services—EC2 is very similar, S3 is as well—there's been significant growth over the last decade-and-a-half. A lot of that growth has been rapid, and rapid growth presents its own challenges. And you live with decisions that you make early on that you didn't realize were significant decisions that have pretty deep implications 15 years later. We're still working through some of those; they present their own challenges. Evolving an existing system to keep up with the growth of business and a customer base that's as varied and complex as ours is always challenging.And also harder but I also think more fun than a clean sheet redo at this point. Like, that's a great thought exercise for, well, if we got to do this again today, what would we do now that we've learned so much over the last 15 years? But there's this—I find it personally fascinating challenge with evolving a live system where it's like, “No, no, like, things exist, so how do we go from there to where we want to be next?”Corey: Turn the billing system off for 18 months, rebuild—James: Yeah. [laugh].Corey: The whole thing from first principles. Light it up. I'm sure you'd have a much better billing system, and also not a company left anymore.James: [laugh]. Exactly, exactly. I've always enjoyed that challenge. You know, even prior to AWS, my previous careers have involved similar kinds of constraints where you've got a live system, or you've got an existing—in the one case, it was an existing SDK that was deployed to tens of thousands of customers around the world, and so backwards compatibility was something that I spent the first five years of my career thinking about it way more detail than I think most people do. And it's a very similar mindset. And I enjoy that challenge. I enjoy that: How do I evolve from here to there without breaking customers along the way?And that's something that we take pretty seriously across AWS. I think SimpleDB is the poster child for we never turn things off. But that applies equally to the services that are maybe less visible to customers, and billing is definitely one of them. Like, we don't get to switch stuff off. We don't get to throw things away and start again. It's this constant state of evolution.Corey: So, let's say that I were to find a way to route data through a series of two Managed NAT Gateways and then egress to internet, and the sheer density of the expense of that traffic tears a hole in the fabric of space-time, it goes back 15 years ago, and you can make a single change to how the billing system was built. What would it be? What pisses you off the most about the current constraints that you have to work within or around?James: I think one of the biggest challenges we've got, actually, is the concept of an account. Because an account means half-a-dozen different things. And way back, when it seemed like a great idea, you just needed an account; an account was your customer, and it was the same thing as the boundary that you put all your resources inside. And of course, it's the same thing that you're going to roll all of your usage up and issue a bill against. And that has been one of the areas that's seen the most evolution and probably still has a pretty long way to go.And what's interesting about that is, that's probably something we could have seen coming because we watched the retail business go through, kind of, the same evolution because they started with, well, a customer is a customer is a customer and had to evolve to support the concept of sellers and partners. And then users are different than customers, and you want to log in and that's a different thing. So, we saw that kind of bifurcation of a single entity into a wide range of different related but separate entities, and I think if we'd looked at that, you know, thought out 15 years, then yeah, we could probably have learned something from that. But at the same time, when AWS first kicked off, we had wild ambitions for it, but there was no guarantee that it was going to be the monster that it is today. So, I'm always a little bit reluctant to—like, it's a great thought exercise, but it's easy to end up second-guessing a pretty successful 15 years, so I'm always a little bit careful to walk that line. But I think account is one of the things that we would probably go back and think about a little bit more.Corey: I want to be very clear with this next question that it is intentionally setting up a question I suspect you get a lot. It does not mirror my own thinking on the matter even slightly, but I get a version of it myself all the time. “AWS bills, that sounds boring as hell. Why would you choose to work on such a thing?” Now, I have a laundry list of answers to that aren't nearly as interesting as I suspect yours are going to be. What makes working on this problem space interesting to you?James: There's a bunch of different things. So, first and foremost, the scale that we're talking about here is absolutely mind-blowing. And for any engineer who wants to get stuck into problems that deal with mind-blowingly large volumes of data, incredibly rich dimensions, problems where, honestly, applying techniques like statistical reasoning or machine learning is really the only way to chip away at it, that exists in spades in the space. It's not always immediately obvious, and I think from the outside, it's easy to assume this is actually pretty simple. So, the scale is a huge part of that.Corey: “Oh, petabytes. How quaint.”James: [laugh]. Exactly. Exactly I mean, it's mind-blowing every time I see some of the numbers in various parts of the Commerce Platform space. I talked about quadrillions earlier. Trillions is a pretty common unit of measure.The complexity that I talked about earlier, that's a result of external environments is another one. So, imposed by external entities, whether it's a government or a tax authority somewhere, or a business requirement from customers, or ourselves. I enjoy those as well. Those are different kinds of challenge. They really keep you on your toes.I enjoy thinking of them as an engineering problem, like, how do I get in front of them? And that's something we spend a lot of time doing in Commerce Platform. And when we get it right, customers are just unaware of it. And then the third one is, I personally am always attracted to the opportunity to have an impact. And this is a space where we get to hopefully positively impact every single customer every day. And that, to me is pretty fulfilling.Those are kind of the three standout reasons why I think this is actually a super-exciting space. And I think it's often an underestimated space. I think once folks join the team and sort of start to dig in, I've never heard anybody after they've joined, telling me that what they're doing is boring. Challenging, yes. Is frustrating, sometimes. Hard, absolutely, but boring never comes up.Corey: There's almost no service, other than IAM, that I can think of that impacts every customer simultaneously. And it's easy for me to sit in the cheap seats and say, “Oh, you should change this,” or, “You should change that.” But every change you have is so massive in scale that it's going to break a whole bunch of companies' automations around the bill processing in different ways. You have an entire category of user persona who is used to clicking a certain button in this certain place in the console to generate the report every month, and if that button moves or changes color, or has a different font, suddenly that renders their documentation invalid, and they're scrambling because it's not their core competency—nor should it be—and every change you make is so constricted, just based upon all the different concerns that you've got to be juggling with. How do you get anything done at all? I find that to be one of the most impressive aspects about your organization, bar none.James: Yeah, I'm not going to lie and say that it isn't a challenge, but a lot of it comes down to the talent that we have on the team. We have a super-motivated, super-smart, super-engaged team, and we spend a lot of time figuring out how to make sure that we can keep moving, keep up with the business, keep up with a world that's getting more complicated [laugh] with every passing day. So, you've kind of hit on one of the core challenges there, which is, how do we keep up with all of those different dimensions that are demanding an increasing amount of engineering and new support and new investment from us, while we keep those customers happy?And I think you touched on something else a little bit indirectly there, which is, a lot of our customers are actually pretty technical across AWS. The customers that Commerce Platform supports, are often the least technical of our customers, and so often need the most help understanding why things are the way they are, where the constraints are.Corey: “A big bill from Amazon. How many books did you people buy last month?”—James: [laugh]. Exactly.Corey: —is still very much level of understanding in some cases. And it's not because they're dumb; far from it. It's just, imagine that some people view there as being more to life than understanding the nuances and intricacies of cloud computing. How dare they?James: Exactly. Who would have thought?Corey: So, as you look now over all of your domain, such as it is, what sucks the most? What are you looking to fix as far as impactful changes that the rest of the world might experience? Because I'm not going to accept one of those questions like, “Oh, yeah, on the back-end, we have this storage subsystem for a tertiary thing that just annoys me because it wakes us up once in a whi”—no, no, I want something customer-facing. What's the painful thing you're looking at fixing next?James: I don't like surprising customers. And free tier is, sort of, one of those buckets of surprises, but there are others. Another one that's pretty squarely in my sights is, whether we like it or not, customer accounts get compromised. Usually, it's a password got reused somewhere or was accidentally committed into a GitHub repository somewhere.And we have pretty established, pretty effective mechanisms for finding all of those, we'll scan for passwords and credentials, and alert customers to those, and help them correct that pretty quickly. We're also actually pretty good at detecting when an account does start to do something that suggests that it's been compromised. Usually, the first thing that a compromised account starts to do is cryptocurrency mining. We're pretty quick to catch those; we catch those within a matter of hours, much faster most days.What we haven't really cracked and where I'm focused at the moment is getting back to the customer in a way that's effective. And by that I mean specifically, we detect an account compromised super-quickly, we reach out automatically. And so, you know, a customer has got some kind of contact from us usually within a couple of hours. It's not having the effect that we need it to. Customers are still being surprised a month later by a large bill. And so, we're digging into how much of that is because they never saw the contact, they didn't know what to do with the contact.Corey: It got buried with all the other, “Hey, we saw you spun up an S3 bucket. Have you heard of what S3 is?” Again, that's all valuable, but you have 300-some-odd services. If you start doing that for every service, you're going to hit mail sending limits for Gmail.James: Exactly. It's not just enough that we detect those and notify customers; we have to reduce the size of the surprise. It's one thing to spend 100 bucks a month on average, and then suddenly find that your spend has jumped $250 because you reused the password somewhere and somebody got ahold of it and it's cryptocurrency-mining your account. It's a whole different ballgame to spend 100 bucks a month and then at the end of the month discover that your bill is suddenly $2,000 or $20,000. And so, that's something that I really wanted to make some progress on this year. Corey: I've really enjoyed our conversation. If people want to learn more about how you view these things, how you're approaching some of these problems, or potentially are just the right kind of warped to consider joining up, where's the best place for them to go?James: They should drop me an email at jamesg@amazon.com. That is the most direct way to get hold of me, and I promise I will get back to you. I try to stay on top of my email as much as possible. But that will come straight to me, and I'm always happy to talk to folks about the space, talk to folks about opportunities in this team, opportunities across AWS, or just hear what's not working, make sure that it's something that we're aware of and looking at.Corey: Throughout Amazon, but particularly within Commerce Platform, I've always appreciated the response of, whenever I report something, no matter how ridiculous it is—and I assure you there's an awful lot of ridiculousness in my bug reports—the response has always been the same: “Tell me more. Help me understand what it is you're trying to achieve—even if it is ridiculous—so we can look at this and see what is actually going on.” Every Amazonian team has been great about that or you're not at Amazon very long, but you folks have taken that to an otherworldly level. I just want to thank you for doing that.James: I appreciate you for calling that out. We try, you know, we really do. We take listening to our customers very seriously because, at the end of the day, that's what makes us better, and that's how we make sure we're in it for the long haul.Corey: Thanks once again for being so generous with your time. I really appreciate it.James: Yeah, thanks for having me on. I've enjoyed it.Corey: James Greenfield, VP of Commerce Platform at AWS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment—possibly on YouTube as well—about how you aren't actually giving this five-stars at all; you have taken three trillions of a star off of the rating.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

The Cloud Pod
164: The Cloud Pod SWIFT-ly Moves Its Money to Google Cloud

The Cloud Pod

Play Episode Listen Later May 16, 2022 42:45


On The Cloud Pod this week, Peter's been suspended without pay for two weeks for not filing his vacation requests in triplicate. Plus it's earnings season once again, there's a major Google and SWIFT collaboration afoot, and MSK Serverless is now generally available, making Kafka management fairly hassle-free. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights

airhacks.fm podcast with adam bien
Real World Enterprise Serverless Java on AWS Cloud

airhacks.fm podcast with adam bien

Play Episode Listen Later May 15, 2022 66:41


An airhacks.fm conversation with Goran Opacic (@goranopacic) about: sales force automation at ehsteh, Palm Pilot syncing, starting a SaS company, hetzner, Azure, then AWS, running EC2 machines, going serverless, kubernetes and the clouds, running MicroProfile applications on Quarkus and AWS Lambda, one code base - multiple lambdas, Lambda runs on Firecracker VM, OkHTTP on Lambdas, tree shaking with GraalVM, AWS CodeArtifact to cache Maven repositories, Amazon ECR, AWS CodeCommit, databases are hard to split, AWS CodeDeploy with scheduler, code hot swap, managed services is serverless, running AWS Fargate on spot intances, using Eclipse BIRT on AWS Lambda, Goran is AWS Data Hero, Goran Opacic on twitter: @goranopacic, Goran's blog: madabout.cloud

Software Sessions
Ant Wilson on Supabase

Software Sessions

Play Episode Listen Later May 11, 2022 59:08


This episode originally aired on Software Engineering Radio.A few topics covered Building on top of open source Forking their GoTrue dependency Relying on Postgres features like row level security Adding realtime support based on Postgres's write ahead log Generating an API layer based on the database schema with PostgREST Creating separate EC2 instances for each customer's database How Postgres could scale in the future Monitoring postgres Common support tickets Permissive open source licenses Related Links @antwilson Supabase Supabase GitHub Firebase Airtable PostgREST GoTrue Elixir Prometheus VictoriaMetrics Logflare BigQuery Netlify Y Combinator Postgres PostgreSQL Write-Ahead Logging Row Security Policies pg_stat_statements pgAdmin PostGIS Amazon Aurora Transcript You can help edit this transcript on GitHub. [00:00:00] Jeremy: Today I'm talking to Ant Wilson, he's the co-founder and CTO of Supabase. Ant welcome to software engineering radio.[00:00:07] Ant: Thanks so much. Great to be here. [00:00:09] Jeremy: When I hear about Supabase, I always hear about it in relation to two other products. The first is Postgres, which is a open source relational database. And second is Firebase, which is a backend as a service product from Google cloud that provides a no SQL data store.It provides authentication and authorization. It has a functions as a service component. It's really meant to be a replacement for you needing to have your own server, create your own backend. You can have that all be done from Firebase. I think a good place for us to start would be walking us through what supabase is and how it relates to those two products.[00:00:55] Ant: Yeah. So, so we brand ourselves as the open source Firebase alternativethat came primarily from the fact that we ourselves do use the, as the alternative to Firebase. So, so my co-founder Paul in his previous startup was using fire store. And as they started to scale, they hit certain limitations, technical scaling limitations and he'd always been a huge Postgres fan.So we swapped it out for Postgres and then just started plugging in. The bits that we're missing, like the real-time streams. Um, He used the tool called PostgREST with a T for the, for the CRUD APIs. And sohe just built like the open source Firebase alternative on Postgres, And that's kind of where the tagline came from.But the main difference obviously is that it's relational database and not a no SQL database which means that it's not actually a drop-in replacement. But it does mean that it kind of opens the door to a lot more functionality actually. Um, Which, which is hopefully an advantage for us. [00:02:03] Jeremy: it's a, a hosted form of Postgres. So you mentioned that Firebase is, is different. It's uh NoSQL. People are putting in their, their JSON objects and things like that. So when people are working with Supabase is the experience of, is it just, I'm connecting to a Postgres database I'm writing SQL.And in that regard, it's kind of not really similar to Firebase at all. Is that, is that kind of right?[00:02:31] Ant: Yeah, I mean, the other thing, the other important thing to notice that you can communicate with Supabase directly from the client, which is what people love about fire base. You just like put the credentials on the client and you write some security rules, and then you just start sending your data. Obviously with supabase, you do need to create your schema because it's relational.But apart from that, the experience of client side development is very much the same or very similar the interface, obviously the API is a little bit different. But, but it's similar in that regard. But I, I think, like I said, we're moving, we are just a database company actually. And the tagline, just explained really, well, kind of the concept of, of what it is like a backend as a service. It has the real-time streams. It has the auth layer. It has the also generated APIs. So I don't know how long we'll stick with the tagline. I think we'll probably outgrow it at some point. Um, But it does do a good job of communicating roughly what the service is.[00:03:39] Jeremy: So when we talk about it being similar to Firebase, the part that's similar to fire base is that you could be a person building the front end part of the website, and you don't need to necessarily have a backend application because all of that could talk to supabase and supabase can handle the authentication, the real-time notifications all those sorts of things, similar to Firebase, where we're basically you only need to write the front end part, and then you have to know how to, to set up super base in this case.[00:04:14] Ant: Yeah, exactly. And some of the other, like we took w we love fire based, by the way. We're not building an alternative to try and destroy it. It's kind of like, we're just building the SQL alternative and we take a lot of inspiration from it. And the other thing we love is that you can administer your database from the browser.So you go into Firebase and you have the, you can see the object tree, and when you're in development, you can edit some of the documents in real time. And, and so we took that experience and effectively built like a spreadsheet view inside of our dashboard. And also obviously have a SQL editor in there as well.And trying to, create this, this like a similar developer experience, because that's where Firebase just excels is. The DX is incredible. And so we, we take a lot of inspiration from it in, in those respects.[00:05:08] Jeremy: and to to make it clear to our listeners as well. When you talk about this interface, that's kind of like a spreadsheet and things like that. I suppose it's similar to somebody opening up pgAdmin, I suppose, and going in and editing the rows. But, but maybe you've got like another layer on top that just makes it a little more user-friendly a little bit more like something you would get from Firebase, I guess.[00:05:33] Ant: Yeah.And, you know, we, we take a lot of inspiration from pgAdmin. PG admin is also open source. So I think we we've contributed a few things and, or trying to upstream a few things into PG admin. The other thing that we took a lot of inspiration from for the table editor, what we call it is airtable.And because airtable is effectively. a a relational database and that you can just come in and, you know, click to add your columns, click to add a new table. And so we just want to reproduce that experience again, backed up by a full Postgres dedicated database. [00:06:13] Jeremy: so when you're working with a Postgres database, normally you need some kind of layer in front of it, right? That the person can't open up their website and connect directly to Postgres from their browser. And you mentioned PostgREST before. I wonder if you could explain a little bit about what that is and how it works.[00:06:34] Ant: Yeah, definitely. so yeah, PostgREST has been around for a while. Um, It's basically an, a server that you connect to, to your Postgres database and it introspects your schemas and generates an API for you based on the table names, the column names. And then you can basically then communicate with your Postgres database via this restful API.So you can do pretty much, most of the filtering operations that you can do in SQL um, uh, equality filters. You can even do full text search over the API. So it just means that whenever you obviously add a new table or a new schema or a new column the API just updates instantly. So you, you don't have to worry about writing that, that middle layer which is, was always the drag right.When, what have you started a new project. It's like, okay, I've got my schema, I've got my client. Now I have to do all the connecting code in the middle of which is kind of, yeah, no, no developers should need to write that layer in 2022.[00:07:46] Jeremy: so this the layer you're referring to, when I think of a traditional. Web application. I think of having to write routes controllers and, and create this, this sort of structure where I know all the tables in my database, but the controllers I create may not map one to one with those tables. And so you mentioned a little bit about how PostgREST looks at the schema and starts to build an API automatically.And I wonder if you could explain a little bit about how it does those mappings or if you're writing those yourself. [00:08:21] Ant: Yeah, it basically does them automatically by default, it will, you know, map every table, every column. When you want to start restricting things. Well, there's two, there's two parts to this. There's one thing which I'm sure we'll get into, which is how is this secure since you are communicating direct from the client.But the other part is what you mentioned giving like a reduced view of a particular date, bit of data. And for that, we just use Postgres views. So you define a view which might be, you know it might have joins across a couple of different tables or it might just be a limited set of columns on one of your tables. And then you can choose to just expose that view. [00:09:05] Jeremy: so it sounds like when you would typically create a controller and create a route. Instead you create a view within your Postgres database and then PostgREST can take that view and create an end point for it, map it to that.[00:09:21] Ant: Yeah, exactly (laughs) . [00:09:24] Jeremy: And, and PostgREST is an open source project. Right. I wonder if you could talk a little bit about sort of what its its history was. How did you come to choose it? [00:09:37] Ant: Yeah.I think, I think Paul probably read about it on hacker news at some point. Anytime it appears on hacker news, it just gets voted to the front page because it's, it's So awesome. And we, we got connected to the maintainer, Steve Chavez. At some point I think he just took an interest in, or we took an interest in Postgres and we kind of got acquainted.And then we found out that, you know, Steve was open to work and this kind of like probably shaped a lot of the way we think about building out supabase as a project and as a company in that we then decided to employ Steve full time, but just to work on PostgREST because it's obviously a huge benefit for us.We're very reliant on it. We want it to succeed because it helps our business. And then as we started to add the other components, we decided that we would then always look for existing tools, existing opensource projects that exist before we decided to build something from scratch. So as we're starting to try and replicate the features of Firebase we would and auth is a great example.We did a full audit of what are all the authorization, authentication, authentication open-source tools that are out there and which one was, if any, would fit best. And we found, and Netlify had built a library called gotrue written in go, which did pretty much exactly what we needed. So we just adopted that.And now obviously, you know, we, we just have a lot of people on the team contributing to, to gotrue as well.[00:11:17] Jeremy: you touched on this a little bit earlier. Normally when you connect to a Postgres database your user has permission to, to basically everything I guess, by default, anyways. And so. So, how does that work? Where when you want to restrict people's permissions, make sure they only get to see records they're allowed to see how has that all configured in PostgREST and what's happening behind the scenes?[00:11:44] Ant: Yeah, we, the great thing about Postgres is it's got this concept of row level security, which actually, I don't think I even rarely looked at until we were building out this auth feature where the security rules live in your database as SQL. So you do like a create policy query, and you say anytime someone tries to select or insert or update apply this policy.And then how it all fits together is our auth server go true. Someone will basically make a request to sign in or sign up with email and password, and we create that user inside the, database. They get issued a URL. And they get issued a JSON, web token, a JWT, and which, you know, when they, when they have it on the, client side, proves that they are this, you, you ID, they have access to this data.Then when they make a request via PostgREST, they send the JWT in the authorization header. Then Postgres will pull out that JWT check the sub claim, which is the UID and compare it to any rows in the database, according to the policy that you wrote. So, so the most basic one is you say in order to, to access this row, it must have a column you UID and it must match whatever is in the JWT.So we basically push the authorization down into the database which actually has, you know, a lot of other benefits in that as you write new clients, You don't need to have, have it live, you know, on an API layer on the client. It's kind of just, everything is managed from the database.[00:13:33] Jeremy: So the, the, you, you ID, you mentioned that represents the user, correct. [00:13:39] Ant: Yeah. [00:13:41] Jeremy: Is that, does that map to a user in post graphs or is there some other way that you're mapping those permissions?[00:13:50] Ant: Yeah. When, so when you connect go true, which is the auth server to your Postgres database for the first time, it installs its own schema. So you'll have an auth schema and inside will be all start users with a list of the users. It'll have a uh, auth dot tokens which will store all the access tokens that it's issued.So, and one of the columns on the auth start user's table will be UUID, and then whenever you write application specific schemers, you can just join a, do a foreign key relation to the author users table. So, so it all gets into schema design and and hopefully we do a good job of having some good education content in the docs as well.Because one of the things we struggled with from the start was how much do we abstract away from SQL away from Postgres and how much do we educate? And we actually landed on the educate sides because I mean, once you start learning about Postgres, it becomes kind of a superpower for you as a developer.So we'd much rather. Have people discover us because we're a firebase alternatives frontend devs then we help them with things like schema design landing about row level security. Because ultimately like every, if you try and abstract that stuff it gets kind of crappy. And maybe not such a great experience. [00:15:20] Jeremy: to make sure I understand correctly. So you have GoTrue, which is uh, a Netlify open-source project that GoTrue project creates some tables in your, your database that has like, you've mentioned the tokens, the, the different users. Somebody makes a request to GoTrue. Like here's my username, my password go true.Gives them back a JWT. And then from your front end, you send that JWT to the PostgREST endpoint. And from that JWT, it's able to know which user you are and then uses postgres' built in a row level security to figure out which rows you're, you're allowed to bring back. Did I, did I get that right?[00:16:07] Ant: That is pretty much exactly how it works. And it's impressive that you garnered that without looking at a single diagram (laughs) But yeah, and, and, and obviously we, we provide a client library supabase JS, which actually does a lot of this work for you. So you don't need to manually attach the JJ JWT in a header.If you've authenticated with supabase JS, then every request sent to PostgREST. After that point, the header will just be attached automatically, and you'll be in a session as that user. [00:16:43] Jeremy: and, and the users that we're talking about when we talk about Postgres' row level security. Are those actual users in PostgreSQL. Like if I was to log in with psql, I could actually log in with those users.[00:17:00] Ant: They're not, you could potentially structure it that way. But it would be more advanced it's it's basically just users in, in the auth.users table, the way, the way it's currently done. [00:17:12] Jeremy: I see and postgrest has the, that row level security is able to work with that table. You, you don't need to have actual Postgres users.[00:17:23] Ant: Exactly. And, and it's, it's basically turing complete. I mean, you can write extremely complex auth policies. You can say, you know, only give access to this particular admin group on a Thursday afternoon between six and 8:00 PM. You can get really, yeah. really as fancy as you want. [00:17:44] Jeremy: Is that all written in SQL or are there other languages they allow you to use?[00:17:50] Ant: Yeah. It's the default is plain SQL. Within Postgres itself, you can useI think you can use, like there's a Python extension. There's a JavaScript extension, which is a, I think it's a subsets of, of JavaScripts. I mean, this is the thing with Postgres, it's super extensible and people have probably got all kinds of interpreters.So you, yeah, you can use whatever you want, but the typical user will just use SQL. [00:18:17] Jeremy: interesting. And that applies to logic in general, I suppose, where if you were writing a rails application, you might write Ruby. Um, If you're writing a node application, you write JavaScript, but you're, you're saying in a lot of cases with PostgREST, you're actually able to do what you want to do, whether that's serialization or mapping objects, do that all through SQL.[00:18:44] Ant: Yeah, exactly, exactly. And then obviously like there's a lot of awesome other stuff that Postgres has like this postGIS, which if you're doing geo, if you've got like a geo application, it'll load it up with a geo types for you, which you can just use. If you're doing like encryption and decryption, we just added PG libsodium, which is a new and awesome cryptography extension.And so you can use all of these, these all add like functions, like SQL functions which you can kind of use in, in any parts of the logic or in the role level policies. Yeah.[00:19:22] Jeremy: and something I thought was a little unique about PostgREST is that I believe it's written in Haskell. Is that right?[00:19:29] Ant: Yeah, exactly. And it makes it fairly inaccessible to me as a result. But the good thing is it's got a thriving community of its own and, you know, people who on there's people who contribute probably because it's written in haskell. And it's, it's just a really awesome project and it's an excuse to, to contribute to it.But yeah. I, I think I did probably the intro course, like many people and beyond that, it's just, yeah, kind of inaccessible to me. [00:19:59] Jeremy: yeah, I suppose that's the trade-off right. Is you have a, a really passionate community about like people who really want to use Haskell and then you've got the, the, I guess the group like yourselves that looks at it and goes, oh, I don't, I don't know about this.[00:20:13] Ant: I would, I would love to have the time to, to invest in uh, but not practical right now. [00:20:21] Jeremy: You talked a little bit about the GoTrue project from Netlify. I think I saw on one of your blog posts that you actually forked it. Can you sort of explain the reasoning behind doing that?[00:20:34] Ant: Yeah, initially it was because we were trying to move extremely fast. So, so we did Y Combinator in 2020. And when you do Y Combinator, you get like a part, a group partner, they call it one of the, the partners from YC and they add a huge amount of external pressure to move very quickly. And, and our biggest feature that we are working on in that period was auth.And we just kept getting the question of like, when are you going to ship auth? You know, and every single week we'd be like, we're working on it, we're working on it. And um, and one of the ways we could do it was we just had to iterate extremely quickly and we didn't rarely have the time to, to upstream things correctly.And actually like the way we use it in our stack is slightly differently. They connected to MySQL, we connected to Postgres. So we had to make some structural changes to do that. And the dream would be now that we, we spend some time upstream and a lot of the changes. And hopefully we do get around to that.But the, yeah, the pace at which we've had to move over the last uh, year and a half has been kind of scary and, and that's the main reason, but you know, hopefully now we're a little bit more established. We can hire some more people to, to just focus on, go true and, and bringing the two folks back together. [00:22:01] Jeremy: it's just a matter of, like you said speed, I suppose, because the PostgREST you, you chose to continue working off of the existing open source project, right? [00:22:15] Ant: Yeah, exactly. Exactly. And I think the other thing is it's not a major part of Netlify's business, as I understand it. I think if it was and if both companies had more resource behind it, it would make sense to obviously focus on on the single codebase but I think both companies don't contribute as much resource as as we would like to, but um, but it's, it's for me, it's, it's one of my favorite parts of the stack to work on because it's written in go and I kind of enjoy how that it all fits together.So Yeah. I, I like to dive in there. [00:22:55] Jeremy: w w what about go, or what about how it's structured? Do you particularly enjoy about the, that part of the project?[00:23:02] Ant: I think it's so I actually learned learned go through, gotrue and I'm, I have like a Python and C plus plus background And I hate the fact that I don't get to use Python and C plus posts rarely in my day to day job. It's obviously a lot of type script. And then when we inherited this code base, it was kind of, as I was picking it up I, it just reminded me a lot of, you know, a lot of the things I loved about Python and C plus plus, and, and the tooling around it as well. I just found to be exceptional. So, you know, you just do like a small amounts of conflig. Uh config, And it makes it very difficult to, to write bad code, if that makes sense.So the compiler will just, boot you back if you try and do something silly which isn't necessarily the case with, with JavaScript. I think TypeScript is a little bit better now, but Yeah, I just, it just reminded me a lot of my Python and C days.[00:24:01] Jeremy: Yeah, I'm not too familiar with go, but my understanding is that there's, there's a formatter that's a part of the language, so there's kind of a consistency there. And then the language itself tries to get people to, to build things in the same way, or maybe have simpler ways of building things. Um, I don't, I don't know.Maybe that's part of the appeal.[00:24:25] Ant: Yeah, exactly. And the package manager as well is great. It just does a lot of the importing automatically. and makes sure like all the declarations at the top are formatted correctly and, and are definitely there. So Yeah. just all of that tool chain is just really easy to pick up.[00:24:46] Jeremy: Yeah. And I, and I think compiled languages as well, when you have the static type checking. By the compiler, you know, not having things blow up and run time. That's, that's just such a big relief, at least for me in a lot of cases,[00:25:00] Ant: And I just loved the dopamine hits of when you compile something on it actually compiles this. I lose that with, with working with JavaScript. [00:25:11] Jeremy: for sure. One of the topics you mentioned earlier was how super base provides real-time database updates. And which is something that as far as I know is not natively a part of Postgres. So I wonder if you could explain a little bit about how that works and how that came about.[00:25:31] Ant: Yeah. So, So Postgres, when you add replication databases the way it does is it writes everything to this thing called the write ahead log, which is basically all the changes that uh, have, are going to be applied to, to the database. And when you connect to like a replication database. It basically streams that log across.And that's how the replica knows what, what changes to, to add. So we wrote a server, which basically pretends to be a Postgres rep, replica receives the right ahead log encodes it into JSON. And then you can subscribe to that server over web sockets. And so you can choose whether to subscribe, to changes on a particular schema or a particular table or particular columns, and even do equality matches on rows and things like this.And then we recently added the role level security policies to the real-time stream as well. So that was something that took us a while to, cause it was probably one of the largest technical challenges we've faced. But now that it's in the real-time stream is, is fully secure and you can apply these, these same policies that you apply over the CRUD API as well.[00:26:48] Jeremy: So for that part, did you have to look into the internals of Postgres and how it did its row level security and try to duplicate that in your own code?[00:26:59] Ant: Yeah, pretty much. I mean it's yeah, it's fairly complex and there's a guy on our team who, well, for him, it didn't seem as complex, let's say (laughs) , but yeah, that's pretty much it it's just a lot of it's effectively a SQL um, a Postgres extension itself, uh which in-in interprets those policies and applies them to, to the, to the, the right ahead log.[00:27:26] Jeremy: and this piece that you wrote, that's listening to the right ahead log. what was it written in and, and how did you choose that, that language or that stack?[00:27:36] Ant: Yeah. That's written in the Elixir framework which is based on Erlang very horizontally scalable. So any applications that you write in Elixir can kind of just scale horizontally the message passing and, you know, go into the billions and it's no problem. So it just seemed like a sensible choice for this type of application where you don't know.How large the wall is going to be. So it could just be like a few changes per second. It could be a million changes per second, then you need to be able to scale out. And I think Paul who's my co-founder originally, he wrote the first version of it and I think he wrote it as an excuse to learn Elixir, which is how, a lot of probably how PostgREST ended up being Haskell, I imagine.But uh, but it's meant that the Elixir community is still like relatively small. But it's a group of like very passionate and very um, highly skilled developers. So when we hire from that pool everyone who comes on board is just like, yeah, just, just really good and really enjoy is working with Elixir.So it's been a good source of a good source for hires as well. Just, just using those tools. [00:28:53] Jeremy: with a feature like this, I'm assuming it's where somebody goes to their website. They make a web socket connection to your application and they receive the updates that way. How have you seen how far you're able to push that in terms of connections, in terms of throughput, things like that?[00:29:12] Ant: Yeah, I don't actually have the numbers at hand. But we have, yeah, we have a team focused on obviously maximizing that but yeah, I don't I don't don't have those numbers right now. [00:29:24] Jeremy: one of the last things you've you've got on your website is a storage project or a storage product, I should say. And I believe it's written in TypeScript, so I was curious, we've got PostGrest, which is in Haskell. We've got go true and go. Uh, We've got the real-time database part in elixir.And so with storage, how did we finally get to TypeScript?[00:29:50] Ant: (Laughs) Well, the policy we kind of landed on was best tool for the job. Again, the good thing about being an open source is we're not resource constrained by the number of people who are in our team. It's by the number of people who are in the community and I'm willing to contribute. And so for that, I think one of the guys just went through a few different options that we could have went with, go just to keep it in line with a couple of the other APIs.But we just decided, you know, a lot of people well, everyone in the team like TypeScript is kind of just a given. And, and again, it was kind of down to speed, like what's the fastest uh we can get this up and running. And I think if we use TypeScript, it was, it was the best solution there. But yeah, but we just always go with whatever is best.Um, We don't worry too much uh, about, you know, the resources we have because the open source community has just been so great in helping us build supabase. And building supabase is like building like five companies at the same time actually, because each of these vertical stacks could be its own startup, like the auth stack And the storage layer, and all of this stuff.And you know, each has, it does have its own dedicated team. So yeah. So we're not too worried about the variation in languages.[00:31:13] Jeremy: And the storage layer is this basically a wrapper around S3 or like what is that product doing?[00:31:21] Ant: Yeah, exactly. It's it's wraparound as three. It, it would also work with all of the S3 compatible storage systems. There's a few Backblaze and a few others. So if you wanted to self host and use one of those alternatives, you could, we just have everything in our own S3 booklets inside of AWS.And then the other awesome thing about the storage system is that because we store the metadata inside of Postgres. So basically the object tree of what buckets and folders and files are there. You can write your role level policies against the object tree. So you can say this, this user should only access this folder and it's, and it's children which was kind of. Kind of an accident. We just landed on that. But it's one of my favorite things now about writing applications and supervisors is the rollover policies kind of work everywhere.[00:32:21] Jeremy: Yeah, it's interesting. It sounds like everything. Whether it's the storage or the authentication it's all comes back to postgres, right? At all. It's using the row level security. It's using everything that you put into the tables there, and everything's just kind of digging into that to get what it needs.[00:32:42] Ant: Yeah. And that's why I say we are a database company. We are a Postgres company. We're all in on postgres. We got asked in the early days. Oh, well, would you also make it my SQL compatible compatible with something else? And, but the amounts. Features Postgres has, if we just like continue to leverage them then it, it just makes the stack way more powerful than if we try to you know, go thin across multiple different databases.[00:33:16] Jeremy: And so that, that kind of brings me to, you mentioned how your Postgres companies, so when somebody signs up for supabase they create their first instance. What's what's happening behind the scenes. Are you creating a Postgres instance for them in a container, for example, how do you size it? That sort of thing.[00:33:37] Ant: Yeah. So it's basically just easy to under the hood for us we, we have plans eventually to be multi-cloud. But again, going down to the speed of execution that the. The fastest way was to just spin up a dedicated instance, a dedicated Postgres instance per user on EC2. We do also package all of the API APIs together in a second EC2 instance.But we're starting to break those out into clustered services. So for example, you know, not every user will use the storage API, so it doesn't make sense to Rooney for every user regardless. So we've, we've made that multitenant, the application code, and now we just run a huge global cluster which people connect through to access the S3 bucket.Basically and we're gonna, we have plans to do that for the other services as well. So right now it's you got two EC2 instances. But over time it will be just the Postgres instance and, and we wanted. Give everyone a dedicated instance, because there's nothing worse than sharing database resource with all the users, especially when you don't know how heavily they're going to use it, whether they're going to be bursty.So I think one of the things we just said from the start is everyone gets a Postgres instance and you get access to it as well. You can use your Postgres connection string to, to log in from the command line and kind of do whatever you want. It's yours.[00:35:12] Jeremy: so did it, did I get it right? That when I sign up, I create a super base account. You're actually creating an two instance for me specifically. So it's like every customer gets their, their own isolated it's their own CPU, their own Ram, that sort of thing.[00:35:29] Ant: Yeah, exactly, exactly. And, and the way the. We've set up the monitoring as well, is that we can expose basically all of that to you in the dashboard as well. so you can, you have some control over like the resource you want to use. If you want to a more powerful instance, we can do that. A lot of that stuff is automated.So if someone scales beyond the allocated disk size, the disk will automatically scale up by 50% each time. And we're working on automating a bunch of these, these other things as well.[00:36:03] Jeremy: so is it, is it where, when you first create the account, you might create, for example, a micro instance, and then you have internal monitoring tools that see, oh, the CPU is getting heady hit pretty hard. So we need to migrate this person to a bigger instance, that kind of thing.[00:36:22] Ant: Yeah, pretty much exactly. [00:36:25] Jeremy: And is that, is that something that the user would even see or is it the case of where you send them an email and go like, Hey, we notice you're hitting the limits here. Here's what's going to happen. [00:36:37] Ant: Yeah.In, in most cases it's handled automatically. There are people who come in and from day one, they say has my requirements. I'm going to have this much traffic. And I'm going to have, you know, a hundred thousand users hitting this every hour. And in those cases we will over-provisioned from the start.But if it's just the self service case, then it will be start on a smaller instance and an upgrade over time. And this is one of our biggest challenges over the next five years is we want to move to a more scalable Postgres. So cloud native Postgres. But the cool thing about this is there's a lot of.Different companies and individuals working on this and upstreaming into Postgres itself. So for us, we don't need to, and we, and we would never want to fork Postgres and, you know, and try and separate the storage and the the computes. But more we're gonna fund people who are already working on this so that it gets upstreamed into Postgres itself.And it's more cloud native. [00:37:46] Jeremy: Yeah. So I think the, like we talked a little bit about how Firebase was the original inspiration and when you work with Firebase, you, you don't think about an instance at all, right? You, you just put data in, you get data out. And it sounds like in this case, you're, you're kind of working from the standpoint of, we're going to give you this single Postgres instance.As you hit the limits, we'll give you a bigger one. But at some point you, you will hit a limit of where just that one instance is not enough. And I wonder if there's you have any plans for that, or if you're doing anything currently to, to handle that.[00:38:28] Ant: Yeah. So, so the medium goal is to do replication like horizontal scaling. We, we do that for some users already but we manually set that up. we do want to bring that to the self serve model as well, where you can just choose from the start. So I want, you know, replicas in these, in these zones and in these different data centers.But then, like I said, the long-term goal is that. it's not based on. Horizontally scaling a number of instances it's just a Postgres itself can, can scale out. And I think we will get to, I think, honestly, the race at which the Postgres community is working, I think we'll be there in two years.And, and if we can contribute resource towards that, that goal, I think yeah, like we'd love to do that, but yeah, but for now, it's, we're working on this intermediate solution of, of what people already do with, Postgres, which is, you know, have you replicas to make it highly available.[00:39:30] Jeremy: And with, with that, I, I suppose at least in the short term, the goal is that your monitoring software and your team is handling the scaling up the instance or creating the read replicas. So to the user, it, for the most part feels like a managed service. And then yeah, the next step would be to, to get something more similar to maybe Amazon's Aurora, I suppose, where it just kind of, you pay per use.[00:40:01] Ant: Yeah, exactly. Exactly. Aurora was kind of the goal from the start. It's just a shame that it's proprietary. Obviously. [00:40:08] Jeremy: right. Um, but it sounds, [00:40:10] Ant: the world would be a better place. If aurora was opensource. [00:40:15] Jeremy: yeah. And it sounds like you said, there's people in the open source community that are, that are trying to get there. just it'll take time. to, to all this, about making it feel seamless, making it feel like a serverless experience, even though internally, it really isn't, I'm guessing you must have a fair amount of monitoring or ways that you're making these decisions.I wonder if you can talk a little bit about, you know, what are the metrics you're looking at and what are the applications you're you have to, to help you make these decisions?[00:40:48] Ant: Yeah. definitely. So we started with Prometheus which is a, you know, metrics gathering tool. And then we moved to Victoria metrics which was just easier for us to scale out. I think soon we'll be managing like a hundred thousand Postgres databases will have been deployed on, on supabase. So definitely, definitely some scale. So this kind of tooling needs to scale to that as well. And then we have agents kind of everywhere on each application on, on the database itself. And we listen for things like the CPU and the Ram and the network IO. We also poll. Uh, Postgres itself. Th there's a extension called PG stats statements, which will give us information about what are, the intensive queries that are running on that, on that box.So we just collect as much of this as possible um, which we then obviously use internally. We set alerts to, to know when, when we need to upgrade in a certain direction, but we also have an end point where the dashboard subscribes to these metrics as well. So the user themselves can see a lot of this information.And we, I think at the moment we do a lot of the, the Ram the CPU, that kind of stuff, but we're working on adding just more and more of these observability metrics uh, so people can can know it could, because it also helps with Let's say you might be lacking an index on a particular table and not know about it.And so if we can expose that to you and give you alerts about that kind of thing, then it obviously helps with the developer experience as well.[00:42:29] Jeremy: Yeah. And th that brings me to something that I, I hear from platform as a service companies, where if a user has a problem, whether that's a crash or a performance problem, sometimes it can be difficult to distinguish between is it a problem in their application or is this a problem in super base or, you know, and I wonder how your support team kind of approaches that.[00:42:52] Ant: Yeah, no, it's, it's, it's a great question. And it's definitely something we, we deal with every day, I think because of where we're at as a company we've always seen, like, we actually have a huge advantage in that.we can provide. Rarely good support. So anytime an engineer joins super base, we tell them your primary job is actually frontline support.Everything you do afterwards is, is secondary. And so everyone does a four hour shift per week of, of working directly with the customers to help determine this kind of thing. And where we are at the moment is we are happy to dive in and help people with their application code because it helps our engineers land about how it's being used and where the pitfalls are, where we need better documentation, where we need education.So it's, that is all part of the product at the moment, actually. And, and like I said, because we're not a 10,000 person company we, it's an advantage that we have, that we can deliver that level of support at the moment. [00:44:01] Jeremy: w w what are some of the most common things you see happening? Like, is it I would expect you mentioned indexing problems, but I'm wondering if there's any specific things that just come up again and again,[00:44:15] Ant: I think like the most common is people not batching their requests. So they'll write an application, which, you know, needs to, needs to pull 10,000 rows and they send 10,000 requests (laughs) . That that's, that's a typical one for, for people just getting started maybe. Yeah. and, then I think the other thing we faced in the early days was. People storing blobs in the database which we obviously solve that problem by introducing file storage. But people will be trying to store, you know, 50 megabytes, a hundred megabyte files in Postgres itself, and then asking why the performance was so bad.So I think we've, we've mitigated that one by, by introducing the blob storage.[00:45:03] Jeremy: and when you're, you mentioned you have. Over a hundred thousand instances running. I imagine there have to be cases where an incident occurs, where something doesn't go quite right. And I wonder if you could give an example of one and how it was resolved.[00:45:24] Ant: Yeah, it's a good question. I think, yeah, w w we've improved the systems since then, but there was a period where our real time server wasn't able to handle rarely large uh, right ahead logs. So w there was a period where people would just make tons and tons of requests and updates to, to Postgres. And the real time subscriptions were failing. But like I said, we have some really great Elixir devs on the team, so they were able to jump on that fairly quickly. And now, you know, the application is, is way more scalable as a result. And that's just kind of how the support model works is you have a period where everything is breaking and then uh, then you can just, you know, tackle these things one by one. [00:46:15] Jeremy: Yeah, I think any, anybody at a, an early startup is going to run into that. Right? You put it out there and then you find out what's broken, you fix it and you just get better and better as it goes along.[00:46:28] Ant: Yeah, And the funny thing was this model of, of deploying EC2 instances. We had that in like the first week of starting super base, just me and Paul. And it was never intended to be the final solution. We just kind of did it quickly and to get something up and running for our first handful of users But it's scaled surprisingly well.And actually the things that broke as we started to get a lot of traffic and a lot of attention where was just silly things. Like we give everyone their own domain when they start a new project. So you'll have project ref dot super base dot in or co. And the things that were breaking where like, you know, we'd ran out of sub-domains with our DNS provider and then, but, and those things always happen in periods of like intense traffic.So we ha we were on the front page of hacker news, or we had a tech crunch article, and then you discover that you've ran out of sub domains and the last thousand people couldn't deploy their projects. So that's always a fun a fun challenge because you are then dependent on the external providers as well and theirs and their support systems.So yeah, I think. We did a surprisingly good job of, of putting in good infrastructure from the start. But yeah, all of these crazy things just break when obviously when you get a lot of, a lot of traffic[00:48:00] Jeremy: Yeah, I find it interesting that you mentioned how you started with creating the EC2 instances and it turned out that just work. I wonder if you could walk me through a little bit about how it worked in the beginning, like, was it the two of you going in and creating instances as people signed up and then how it went from there to where it is today?[00:48:20] Ant: yeah. So there's a good story about, about our fast user, actually. So me and Paul used to contract for a company in Singapore, which was an NFT company. And so we knew the lead developer very well. And we also still had the Postgres credentials on, on our own machines. And so what we did was we set up the th th the other funny thing is when we first started, we didn't intend to host the database.We, we thought we were just gonna host the applications that would connect to your existing Postgres instance. And so what we did was we hooked up the applications to, to the, to the Postgres instance of this, of this startup that we knew very well. And then we took the bus to their office and we sat with the lead developer, and we said, look, we've already set this thing up for you.What do you think. know, when, when you think like, ah, we've, we've got the best thing ever, but it's not until you put it in front of someone and you see them, you know, contemplating it and you're like, oh, maybe, maybe it's not so good. Maybe we don't have anything. And we had that moment of panic of like, oh, maybe we just don't maybe this isn't great.And then what happened was he didn't like use us. He didn't become a supabase user. He asked to join the team. [00:49:45] Jeremy: nice, nice.[00:49:46] Ant: that was a good a good kind of a moment where we thought, okay, maybe we have got something, maybe this is maybe this isn't terrible. So, so yeah, so he became our first employee. Yeah. [00:49:59] Jeremy: And so yeah, so, so that case was, you know, the very beginning you set everything up from, from scratch. Now that you have people signing up and you have, you know, I don't know how many signups you get a day. Did you write custom infrastructure or applications to do the provisioning or is there an open source project that you're using to handle that[00:50:21] Ant: Yeah. It's, it's actually mostly custom. And you know, AWS does a lot of the heavy lifting for you. They just provide you with a bunch of API end points. So a lot of that is just written in TypeScript fairly straightforward and, and like I said, you never intended to be the thing that last. Two years into the business.But it's, it's just scaled surprisingly well. And I'm sure at some point we'll, we'll swap it out for some I don't orchestration tooling like Pulumi or something like this. But actually the, what we've got just works really well.[00:50:59] Ant: Be because we're so into Postgres our queuing system is a Postgres extension called PG boss. And then we have a fleet of workers, which are. Uh, We manage on EC ECS. Um, So it's just a bunch of VMs basically which just subscribed to the, to the queue, which lives inside the database.And just performs all the, whether it be a project creation, deletion modification a whole, whole suite of these things. Yeah. [00:51:29] Jeremy: very cool. And so even your provisioning is, is based on Postgres.[00:51:33] Ant: Yeah, exactly. Exactly (laughs) . [00:51:36] Jeremy: I guess in that case, I think, did you say you're using the right ahead log there to in order to get notifications?[00:51:44] Ant: We do use real time, and this is the fun thing about building supabase is we use supabase to build supabase. And a lot of the features start with things that we build for ourselves. So the, the observability features we have a huge logging division. So, so w we were very early users of a tool called a log flare, which is also written in Elixir.It's basically a log sync backed up by BigQuery. And we loved it so much and we became like super log flare power users that it was kind of, we decided to eventually acquire the company. And now we can just offer log flare to all of our customers as well as part of using supabase. So you can query your logs and get really good business intelligence on what your users um, consuming in from your database.[00:52:35] Jeremy: the lock flare you're mentioning though, you said that that's a log sink and that that's actually not going to Postgres, right. That's going to a different type of store.[00:52:43] Ant: Yeah. That is going to big query actually. [00:52:46] Jeremy: Oh, big query. Okay. [00:52:47] Ant: yeah, and maybe eventually, and this is the cool thing about watching the Postgres progression is it's become. It's bringing like transactional and analytical databases together. So it's traditionally been a great transactional database, but if you look at a lot of the changes that have been made in recent versions, it's becoming closer and closer to an analytical database.So maybe at some point we will use it, but yeah, but big query works just great. [00:53:18] Jeremy: Yeah. It's, it's interesting to see, like, I, I know that we've had episodes on different extensions to Postgres where I believe they change out how the storage works. So there's yeah, it's really interesting how it's it's this one database, but it seems like it can take so many different forms. [00:53:36] Ant: It's just so extensible and that's why we're so bullish on it because okay. Maybe it wasn't always the best database, but now it seems like it is becoming the best database and the rate at which it's moving. It's like, where's it going to be in five years? And we're just, yeah, we're just very bullish on, on Postgres.As you can tell from the amount of mentions it's had in this episode.[00:54:01] Jeremy: yeah, we'll have to count how many times it's been said. I'm sure. It's, I'm sure it's up there. Is there anything else we, we missed or think you should have mentioned.[00:54:12] Ant: No, some of the things we're excited about are cloud functions. So it's the thing we just get asked for the most at anytime we post anything on Twitter, you're guaranteed to get a reply, which is like when functions. And we're very pleased to say that it's, it's almost there. So um, that will hopefully be a really good developer experience where also we launched like a, a graph QL Postgres extension where the resolver lives inside of Postgres.And that's still in early alpha, but I think I'm quite excited for when we can start offering that on the on the hosted platform as well. People will have that option to, to use GraphQL instead of, or as well as the restful API.[00:55:02] Jeremy: the, the common thread here is that PostgreSQL you're able to take it really, really far. Right. In terms of scale up, eventually you'll have the read replicas. Hopefully you'll have. Some kind of I don't know what you would call Aurora, but it's, it's almost like self provisioning, maybe not sharing what, how you describe it.But I wonder as a, as a company, like we talked about big query, right? I wonder if there's any use cases that you've come across, either from customers or in your own work where you're like, I just, I just can't get it to fit into Postgres.[00:55:38] Ant: I think like, not very often, but sometimes we'll, we will respond to support requests and recommend that people use Firebase. they're rarelylike if, if they really do have like large amounts of unstructured data, which is which, you know, documented storage is, is kind of perfect for that. We'll just say, you know, maybe you should just use Firebase.So we definitely come across things like that. And, and like I said, we love, we love Firebase, so we're definitely not trying to, to uh, destroy as a tool. I think it, it has its use cases where it's an incredible tool yeah. And provides a lot of inspiration for, for what we're building as well. [00:56:28] Jeremy: all right. Well, I think that's a good place to, to wrap it up, but where can people hear more about you hear more about supabase?[00:56:38] Ant: Yeah, so supeabase is at supabase.com. I'm on Twitter at ant Wilson. Supabase is on Twitter at super base. Just hits us up. We're quite active on the and then definitely check out the repose gets up.com/super base. There's lots of great stuff to dig into as we discussed. There's a lot of different languages, so kind of whatever you're into, you'll probably find something where you can contribute. [00:57:04] Jeremy: Yeah, and we, we sorta touched on this, but I think everything we've talked about with the exception of the provisioning part and the monitoring part is all open source. Is that correct? [00:57:16] Ant: Yeah, exactly.And as, yeah. And hopefully everything we build moving forward, including functions and graph QL we'll continue to be open source.[00:57:31] Jeremy: And then I suppose the one thing I, I did mean to touch on is what, what is the, the license for all the components you're using that are open source?[00:57:41] Ant: It's mostly Apache2 or MIT. And then obviously Postgres has its own Postgres license. So as long as it's, it's one of those, then we, we're not too precious. I, As I said, we inherit a fair amounts of projects. So we contribute to and adopt projects. So as long as it's just very permissive, then we don't care too much.[00:58:05] Jeremy: As far as the projects that your team has worked on, I've noticed that over the years, we've seen a lot of companies move to things like the business source license or there's, there's all these different licenses that are not quite so permissive. And I wonder like what your thoughts are on that for the future of your company and why you think that you'll be able to stay permissive.[00:58:32] Ant: Yeah, I really, really, rarely hope that we can stay permissive. forever. It's, it's a philosophical thing for, for us. You know, when we, we started the business, it's what just very, just very, as individuals into the idea of open source. And you know, if, if, if AWS come along at some point and offer hosted supabase on AWS, then it will be a signal that where we're doing something.Right. And at that point we just, I think we just need to be. The best team to continue to move super boost forward. And if we are that, and I, I think we will be there and then hopefully we will never have to tackle this this licensing issue. [00:59:19] Jeremy: All right. Well, I wish you, I wish you luck.[00:59:23] Ant: Thanks. Thanks for having me. [00:59:25] Jeremy: This has been Jeremy Jung for software engineering radio. Thanks for listening. 

Le Podcast AWS en Français

Entre deux conférence AWS Summit, les nouveautés AWS des deux dernières semaines qui ont attiré mon attention tournent autour des bases de données RDS, de Kafka, et encore une nouvelle famille d'instances EC2

Screaming in the Cloud
Automating in Pre-Container Times with Michael DeHaan

Screaming in the Cloud

Play Episode Listen Later May 5, 2022 40:46


About MichaelMichael is the creator of IT automation platforms Cobbler and Ansible, the latter allegedly used by ~60% of the Fortune 500, and at one time one of the top 10 contributed to projects on GitHub.Links Referenced: Speaking Tech: https://michaeldehaan.substack.com/ michaeldehaan.net: https://michaeldehaan.net Twitter: https://twitter.com/laserllama TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Once upon a time, Docker came out and change an entire industry forever. But believe it or not, for many of you, this predates your involvement in the space. There was a time where we had to manage computer systems ourselves with our hands—kind of—like in the prehistoric days, chiseling bits onto disk and whatnot. It was an area crying out for automation, as we started using more and more computers to run various websites. “Oh, that's a big website. It needs three servers now.” Et cetera.The times have changed rather significantly. One of the formative voices in that era was Michael DeHaan, who's joining me today, originally one of the—or if not the creator of Cobbler, and later—for which you became better known—Ansible. First, thanks for joining me.Michael: Thank you for having me. You're also making me feel very, very old there. So, uh, yes.Corey: I hear you. I keep telling people, I'm in my mid-30s, and my wife gets incensed because I'm turning 40 in July. But still. I go for the idea of yeah, the middle is expanding all the time, but it's always disturbing talking to people who are in our sector, who are younger than some of the code that we're using, which is just bizarre to me. We're all standing on the backs of giants. Like it or not, one of them's you.Michael: Oh, well, thank you. Thank you very much. Yeah, I was, like, talking to some undergrads, I was doing a little bit of stuff helping out my alma mater for a little bit, and teaching somebody the REST lecture. I was like, “In another year, REST is going to be older than everybody in the room.” And then I was just kind of… scared.Corey: Yeah. It's been a wild ride for basically everyone who's been around long enough if you don't fall off the teeter-totter and wind up breaking a limb somewhere. So, back in the bad old days, before cloud, when everything was no longer things back then were constrained by how much room you had on your credit card like they are today with cloud, but instead by things like how much space you had in the data center, what kind of purchase order you could ram through your various accounting departments. And one of the big problems you have is, great. So, finally—never on time—Dell has shipped out a whole bunch of servers—or HP or Supermicro or whoever—and the remote hands—which is always distinct from smart hands, which says something very insulting, but they seem to be good about it—would put them into racks for you.And great, so you'd walk in and see all of these brand new servers with nothing on them. How do we go ahead and configure these things? And by hand was how most of us started, and that means, oh, great, we're going to screw things up and not do them all quite the same, and it's just a treasure and a joy. Cobbler was something that you came up with that revolutionized how provisioning of bare-metal systems worked. Tell me about it.Michael: Yeah, um, so it's basically just glue. So, the story of how I came up with that is I was working for the Emerging Technologies Group at Red Hat, and I just joined. And they were like, “We have to have a solution to install Xen and KVM virtual machines.” So obviously, everybody's familiar with, like, EC2 and things now, but this was about people running non-VMware virtualization themselves. So, that was part of the problem, but in order to make that interesting, we really needed to have some automation around bare-metal installs.And that's PXE boot. So, it's TFTP and DHCP protocol and all that kind of boring stuff. And there was glue that existed, but it was usually humans would have to click on buttons to—like Red Hat had system-config-netboot, but what really happened was sysadmins all wrote their own automation at, like, every single company. And the idea that I had, and it was sort of cemented by the fact that, like, my boss, a really good guy left for another company and I didn't have a boss for, like, a couple years, was like, I'm just going to make IRC my boss, and let's get all these admins together and build a tool we can share, right?So, that was a really good experience, and it's just basically gluing all that stuff together to fully automate an install over a network so that when a system comes on, you can either pick it out from a menu; or maybe you've already got the MAC address and you can just say, “When you see this MAC address, go install this operating system.” And there's a kickstart file, or a preseed in the case of Debian, that says, “When you're booting up through the installer, basically, here's just the answers and go do these things.” And that install processes a lot slower than what we're used to, but for a bare-metal machine, that's a pretty good way to do it.Corey: Yeah, it got to a point where you could walk through and just turn on all the servers in a rack and go out to lunch, come back, they would all be configured and ready to go. And it sounds relatively basic the way we're talking about it now, but there were some gnarly cases. Like, “When I've rebooted the database server, why did it wipe itself and reprovision?” And it's, “Oh, dear.” And you have to make sure that things are—that there's a safety built into these things.And you also don't want to have to wind up plugging in a keyboard and monitor to all of these individual machines one-by-one to hit yes and acknowledge the thing. And it was a colossal pain in the ass. That's one of the things that cloud has freed us from.Michael: Yeah, definitely. And one of the nice things about the whole cloud environment is like, if you want to experiment with those ideas, like, I want to set up some DHCP or DNS, I don't have to have this massive lab and all the electricity and costs. But like, if I want to play with a load balancer, I can just get one. That kind of gives the experience of playing with all these data center technologies to everybody, which is pretty cool.Corey: On some level, you can almost view the history of all these things as speeding things up. With a well-tuned Cobbler install, it still took multiple minutes, in some cases, tens of minutes to go from machine you're powering on to getting it provisioned and ready to go. Virtual machines dropped that down to minutes. And cloud, of course, accelerated that a bit. But then you wind up with things like Docker and it gets down to less than a second. It's the meantime to dopamine.But in between the world of containers and bare-metal, there was another project—again, the one you're best known for—Ansible. Tell me about that because I have opinions on this whole space.Michael: [laugh]. Yeah. So, how Ansible got started—well, I guess configuration management is pretty old, so the people writing their own scripts, CFEngine came out, Puppet was a much better CFEngine. I was working at a company and I kind of wanted another open-source project because I enjoyed the Cobbler experience. So, I started Ansible on the side, kind of based on some frustrations around Puppet but also the desire to unify Capistrano kind of logic, which was like, “How do I push out my apps onto these servers that are already running,” with Puppet-style logic was like, “Is this computer's firewall configured directly? And is the time set correctly?”And you can obviously use that to install apps, but there's some places where that blurred together where a lot of people are using two different tools. And there's some prior art that I worked on called Funk, which I wrote with Seth Vidal and Adrian Likins at Red Hat, which was, like, 50% of the Ansible idea, and we just never built the config management layer on top. So, the idea was make something really, really simple that just uses SSH, which was controversial at the time because people thought it, like, wouldn't scale, because I was having trouble with setting up Puppet security because, like, it had DNS or timing issues or whatever.Corey: Yeah. Let's dive in a bit to what config management is first because it turns out that not everyone was living in the trenches in quite the same way that we were. I was a traveling trainer for Puppet for a summer once, and the best descriptor I found to explain it to people who are not in this space was, “All right, let's say that you go and you buy a new computer. What do you do? Well, you're going to install the applications you'd like to use, you're going to set up your own user account, you're going to set your password correctly, you're going to set up preferences, copy some files over so you have the stuff you care about. Great. Now, imagine you need to do that to a thousand computers and they all need to be the same. How do you do that?” Well, that is the world of configuration management.And there was sort of a bifurcation there, where there was the idea of, first, we're going to have configuration management that just describes what the system should look like, and that's going to run on a schedule or whatnot, and then you're going to have the other side of it, which is the idea of remote execution, of I want to run an arbitrary command on this server, or this set of servers, or all the servers, depending upon what it is. And depending on where you started on the side of that world, you wound up wanting things from the other side of that space. With Puppet, for example, is very oriented configuration management and the question became, well, can you use this for remote execution with arbitrary commands? And they wound up doing some work with Mcollective, which was a very complicated and expensive way to say, “No, not really.” There was a need for things that needed to hang out in that space.The two that really stuck out from that era were Ansible, which had its wild runaway success, and the one that I was smacking around for a bit, SaltStack, which never saw anywhere approaching that level of popularity.Michael: Yeah, sure. I mean, I think that you hit it pretty much exactly right. And it's hard to say what makes certain things take off, but I think, like, the just SSH approach was interesting because, well for one, everybody's running it. But there was this belief that this would not scale. And I tried to optimize the heck out of that because I liked performance, but it turns out that wasn't really a business problem because if you can imagine you just wrote this little bit of automation, and you're going to run it against your entire infrastructure and you've got 30,000 machines, do you want that to—if you were to, like, run an update command on 30,000 machines at once, you're going to DDoS something. Definitely, right?Corey: Yeah. Suddenly you have 30,000 machines all talk to the same things at the same times. And you want to do them in batches or smear it across.Michael: Right, so because that was there, like, you just add batch support in Ansible and things are fine, right? People want to target little small groups of things. So, like, that whole story wasn't true, and I think it was just a matter of testing this belief that everybody thought that we needed to have this whole network of things. And honestly, Salt's idea of using a message bus is great, but we took a little bit different approach with YAML because we have YAML variables in it, but they had something that compiled down to YAML. And I think those are some differences in the dialect and some things other people preferred, but—Corey: And they use Jinja, at one point to wind up making it effectively Turing complete; you could wind up having this ridicu—like, loop flow control and loops and the rest. And it was an interesting exposure to things, but yikes, at some l—at the same time.Michael: If you use all the language features in anything you can make something complicated, and too complicated. And I was like, I wanted automation to look like grocery lists. And when I started out, I said, “Hey, if anybody is doing this all day, for a day job, I will have failed.” And it clearly shows you that I have because there are people that are doing that all day. And the goal was, let me concentrate on dev and ops and my other things and keep this really, really simple.And some people just, like, get really, really into that automation technology, which is—in my opinion—why some of the earlier stuff was really popular because sysadmin were bored, so they see something new and it's kind of like a Java developer finding Perl for the first time. They're like, “I'm going to use all these things.” And they have all their little widgets, and it gets, like, really complicated.Corey: The thing that I always found interesting and terrifying at the same time about Ansible was the fact that you did ride on top of SSH, which is great because every company already had a way of controlling access by SSH to IT systems; everyone uses it, so it has an awful lot of eyes on the security protocol on the rest. The thing that I found terrifying in the early days was that more or less every ops person would wind up checking this out onto their laptop or whatnot, so whenever they wanted to run something, they would just run it from their laptop over a VPN or whatnot from wherever they happen to be, and you wind up with a dueling banjos type of circumstance where people were often not doing it from a centralized place. And in time, best practices emerged where, okay, that is going to be the command and control server where that runs at, and you log into it. And then you start guarding that with CI/CD flows and the rest. And like anything else, it wound up building some operational approaches to it.Michael: Yeah. Like, I kind of think that created a problem that allowed us to sell a product, right, which was good. If you knew what you were doing, you could use Jenkins completely and you'd be fine, right, if you had some level of discipline and access control, and you wanted to wire that up. And if you think about cloud, this whole, like, shadow IT idea of, “I just want to do this thing, therefore I'm going to get an Amazon account,” it's kind of the same thing. It's like, “I want to use this config management, but it's not approved. Who can stop me?” Right?And that kind of probably got us in the door in few accounts that way. But yeah, it did definitely create the problem where multiple people could be running things at the same time. So yeah, I mean, that's true.Corey: And the idea of, “Hey, maybe I should be controlling these things in Git,” or some other form of version control was sort of one of those evolutionary ideas that, oh, we could treat this like code. And the early days of DevOps, that was a controversial thing. These days, you say you're not doing it and people look at you very strangely. And things were going reasonably well in that direction for a while. Then this whole Docker thing showed up, where, well, what if instead of having these long-lived servers where you have to install updates and run patches and maintain a whole user list on them, instead you had this immutable infrastructure that every time there was a change, you would just go ahead and deploy a brand new set of servers?And you could do this in the olden days with virtual machines and whatnot; it just took a long time to push things out, so do I really want to roll the entire fleet for a two-line config change? Probably not, so we're going to batch it up, or maybe do this hybrid model. With Docker, it takes less than a second to wind up provisioning the—switching over to the new container series and you're done; you can keep going with that. That really solved a lot of these problems.But there were companies that, like, the entire configuration management space, who suddenly found themselves in a really weird position. Some of them tried to fight the tide forever and say, “Oh, this is terrible because it means we don't have a business model anymore.” But you can only fight the future for so long. And I think today, we'd be hard-pressed to say that Docker hasn't won, on some level.Michael: I mean, I think it has, like, the technology has won. But I guess the interesting thing is, config management now seems to be trying to pivot towards networking where I think the tool hasn't ever been designed for networking, so it's kind of a round peg, square hole. But it's all people have that unless they're buying something. Or, like, deploying the undercloud because, like, people are still running essentially clouds on top of clouds to get their Kubernetes deployments going and those are monstrous. Or maybe to deploy a data layer; like, I know Kafka has gotten off of ZooKeeper, but the Kafka-ZooKeeper thing—and I don't remember ZooKeeper [unintelligible 00:14:37] require [unintelligible 00:14:38] or not, but managing those sort of long, persistent implications, it still has a little bit of a place where it exists.But I mean, I think the whole immutable systems idea is theoretically completely great. I never was really happy with the whole Docker development workflow, and I think it does create a problem where people don't know what they're deploying and you kind of encourage that to where they could be deploying different versions of libraries, or—and that's kind of just a problem of the whole microservices thing in general where, “Did somebody change this?” And then I was working very briefly at one company where we essentially built a whole dashboard to detect service versions and what version of the base image everybody was on, and all these other things, and it can get out of hand, too. So, it's kind of like trading some problems for other problems, I think to me. But in general, containerization is good. I just wished the management glue around it was easy, right?Corey: I wound up giving a talk at a conference a while back, 2015 or so, called, “Heresy in the Church of Docker,” and it was a throwaway five-minute lightning talk, and someone approached me afterwards with, “Hey, can you give the full version of that at ContainerCon?” “There's a full version? Yes. Yes, I can.” And it talked about a number of problems with the management layer and the rest.Now, Kubernetes absolutely solves virtually every problem that I identified with it, but when you look at the other side of it, getting Kubernetes rolled out is effectively you get to cosplay being a cloud provider yourself. It is incredibly complicated, and of course, we're right back to managing it all with YAML.Michael: Right. And I think that's an interesting point, too, is I don't know who's exactly responsible for, like, the YAML explosion. And I like it as a data format; it's really good for humans. Cobbler originally used it more of an internal storage, which I think was a mistake because, like, even—I was trying to avoid setting up a database at the time, so—because I knew if I had to require setting up a database in 2007 or 2008, I'd get way less users, so it used flat files.A lot of the YAML dialects people are developing now are very, very nested and they requires, like, loading a webpage, for the Docks, like, all the time and reading what's valid here, what's valid there. I think people learn the wrong lesson from Ansible's YAML usage, right? It was supposed to be, like, YAML's good for things that are grocery lists. And there's a lot of places where I didn't do a good job. But when you see methods taking 15 parameters and you have to constantly have the reference up, maybe that's a sign that you should do something else.Corey: At least you saved us, on some level, from having to do this all in XML. But still, there are wrong ways and more wrong ways to do it. I don't think anyone could ever agree on the right way to approach these things.Michael: Yeah. I mean, and YAML, at the time was a good answer because I knew I didn't want to write and maintain a parser as, like, a guy that was running a project. We had a lot of awesome contributors, but if I had to also maintain a DSL, not only does that mean that I have to write the code for this thing—which I, you know, observed slowing down some other projects—but also that I'd have to explain it to people. Looking kind of like Bash was not a bad thing. Not having to know and learn something, so you can kind of feel really effective in about 15 minutes or something like that.Corey: One of the things that I find really interesting about you personally is that you were starting off in a bare-metal world; Ansible was sort of wherever you wanted to run it. Great, as long as there are systems that can receive these things, we're great. And now the world has changed, and for better or worse, configuration management slash remote execution is not the problem it once was and it is not a best practice way of solving a lot of those problems either. But you aren't spending your time basically remembering the glory years. You're actively moving forward doing some fairly interesting stuff. How would you describe what you're into these days?Michael: I tried to create a few projects to, like, kind of build other systems management things for the same audience for a while. I was building a build server and a new—trying to do some next-gen config stuff. And I saw people weren't interested. But I like having conversations with people, right, and I think one of the lessons from Ansible was how to explain highly technical things to technical audiences and cut out a lot of the marketing goo and all that; how to get people excited about an idea and make a community be really authentic. So, I've been writing about that for really, it's—rebooted blog is only a couple of weeks old. But also kind of trying to do some—helping out companies with some, like, basic marketing kind of stuff, right?There's just this pattern that everybody has where every website starts with this little basic slogan and two buttons and then there's a bunch of adjectives, but it doesn't say anything. So, how can you have really good documentation, and how can you explain an idea? Because, like, really, the reason you're in it is not just to sell stuff, but it's to help people and to see them get excited about your ideas. And there's just, like, we're not doing a good job in this, like, world where there's thousands upon thousands of applications, all competing at once to, like—how do you rise above that?Corey: And that's always the hard part is at some point, this does become your identity and you become known for a thing. And when you start branching out from that thing, you bring the expertise from that area that you were in, but you start applying it to new things. I feel like so many companies get focused—and people get focused—on assuming that their audience is just like them, where they're coming in with the exact same biases, the exact same experiences. And given that basically no one was as deep in the weeds as you were when it came to configuration management, that meant that you were spending time in that side of the world, not in other pursuits which aligned in some ways more directly with people developing other things. So, I suspect this might be one of the weird things we have in common when we show up and see something new.And a company is really excited. It's like, it's basically a few people talking [unintelligible 00:20:12] that both founders are technical. And they're super excited about something they can't quite articulate. And it's this, “Slow down. Tell me exactly what it is your product does.” And that's a hard thing to do because my default response was always the if I don't understand that is clearly the way in which I am deficient somehow. But marketing is really about clear communication and there's not that much of it in our space, at least not for early-stage companies.Michael: Yeah, I don't know why that is. I mean, I think there's this belief in that there's, like, this buyer audience where there's some vice president that's going to buy your stuff if you drop the right buzzwords. And 15 years ago, like, you had to say ‘synergy,' and now you say ‘time to value' or ‘total cost of ownership' or something. And I don't think that's true. I mean, I think people use products that they like and that they need to be shown them to try them out.So like, why can't your webpage have a diagram and a screenshot instead of this, like, picture of a couple of people drinking coffee around a computer, right? It's basic stuff. But I agree with you, I kind of feel dumb when I'm looking at all these tech products that I should be excited about, and, like, the way that we get there, as we ask questions. And the way that I've actually figured out what some of these things do is usually having to ask questions from someone who uses them that I randomly find on my diminishing circle of friends, right? And that's kind of busted.So, Ansible definitely had a lot of privilege in the way that it was launched in the sense that I launched it off Cobbler list and Cobbler list started off of [ET Management Tools 00:21:34] which was a company list. But people can do things like meetup groups really easily, they can give talks, they can get their blogs reblogged, and, you know, hope for some Hacker News or Reddit juice or whatever. But in order to get that to happen, you have to be able to talk to engineers that really want to know what you're doing, and they should be excited about it. So, learn to talk to them.Corey: You have to speak their language but without going so deep in the weeds that the only people that understand it are the folks who are never going to use your product because they want to build it themselves. It's a delicate balance to strike.Michael: And it's a difficult thing to do, too, when you know about it. So, when I was, like, developing all the Ansible docs, I've told people many times—and I hope it's true—that I, like, spent, like, 40% of my time just on the website and the docs, and whenever I heard somebody complain, I tried to fix it. But the idea was like, you can lose somebody really fast, but you kind of have to forget what you know about the product. So, the worst person to sometimes look at that as the person that built it. So, you have to forget what you know, and try to see, like, what questions they're asking, what do they need to find out? How do they want to learn something?And for me, I want to see a lot of pictures. A lot of people write a bunch of giant walls of text, or worse for me is when there's just these little pithy expressions and I don't know what they mean, right? And everybody's, like, kind of doing that these days.Corey: This episode is sponsored in part by our friends at ChaosSearch. You could run Elasticsearch or Elastic Cloud—or OpenSearch as they're calling it now—or a self-hosted ELK stack. But why? ChaosSearch gives you the same API you've come to know and tolerate, along with unlimited data retention and no data movement. Just throw your data into S3 and proceed from there as you would expect. This is great for IT operations folks, for app performance monitoring, cybersecurity. If you're using Elasticsearch, consider not running Elasticsearch. They're also available now in the AWS marketplace if you'd prefer not to go direct and have half of whatever you pay them count towards your EDB commitment. Discover what companies like Equifax, Armor Security, and Blackboard already have. To learn more, visit chaossearch.io and tell them I sent you just so you can see them facepalm, yet again.48]Corey: One thing that I've really found myself enjoying recently has been your substack-based newsletter, Speaking Techis what you call it. And I didn't quite know what to expect when I signed up for it, but it's been a few weeks now, and you are more or less hitting across the board on a bunch of different things, ranging from engineering design patterns, to a teardown of random company's entire website from a marketing and messaging perspective—which I just adore personally; like that is very aligned with how I see the world—Michael: There's more of that coming.Corey: Yeah, [unintelligible 00:23:17] a bunch of other stuff. Let's talk about, for example, the idea of those teardowns. I always found that I have to be somewhat careful in how I talk about it when I'm doing a tweet thread or something like that because you are talking about people's work, let's be clear here, and I tend to be a lot kinder to small, early-stage companies than I am to, you know, $1.6 trillion companies who really should have solved for this by now, on some level. But so much of it misses the mark of great, here's the way that I think about these things. Here's the way that I don't understand what the hell you're telling me.An easy example of this for me, at least I'm curious to get your thoughts on it, I tend to almost always just skim what they're saying, great. Let's look at the pricing page because I find that speaks to people in a way that very often companies forget that they're speaking to customers.Michael: Yeah, for sure. I always tried to find the product page lately, and then, like, the product page now is, like, a regurgitation of the homepage. But it's what you said earlier. I think I try to stay nice to everybody, but it's good to show people how to understand things by counterexample, to some extent, right? Like, oh, I've got some stuff coming out—I don't know when this is actually going to get published—but next week, where I was like just taking random snippets of home pages, and like, “What's everybody doing with the header these days?”And there's just, like, ridiculous amounts of copying going on. But it's not just for, like, people's companies because everybody listening here isn't going to have a company. If you have a project and you wanted to get it noticed, right, I think, like, in the early days, the projects that I paid attention to and got excited about were often the ones that spend time on their website and their messaging and their experience. So, everybody kind of understands you have to write a good readme now but some of, like, the early Ruby crowd, for instance, did awesome, awesome web pages. They know how to pick out fonts, and I still don't know how to pick out fonts. But—Corey: I ask someone good at those things. That's how I pick ‘em.Michael: Yeah, yeah. That's not my job; get somebody that's good at that. But all that matters, right? So, if you do invest a little bit in not promoting yourself, not promoting your company, but trying to help people and communicate to them, you can build that audience around your thing and it makes it a lot more interesting.Corey: There's so many great tools out there that I find on GitHub that other people have to either point me to or I find it when I'm looking at it from a code-first perspective, just trying to find a particular example of the library being used, where they do such a terrible job of describing the problem that they solve, and it doesn't take much; it takes a paragraph or two at most. Or the idea that, “Oh, yeah, here's a way to do this thing. Just go ahead and get your credential file somewhere else.” Great. Could you maybe link to an example of how to do that?It's the basic stuff; assume that someone who isn't you might possibly want to use this. And I'm not even slightly suggesting that you wind up talking your way through how to do all of that. Just link to somewhere that has a good write-up of it and call it good. Just don't get in the way of people's first-time user experiences.Michael: Yeah, for sure. And—Corey: For some reason, that's a radical thought.Michael: Yeah, I think one of the things the industry has—well, not the industry; it's not their problem to solve, but, like, we don't really have a way for people to find what's cool and interesting anymore. So, various people have their own little lists on GitHub or whatever, but there's just so many people posting on the one or two forums people read and it goes by in a day. So, it's really, really hard to get attention. Even your own circle of followers isn't really logging into Twitter or anything, or LinkedIn. Or there's all the congratulations for your five years of Acme Corp kind of posts, and it's really, really hard to get attention.And I feel for everybody, so like, if somebody like GitHub or Microsoft is listening, and you wanted to build, like, a dashboard of here's the cool 15 projects for the week, kind of thing where everybody would see it, and start spotlight some of these really cool new things, that would be awesome, right?Corey: Whenever you see those roundups, that was things like Kubernetes and Docker. And great, I don't think those projects need the help in the same way.Michael: No, no, they don't. It's like maybe somebody's cool data thing, or a cool visualization, or the other thing that's—it's completely random, but I used to write fun graphics programs for fun or games and libraries. And I don't see that anymore, right? Maybe if you find it, you can look for it, but the things that get people excited about programming. Maybe they have no commercial value at all, but the way that people discover stuff is getting so consolidated is about Docker and Kubernetes. And everyone's talking about these three things, and if you're not Google or you're not Facebook, it's really—or Amazon, obviously—it's hard to get attention.Corey: Open-source on some level has changed from a community perspective. And part of it is because once upon a time, you could start with the very low-level stuff and build something, get it up and working. And that's where things like [Cobbler 00:27:44] and Ansible came out of. Now it's, “Click the button and use the thing everyone else is using. And if you're not doing that, what are you doing over there?”So, the idea of getting started tinkering with computers are built on top of so many frameworks and other things. And that's always been the case, but now it's much more apparent in some ways. “Okay, I'm going to go ahead and build out my first HTML file and serve it out using something in Node.” “Great, what is those NPM stuff that's scrolling past?” It's like, “The devil. That is the devil's own language you are seeing scroll past. And you don't need to worry about that; just pretend it's not there.”But back when I was learning all this stuff, we're paying attention to things scrolling past, like, you know, compilation messages and the Linux boot story as it wound up scrolling past. Terrible story; the protagonist was unreliable, but all right. And you start learning how these things work when you start scratching at the things that you're just sort of hand-waving and glossing over. These days, it feels like every time I use a modern project, that's everything.Michael: I mean, it is. And like what, React has, like, 2000 dependencies, right? So, how do you ever feel like you understand it? Or when recruiters are asking for ten years at Amazon. And then—or we find a library that it can only explain itself by being like this other library and requiring these other five.And you read one of those, and it becomes, like, this… tree of knowledge that you have no way of possibly understanding. So, we've just built these stacks upon stacks upon stacks of things. And I tend to think I kind of believe in minimalism. And like, wouldn't it be cool if we just burned this all and start—you know, we burn the forest and let something new regrow. But we tend to not do that. We just—now running a cloud on top of a cloud, and our JavaScript is thousands of miles high.Corey: I really wish that there were better paths for getting started. Like, I used to think that the right way to wind up learning how all this stuff work is to do what I did: Start off as, you know, the grumpy sysadmin type, and then—or help desk—and then work your way up and the rest. Those jobs aren't there anymore, and it doesn't leave people in a productive environment. “Oh, you want to build a computer game. Great. For an iPhone? Terrific.” Where do you go to get started on that? It's a hard thing to do.And people don't care at that scale, nor should they necessarily, on how to run your own servers. Back in the day when you wanted to have a blog on the internet, you were either reduced to using LiveJournal or MySpace, or you were running your own web server and had to learn how to make sure that it didn't become an attack platform. There was a learning curve that was fairly steep. Now, there are so many different paths to go down, you don't really need to know how any of these things even work.Michael: Yeah, I think, like, one of the—I don't know whether DevOps means anything as a topic or not, but one of the original pieces around that movement was systems administrators learning to code things and really starting to enjoy it, whether that was Python or Ruby, and so on. And now it feels like we're gluing all the things together, but that's happening in App Dev as well, right? The number of people that can build a really, really good library from the ground up, like, something that has C bindings, that's a really, really small crowd. And most of it, what we're doing is gluing together other people's libraries and compensating for the flaws and bugs in them, and duct tape and error handling or whatever. And it feels like programming has changed a lot because of this—and it's good if you want to get an idea up quickly, no doubt. But it's a different experience.Corey: The problem I always ran into was the similar problems I had with doing Debian packaging. It was always the, oh, great, there's going to be four or five different guides on how to do it—same story with RPM—and they're all going to be assuming different things, and you can crossover between them without realizing it. And then you just do something monstrous that kind of works until an actual Debian developer shoves you aside like you were a hazard to everyone around you. Let me do it for you. And there we go.It's basically, get people to do work for you by being really bad at it. And I don't love that pattern, but I'm still reminded of that because there are so many different ways to achieve any outcome that, okay, I want to run a ridiculous Hotdog or Not Hotdog style website out there. Great. I can upload things. Well, Docker or serverless? What provider do I want to put it on? And oh, by the way, a lot of those decisions very early on are one-way doors that you don't realize you're crossing through, as well as not knowing what the nuances of all of those things are. And that's dangerous.Michael: I think people are also learning the vendor as well, right? Some people get really engrossed in whether it's Amazon, or Google, or HashiCorp, or somebody's API, and you spend so much of your brain cells just learning how these people's systems work versus, like, general programming practices or whatever.Corey: I make it a point to build something on other cloud providers that aren't Amazon every now and then, just because I don't want to wind up effectively embracing a monoculture.Michael: Yeah, for sure. I mean, I think that's kind of the trend I see with people looking just at the Kubernetes stuff, or whatever, it's that I don't think it necessarily existed in web dev; there seems to be a lot of—still a lot of creativity and different frameworks there, but people are kind of… what's popular? What gets me my next job, and that kind of thing. Whereas before it was… I wasn't necessarily a sysadmin; I kind of stumbled into building admin tools. I kind of made hammers not houses or whatever, but basically, everybody was kind of building their own tools and deciding what they wanted. Now, like, people that are wanting to make money or deciding what people want for them. And it's kind of not always the simplest, easiest thing.Corey: So, many open-source projects now are—for example, one that I was dealing with recently was the AWS CLI. Great, like, I'm thrilled to throw in issues and challenges here, but I'm not going to spend significant time writing code against it because, one, it's basically impossible to get these things accepted when all the maintainers work at Amazon, and two, is it really an open-source project in the way that you and I think about community and the rest, but it's basically sole purpose is to funnel money to Amazon faster. Like, that isn't really a community ethos I feel comfortable getting behind to be perfectly honest. They're a big company; they can afford to pay people to build these things out, full time.Michael: Yeah. And GitHub, I mean, we all mostly, I think, appreciate the fact that we can host the Git repo and it's performant and everything, and we don't have blazing unicorns quite as often or whatever they used to have, but it kind of changed the whole open-source culture because we used to talk about things on mailing lists, like, what should this be, and there was like an upfront conversation, or it might happen on IRC. And now people are used to just saying, “I've got a problem. Fix it for me.” Or they're throwing code over the wall and it might not be the code or feature that you wanted because they're not really part of your thing.So before, people would get really engrossed with, like, just a couple of projects, and if they were working on them as kind of like a collective of people working against different organizations, we'd talk about things, and they kind of know what was going on. And now it's very easy to get a patch that you don't want and you're, like, “Oh, can you change all of these things?” And then somebody's, like, now they're offended because now they have to do all this extra work, whereas that conversation didn't happen. And GitHub could absolutely remodel themselves to encourage those kinds of conversations and communities, but part of the death of open-source and the fact that now it's, “Give me free code,” is because of that kind of absence of the—because we're looking at that is, like, the front of a community versus, like, a conversation.Corey: I really want to appreciate your taking so much time out of your day to basically reminisce about some of these things. But on a forward-looking basis, if people want to learn more about how you see things, where's the best place to find you?Michael: Yeah. So, if you're interested in my blog, it's pretty random, but it's michaeldehaan.substack.com. I run a small emerging consultancy thing off of michaeldehaan.net. And that's basically it. My Twitter is laserllama if you want to follow that. Yeah, thank you very much for having me. Great conversation. Definitely making all this technology feel old and busted, but maybe there's still some merit in going back—Corey: Old and busted because it wasn't built this year? Great—Michael: Yes.Corey: —yes, its legacy, which is a condescending engineering term for ‘it makes money.' Yeah, there's an entire universe of stuff out there. There are companies that are still toying with virtualization: “Is this something we get on board with?” There's nothing inherently wrong with that. I find that judging what a bunch of startups are doing or ‘company started today' is a poor frame of reference to look at what you should do with your 200-year-old insurance company.Michael: Yeah, like, [unintelligible 00:35:53] software engineering is just ridiculously new. Like, if you compare it to, like, bridge-building, or even electrical engineering, right? The industry doesn't know what it's doing and it's kind of stumbling around trying to escape local maxima and things like that.Corey: I will, of course, put links to where to find you into the [show notes 00:36:09]. Thanks again for being so generous with your time. It's appreciated.Michael: Yeah, thank you very much.Corey: Michael DeHaan, founder of Cobbler, Ansible, and oh, so much more than that. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice—and/or smash the like and subscribe buttons on the YouTubes—whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, smash the buttons as mentioned, and leave a loud, angry comment explaining what you hated about it that I will then summarily reject because it wasn't properly formatted YAML.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.

BSD Now
453: TwinCat/BSD Hypervisor

BSD Now

Play Episode Listen Later May 5, 2022 45:13


Building Your Own FreeBSD-based NAS, Writing a device driver for Unix V6, EC2: What Colin Percival's been up to, Beckhoff releases TwinCAT/BSD Hypervisor, Writing a NetBSD kernel module, and more. NOTES This episode of BSDNow is brought to you by Tarsnap (https://www.tarsnap.com/bsdnow) and the BSDNow Patreon (https://www.patreon.com/bsdnow) Headlines Building Your Own FreeBSD-based NAS (https://klarasystems.com/articles/building-your-own-freebsd-based-nas-with-zfs/) Writing a device driver for Unix V6 (https://mveg.es/posts/writing-a-device-driver-for-unix-v6/) News Roundup FreeBSD/EC2: What I've been up to (https://www.daemonology.net/blog/2022-03-29-FreeBSD-EC2-report.html) Beckhoff has released its TwinCAT/BSD Hypervisor (https://www.automationworld.com/control/article/22144694/beckhoff-hypervisor-enables-virtual-machines-for-control-applications) Writing a NetBSD kernel module (https://saurvs.github.io/post/writing-netbsd-kern-mod/) Benedicts Git Finds Projects Run anything (like full blown GTK apps) under Capsicum (https://github.com/unrelentingtech/capsicumizer) Twitter client for UEFI (https://github.com/arata-nvm/mitnal) n³ The unorthodox terminal file manager (https://github.com/jarun/nnn) OpenVi: Portable OpenBSD vi for UNIX systems (https://github.com/johnsonjh/OpenVi) Gists and Articles Step-by-step instructions on installing the latest NVIDIA drivers on FreeBSD 13.0 and above (https://gist.github.com/Mostly-BSD/4d3cacc0ee2f045ed8505005fd664c6e) FreeBSD SSH Hardening (https://gist.github.com/koobs/e01cf8869484a095605404cd0051eb11) GTFOBins is a curated list of Unix binaries that can be used to bypass local security restrictions in misconfigured systems (https://gtfobins.github.io) Tarsnap This weeks episode of BSDNow was sponsored by our friends at Tarsnap, the only secure online backup you can trust your data to. Even paranoids need backups. Feedback/Questions Ben - Backing Up (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Ben%20-%20Backing%20Up.md) Ethan - Thanks (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Ethan%20-%20Thanks.md) Maxi - question about note taking (https://github.com/BSDNow/bsdnow.tv/blob/master/episodes/453/feedback/Maxi%20%20-%20question%20about%20note%20taking.md) Send questions, comments, show ideas/topics, or stories you want mentioned on the show to feedback@bsdnow.tv (mailto:feedback@bsdnow.tv) ***

Screaming in the Cloud
The Magic of Tailscale with Avery Pennarun

Screaming in the Cloud

Play Episode Listen Later May 4, 2022 41:29


About Averywvdial, bup, sshuttle, netselect, popularity-contest, redo, gfblip, GFiber, and now @Tailscale doing WireGuard mesh. Top search result for "epic treatise."Links Referenced: Webpage: https://tailscale.com Tailscale Twitter: https://twitter.com/tailscale Personal Twitter: https://twitter.com/apenwarr TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Generally, at the start of these shows, I mention something about money. When I have a promoted guest, which means that they are sponsoring this episode, I talk about that. This is not that moment. There's no money changing hands here.And in fact, I'm about to talk about a product that I am a huge fan of, but I'm, also as of this recording, not paying for. So, one might think I'm the product, but no. Let's actually start by talking about money. My guest today is Avery Pennarun, the CEO of Tailscale, and as of today, being the day that this goes out, you folks have just raised $100 million in a Series B. First, thank you for joining me, followed immediately by congratulations.Avery: It's great to be here, and thank you. It's an exciting announcement that I hope we don't end up spending too much time talking about because money is a lot more boring than technology. But yeah, we are very happy, both to be here and to be making the announcement.Corey: Yeah. CRV and Insight Partners are the lead investors on the round. And it's great to see because I've been using Tailscale for a while now. And it is a transformative experience for the way that I think about these things. A while back, I wrote a Lambda layer that lets Lambda functions take advantage of it, but in fairness, I did write it, so anyone looking at that should—“Haha, that's why you're not a developer full-time. You're bad at it.” Yes, I am.But I can't stop raving about how useful Tailscale is, with the counterpoint that it's also very difficult to explain to people who are not—at least in my experience—broken in a very particular way, as I am. What is Tailscale? And what does it do?Avery: Right. Well, I mean, first of all, one of the things I really like about Tailscale and what we built is that, you know, even if you're not a super great developer—like you just described yourself—you can get excited about it, you can use it for things, you can build on top of it, and contribute back without having to understand every single little detail of what it does, right? Tailscale is something that a lot of people get excited about without having to know how it works; they just know what it gives them, right? The answer to what Tailscale is, is sort of… it can be hard to explain to people who don't know about the kinds of problems that it solves, but the super short answer is it connects all of your devices and virtual machines and containers to each other, wherever they are, without going through an intermediary, right? So, it minimizes latency and it maximizes throughput, and it minimizes pain. And it sounds like that should be hard, but you can get it all done in, like, five minutes.Corey: I have been using it for a while now. Originally, I was using it and federating through it I believe, via Google. I rebuilt and tore down the entire network in about five minutes, instead started federating through GitHub. Nowadays, you apparently changed your position on that identity and you use third-party SSL sources, as well as retaining user information and login stuff yourselves, which is just, it's almost starved for choice, on some level. But I am such a fan of the product that if you'll forgive me if I talk for about a minute or so on how I use it and my experience of it.Avery: Go for it.Corey: So, I wind up firing up Tailscale, and I have a network that from any of my devices, I can talk to any other. I have a couple of EC2 machines hanging out in AWS, I have a Raspberry Pi that I use as a DNS server sitting in the other room, I have my iPad, I have my iPhone, I have my laptop, I have my desktop, I have a VM sitting over in Google Cloud, I have a different VM sitting over an Oracle Cloud. And all of these things can talk to each other directly over a secured network. I can override DNS and talk to these things just by the machine name, I can talk to them via the address that winds up being passed out to them through this. It is transformative. It works on IPv4, IPv6, if I'm on a network without IPv6 access using Tailscale, suddenly I can.I can emerge from almost any other node on this network. And adding a new device to this is effectively opening a link in a browser on either that device or a different one, clicking approve once I log in, and it's done. That is my experience of it, so far. Is that directionally correct as far as how you think about the product? Because again, I use DNS TXT records as a database for God's sake. I am probably not the world's foremost technical authority on the proper use of things.Avery: Right. Yeah. I mean, that's a good description of what it does. I think it actually—it's weird, right? It's hard to get across in words just how simple it is, right?That one-minute description used a bunch of technical-sounding terminology that probably the listeners to your podcast will understand. But, like, the average tech person doesn't need to know any of those things in order to use Tailscale, right? You download it from the app store on your phone and your laptop. And you install Tailscale on both from the App Store. You log into your Google account or your GitHub account, and that's it. Those two devices are tied together in time and space; they can see each other. You can access a web server that you're running on your laptop from your phone without doing anything else, right?And then you can start a VM in AWS and you load Tailscale in there, and now that's part of your network. And so, there's—you don't need to know what IPv4 and IPv6 even are. You don't need to know what DNS even is. It just, you know, the magic sort of comes together. We do a ton of stuff behind the scenes to make that magic work. But it's this —one thing that one customer said to us one time is, like, “It makes the internet work the way you thought the internet worked until you learned how the internet worked.” If that makes sense.Corey: Right. It basically works on duct tape and toothpicks all spit together, and it's amazing that it works at all. I mean, this is going to sound relatively banal, but the way that I've used Tailscale the most is on my phone or on my iPad or on my Mac. I will connect to the Tailscale network by default, and when that is done, it passes out my pi-hole's IP address as the custom DNS server for the entire network. So, I don't see a whole bunch of ads, not just in browser, but in apps and the rest.And every once in a while when something is broken because an ad server is apparently critical to something, great, I turn off the VPN on that device, use the natural stuff. My experience of the internet gets worse as a result and the thing starts working again, then I turn it back on. It is more or less the thing that I use as a very strange-looking ad blocker, in some respects, that I can toggle on and off with the click of a button. But it's magic, it is effectively magic. From the device side, it's open up an app and toggle a switch, or it is grab from the menu bar on a Mac, there's an application that runs and just click the connect button or the disconnect button.There is no MFA every time you connect. There is no type in a username and password. There is no lengthy handshake. I hit connect and it is connected by the time I have moved the mouse back from the menu bar to the application I was working in. Whenever I show this to someone who uses a corporate VPN, they don't believe me.Avery: Right. Yeah, exactly. It's hard to believe. It's like, “Hey, did anything actually happen here?” Because we removed you know, for example, it doesn't by default catch all your traffic, it only catches the traffic to your private network, so it's safe to leave it on all the time because it's not interfering with what you're doing.What you're describing is using Pi-Hole, which is a Raspberry Pi-based DNS server that is an ad blocker, most people using Pi-Hole have one at home, so when they're at home they get ads blocked, but when they leave home they don't get their ads blocked. If you add Tailscale to that, you can use your Pi-Hole even when you're not at home, and it sort of makes it that much more useful. I think an important difference from, say, other services that you can use an adblocker or a privacy VPN is that we never see your traffic, right? Tailscale creates a private network between you and all your personal devices, and that private network is private even from us, right? We help you connect the devices to each other, but when your traffic goes to Pi-Hole, it's your Pi-Hole. It's not our adblocker. It's your adblocker, right, so we never see what traffic you're going to, we never see what DNS names you're looking up because it was just never made available to us, right?Corey: Right. But did you do—the level of visibility you have into my network is fascinating in a variety of different ways, but it is also equally fascinating—one of those ways—is that how limited it is. You know what devices I have, the last time they've connected, the version of Tailscale they're running, an IP address on it, and you also wind up seeing what services are advertised and available on those networks if I decide to enable that. Which is great for things like development; I'm going to be doing development in a local dev sense on an EC2 instance somewhere. And well, I don't want to set up a tunnel with SSH to wind up having to proxy traffic over there just so I can wind up hitting some high port that I bound to, and I certainly don't want to expose that to the general internet; that is a worst practice for all these things.And Tailscale magically makes this go away. I haven't done this in much depth yet with a variety of my team members, but when you start working on this with teams who are doing development work, someone can have something running on their laptop and just seamlessly share it with their colleagues. It's transformative, especially in an area where very often that colleague is not sitting in the same room getting the greasy fingerprints on your laptop screen.Avery: Yep. Yeah, exactly. So, you mentioned the services list which you have to specifically opt into, and the reason we did that is that, you know, the list of devices and hostnames and IP addresses, we have to collect because that's how the service works, right? You send us the information about your devices, and then we send the public keys for those devices to the other devices. We can't get out of collecting that, whereas the services list is purely an interesting add-on feature, and we decided that we didn't want to collect that by default because it would make people nervous about their privacy.So, if you want that feature, you click it on; if you don't want it, don't turn it on, you can still share services with people inside your network; they just need to know that those services exist. You send them the URL or whatever and it'll work, but it doesn't show up as a list of things that we can see in that case. But yeah, sharing stuff between your coworkers is definitely… is a major use case for Tailscale and dev and infrastructure teams in particular. Like, you can—designers, for example, run a test version of the website on their laptop, and then they say, “Hey, visit this URL on my laptop.” And you don't have to be in the same office, you can both be sitting in different cafes in different cities. Tailscale will make it so that the connection between those two computers still works, even if they're both behind firewalls, even if they're both behind different NATs, and so on.Corey: One of the things that astounded me the most; I am reluctant to completely trust things that are new that touch the network. Early on in my career, I made network engineering mistake 101, which is making a change to the firewall in your data center without having another way in. And the drive across town or calling remote hands to get them to let you back in and when you locked things out. Because you folks are building these things on a pretty consistent clip; there are a lot of updates and releases across all of the platforms. And invariably, I find myself on some devices version behind or so, just because of the pace of innovation. “Oh, great. We're updating the VPN client. Cool. So, I'm going to expect this thing to drop and I'm going to have to go in and jigger it to get it working again.”That has never happened. I have finally given in to, I guess, the iron test of this, and I have closed SSH from the internet to most of these nodes. In fact, some of them sit —the Pi-Hole sitting at home, if you're not on my home network, there is no outside way in without breaking in. It is absolutely one of those things that disappears into the background in a way that I was extraordinarily surprised to find.Avery: Right. Well, that is something—I mean, I'm old and grumpy, I guess, is sort of the beginning part of all this, right? I've seen all this annoying stuff that happens with software. And, you know, and many of us, in fact, at Tailscale are old and grumpy, and we just didn't want to repeat those same things. So, first of all, network stuff to an even stronger degree than virtually any other kind of product, if your network stops working, everything stops working, right, so it's number one priority that Tailscale has to not mess up your network.Because if it does, you instantly lose faith. There's kind of like—Tailscale gives you this magical feeling when you first install it, but that feeling of magic goes away very quickly the first time it screws something up and you can't connect when you really need to. So, we put a huge amount of work into making sure that you can connect when you really need to. We have a lot of automated tests. One of our policies that I think is almost unheard of is that we intend to never deprecate support for older versions of the Tailscale client.And to this day, we're about three years into Tailscale, we've never deprecated an old client that anybody is using. So eventually, people—though in fact hard to believe, but eventually, people do stop using some old versions, so those ones don't work anymore, necessarily. But any version of Tailscale that is in use today is going to keep working as long as anybody is using it. We have a very, very, very strong backwards compatibility policy. Because the worst thing that I can imagine is having some Raspberry Pi sitting out in the void somewhere that I haven't looked at for two years, that whoops, Tailscale broke it, and now I can't connect to it, and now I have to go drive down there and fix it, right? It would be just insultingly terrible for that to happen.And we just make sure that doesn't happen. Another thing that people get excited about is, like, on a Debian system or whatever, if you've got the Debian package installed, you can do an apt-get upgrade. Tailscale upgrades and even your SSH session doesn't drop. Every now and then people [comment and was like 00:14:13] —Corey: That was the weirdest part. I was expecting it to go away or hang for a long period of time. And sure, I guess it might drop a packet or so, I've never bothered to look because it is so seamless.Avery: Right. Yeah, exactly. It's just, like, “Wait. Did anything even happen?” It's like, “Yes”—Corey: Right—Avery: —“Something happened. We upgraded it out from underneath you.”Corey: —my next thing is [crosstalk 00:14:28]—yeah, I grep Tailscale on the process table. Like, okay, is this just a stale thing that's existing [unintelligible 00:14:34] to bounce it? No, it has just been started. It was so seamless under the hood that it was amazing. There is something that is—a lot of things have been very deeply right on this.Something else that I think is worth pointing out is that if any company had the brainpower there to roll their own crypto, it would be you folks, but you don't. You're riding on top of WireGuard, an open-source project that does full-mesh VPNs with terrible user interfaces.Avery: Yep. So, you know, I guess disclosure. Back in 1997 when I started my first startup, I was not smart enough to not roll my own crypto. And therefore the VPN I wrote at the time definitely had giant security holes. It was also not that popular, so nobody found them. But I, you know eventually I found [crosstalk 00:15:21]—Corey: “Except a bank, which I really shouldn't disclose.” Kidding, I'm kidding. But yeah.Avery: [laugh]. No, no, no. The bank never used that software. [laugh]. But yeah. Nowadays, I've been through a lot, and I… I would not describe myself as a security expert. Although people often describe me as a security expert. I don't know what that means. But I am enough of an expert to know that I should not be rolling my own crypto. And the people who invented WireGuard, it's one of the—I feel like I'm overstating things, but I'm not—it's one of the biggest leaps forward in cryptography, in probably the history of computing. Now, it builds on a series of things that are part of the same leap forward, right? It's built on the protocol that Signal uses called the Noise Protocol, right? Signal and Noise are built on the Ed25519 curve, made by —or popularized by Dan Bernstein who's a major cryptographer in this area. Sometimes popular, sometimes—Corey: Oh, djb.Avery: —not popular. Yeah, exactly.Corey: He also, near and dear to my heart, wrote djbdns, which was a well-known, widely deployed DNS server, by which I of course mean database. Please, continue.Avery: Yep. [laugh]. I've been a huge fan of basically everything djb has ever made in the history of—Corey: Oh, you're a qmail person. I am on the postfix side of [unintelligible 00:16:37].Avery: Yep. Well, my first startup back in 1997, we made Linux-based server appliances for small businesses. And we use qmail, we use djbdns, we used a couple of other djb products. And you know, for the history of that product—you know, leaving aside my VPN that was a security hole—the djb stuff never had a single problem. That company was eventually acquired by IBM.One of the first things IBM did is, like, “Whoa, djb has a super-weird software license. We can't be doing this. Let's replace it with software that has a decent license.” So, they dropped out djbdns and started using BIND. Within a week, there was a security hole in BIND that affected all of these appliances that they now controlled, right?So, djb is a very big-brained, super genius in security, whatever you might think of his personality. And it's sort of like was the basis for this revolution in cryptography that WireGuard has sort of brought to the networking world. And it's hard to overstate. Just, like, the number of lines of code, there's something like 100 times less code to implement WireGuard than to implement IPsec. Like, that is very hard to believe, but it is actually the case.And that made it something really powerful to build on top of. Like, it's super hard for somebody like me to screw up the security of a WireGuard deployment, where it's very easy to screw up the security of an IPsec deployment.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of “Hello, World” demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking, databases, observability, management, and security. And—let me be clear here—it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself, all while gaining the networking, load balancing, and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free, you can do things like run small-scale applications or do proof-of-concept testing without spending a dime. You know that I always like to put asterisks next to the word free? This is actually free, no asterisk. Start now. Visit snark.cloud/oci-free that's snark.cloud/oci-free.Corey: I just want to call something out as well, that when I say that you folks definitely have the intellectual firepower to roll your own crypto should you choose to do so, but you chose not to, if anything, I'm understating it. To be clear, one of the blog posts you had somewhat recently out was how you are maintaining what is effectively your own fork of the Go programming language. Which is one of those things when someone hears that it's like, “I'm sorry, can you say that again? Because I am almost certain I misunderstood something.” What is the high-level version of that?Avery: Well, there's, I think, two important points there. One of them is that yes, we did fork the Go programming language; it's supposed to be a temporary fork because it allows us to do some experiments with the go back-end. And the primary reason we were able to do that is because we employ a couple of people who used to be on the core Go team. And that was not because we went out looking for people who used to be on the core Go team, that's just how it worked out. But because we do, it's easier for them to fork Go than it would be for the average person, and in many ways, it's easier for them to get their job done by just continuing to work on the codebase they've already worked on.But the second point is actually, as compilers go, the Go compiler is probably the very easiest one I've ever seen to be able to fork and edit. Like it's super-clear code, you're just editing Go code, which is already pretty easy. But they really put a ton of work into making it readable and understandable. So, like, average people actually can fork the Go compiler and not be completely bamboozled by how difficult everything is, right? Compared to, like, GCC where just building the thing is something that takes you weeks to learn how to do, right, Go is just, like, you run this script and build your compiler [unintelligible 00:19:35]—Corey: Yeah. Let me clear this quarter on my schedule so I can go ahead and do that. Yeah, no, thank you.Avery: Yeah. I've built copies of GCC and it's absolutely nightmarish, right? And built people's forks of GCC for special embedded processors and stuff. And this is, like, a f—this is a career that you can specialize in, building GCC, right? There are people that do this, right? And the Go compiler, it's really—Corey: Well, it's 40 years of load-bearing technical debt.Avery: Yeah. Yeah. But the Go compiler. It's very nice; it's just a program that's written in Go, that compiles under Go, and then you end up with one binary, right? And as long as you have that binary, everything just works, right? And so, it's actually surprisingly easy to fork Go. I don't want to—you know, I wouldn't put that on the same level of difficulty as, like, not screwing up cryptography, if you're trying to do it yourself. [crosstalk 00:20:16]Corey: [crosstalk 00:20:16] their own crypto algorithm that they themselves can't defeat. Yeah, it turns out that basically, breaking crypto is a team sport. Who knew?Avery: Yeah. Exactly. Generally, with security, you have this problem a lot, right? It's a lot harder to build a system that nobody can break into, than it is to break into a random system, right? Because you know, the job of securing something against everybody is much harder than the job of finding something you can break into.Corey: So, I did have a question about something you said earlier, where one of the use cases—one of the design goals—is not to have a breaking change to a point where an old device cannot still connect to the private network. But you do have a key expiry for devices where a device needs to relog in, and it can be anywhere between 3 and 180 as I look at it. I don't know if some of the more enterprise-y options have longer options that they can set, but what happen—how do you not have to drive out to the back of beyond to re-authenticate that Raspberry Pi every six months?Avery: Ah. So, this is something, it's at the policy layer, and we have not finished refining this to perfection, I would say, right now. What we do have though, if your key does expire, there's a button in the admin panel to say, like, boost this device for a little bit longer. Sort of unexpire it for another 30 minutes—I don't remember what the—how much time it is—then you can SSH into the device and do a proper key refresh on it without actually having to drive out there. Now, we did for one version, accidentally break the key reactivation feature so that if the client noticed it's key is expired, it actually disconnected from the Tailscale network altogether and then didn't receive the message to, like, “Hey, could you please increase the length of your key?” That was fixable by power cycling it, which you could often get somebody to do without driving all the way out there. But we fixed that, so now that—Corey: “Have you tried turning it off and back on again,” is still a surprisingly effective way of troubleshooting something.Avery: Yeah, exactly. So, that wasn't—I mean, it was kind of annoying for some people. But yeah, the reason we use, by default, every key always expires is because unlimited time credentials are one of the worst security holes that people don't really acknowledge. Because technically, it'll never be the, like—you know, it'll never show up as the highest severity security hole that you have an unlimited time credential sitting in your home directory, but it is something that—well, I can tell a story. There is a company that I heard about that had you know—SSH keys are typically unlimited time credentials; the easiest way to do it is you run ssh-keygen, it puts something in your home directory, you copy the public key to all the devices you want to be able to log into, and then you never think about it again.So, this is a company that, of course, every developer in their company had done this; they had a production network with a bunch of SSH keys in it. Some not very ethical employee worked there, had keys in their production systems, and eventually got fired. Now, of course, this company had good processes in place, they went through all the devices and took out this person's public key from all the devices. What they didn't know is that during lunch one day, this person had gone around to all their coworkers' workstations that hadn't been locked, downloaded the private keys for those people on his—Corey: Oh no.Avery: —computer before he got fired. And so, shortly after he got fired, their entire production network got wiped out. Now, they didn't have enough forensics at the time to know how it all got wiped out, so they spent some time putting it all back in place, this time with forensics. About a month later—they rebuilt everything from scratch, all new public keys and everything. You couldn't possibly have any backdoors in this system, right?And then a month later, it all got wiped out again. This time, the forensics revealed and, like, it was one of the existing employees, coming from a different country, that had gotten into their private production network and wiped everything out. How did that happen? It was because this person had years earlier, downloaded all their public—or private keys when he wandered around through the office. You can fix this problem instantly, by just expiring your keys and forcing your rotation periodically, right?SSH doesn't make that very easy. You can with SSH setup, SSH certificate authentication, which is a huge ordeal to get configured, but once it's working, it solves this particular problem, right? Tailscale [crosstalk 00:24:19]—Corey: On Mac and iOS, there is a slight improvement to this that I'm a big fan of because I agree with you. I am lousy at rotating my keys, but there's an open-source project called Secretive that I use on the Mac that stores the private key in the Secure Enclave, which the Mac will not let out of it. And I have to use Touch ID to authenticate every time I want to connect to something. Which can get annoying from time to time, but there is no way for someone to copy that off. Historically, I would—Avery: That's true.Corey: Have a passphrase that was also tied to the key so if someone grabbed it off the disk, it still theoretically would not be usable. And that was—but again, that is an absolute vector that needs to be addressed and thought about. Key rotation is huge.Avery: And you have to go through this effort to sort it all out, right? So Tailscale, we just have this policy: We don't do unlimited length credentials; we do key rotation for everything, and we just sort of set different time limits for this rotation depending on how picky you want to be about it. But any key expiry is much, much better than no key expiry. Even if you set it to a six-month key expiry, you still have at least it's only the six-month window that somebody could theoretically reuse your keys. And we can also rotate keys behind the scenes and so on.So, in the SSH case, the way people use Tailscale, you stopped opening the SSH port to the world. You're only SSH when you're connected over Tailscale. The fact that your Tailscale keys rotate and expire over time is what protects your SSH session. So, you could keep using static SSH keys that never expire—don't try to figure out all this other complicated stuff, right—and you're still protected from these private SSH, like, unlimited length keys. Now, that said, for servers, Tailscale does have a button where you can say, like, “Please stop expiring the key.” This is a server, nobody's ever going to get physical access to the machine.The only thing we could do with the private key for this machine is allow other people to SSH into it, which is not very dangerous, right? It's pretty much, like, somebody stealing your SSH authorized keys file; like, it doesn't really matter. And for that case, you turn off the expiry altogether. But expiring keys is intended for use by, like, devices that employees are actually holding in their hands where if it expires, it's no big deal, you push the login button and it refreshes.Corey: There's something that is very nice about dealing with something that is just so sensible. I mean, we've all—at least in the olden days of running sysadmin stuff, we had this problem we would generate—or purchase back in those days—SSL certificates and, great, they expire to a year or so at the end of the year, people forget, and then it would expire you to run around fixing this. And the default knee-jerk response was that was awful. Let's get the next one for five years so we didn't have to think about it that long.And it's always a wildcard and so it gets put all over the place, and you wind up with these problems. One of the things that Let's Encrypt has done super well is forcing a rotation every 90 days so you know where it is. It's just often enough you want to automate it. And ACM, the AWS certificate manager that they use, takes a slightly different approach. It doesn't give you the private key; it embeds it in other places so they can handle the rotation themselves.And they start screaming in your email if they can't verify that it's time for renewal long before it hits. It's different approaches to the problem, but yeah, five years out, how should I know all the places the certificate has wound up in that intervening time? Most of the people who did it aren't there anymore. And one day, surprise, a website breaks, either because its SSL cert isn't working, or one of the back-end services it depends on suddenly doesn't have that working. It's become a mess, so having a forced modernity to these things is important.Avery: Right. It's forced modernity, and it's just basically, it's all behind the scenes. Like, you don't even think about the fact that Tailscale gave you a key because that is not relevant to your day-to-day life, right? You logged in, something happened, all these devices ended up on your network. What actually happened is that public and private keys—you know, a private key was generated, the public keys were distributed properly, things are getting rotated, but you don't have to care about all that stuff.So, it's fun that Tailscale is what we call secure by default, right? People love to use it because it's easier, it makes their life easier, but security teams like it because actually, it changes the default security posture from, like, “Ugh, I'm going to have to tell everybody to please stop doing these five things because it always creates security holes,” to like, “Whoa, the thing that they're going to do most naturally is actually going to be safe.” Right? I really like that about it. You're not thinking about certificates, but their certificates are getting rotated exactly as they should be.Corey: There's just something so nice about computers doing the heavy lifting for us. It's one of the weird things about Tailscale is it falls into a very strange spot where there is effectively zero maintenance burden on me, but I still use it to toggle it on or off in scenarios often enough to remember that it's there and that I'm using it. It is the perfect sweet spot of being somewhat close to top of mind, but never in a sense that is, “Oh, I got to deal with this freaking thing again.” It never feels that way. Logging into it, it has long-lived sessions at the browser, so it isn't one of those, ah, you have to go back to GitHub and re-authenticate and do all these other dog-and-pony show things. It just works. It is damn near a consumer-level of ease-of-use, start to finish. The hard part, of course, is how on earth you explain this to someone [laugh] without a background in this space.Avery: Yeah, exactly. It's something we ask ourselves sometimes is, like, well, you know, Tailscale is great for developers right now. It is easy enough to use, even for consumers, but, like, how would you explain it to consumers and find a good use case for consumers? And it's something that I think we are going to do eventually, but it hasn't been, up until now, a super high priority for us just because developers are this sort of like the core audience that we haven't even finished building a great product that does everything that they want, yet. There is one little feature in Tailscale that's the beginning of something that's consumer-friendly; it's called Taildrop.I don't know if you've seen this one. You can turn it on, and basically, it acts like AirDrop in Apple products, except you don't need to care about physical proximity and it works with every kind of device, not just Apple devices, right? So, you can add it as—it shows up in the share pane on your Mac OS or Windows or iOS device. You can use it from Linux, you just use it to send files of any type, and it sends them point to point not through a cloud provider so that we never see a copy of the file. It only goes between your devices over your encrypted network. So, that's something that consumers kind of like.Corey: Feels like Tailprint for Bonjour could wind up being another aspect of this as well. And I'm still hoping for something almost Ansible-like where run the following command, whether it's pre-approved or not, on a following subset of things. In my case, for example, it's, I would love it if it would just automatically, when I press the button, update Tailscale across all of the nodes that support it, namely the Linux boxes. I don't think you can trigger an App Store update from within a sandboxed app on iOS, but I've been—Avery: Right.Corey: Surprised before. Yeah. But it's nice to be able to do some things.Avery: Yeah. This is one of those—yeah, we get that request a lot for, like, can you push a button to auto-update Tailscale? It makes me really sad that we get this request because the need for this is a sign that all of the OS vendors have completely botched software updates, right? Like, the OS should be the thing, updating your software on a good schedule based on a set of rules, and it shouldn't be the job of every single application to provide their own software update. It's actually a massive, embarrassing, security hole that software can even update itself, right?Because if it can update itself, then you know, imagine someone breaks into the production services of a company that is offering a particular program. They put malware into a version of the software, they put it into the software update server, and then they trigger everything in the network to push the software update to those devices. Now, you've got malware installed on all your devices, right? It's very strange that people asked for this as a feature. [laugh].Tailscale currently does not have that feature; it doesn't push software updates on its own. But it's such a popular feature that I think we're going to have to implement it because everybody wants this because Windows, for example, is simply just never going to automatically update your software for you. We have to have these weird-super admin rights on your machine so that we can push software updates because nobody else will. I feel really weird about that. You know, the security world should be protesting this more.But instead, they're like, asking, can you please put this feature in because I've got a checklist in my compliance thing that says, “Is all your software up-to-date?” I don't have a checklist item that says, “Does any of my software have super-admin rights that they shouldn't have?” Right? It's sort of, I guess, the next level of supply-chain management is the big word. Nobody—there is no supply chain management for software.Corey: There isn't, for better or worse. I wish there were, but there simply is not. Ugh. Next year, maybe. We hope.Avery: Yep. So, you have to trust your vendors, fundamentally, which I guess will always be true. That's true for Tailscale as well, right? Whether or not we include the software update pushing. If you're installing a VPN product provided by a vendor, you have to trust that we're going to put the right stuff into the software.And the best—the only thing I can really do is just be honest about these issues and say, “Well, look, we try our best. We definitely try not to implement features that are going to turn into security holes for you.” And I think we do a lot better than most vendors do in that area. But it's very hard to be perfect because nobody knows how to do software supply chain well.Corey: Ugh. I hear you. I that's the nice thing, too. Honestly, the big reason I know I need to update these things and the reason I want to do it's actually you. Because whenever I log in and look at my devices in the Tailscale thing, there's a little icon next to the one that there's an update available here.And you have fixed a lot of the niceties on this, like, ah, there's an update available for the iOS version. It's, “Really? Because it's not available in the Apple Store yet,” as I sit there spamming the thing. That stopped happening. There's a lot of just very nice quality-of-life improvements that are easy to miss.Avery: Yep, yeah, that's kind of weird. We actually went a little overboard on the update available notifications for a while because there's always this trade-off, right? Like I said, we have a policy of never breaking old versions, so when people see the update available notification, they kind of panic. It's, like, “Oh no, I better install the update, before Talescale cuts me off.” And, like, well, we're not actually ever going to cut you off, so you shouldn't have to worry about that stuff.But on the other hand, you're not going to get the latest features and bug fixes unless you're running the latest version, so when people email us saying, “Hey, I'm using Tailscale from six months ago, and I have this problem,” the first thing our support team does is say, “Well, can you please try the latest one, and does the problem go away?” Because it's kind of inefficient debugging six-month-old software. So, one way we were trying to, like, minimize that cost is, like, hey, we could just tell people there's a new version available and then maybe they'll update it themselves. But that resulted in people panicking. Like, oh, no, I need to install the software really, really soon because I can't afford to break my network.Corey: Right.Avery: And because our system is based on WireGuard and this is —you know, I'll probably jinx it by saying this but, like, we've never had an actual security hole that we've had to issue a Tailscale update to resolve, right? People see the update available thing and, like, “Oh, no, I bet there's a whole bunch of vulnerabilities that they fixed.” It's like, “Well, no.” WireGuard has also never had a vulnerability, right? [laugh] it's… yeah, it's, you know, sooner or later there probably will be one, and when there is one, we'll probably have to make the, you know, update notification in red or something instead of just the little icon on the admin panel. But yeah, it's—Corey: [laugh].Avery: —we try [crosstalk 00:35:23]—Corey: Nice job on jinxing it, by the way, I appreciate that.Avery: Yeah I know. I mean, I try to try my best. [laugh]. But I've actually been surprised. It's very much like my experience with all the djb stuff we used in the past.Like, when we were using qmail and djbdns for years, there was never once a security hole, right? It's very interesting that it is possible to design software that never once has a security hole. And nobody does that, right? I mean, I would say I'm not as smart as djb; our software is probably, you know, not going to be as one hundred percent perfect as that, but we try really, really hard to aim for that as a goal.Corey: Yeah. I really want to thank you for taking the time to speak with me about everything Tailscale is up to. And again, congratulations on your Series B. If people want to learn more, where should they go?Avery: I guess, tailscale.com is the place. We also have @tailscale in Twitter. My own personal Twitter is @apenwarr, which you probably won't be able to spell unless you Google for me or something—Corey: But it's in the [show notes 00:36:19], which makes this even easier.Avery: It is? Ah, there you go. So yeah, there's lots of information. But the number one thing I tell people is, like, look, it is a lot easier to get started than you think it is. Even after you've heard it 100 times, nobody ever believes how easy it is to get started. Just go to the App Store, download the app, log into your account, and you're already done, right? Try that and you don't even have to read anything.Corey: I would tear you apart for that statement if it weren't—if it were slightly less true than it is, but it is transformative. Give it a try. It's a strong endorsement from me. Thank you so much for your time. I appreciate it.Avery: Thank you, too. Great talking to you, and talk next time.Corey: Indeed. Avery Pennarun, CEO of Tailscale. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this show, please leave a five-star review on your podcast platform of choice, and smash the like and subscribe buttons, whereas if you've hated it, same thing—five-star review, smash the buttons—and also leave an angry bitter comment about how you are smart enough to roll your own crypto, so you don't understand why other people wouldn't do it.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.