POPULARITY
Send us a textThe story of cloud networking rarely gets told from the perspective of those building it inside unicorn startups, but that's exactly what this episode delivers. Richard Olson, cloud networking expert at Canva, takes us behind the scenes of building network infrastructure for one of the world's fastest-growing SaaS platforms.Richard's fascinating career journey began with literally throwing rocks with phone lines into trees during his military service, progressing through network operations centers and pre-sales engineering before landing at AWS and eventually Canva. His unique perspective bridges traditional networking expertise with cloud-native development approaches.Unlike enterprises migrating from legacy environments, Canva started entirely in the cloud with minimal networking considerations. Richard explains how this trajectory created different challenges - starting with overlapping 10.0.0.0/16 addresses across development environments and evolving to hundreds of VPCs requiring sophisticated connectivity solutions. By mid-2022, these networking challenges had grown complex enough to warrant forming a dedicated cloud networking team, which Richard helped establish.The conversation takes a deep technical turn exploring Kubernetes networking challenges that even experienced network engineers might not anticipate. Richard explains why "Kubernetes eats IP addresses for breakfast" in cloud environments, detailing the complex interaction between VPC CIDR allocations, prefix delegations, and worker node configurations that can quickly exhaust even large IP spaces. This pressure is finally creating compelling business cases for IPv6 adoption after decades of slow uptake.Whether you're managing cloud infrastructure today or planning your organization's network strategy for tomorrow, this episode offers invaluable insights into the evolution and challenges of cloud networking at unicorn scale. Listen now to understand why companies are increasingly forming dedicated cloud networking teams and the unique skill sets they require.Connect with Richard:https://www.linkedin.com/in/richard-olson-auCheck out the Fortnightly Cloud Networking Newshttps://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/Visit our website and subscribe: https://www.cables2clouds.com/Follow us on Twitter: https://twitter.com/cables2cloudsFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatj
In this episode, Meg Ashby, a senior cloud security engineer shares how her team tackled AWS's centralized VPC interface endpoints, a design often seen as an anti-pattern. She explains how they turned this unconventional approach into a cost-efficient and scalable solution, all while maintaining granular controls and network visibility. She shares why centralized VPC endpoints are considered an AWS anti-pattern, how to implement granular IAM controls in a centralized model and the challenges of monitoring and detecting VPC endpoint traffic. Guest Socials: Meg's Linkedin Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security Podcast- Youtube - Cloud Security Newsletter - Cloud Security BootCamp Questions asked: (00:00) Introduction (02:48) A bit about Meg Ashby (03:44) What is VPC interface endpoints? (05:26) Egress and Ingress for Private Networks (08:21) Reason for using VPC endpoints (14:22) Limitations when using centralised endpoint VPCs (19:01) Marrying VPC endpoint and IAM policy (21:34) VPC endpoint specific conditions (27:52) Is this solution for everyone? (38:16) Does VPC endpoint have logging? (41:24) Improvements for the next phase Thank you to our episode sponsor Wiz. Cloud Security Podcast listeners can also get a free cloud security health scan by going to wiz.io/csp
In this episode, David Lynam provides an overview of AWS Transit Gateway, which aims to simplify complex network connectivity between VPCs, VPNs, and on-premises networks. We discuss the limitations of using VPC peering and the benefits Transit Gateway provides through its hub-and-spoke model. The main components of Transit Gateway are explained, including attachments, route tables, associations, and route propagation. We go through some example use cases like sharing Transit Gateways across accounts, network isolation for compliance, routing traffic through security services, and bandwidth/scaling capabilities. In this episode, we mentioned the following resources: - How Amazon VPC Transit Gateways work Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on X/Twitter: - https://twitter.com/eoins - https://twitter.com/loige
Main Themes: The rise of multicloud 2.0: Organizations are moving beyond a single primary cloud and embracing a true multicloud strategy to leverage best-of-breed services from different providers. Kubernetes networking and security challenges: Multicloud Kubernetes deployments face issues with IP address exhaustion, overlapping IPs, egress security, and high-bandwidth secure inter-cluster connectivity. Aviatrix solutions for multicloud Kubernetes: Aviatrix offers a controller-based, intent-based networking and security platform that addresses these challenges with dynamic segmentation, secure egress, and hybrid connectivity. Key Ideas and Facts: Multicloud 2.0: Shifting landscape: The cloud landscape has evolved significantly in the 18 years since AWS launched. Organizations now have access to hyperscalers, regional clouds, and specialized clouds. True multicloud strategy: Organizations are adopting a true multicloud strategy to leverage the unique strengths of different cloud providers and enable developers to build better applications and services. Cloud 2.0: Many organizations are calling this shift "Cloud 2.0," driven by the need for distributed data, models, and applications, especially with the rise of GenAI and AI/ML applications. Kubernetes Networking and Security Challenges: IP address exhaustion: Kubernetes is "IP hungry," leading to IP address exhaustion and challenges with overlapping IPs, especially in large deployments with thousands of VPCs. Egress security: Millions of VPCs have weak or non-existent egress security, posing a significant risk to sensitive data. Inter-cluster connectivity: Establishing high-bandwidth, secure connectivity between Kubernetes clusters across different clouds and on-premises environments is complex and challenging. Aviatrix Solutions: Controller-based, intent-based networking: Aviatrix provides a centralized multicloud controller and uses intent-based policies to dynamically segment and secure traffic across Kubernetes clusters, regardless of the underlying IP addresses. Secure egress: Aviatrix replaces traditional NAT gateways with secure Aviatrix gateways, offering embedded NAT, visibility, and granular egress security policies based on Kubernetes resources. Dynamic scaling: Aviatrix automatically discovers and incorporates new Kubernetes resources into security policies as clusters scale up or down, eliminating manual configuration and ensuring consistent security. Hybrid connectivity: Aviatrix facilitates secure connectivity between cloud Kubernetes clusters and on-premises environments, including edge locations, enabling hybrid deployments for AI/ML and other workloads. Customer Success: Large-scale deployments: Aviatrix has customers with thousands of island VPCs and overlapping IP spaces, successfully using its platform to manage their multicloud Kubernetes environments. Operational efficiency: Aviatrix simplifies operations with its controller-based approach, dynamic policy updates, and world-class SRE team handling upgrades and troubleshooting. Key Quotes: Anirban Sengupta (Aviatrix): "Today every organization should embrace multicloud. That's the best way to get ahead with their competitors and help their developers." Anirban Sengupta (Aviatrix): "Networking and security should be top of mind... without connectivity and without security, you really can't have a multicloud strategy." Anirban Sengupta (Aviatrix): "Kubernetes is very IP hungry. There is exhaustion, IP address exhaustion is the key." Call to Action: Organizations looking to embrace a true multicloud strategy and overcome the networking and security challenges of Kubernetes should consider Aviatrix's controller-based platform. Contact Aviatrix for a demo and learn how their solutions can help you achieve secure and efficient multicloud Kubernetes deployments.
Main Themes: The rise of multicloud 2.0: Organizations are moving beyond a single primary cloud and embracing a true multicloud strategy to leverage best-of-breed services from different providers. Kubernetes networking and security challenges: Multicloud Kubernetes deployments face issues with IP address exhaustion, overlapping IPs, egress security, and high-bandwidth secure inter-cluster connectivity. Aviatrix solutions for multicloud Kubernetes: Aviatrix offers a controller-based, intent-based networking and security platform that addresses these challenges with dynamic segmentation, secure egress, and hybrid connectivity. Key Ideas and Facts: Multicloud 2.0: Shifting landscape: The cloud landscape has evolved significantly in the 18 years since AWS launched. Organizations now have access to hyperscalers, regional clouds, and specialized clouds. True multicloud strategy: Organizations are adopting a true multicloud strategy to leverage the unique strengths of different cloud providers and enable developers to build better applications and services. Cloud 2.0: Many organizations are calling this shift "Cloud 2.0," driven by the need for distributed data, models, and applications, especially with the rise of GenAI and AI/ML applications. Kubernetes Networking and Security Challenges: IP address exhaustion: Kubernetes is "IP hungry," leading to IP address exhaustion and challenges with overlapping IPs, especially in large deployments with thousands of VPCs. Egress security: Millions of VPCs have weak or non-existent egress security, posing a significant risk to sensitive data. Inter-cluster connectivity: Establishing high-bandwidth, secure connectivity between Kubernetes clusters across different clouds and on-premises environments is complex and challenging. Aviatrix Solutions: Controller-based, intent-based networking: Aviatrix provides a centralized multicloud controller and uses intent-based policies to dynamically segment and secure traffic across Kubernetes clusters, regardless of the underlying IP addresses. Secure egress: Aviatrix replaces traditional NAT gateways with secure Aviatrix gateways, offering embedded NAT, visibility, and granular egress security policies based on Kubernetes resources. Dynamic scaling: Aviatrix automatically discovers and incorporates new Kubernetes resources into security policies as clusters scale up or down, eliminating manual configuration and ensuring consistent security. Hybrid connectivity: Aviatrix facilitates secure connectivity between cloud Kubernetes clusters and on-premises environments, including edge locations, enabling hybrid deployments for AI/ML and other workloads. Customer Success: Large-scale deployments: Aviatrix has customers with thousands of island VPCs and overlapping IP spaces, successfully using its platform to manage their multicloud Kubernetes environments. Operational efficiency: Aviatrix simplifies operations with its controller-based approach, dynamic policy updates, and world-class SRE team handling upgrades and troubleshooting. Key Quotes: Anirban Sengupta (Aviatrix): "Today every organization should embrace multicloud. That's the best way to get ahead with their competitors and help their developers." Anirban Sengupta (Aviatrix): "Networking and security should be top of mind... without connectivity and without security, you really can't have a multicloud strategy." Anirban Sengupta (Aviatrix): "Kubernetes is very IP hungry. There is exhaustion, IP address exhaustion is the key." Call to Action: Organizations looking to embrace a true multicloud strategy and overcome the networking and security challenges of Kubernetes should consider Aviatrix's controller-based platform. Contact Aviatrix for a demo and learn how their solutions can help you achieve secure and efficient multicloud Kubernetes deployments.
In this episode, Simon Elisha and Brett Looney dive deep into the AWS Transit Gateway, a cloud-scale router that connects VPCs and other networking resources in AWS. They explain how Transit Gateway works, its advantages over VPC peering, and its scalability and resilience. They also touch on concepts like attachments, route tables, and VRF (virtual routing and forwarding). The conversation highlights the benefits of Transit Gateway for both experienced network administrators and newcomers to networking in the cloud. https://aws.amazon.com/transit-gateway
AWS Morning Brief for the week of Monday, August 12th with Mike Julian. Links:Introducing AWS End User MessagingAmazon EFS now supports up to 30 GiB/s (a 50% increase) of read throughputAmazon RDS for Db2 supports loading data from Amazon S3AWS announces private IPv6 addressing for VPCs and subnetsAnnouncing delegated administrator for Cost Optimization HubOpenSearch optimized instance (OR1) is game changing for indexing performance and cost
This episode discusses solutions for securely accessing private VPC resources for debugging and troubleshooting. We cover traditional approaches like bastion hosts and VPNs and newer solutions using containers and AWS services like Fargate, ECS, and SSM. We explain how to set up a Fargate task with a container image with the necessary tools, enable ECS integration with SSM, and use SSM to start remote shells and port forwarding tunnels into the container. This provides on-demand access without exposing resources on the public internet. We share a Python script to simplify the process. We suggest ideas for improvements like auto-scaling the container down when idle. Overall, this lightweight containerized approach can provide easy access for debugging compared to managing EC2 instances.
Welcome to episode 248 of the CloudPod Podcast – where the forecast is always cloudy! It's the return of our Cloud Journey Series! Plus, today we're talking shared VPCs and why you should avoid them, Amazon’s new data centers ( we think they forgot about the sustainability pledge,) new threats to and from AI, and a quick preview of Next ‘24 programs – plus much more! Titles we almost went with this week: The Cloud Pod Isn't a Basic Bitch New AWS Data Solutions Framework – or – How You Accidentally Spent $100k's A PSA on Shared VPCs in AWS Amazon Doesn't Even Pay Attention to Climate When it’s on a Building Vector Search I Hardly Know Her Google Migs are Less Fun than Russian Migs AI Can Now Attack Us; Who Didn't See That Coming Who is Surprised That AWS is Using More Power Than the Rest of the State of Oregon Spend all the Dinero in Spain A big thanks to this week's sponsor: We're sponsorless this week! Interested in sponsoring us and having access to a specialized and targeted market? We'd love to talk to you. Send us an email or hit us up on our Slack Channel. AI is Going Great (or how ML Makes all Its Money) 01:24 Disrupting malicious uses of AI by state-affiliated threat actors In this week’s chapter of AI nightmares, ChatGPT tells us how they are blocking the usage of AI by state-affiliated threat actors. Awesome; things went from bad to worse in one week. Cool. Cool cool cool. In partnership with Microsoft Threat Intelligence, they have disrupted five state-affiliated actors that sought to use their AI service in support of malicious cyber activities These actors generally sought to use OpenAI services for querying open-source information, translating, finding coding errors, and running basic coding tasks. Charcoal Typhoon (China affiliated) researched various companies and cybersecurity tools, debugged code and generated scripts, and created content likely for use in phishing campaigns. Salmon Typhoon (China affiliated) translated technical papers, retrieved publicly available information on multiple intelligence agencies and regional threat actors, assisted with coding, and researched common ways processes could be hidden on a system. Crimson Sandstorm (Iran affiliated) used OpenAI services for scripting support related to app and web development, generating content likely for spear-phishing campaigns, and researching common ways malware could evade detection. Emerald Sleet (North Korea affiliated) identified experts and organizations focused on defense issues in the Asia-Pacific region, to understand publicly available vulnerabilities, and used OpenAI services for help with basic scripting tasks, and drafting content that could be used in phishing campaigns. Forest Blizzard (Russia-affiliated) primarily for performing research on open-source data into satellite communication protocols and radar imaging technology, as well as for support with scripting tasks. OpenAI says the capabilities of the current models are limited, they believe it’s important to stay ahead of significant and evolving threats. To continue making sure their platform is used for good they have a multi-pronged approach:
Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result.About EvelynEvelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less.Links Referenced:LinkedIn: https://www.linkedin.com/in/evelyn-osman/TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me.Evelyn: Thank you very much, Corey. It's actually really fun to be on here.Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS's customer base has absolutely exploded over the past 15 years that it's been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we've lost some of that magic from the beginning. Where do you land on that whole topic?Evelyn: That's actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I'm like, you know, “I'm this director, I've done this XYZ,” and I always say, like, “I'm Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I've been working with AWS, you know, 11, 12 years,” or now I can't quite remember.Corey: Time becomes a flat circle. The pandemic didn't help.Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I'm like, “Jesus. It's been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I'm… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there's a lot of where it's just, you know, wrapping up a pretty bow, and calling it something different, it feels like.Corey: That's what I've been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What's yours?Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda.Corey: That was announced before I even got into this space, as an example of how far back things were. And you're right. That was transformative. That was awesome.Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things.And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn't figure out how to sell Lambda on the marketplace.Corey: It's kind of wild when we see just how far it's come, but also how much they've announced that doesn't change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that's a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn't a feature launch so much as it was a concession to the ways that companies had built things and weren't likely to change.Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn't actually that great.Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It's like, “Look, just sandbag the thing so it does what you'd want.” And all that stuff got fixed, but at the time it looked like it was clown shoes.Evelyn: Yeah, and that reminds me of, like, EBS's, like, gp2 when we're, like you know, we're talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we're still using it now today, and like, we're trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That's where I started. And I always like to tell a story that in my team, we're talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh].Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it's going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong.Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it's like, it's my bread and butter. I'm specialized in it, you know? I write code in my sleep and such.[Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we're looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time.Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It's not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don't have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that's anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I've had with almost all of their recent launches versus what I felt they could have been.Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way.Corey: I'm curious, as well, to get your take on what I've been seeing that I think could only be described as an internal shift, where it's almost as if there's been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I'd argue it shouldn't be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it's not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money.Evelyn: Yep, pretty simple. Oftentimes, there's, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that's, like, you know, if you're talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it.Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it's going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where's the spend going, or what's it going to cost me as time goes on? Because it's not just what are the resources I'm spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You're going to have to basically run an experiment and find out.Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we're trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it.I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn't something where we see that either there's, like, arbitrarily low pricing there, or the costs itself doesn't really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it's a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn't actually that large.Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you're actually spending on the thing. There were blog posts early on that, “Oh, here's how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It's, well, I'm guessing that that's not really for my $50 a month account.Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you're actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it's simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn't really that multidimensional, when you think about it [laugh].Corey: Increasingly, Cost Explorer has gotten better. It's not a new service, but every iteration seems to improve it to a point now where I'm talking to folks, and they're having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that's already improving and works for free. That doesn't necessarily make sense. And I feel like that's what you get trapped into when you start going down the VC path in the cost optimization space. You've got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I'm unconvinced that anything like that is really tenable.Evelyn: Yeah. When you're a small organization you're trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we're going to charge you a really small percentage of your invoice,” like, okay, great. That's, like, you know, like, a few $100 a month to make sure I'm fully optimized, and I'm saving, you know, far more than that. But as soon as your invoice turns into, you know, it's like $100,000, or $300,000 or more, that percentage becomes rather significant. And I've had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we're going to do this machine learning, you know, AI optimization for you. You know, you don't have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it.Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I've never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it's a service for just running a whole bunch of games at scale, and I'm not generally doing that. My favorite computer game remains to be Twitter at this point, but that's okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out.Evelyn: Yeah, so I'll preface this by saying, like, I'm not an expert on GameLift. I haven't even spun it up myself because there's quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I'm, you know, like, World of Warcraft, all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I'm running, and you're running, that we know that we're going to reach the same point the same time, or if there's one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients.And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that's really what it is. Now, there's, you know, quite a bit more happening within GameLift, but that's what I was going to explain is, like, it's just state management. And there are far more use cases for it than just for video games.Corey: That's maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they've solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that's useful.Evelyn: No, absolutely. And honestly, I wouldn't be surprised if you peeled back the curtain with GameLift, you'll find a lot of—like, several other you know, AWS services that it's just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it's like we just see other services packaged together and releases a new product.Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that's a thing that exists and is handy, but otherwise you'd never know it was there because, “Well, I'm not building anything that's IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in.And now they take existing services that are, mmm, kind of milquetoast, if I'm being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We're going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn't exist already, and you're mostly there. But what do I know? I'm not a product manager trying to get promoted.Evelyn: Yeah, that's true. Well, I was going to mention that maybe it's the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I've stared far too long at that compliance section on AWS's site to know this, but you know, a lot of them actually are HIPAA-compliant, they're PCI-compliant, and ISO-compliant, and you know, and everything. So, I'm actually pretty intrigued to know why they [wouldn't 00:16:04] take that advantage.Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what's the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there's no difference from a regulatory point of view.Evelyn: That's fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There's, like, Digital Native Business, there's Enterprise, there's Startup. So, I am curious how things look over the engineering side. I'm going to talk to somebody about this now [laugh].Corey: Yeah, it's the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst's chart somewhere.” I don't know. I mean, I hate to sound that negative about it, but it's… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they're like, “Yeah, we took a look at it. It was adorable, but we're not using it that way. We're going to use either the baseline version or we're going to work with someone who actively gets our industry.” And I've heard that repeated about three or four different releases that they've put out across the board of what they've been doing. It feels like it is a misunderstanding between what the world needs and what they're able to or willing to build for us.Evelyn: Not sure. I wouldn't be surprised, if we go far enough, it could probably be that it's just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it's an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical.Corey: And, on some level, too, they're moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn't beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I've run, came out way better as a product than what Code Whisperer is. And while I'd like to say that this is great, but it's too little too late. And when I talk to engineers, they're very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS.Evelyn: No, that's true. And so, I think what's happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it's like, “Ah, it's a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak.Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That's always pleasant.Evelyn: Yeah, precisely. And what I kind of notice is… now they're actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they're going for the larger behemoths who have massive data centers, and they don't want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we've all kind of scratched your head. I'm like, really [laugh]?Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don't work under emulation in some weird way, but it's also such a weird niche that I'm sure it's there for someone. My default assumption, just given the breadth of AWS's customer base, is that whenever I see something that they just announced, well, okay, it's clearly not for me; that doesn't mean it's not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don't see anything interesting coming out with it, it really makes you start to wonder.Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud.Corey: Yeah. There's a bunch of different ways they have that, okay, here's ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn't seem like it's quite fair.Evelyn: That's exactly it. So, I feel like now it's sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they're cloud-native, and they grow so big, that they actually start running their own data centers.Corey: I want to call out as well before we wind up being accused of being oblivious, that we're recording this before re:Invent. So, it's entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we're embarrassed to have had this conversation. And yeah, they're totally getting it now, and they have completely surprised us with stuff that's going to be transformative for almost every customer. I've been expecting and hoping for that for the last three or four re:Invents now, and I haven't gotten it.Evelyn: Yeah, that's right. And I think there's even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it's Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering.And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don't want to maintain our own infrastructure. And the problem we ran into is that it doesn't have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn't have it. And that's, kind of, what, I kind of found is, like, there's—it feels—sometimes it's—there's a—it's getting rushed out the door, and it'll actually have a new managed service or new service launched out, but they're also sort of cutting some corners just to actually make sure it's packaged up and ready to go.Corey: When I'm looking at this, and seeing how this stuff gets packaged, and how it's built out, I start to understand a pattern that I've been relatively down on across the board. I'm curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them?Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it's like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they're still using Ruby.And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you're going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there.And so, this is where we say, “Okay, let's agree. We're going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that's already, like, the formation of it. And oftentimes, you start with DevOps. You'll say, like, “I'm a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that's where we start looking at, you know, having a platform… approach, but I'm going to say it's Platform-as-a-Product. That's the key.Corey: Yeah, that's a good way of framing it because originally, the entire world needed that. That's what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn't really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren't distracting, dangerous, et cetera? I don't know the answer on that one.Evelyn: Yeah. No, that's true. So, full disclosure. At AutoScout, we do platform engineering. So, I'm part of, like, the platform engineering group, and we built a platform for our product teams. It's kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we're going to use Fargate. All right, how do we do it so that developers don't actually—don't need to think that they're running Fargate workloads?And that's, like, you know, where it's really important to have those standardized abstractions that developers actually enjoy using. And I'd even say that, before you start saying, “Ah, we're going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It's basically build tools that makes it easy for developers to write code. That's the first step for anything. It's just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it.Corey: That sure would be nice. There's a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they're clearly built by people who understand the problem space super well, but they're designing these things to be used by people who just want to make the website work. They don't have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to.Evelyn: No, that's true. And what I see is, a lot of the times, it's a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it's basically like, if they're just trying to run the website, they're just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what's been done, and they're kind of stuck with it because there is no guardrails or paved path or however you want to call it.Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year?Evelyn: I am going to be there this year.Corey: My condolences. I keep hoping people get to escape.Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody's going crazy over pins. And I still have a bag of them [laugh].Corey: Yeah, that did seem like a hot-second collectable moment, didn't it?Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody's heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh].Corey: Don't you hate it at some point where it's like, you feel like I'm going to finally get this crowning achievement, it's like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble.Evelyn: Yeah.Corey: Ugh, scavenger hunts are hard, as I'm about to learn to my own detriment.Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we'll see how re:Invent turns out.Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong?Evelyn: In most cases, I've seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we're going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they're going to structure things, how we do the planning, here's some great charts and diagrams,” but they don't really look at the culture aspect.And that's always where I've seen things fall apart. I've been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we're going to adopt it.” And then an engineering manager proceeded to say, “Okay, you're responsible for this team, you're responsible for that team, you're responsible for this team talking to, like, a team of, like, five engineers,” which doesn't really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we're going to adopt DevOps, we're going to have a DevOps team, or have a DevOps engineer.”Corey: Step one: we're going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that's not necessarily sufficient in and of itself.Evelyn: Not really. The Spotify model. People say, like, “Oh, we're going to do the Spotify model. We're going to do skills, tribes, you know, and everything. It's going to be awesome, it's going to be great, you know, and nice, cross-functional.”The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what's happening, just so I'm aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we're going to have a team to do everything, so your developers don't need to worry about it.” What ends up happening is you're still an ops team, you still have your silos.And that's always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that's why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it's all about the user journey.It's not about building a common set of tools. It's the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they're actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you're going to use it.” And they're probably going to say, “No.” And the next thing, you know, they're just doing their own thing on the side.Corey: Yeah, the rise of Shadow IT has never gone away. It's just, on some level, it's the natural expression, I think it's an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don't have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues.Evelyn: Yeah. And so, what I'm going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he's a [solutions architect 00:31:56]—I always f- up his name. I'm so sorry, Gregor. He has a great book, and even a talk, called The Magic of Platforms, that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk.If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that's—for me, I really kind of strongly agree with his approach because that's really how, like, you know, as he says, like, boost innovation is, you know, where you're actually building a platform that really works.Corey: Yeah, it's a hard problem, but it's also one of those things where you're trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It's hard to turn down things that might very well get you there sooner, faster, but it's like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn't work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn't work.Evelyn: Exactly. So, I've worked with, like, organizations, like, 4000-plus people, I've worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that's when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you're actually building, before you actually build it.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Evelyn: So, I used to be on Twitter, but I've actually got off there after it kind of turned a bit toxic and crazy.Corey: Feels like that was years ago, but that's beside the point.Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I'll probably be screaming into the cloud like you are.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it.Evelyn: Thank you, Corey.Corey: Evelyn Osman, engineering manager at AutoScout24. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
AWS Morning Brief for the week of November 20, 2023 with Corey Quinn. Links: re:Quinnvent Wednesday night drinkup at Atomic Liquors Nature Walk Amazon CloudWatch Logs announces regular expression filter pattern support for Live Tail Amazon EBS announces Snapshot Lock to protect snapshots from inadvertent or malicious deletions Amazon MSK Serverless now supports all programming languages Amazon Time Sync Service now supports microsecond-accurate time AWS CloudTrail Lake announces new pricing option optimized for flexible retention AWS Cost Explorer now provides more historical and granular data AWS announces IPv6 tiered VPCs and subnets AWS Lambda console now features a single pane view of metrics, logs, and traces Announcing Research and Engineering Studio on AWS Announcing PartyRock, an Amazon Bedrock Playground Amazon Bedrock now provides access to Meta's Llama 2 Chat 13B model Happy anniversary, Amazon CloudFront: 15 years of evolution and internet advancements New – Multi-account search in AWS Resource Explorer Introducing instance maintenance policy for Amazon EC2 Auto Scaling The serverless attendee's guide to AWS re:Invent 2023 Amazon EKS and Kubernetes sessions at AWS re:Invent 2023 Optimize AZ traffic costs using Amazon EKS, Karpenter, and Istio Editorial Join us for a week of AWS Amplify launches
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
In this episode, Woody dives into the world of cloud security using open source systems with our special guest, Susan Hinrichs. Susan Hinrichs, Chief Scientist at Aviatrix, is a multifaceted professional with a strong background in the open source networking and security space. As a designer and implementer, she has contributed significantly to the development of distributed cloud firewall. Susan's expertise extends well beyond traditional networking, encompassing diverse areas such as cloud routing, application security, policy-based traffic engineering, and distributed systems. Throughout this insightful conversation, Susan discusses the advantages of open source platforms, Aviatrix contributions to the open source community, and the open source DNA of the Aviatrix Distributed Cloud Firewall. Susan and Woody also explore possible directions for Distributed Cloud Firewall and the role that AI and ML could play in network security. Learn more about Altitude and Host Woody: https://aviatrix.com/altitude/ Susan's LinkedIn: https://www.linkedin.com/in/shinrich/ Timestamps: [00:02:11] Group responsible for traffic termination and scrubbing. Used open source software and contributed back. [00:06:55] Extended Berkeley Packet Filter (eBPF) enables efficient traffic analysis in kernel space, particularly for dropping network traffic at low levels with minimal effort. It provides a more cost-effective alternative to IP tables for implementing firewall policies. [00:10:07] Approach: Not everyone is the root. All processes aren't root. Need to elevate. Complicated product made simple. [00:14:27] Open Stack's limitations revealed as enterprise-scale businesses require dedicated specialists, making it costly. Distributed cloud firewall innovates multicloud security. Scaling security in the cloud is challenging due to layer 3 and up the stack complexities. [00:16:38] Distributed firewall challenges and solutions summarized. [00:21:53] Smart groups are created with tags on VMs, subnets, and VPCs. These groups are used to create rules for traffic routing. With Aviatrix fabric, gateways are protected, and traffic routes are understood. The controller analyzes gateways and enforces rules accordingly. Rules are pushed or pulled to the gateways. [00:26:15] Security group orchestration across different cloud platforms has limitations due to varying models and rule limits. Difficulties arise when translating intermixed allows and denies into only allows, potentially causing networks to split and requiring more rules. Despite extensive work, there are cases where policy expression is not possible. Other tools, like VMware and Cisco, offer similar orchestration capabilities, but the physical enforcement points may still restrict the unified view presented to customers. [00:30:30] Moving towards intrusion protection, analytics, and service mesh for enhanced security. [00:34:05] The impact of AI and machine learning on security systems. [00:35:16] AI helps with alarm fatigue and data correlation.
Jake Gold, Infrastructure Engineer at Bluesky, joins Corey on Screaming in the Cloud to discuss his experience helping to build Bluesky and why he's so excited about it. Jake and Corey discuss the major differences when building a truly open-source social media platform, and Jake highlights his focus on reliability. Jake explains why he feels downtime can actually be a huge benefit to reliability engineers, and why how he views abstractions based on the size of the team he's working on. Corey and Jake also discuss whether cloud is truly living up to its original promise of lowered costs. About JakeJake Gold leads infrastructure at Bluesky, where the team is developing and deploying the decentralized social media protocol, ATP. Jake has previously managed infrastructure at companies such as Docker and Flipboard, and most recently, he was the founding leader of the Robot Reliability Team at Nuro, an autonomous delivery vehicle company.Links Referenced: Bluesky: https://blueskyweb.xyz/ Bluesky waitlist signup: https://bsky.app TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. In case folks have missed this, I spent an inordinate amount of time on Twitter over the last decade or so, to the point where my wife, my business partner, and a couple of friends all went in over the holidays and got me a leather-bound set of books titled The Collected Works of Corey Quinn. It turns out that I have over a million words of shitpost on Twitter. If you've also been living in a cave for the last year, you'll notice that Twitter has basically been bought and driven into the ground by the world's saddest manchild, so there's been a bit of a diaspora as far as people trying to figure out where community lives.Jake Gold is an infrastructure engineer at Bluesky—which I will continue to be mispronouncing as Blue-ski because that's the kind of person I am—which is, as best I can tell, one of the leading contenders, if not the leading contender to replace what Twitter was for me. Jake, welcome to the show.Jake: Thanks a lot, Corey. Glad to be here.Corey: So, there's a lot of different angles we can take on this. We can talk about the policy side of it, we can talk about social networks and things we learn watching people in large groups with quasi-anonymity, we can talk about all kinds of different nonsense. But I don't want to do that because I am an old-school Linux systems administrator. And I believe you came from the exact same path, given that as we were making sure that I had, you know, the right person on the show, you came into work at a company after I'd left previously. So, not only are you good at the whole Linux server thing; you also have seen exactly how good I am not at the Linux server thing.Jake: Well, I don't remember there being any problems at TrueCar, where you worked before me. But yeah, my background is doing Linux systems administration, which turned into, sort of, Linux programming. And these days, we call it, you know, site reliability engineering. But yeah, I discovered Linux in the late-90s, as a teenager and, you know, installing Slackware on 50 floppy disks and things like that. And I just fell in love with the magic of, like, being able to run a web server, you know? I got a hosting account at, you know, my local ISP, and I was like, how do they do that, right?And then I figured out how to do it. I ran Apache, and it was like, still one of my core memories of getting, you know, httpd running and being able to access it over the internet and telling my friends on IRC. And so, I've done a whole bunch of things since then, but that's still, like, the part that I love the most.Corey: The thing that continually surprises me is just what I think I'm out and we've moved into a fully modern world where oh, all I do is I write code anymore, which I didn't realize I was doing until I realized if you call YAML code, you can get away with anything. And I get dragged—myself getting dragged back in. It's the falling back to fundamentals in these weird moments of yes, yes, immutable everything, Infrastructure is code, but when the server is misbehaving and you want to log in and get your hands dirty, the skill set rears its head yet again. At least that's what I've been noticing, at least as far as I've gone down a number of interesting IoT-based projects lately. Is that something you experience or have you evolved fully and not looked back?Jake: Yeah. No, what I try to do is on my personal projects, I'll use all the latest cool, flashy things, any abstraction you want, I'll try out everything, and then what I do it at work, I kind of have, like, a one or two year, sort of, lagging adoption of technologies, like, when I've actually shaken them out in my own stuff, then I use them at work. But yeah, I think one of my favorite quotes is, like, “Programmers first learn the power of abstraction, then they learn the cost of abstraction, and then they're ready to program.” And that's how I view infrastructure, very similar thing where, you know, certain abstractions like container orchestration, or you know, things like that can be super powerful if you need them, but like, you know, that's generally very large companies with lots of teams and things like that. And if you're not that, it pays dividends to not use overly complicated, overly abstracted things. And so, that tends to be [where 00:04:22] I follow up most of the time.Corey: I'm sure someone's going to consider this to be heresy, but if I'm tasked with getting a web application up and running in short order, I'm putting it on an old-school traditional three-tier architecture where you have a database server, a web server or two, and maybe a job server that lives between them. Because is it the hotness? No. Is it going to be resume bait? Not really.But you know, it's deterministic as far as where things live. When something breaks, I know where to find it. And you can miss me with the, “Well, that's not webscale,” response because yeah, by the time I'm getting something up overnight, to this has to serve the entire internet, there's probably a number of architectural iterations I'm going to be able to go through. The question is, what am I most comfortable with and what can I get things up and running with that's tried and tested?I'm also remarkably conservative on things like databases and file systems because mistakes at that level are absolutely going to show. Now, I don't know how much you're able to talk about the Blue-ski infrastructure without getting yelled at by various folks, but how modern versus… reliable—I guess that's probably a fair axis to put it on: modernity versus reliability—where on that spectrum, does the official Blue-ski infrastructure land these days?Jake: Yeah. So, I mean, we're in a fortunate position of being an open-source company working on an open protocol, and so we feel very comfortable talking about basically everything. Yeah, and I've talked about this a bit on the app, but the basic idea we have right now is we're using AWS, we have auto-scaling groups, and those auto-scaling groups are just EC2 instances running Docker CE—the Community Edition—for the runtime and for containers. And then we have a load balancer in front and a Postgres multi-AZ instance in the back on RDS, and it is really, really simple.And, like, when I talk about the difference between, like, a reliability engineer and a normal software engineer is, software engineers tend to be very feature-focused, you know, they're adding capabilities to a system. And the goal and the mission of a reliability team is to focus on reliability, right? Like, that's the primary thing that we're worried about. So, what I find to be the best resume builder is that I can say with a lot of certainty that if you talk to any teams that I've worked on, they will say that the infrastructure I ran was very reliable, it was very secure, and it ended up being very scalable because you know, the way we solve the, sort of, integration thing is you just version your infrastructure, right? And I think this works really well.You just say, “Hey, this was the way we did it now and we're going to call that V1. And now we're going to work on V2. And what should V2 be?” And maybe that does need something more complicated. Maybe you need to bring in Kubernetes, you maybe need to bring in a super-cool reverse proxy that has all sorts of capabilities that your current one doesn't.Yeah, but by versioning it, you just—it takes away a lot of the, sort of, interpersonal issues that can happen where, like, “Hey, we're replacing Jake's infrastructure with Bob's infrastructure or whatever.” I just say it's V1, it's V2, it's V3, and then I find that solves a huge number of the problems with that sort of dynamic. But yeah, at Bluesky, like, you know, the big thing that we are focused on is federation is scaling for us because the idea is not for us to run the entire global infrastructure for AT Proto, which is the protocol that Bluesky is based on. The idea is that it's this big open thing like the web, right? Like, you know, Netscape popularized the web, but they didn't run every web server, they didn't run every search engine, right, they didn't run all the payment stuff. They just did all of the core stuff, you know, they created SSL, right, which became TLS, and they did all the things that were necessary to make the whole system large, federated, and scalable. But they didn't run it all. And that's exactly the same goal we have.Corey: The obvious counterexample is, no, but then you take basically their spiritual successor, which is Google, and they build the security, they build—they run a lot of the servers, they have the search engine, they have the payments infrastructure, and then they turn a lot of it off for fun and… I would say profit, except it's the exact opposite of that. But I digress. I do have a question for you that I love to throw at people whenever they start talking about how their infrastructure involves auto-scaling. And I found this during the pandemic in that a lot of people believed in their heart-of-hearts that they were auto-scaling, but people lie, mostly to themselves. And you would look at their daily or hourly spend of their infrastructure and their user traffic dropped off a cliff and their spend was so flat you could basically eat off of it and set a table on top of it. If you pull up Cost Explorer and look through your environment, how large are the peaks and valleys over the course of a given day or week cycle?Jake: Yeah, no, that's a really good point. I think my basic approach right now is that we're so small, we don't really need to optimize very much for cost, you know? We have this sort of base level of traffic and it's not worth a huge amount of engineering time to do a lot of dynamic scaling and things like that. The main benefit we get from auto-scaling groups is really just doing the refresh to replace all of them, right? So, we're also doing the immutable server concept, right, which was popularized by Netflix.And so, that's what we're really getting from auto-scaling groups. We're not even doing dynamic scaling, right? So, it's not keyed to some metric, you know, the number of instances that we have at the app server layer. But the cool thing is, you can do that when you're ready for it, right? The big issue is, you know, okay, you're scaling up your app instances, but is your database scaling up, right, because there's not a lot of use in having a whole bunch of app servers if the database is overloaded? And that tends to be the bottleneck for, kind of, any complicated kind of application like ours. So, right now, the bill is very flat; you could eat off, and—if it wasn't for the CDN traffic and the load balancer traffic and things like that, which are relatively minor.Corey: I just want to stop for a second and marvel at just how educated that answer was. It's, I talk to a lot of folks who are early-stage who come and ask me about their AWS bills and what sort of things should they concern themselves with, and my answer tends to surprise them, which is, “You almost certainly should not unless things are bizarre and ridiculous. You are not going to build your way to your next milestone by cutting costs or optimizing your infrastructure.” The one thing that I would make sure to do is plan for a future of success, which means having account segregation where it makes sense, having tags in place so that when, “Huh, this thing's gotten really expensive. What's driving all of that?” Can be answered without a six-week research project attached to it.But those are baseline AWS Hygiene 101. How do I optimize my bill further, usually the right answer is go build. Don't worry about the small stuff. What's always disturbing is people have that perspective and they're spending $300 million a year. But it turns out that not caring about your AWS bill was, in fact, a zero interest rate phenomenon.Jake: Yeah. So, we do all of those basic things. I think I went a little further than many people would where every single one of our—so we have different projects, right? So, we have the big graph server, which is sort of like the indexer for the whole network, and we have the PDS, which is the Personal Data Server, which is, kind of, where all of people's actual social data goes, your likes and your posts and things like that. And then we have a dev staging, sandbox, prod environment for each one of those, right? And there's more services besides. But the way we have it is those are all in completely separated VPCs with no peering whatsoever between them. They are all on distinct IP addresses, IP ranges, so that we could do VPC peering very easily across all of them.Corey: Ah, that's someone who's done data center work before with overlapping IP address ranges and swore, never again.Jake: Exactly. That is when I had been burned. I have cleaned up my mess and other people's messes. And there's nothing less fun than renumbering a large complicated network. But yeah, so once we have all these separate VPCs and so it's very easy for us to say, hey, we're going to take this whole stack from here and move it over to a different region, a different provider, you know?And the other thing is that we're doing is, we're completely cloud agnostic, right? I really like AWS, I think they are the… the market leader for a reason: they're very reliable. But we're building this large federated network, so we're going to need to place infrastructure in places where AWS doesn't exist, for example, right? So, we need the ability to take an environment and replicate it in wherever. And of course, they have very good coverage, but there are places they don't exist. And that's all made much easier by the fact that we've had a very strong separation of concerns.Corey: I always found it fun that when you had these decentralized projects that were invariably NFT or cryptocurrency-driven over the past, eh, five or six years or so, and then AWS would take a us-east-1 outage in a variety of different and exciting ways,j and all these projects would go down hard. It's, okay, you talk a lot about decentralization for having hard dependencies on one company in one data center, effectively, doing something right. And it becomes a harder problem in the fullness of time. There is the counterargument, in that when us-east-1 is having problems, most of the internet isn't working, so does your offering need to be up and running at all costs? There are some people for whom that answer is very much, yes. People will die if what we're running is not up and running. Usually, a social network is not on that list.Jake: Yeah. One of the things that is surprising, I think, often when I talk about this as a reliability engineer, is that I think people sometimes over-index on downtime, you know? They just, they think it's much bigger deal than it is. You know, I've worked on systems where there was credit card processing where you're losing a million dollars a minute or something. And like, in that case, okay, it matters a lot because you can put a real dollar figure on it, but it's amazing how a few of the bumps in the road we've already had with Bluesky have turned into, sort of, fun events, right?Like, we had a bug in our invite code system where people were getting too many invite codes and it was sort of caused a problem, but it was a super fun event. We all think back on it fondly, right? And so, outages are not fun, but they're not life and death, generally. And if you look at the traffic, usually what happens is after an outage traffic tends to go up. And a lot of the people that joined, they're just, they're talking about the fun outage that they missed because they weren't even on the network, right?So, it's like, I also like to remind people that eBay for many years used to have, like, an outage Wednesday, right? Whereas they could put a huge dollar figure on how much money they lost every Wednesday and yet eBay did quite well, right? Like, it's amazing what you can do if you relax the constraints of downtime a little bit. You can do maintenance things that would be impossible otherwise, which makes the whole thing work better the rest of the time, for example.Corey: I mean, it's 2023 and the Social Security Administration's website still has business hours. They take a nightly four to six-hour maintenance window. It's like, the last person out of the office turns off the server or something. I imagine some horrifying mainframe job that needs to wind up sweeping after itself are running some compute jobs. But yeah, for a lot of these use cases, that downtime is absolutely acceptable.I am curious as to… as you just said, you're building this out with an idea that it runs everywhere. So, you're on AWS right now because yeah, they are the market leader for a reason. If I'm building something from scratch, I'd be hard-pressed not to pick AWS for a variety of reasons. If I didn't have cloud expertise, I think I'd be more strongly inclined toward Google, but that's neither here nor there. But the problem is these large cloud providers have certain economic factors that they all treat similarly since they're competing with each other, and that causes me to believe things that aren't necessarily true.One of those is that egress bandwidth to the internet is very expensive. I've worked in data centers. I know how 95th percentile commit bandwidth billing works. It is not overwhelmingly expensive, but you can be forgiven for believing that it is looking at cloud environments. Today, Blue-ski does not support animated GIFs—however you want to mispronounce that word—they don't support embedded videos, and my immediate thought is, “Oh yeah, those things would be super expensive to wind up sharing.”I don't know that that's true. I don't get the sense that those are major cost drivers. I think it's more a matter of complexity than the rest. But how are you making sure that the large cloud provider economic models don't inherently shape your view of what to build versus what not to build?Jake: Yeah, no, I kind of knew where you're going as soon as you mentioned that because anyone who's worked in data centers knows that the bandwidth pricing is out of control. And I think one of the cool things that Cloudflare did is they stopped charging for egress bandwidth in certain scenarios, which is kind of amazing. And I think it's—the other thing that a lot of people don't realize is that, you know, these network connections tend to be fully symmetric, right? So, if it's a gigabit down, it's also a gigabit up at the same time, right? There's two gigabits that can be transferred per second.And then the other thing that I find a little bit frustrating on the public cloud is that they don't really pass on the compute performance improvements that have happened over the last few years, right? Like computers are really fast, right? So, if you look at a provider like Hetzner, they're giving you these monster machines for $128 a month or something, right? And then you go and try to buy that same thing on the public, the big cloud providers, and the equivalent is ten times that, right? And then if you add in the bandwidth, it's another multiple, depending on how much you're transferring.Corey: You can get Mac Minis on EC2 now, and you do the math out and the Mac Mini hardware is paid for in the first two or three months of spinning that thing up. And yes, there's value in AWS's engineering and being able to map IAM and EBS to it. In some use cases, yeah, it's well worth having, but not in every case. And the economics get very hard to justify for an awful lot of work cases.Jake: Yeah, I mean, to your point, though, about, like, limiting product features and things like that, like, one of the goals I have with doing infrastructure at Bluesky is to not let the infrastructure be a limiter on our product decisions. And a lot of that means that we'll put servers on Hetzner, we'll colo servers for things like that. I find that there's a really good hybrid cloud thing where you use AWS or GCP or Azure, and you use them for your most critical things, you're relatively low bandwidth things and the things that need to be the most flexible in terms of region and things like that—and security—and then for these, sort of, bulk services, pushing a lot of video content, right, or pushing a lot of images, those things, you put in a colo somewhere and you have these sort of CDN-like servers. And that kind of gives you the best of both worlds. And so, you know, that's the approach that we'll most likely take at Bluesky.Corey: I want to emphasize something you said a minute ago about CloudFlare, where when they first announced R2, their object store alternative, when it first came out, I did an analysis on this to explain to people just why this was as big as it was. Let's say you have a one-gigabyte file and it blows up and a million people download it over the course of a month. AWS will come to you with a completely straight face, give you a bill for $65,000 and expect you to pay it. The exact same pattern with R2 in front of it, at the end of the month, you will be faced with a bill for 13 cents rounded up, and you will be expected to pay it, and something like 9 to 12 cents of that initially would have just been the storage cost on S3 and the single egress fee for it. The rest is there is no egress cost tied to it.Now, is Cloudflare going to let you send petabytes to the internet and not charge you on a bandwidth basis? Probably not. But they're also going to reach out with an upsell and they're going to have a conversation with you. “Would you like to transition to our enterprise plan?” Which is a hell of a lot better than, “I got Slashdotted”—or whatever the modern version of that is—“And here's a surprise bill that's going to cost as much as a Tesla.”Jake: Yeah, I mean, I think one of the things that the cloud providers should hopefully eventually do—I hope Cloudflare pushes them in this direction—is to start—the original vision of AWS when I first started using it in 2006 or whenever launched, was—and they said this—they said they're going to lower your bill every so often, you know, as Moore's law makes their bill lower. And that kind of happened a little bit here and there, but it hasn't happened to the same degree that you know, I think all of us hoped it would. And I would love to see a cloud provider—and you know, Hetzner does this to some degree, but I'd love to see these really big cloud providers that are so great in so many ways, just pass on the savings of technology to the customer so we'll use more stuff there. I think it's a very enlightened viewpoint is to just say, “Hey, we're going to lower the costs, increase the efficiency, and then pass it on to customers, and then they will use more of our services as a result.” And I think Cloudflare is kind of leading the way in there, which I love.Corey: I do need to add something there—because otherwise we're going to get letters and I don't think we want that—where AWS reps will, of course, reach out and say that they have cut prices over a hundred times. And they're going to ignore the fact that a lot of these were a service you don't use in a region you couldn't find a map if your life depended on it now is going to be 10% less. Great. But let's look at the general case, where from C3 to C4—if you get the same size instance—it cut the price by a lot. C4 to C5, somewhat. C5 to C6 effectively is no change. And now, from C6 to C7, it is 6% more expensive like for like.And they're making noises about price performance is still better, but there are an awful lot of us who say things like, “I need ten of these servers to live over there.” That workload gets more expensive when you start treating it that way. And maybe the price performance is there, maybe it's not, but it is clear that the bill always goes down is not true.Jake: Yeah, and I think for certain kinds of organizations, it's totally fine the way that they do it. They do a pretty good job on price and performance. But for sort of more technical companies—especially—it's just you can see the gaps there, where that Hetzner is filling and that colocation is still filling. And I personally, you know, if I didn't need to do those things, I wouldn't do them, right? But the fact that you need to do them, I think, says kind of everything.Corey: Tired of wrestling with Apache Kafka's complexity and cost? Feel like you're stuck in a Kafka novel, but with more latency spikes and less existential dread by at least 10%? You're not alone.What if there was a way to 10x your streaming data performance without having to rob a bank? Enter Redpanda. It's not just another Kafka wannabe. Redpanda powers mission-critical workloads without making your AWS bill look like a phone number.And with full Kafka API compatibility, migration is smoother than a fresh jar of peanut butter. Imagine cutting as much as 50% off your AWS bills. With Redpanda, it's not a pipedream, it's reality.Visit go.redpanda.com/duckbill today. Redpanda: Because your data infrastructure shouldn't give you Kafkaesque nightmares.Corey: There are so many weird AWS billing stories that all distill down to you not knowing this one piece of trivia about how AWS works, either as a system, as a billing construct, or as something else. And there's a reason this has become my career of tracing these things down. And sometimes I'll talk to prospective clients, and they'll say, “Well, what if you don't discover any misconfigurations like that in our account?” It's, “Well, you would be the first company I've ever seen where that [laugh] was not true.” So honestly, I want to do a case study if we do.And I've never had to write that case study, just because it's the tax on not having the forcing function of building in data centers. There's always this idea that in a data center, you're going to run out of power, space, capacity, at some point and it's going to force a reckoning. The cloud has what distills down to infinite capacity; they can add it faster than you can fill it. So, at some point it's always just keep adding more things to it. There's never a let's clean out all of the cruft story. And it just accumulates and the bill continues to go up and to the right.Jake: Yeah, I mean, one of the things that they've done so well is handle the provisioning part, right, which is kind of what you're getting out there. One of the hardest things in the old days, before we all used AWS and GCP, is you'd have to sort of requisition hardware and there'd be this whole process with legal and financing and there'd be this big lag between the time you need a bunch more servers in your data center and when you actually have them, right, and that's not even counting the time takes to rack them and get them, you know, on network. The fact that basically, every developer now just gets an unlimited credit card, they can just, you know, use that's hugely empowering, and it's for the benefit of the companies they work for almost all the time. But it is an uncapped credit card. I know, they actually support controls and things like that, but in general, the way we treated it—Corey: Not as much as you would think, as it turns out. But yeah, it's—yeah, and that's a problem. Because again, if I want to spin up $65,000 an hour worth of compute right now, the fact that I can do that is massive. The fact that I could do that accidentally when I don't intend to is also massive.Jake: Yeah, it's very easy to think you're going to spend a certain amount and then oh, traffic's a lot higher, or, oh, I didn't realize when you enable that thing, it charges you an extra fee or something like that. So, it's very opaque. It's very complicated. All of these things are, you know, the result of just building more and more stuff on top of more and more stuff to support more and more use cases. Which is great, but then it does create this very sort of opaque billing problem, which I think, you know, you're helping companies solve. And I totally get why they need your help.Corey: What's interesting to me about distributed social networks is that I've been using Mastodon for a little bit and I've started to see some of the challenges around a lot of these things, just from an infrastructure and architecture perspective. Tim Bray, former Distinguished Engineer at AWS posted a blog post yesterday, and okay, well, if Tim wants to put something up there that he thinks people should read, I advise people generally read it. I have yet to find him wasting my time. And I clicked it and got a, “Server over resource limits.” It's like wow, you're very popular. You wound up getting—got effectively Slashdotted.And he said, “No, no. Whatever I post a link to Mastodon, two thousand instances all hidden at the same time.” And it's, “Oh, yeah. The hug of death. That becomes a challenge.” Not to mention the fact that, depending upon architecture and preferences that you make, running a Mastodon instance can be extraordinarily expensive in terms of storage, just because it'll, by default, attempt to cache everything that it encounters for a period of time. And that gets very heavy very quickly. Does the AT Protocol—AT Protocol? I don't know how you pronounce it officially these days—take into account the challenges of running infrastructures designed for folks who have corporate budgets behind them? Or is that really a future problem for us to worry about when the time comes?Jake: No, yeah, that's a core thing that we talked about a lot in the recent, sort of, architecture discussions. I'm going to go back quite a ways, but there were some changes made about six months ago in our thinking, and one of the big things that we wanted to get right was the ability for people to host their own PDS, which is equivalent to, like, posting a WordPress or something. It's where you post your content, it's where you post your likes, and all that kind of thing. We call it your repository or your repo. But that we wanted to make it so that people could self-host that on a, you know, four or five $6-a-month droplet on DigitalOcean or wherever and that not be a problem, not go down when they got a lot of traffic.And so, the architecture of AT Proto in general, but the Bluesky app on AT Proto is such that you really don't need a lot of resources. The data is all signed with your cryptographic keys—like, not something you have to worry about as a non-technical user—but all the data is authenticated. That's what—it's Authenticated Transfer Protocol. And because of that, it doesn't matter where you get the data, right? So, we have this idea of this big indexer that's looking at the entire network called the BGS, the Big Graph Server and you can go to the BGS and get the data that came from somebody's PDS and it's just as good as if you got it directly from the PDS. And that makes it highly cacheable, highly conducive to CDNs and things like that. So no, we intend to solve that problem entirely.Corey: I'm looking forward to seeing how that plays out because the idea of self-hosting always kind of appealed to me when I was younger, which is why when I met my wife, I had a two-bedroom apartment—because I lived in Los Angeles, not San Francisco, and could afford such a thing—and the guest bedroom was always, you know, 10 to 15 degrees warmer than the rest of the apartment because I had a bunch of quote-unquote, “Servers” there, meaning deprecated desktops that my employer had no use for and said, “It's either going to e-waste or your place if you want some.” And, okay, why not? I'll build my own cluster at home. And increasingly over time, I found that it got harder and harder to do things that I liked and that made sense. I used to have a partial rack in downtown LA where I ran my own mail server, among other things.And when I switched to Google for email solutions, I suddenly found that I was spending five bucks a month at the time, instead of the rack rental, and I was spending two hours less a week just fighting spam in a variety of different ways because that is where my technical background lives. Being able to not have to think about problems like that, and just do the fun part was great. But I worry about the centralization that that implies. I was opposed to it at the idea because I didn't want to give Google access to all of my mail. And then I checked and something like 43% of the people I was emailing were at Gmail-hosted addresses, so they already had my email anyway. What was I really doing by not engaging with them? I worry that self-hosting is going to become passe, so I love projects that do it in sane and simple ways that don't require massive amounts of startup capital to get started with.Jake: Yeah, the account portability feature of AT Proto is super, super core. You can backup all of your data to your phone—the [AT 00:28:36] doesn't do this yet, but it most likely will in the future—you can backup all of your data to your phone and then you can synchronize it all to another server. So, if for whatever reason, you're on a PDS instance and it disappears—which is a common problem in the Mastodon world—it's not really a problem. You just sync all that data to a new PDS and you're back where you were. You didn't lose any followers, you didn't lose any posts, you didn't lose any likes.And we're also making sure that this works for non-technical people. So, you know, you don't have to host your own PDS, right? That's something that technical people can self-host if they want to, non-technical people can just get a host from anywhere and it doesn't really matter where your host is. But we are absolutely trying to avoid the fate of SMTP and, you know, other protocols. The web itself, right, is sort of… it's hard to launch a search engine because the—first of all, the bar is billions of dollars a year in investment, and a lot of websites will only let us crawl them at a higher rate if you're actually coming from a Google IP, right? They're doing reverse DNS lookups, and things like that to verify that you are Google.And the problem with that is now there's sort of this centralization with a search engine that can't be fixed. With AT Proto, it's much easier to scrape all of the PDSes, right? So, if you want to crawl all the PDSes out on the AT Proto network, they're designed to be crawled from day one. It's all structured data, we're working on, sort of, how you handle rate limits and things like that still, but the idea is it's very easy to create an index of the entire network, which makes it very easy to create feed generators, search engines, or any other kind of sort of big world networking thing out there. And then without making the PDSes have to be very high power, right? So, they can do low power and still scrapeable, still crawlable.Corey: Yeah, the idea of having portability is super important. Question I've got—you know, while I'm talking to you, it's, we'll turn this into technical support hour as well because why not—I tend to always historically put my Twitter handle on conference slides. When I had the first template made, I used it as soon as it came in and there was an extra n in the @quinnypig username at the bottom. And of course, someone asked about that during Q&A.So, the answer I gave was, of course, n+1 redundancy. But great. If I were to have one domain there today and change it tomorrow, is there a redirect option in place where someone could go and find that on Blue-ski, and oh, they'll get redirected to where I am now. Or is it just one of those 404, sucks to be you moments? Because I can see validity to both.Jake: Yeah, so the way we handle it right now is if you have a, something.bsky.social name and you switch it to your own domain or something like that, we don't yet forward it from the old.bsky.social name. But that is totally feasible. It's totally possible. Like, the way that those are stored in your what's called your [DID record 00:31:16] or [DID document 00:31:17] is that there's, like, a list that currently only has one item in general, but it's a list of all of your different names, right? So, you could have different domain names, different subdomain names, and they would all point back to the same user. And so yeah, so basically, the idea is that you have these aliases and they will forward to the new one, whatever the current canonical one is.Corey: Excellent. That is something that concerns me because it feels like it's one of those one-way doors, in the same way that picking an email address was a one-way door. I know people who still pay money to their ancient crappy ISP because they have a few mails that come in once in a while that are super-important. I was fortunate enough to have jumped on the bandwagon early enough that my vanity domain is 22 years old this year. And my email address still works,which, great, every once in a while, I still get stuff to, like, variants of my name I no longer use anymore since 2005. And it's usually spam, but every once in a blue moon, it's something important, like, “Hey, I don't know if you remember me. We went to college together many years ago.” It's ho-ly crap, the world is smaller than we think.Jake: Yeah.j I mean, I love that we're using domains, I think that's one of the greatest decisions we made is… is that you own your own domain. You're not really stuck in our namespace, right? Like, one of the things with traditional social networks is you're sort of, their domain.com/yourname, right?And with the way AT Proto and Bluesky work is, you can go and get a domain name from any registrar, there's hundreds of them—you know, we'd like Namecheap, you can go there and you can grab a domain and you can point it to your account. And if you ever don't like anything, you can change your domain, you can change, you know which PDS you're on, it's all completely controlled by you. And there's nearly no way we as a company can do anything to change that. Like, that's all sort of locked into the way that the protocol works, which creates this really great incentive where, you know, if we want to provide you services or somebody else wants to provide you services, they just have to compete on doing a really good job; you're not locked in. And that's, like, one of my favorite features of the network.Corey: I just want to point something out because you mentioned oh, we're big fans of Namecheap. I am too, for weird half-drunk domain registrations on a lark. Like, “Why am I poor?” It's like, $3,000 a month of my budget goes to domain purchases, great. But I did a quick whois on the official Bluesky domain and it's hosted at Route 53, which is Amazon's, of course, premier database offering.But I'm a big fan of using a enterprise registrar for enterprise-y things. Wasabi, if I recall correctly, wound up having their primary domain registered through GoDaddy, and the public domain that their bucket equivalent would serve data out of got shut down for 12 hours because some bad actor put something there that shouldn't have been. And GoDaddy is not an enterprise registrar, despite what they might think—for God's sake, the word ‘daddy' is in their name. Do you really think that's enterprise? Good luck.So, the fact that you have a responsible company handling these central singular points of failure speaks very well to just your own implementation of these things. Because that's the sort of thing that everyone figures out the second time.Jake: Yeah, yeah. I think there's a big difference between corporate domain registration, and corporate DNS and, like, your personal handle on social networking. I think a lot of the consumer, sort of, domain registries are—registrars—are great for consumers. And I think if you—yeah, you're running a big corporate domain, you want to make sure it's, you know, it's transfer locked and, you know, there's two-factor authentication and doing all those kinds of things right because that is a single point of failure; you can lose a lot by having your domain taken. So, I completely agree with you on there.Corey: Oh, absolutely. I am curious about this to see if it's still the case or not because I haven't checked this in over a year—and they did fix it. Okay. As of at least when we're recording this, which is the end of May 2023, Amazon's Authoritative Name Servers are no longer half at Oracle. Good for them. They now have a bunch of Amazon-specific name servers on them instead of, you know, their competitor that they clearly despise. Good work, good work.I really want to thank you for taking the time to speak with me about how you're viewing these things and honestly giving me a chance to go ambling down memory lane. If people want to learn more about what you're up to, where's the best place for them to find you?Jake: Yeah, so I'm on Bluesky. It's invite only. I apologize for that right now. But if you check out bsky.app, you can see how to sign up for the waitlist, and we are trying to get people on as quickly as possible.Corey: And I will, of course, be talking to you there and will put links to that in the show notes. Thank you so much for taking the time to speak with me. I really appreciate it.Jake: Thanks a lot, Corey. It was great.Corey: Jake Gold, infrastructure engineer at Bluesky, slash Blue-ski. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that will no doubt result in a surprise $60,000 bill after you posted.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Avi Freedman, CEO at Kentik, joins Corey on Screaming in the Cloud to discuss the fun of solving for observability. Corey and Avi discuss how great simplicity can be deceiving, and Avi points out that with great simplicity comes great complexity. Avi discusses examples of this that he sees in Kentik customer environments, as well as the differences he sees in cloud environments from traditional data center environments. Avi also reveals his predictions for the future and how enterprise M&A will affect the way companies view data centers and VPCs. About AviAvi Freedman is the co-founder and CEO of network observability company Kentik. He has decades of experience as a networking technologist and executive. As a network pioneer in 1992, Freedman started Philadelphia's first ISP, known as netaxs. He went on to run network operations at Akamai for over a decade as VP of network infrastructure and then as chief network scientist. He also ran the network at AboveNet and was the CTO of ServerCentral.Links Referenced: Kentik: https://kentik.com Email: avi@kentik.com Twitter: https://twitter.com/avifreedman LinkedIn: https://www.linkedin.com/in/avifreedman TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Most Companies find out way too late that they've been breached. Thinkst Canary changes this. Deploy Canaries and Canarytokens in minutes and then forget about them. Attackers tip their hand by touching 'em giving you the one alert, when it matters. With 0 admin overhead and almost no false-positives, Canaries are deployed (and loved) on all 7 continents. Check out what people are saying at canary.love today!Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. This promoted guest episode is brought to us by our friends at Kentik. And into my social grist mill, they have thrown Avi Freedman, their CEO. Avi, thank you for joining me.Avi: Thank you for having me, Corey. I've been a big fan for some time, I have never actually fallen off my seat laughing, but I've come close a couple times on some of your threads.Corey: You must have a great chair.Avi: I should probably upgrade it [laugh].Corey: [laugh]. I have been looking forward to this conversation for a while because you are one of those rare creatures who comes from a similar world to what I did where we were grumpy and old before our time because we worked on physical infrastructure in data centers, we basically wrangled servers into doing the things that we wanted them to do when hardware reliability was an aspiration rather than a reality. And we also moved on from that, in many ways. We are not blind to the modern order of how computers work. But you still run a lot of what you do in data centers, but many of your customers are in cloud. You speak both languages very fluently because of the unifying thread between all of this, which is, of course, the network. How did you wind up in, I guess we'll call it network hell.Avi: [laugh]. I mean, network hell was truly… in the '90s, when the internet was—I mean, the internet is sort of like the human body: the more you study it, the more amazing it is that it ever worked in the first place, not that it breaks sometimes—was the bugs, and trying to put together the technology back then, you know, that we had the life is pretty good nowadays, other than the [laugh] immense complexity that has been unleashed on us by everyone taking the same technology and then writing it in their own software and giving it their own marketing names. And thus, you have multi-cloud networking. So, got into it because it's a problem that needs to be solved, right? There's no ESP that connects the applications together; the network still needs to make it work. And now people own some of it, and then more of it, they don't own, but they're still responsible for it. So, it's a fun problem to solve.Corey: The timing of this episode is apt because I've used Kentik myself for a few things over the years. And to be fair, using it for any of my personal networking problems is a bit like noticing, “Oh, I have a loose thread here on my shirt. Pass me the chainsaw.” It's, my environment is tiny and it's over-scoped. But I just earlier this week wound up having to analyze a day's worth of Flow Logs from one of my clients, and to do this, I had to spin up an EC2 instance with 128 gigs of RAM and then load the Flow Logs for that day into RAM, and then—not kidding—I ran into OOM Killer because I ran out of RAM on this thing.Avi: [laugh].Corey: It is, like, yeah, that's right. The network is chatty, the logs are immense, and it's easy to forget. Because the reason I was doing this was just to figure out what are the things that are talking to each other in this environment to drive up some aspects of data transfer costs. But that is an esoteric use case for this; it's not why most people tend to think about network observability. So, I'm going to ask you the blunt question up front here because it might be a really short episode. Do we have to care about networking in the least now that cloud is the default in most locations? It is just an API call away, isn't it?Avi: With great simplicity comes great complexity. So, to the people running infrastructure, to developers or architects, turning it all on, it looks like just API calls. But did you set the policies right? Can the things talk to each other? Are they talking in patterns that are causing you wild data transfer costs?All these things ultimately come back to some team that actually has to make it go. And can be pretty hard to figure that out, right, because it's not just the VPC Flow Logs. It's, what's the policy? It's, what are they talking to that maybe isn't in that cloud, that's maybe in another cloud? So, how do you bring it all together? Like, you could have—and maybe you should have—used Athena, right? You can put VPC Flow Logs in S3 buckets and use Athena and run SQL queries if all you want is your top talker.Corey: Oh, I did. That's how I started, but Athena is, uh… it has some challenges. Let's just put it that way and leave it there. DuckDB is what I was using and I'm much happier with it for a variety of excellent reasons.Avi: Okay. Well, I'll tease you another time about, you know—I lost this battle at Kentik. We actually don't use swap, but I'm a big fan of having swap and monitoring it so the OOM Killer only does what you want or doesn't fire at all. But that's a separate religious debate.Corey: There's a counterargument of running an in-memory data store. And then oh, we're going to use it as swap though, so it's like, hang on, this just feels like running a normal database with extra steps.Avi: Computers allow you to do amazing things and only occasionally slap you nowadays with it. It's pretty amazing. But back to the question. APIs make it easy to turn on, but not so easy to run. The observability that you get within a given cloud is typically very limited.Google actually has the best. They show some topology and other things. I mean, a lot of what we do involves scraping API calls in the cloud to figure out what does this all mean, then convolving it with the VPC Flow Logs and making it look like a network, and what are the gateways, and what are the rules being applied and what can't talk to itself? If you just look at VPC Flow Logs like it's Syslog, good luck trying to figure out what VPCs are talking to each other. It's exactly the problem that you were describing.So, the ease of turning it on is exactly inversely proportional to the ease of running it. And, you know, as a vendor, we think it's an awesome [laugh] problem, but we feel for our customers. And you know, occasionally it's a pain to get the IAM roles set up to scrape things and help them, but that's you know, that's just part of the job.Corey: It's fascinating to me, just looking from an AWS perspective, just how much work clearly has to be done to translate their Byzantine and very strange networking environment and concepts into things that customers see. Because in many cases, the things that the virtual machines that we've run on top of EC2, let alone anything higher level, is being lied to the entire time about what the actual topology of the environment is. It was most notable, for me at least, at re:Invent 2022, the most recent one, where they announced they have a TCP replacement, scalable, reliable data grammar SRD. It's a new protocol entirely. It's, “Oh, wow, can we use it?” “No.” “Okay.” Like, I get that it's a lot of work, I get you're excited about it. Are you going to talk to us about how it actually works? “Oh, absolutely not.” So… okay, good for you, I guess.Avi: Doesn't Amazon have to write a press release before they build anything, and doesn't the press release have to say, like, why people give a shit, why people care?Corey: Yep. And their story on this was oh, it enables us to be a lot faster at letting EBS volumes talk to some of our beefier instances.Avi: [laugh].Corey: And that's all well and good, don't get me wrong, but it's also, “Yay, it's more reliable,” is a difficult message to send. I mean, it's hard enough when—and it's necessary because you've got to tacitly admit that reliability and performance haven't been all they could be. But when it's no longer an issue for most folks, now you're making them wonder, like, wait, how bad was it? It's just a strange message.Avi: Yeah. One of my projects for this weekend is, I actually got a gaming PC and I'm going to try compression offload to the CUDA cores because right now, we do compress and decompress with Intel cores. And like, if I'm successful there and we can get 30% faster subqueries—which doesn't really matter, you know, on the kind of massive queries we run—and 20% more use out of the computers that we actually run, I'm probably not going to do a press release about it. But good to see the pattern.But you know, what you said is pretty interesting. As people like Kentik, we have to put together, well, on Azure, you can have VPCs that cross regions, right? And in other places, you can't. And in Google, you have performance metrics that come out and you can get it very frequently, and in Amazon and Azure, you can't. Like, how do you take these kinds of telemetry that are all the same stuff underneath, but packaged up differently in different quantos and different things and make it all look the same is actually pretty fun and interesting.And it's pretty—you know, if you give some cloud engineers who focus on the infrastructure layer enough beers or alcohol or just room to talk, you can hear some funny stories. And it all made sense to somebody in the first place, but unpacking it and actually running it as a common infrastructure can be quite fun.Corey: One of the things that I have found notable about your perspective, as particularly, you're running all of the network ingest, to my understanding, in your data center environment. Because we talked about this when you were kind enough to invite me to your company all-hands offsite, presumably I assume when people do that, it's so they can beat me up in the alley, but that only happened twice. I was very pleasantly surprised.Avi: [And you 00:09:23] made fun of us only three times, so you know, you beat us—Corey: Exactly.Avi: —but it was all enjoyed.Corey: But always with love. Now, what I found fascinating was you and I sat down for a while and you talked about your data center architecture. And you asked me—since I don't have anything to sell you—is there an economical way that I could see running your environment on top of AWS? And the answer was sure, if by economical you mean an absolute minimum of six times what you're currently paying a year, sure you can get there. But it just does not make sense for any realistic approach to doing this.And the reason I bring this up is that you're in a data center not because of religious beliefs, “Of, well, this is good enough for my grandpappy, so it's good enough for me.” It's because it solves the problem you have in a way that the cloud providers clearly cannot. But you also are not anti-cloud. So, many folks who are all-in on data centers seem to be doing it out of pure self-interest where, well, if everyone goes all-in on cloud, then we have nothing left to sell them. I've used AWS VPC Flow Logs. They have nothing that could even remotely be termed network observability. Your future is assured as long as people understand what it is that you're providing them and what are you that adds. So yeah, people keep going in a cloud direction, you're happy as houses.Avi: We'll use the best tools for building our infrastructure that we can, right? We use cloud. In fact, we're just buying some reserved instances, which always, you know, I give it the hairy eyeball, but you know, we're probably always going to have our CI/CD bursty stuff in the cloud. We have performance testing regions on all the major clouds so that we can tell people what performance is to and from cloud. Like, that we have to use cloud for.And if there's an always-on model, which starts making sense in the cloud, then I try not to be the first to use anything, but [laugh] we'll be one of the first to use it. But every year, we talk to, you know, the major clouds because we're customers of all them, for as I said, our testing infrastructure if nothing else, and you know, some of them for some other parts, you know, for example, proxying VPC Flow Logs, we run infrastructure on Kubernetes in all—in the three biggest to proxy VPC Flow Logs, you know, and so that's part of our bill. But if something's always on, you know, one of our storage servers, it's a $15,000 machine that, you know, realistically runs five years, but even if you assume it runs three years, we get financing for it, cost a couple $100 a month to host, and that's inclusive of our ops team that runs, sort of, everything, you just do the math. That same machine would be, you know, even not including data transfer would be maybe 3500 a month on cloud. The economics just don't quite make sense.For burst, for things like CI/CD, test, seasonality, I think it's great. And if we have patterns like that, you know, we're the first to use it. So, it's just a question of using what's best. And a lot of our customers are in that realm, too. I would say some of them are a little over-rotated, you know, they've had big mandates to go one way or the other and don't have the right, you know, sort of nuanced view, but I think over time, that's going to fix itself. And yeah, as you were saying, like, the more people use cloud, the better we do, so it's just really a question of what's the best for us at our infrastructure and at any given time.Corey: I think that that is something that is not fully appreciated or well understood is that I work with cloud technologies because for what I do, it makes an awful lot of sense. But I've been lately doing a significant build-out in my home network on the perspective of yeah, this makes sense for what I do. And I now have increased number of workloads that I'm running here and I got to say, it feels a little strange, on some level, not to be paying AWS on something metered by the second whenever I'm running a job here. That always feels a little on the weird side. But I'm not suggesting I filled my house with servers either.Avi: [unintelligible 00:13:18] going to report you to the House on Cloudian Activities Committee [laugh] for—Corey: [laugh].Avi: To straighten you out about your infrastructure use and beliefs. I do have to ask you, and I do have some foreknowledge of this, where is the controller for your network running? Is it running in your house or—Corey: Oh, the WiFi controller lives in Ohio with all the other unpleasant things. I mean, even data transfer between Ohio and Virginia—if you're on AWS—is half-price because data wants to get out of Ohio just as much as the people do. And that's fine, but it can also fail out of band. I can chill that thing for a while and I'm not able to provision new equipment, I can't spin up new SSIDs, but—Avi: Right. It's the same as [kale scale 00:14:00], which is, like, sufficiently indistinguishable from magic, but it's nice there's [head scale 00:14:05] in case something happened to them. But yeah, you know, you just can't set up new stuff without your SSHing old way while it's down. So.Corey: And worst case, it goes away irretrievably, I can spin a new one up, I can pair everything locally, do it by repointing DNS locally, and life will go on. It's one of those areas where, like, I would not have this in Ohio if latency was a concern if it was routing every packet out halfway across the country before it hit the general internet. That would be a challenge for me. But that's not what I'm doing.Avi: Yeah, yeah. No, that makes sense. And I think also—Corey: And I certainly pay AWS by the second for that thing. That's—I have a three-year savings plan for that thing, and if nothing else, it was useful for me just to figure out what the living hell was going on with the savings plan purchase project one year. That was just, it was challenged to get that straightened out in some ways. Turns out that the high watermark of the console is a hundred-and-some-odd-thirty-million dollars you can add to cart and click the buy button. Have fun.Avi: My goodness. Okay, well.Corey: The API goes up to $26.2 billion. Try that in a free tier account, preferably someone else's.Avi: I would love to have such problems. Right now, that is not one of them. We don't spend that much on infrastructure.Corey: Oh, that is more than Amazon's—AWS's at least—quarterly revenue. So, if you wind up doing a $26.2 billion, it's like—it's that old saw. You owe Amazon a million dollars, you have a problem. If you owe Amazon $26 billion, Amazon has a problem. Yeah, that's when Andy Jassy calls you 20 minutes after you make that purchase, and at least to me, he yells at me with a, “Listen here, asshole,” and it sort of devolves from there.Avi: Well, I do live in Seattle, so you know, they send the posse out, I'm pretty sure.Corey: [laugh] I will be keynoting DevOpsDays Seattle on August 1st with a talk that might very well resonate with your perspective, “The Modern Devops: A Million Ways to Die in Production.”Avi: That is very cool. I mean, ultimately, I think that's what cloud comes back to. When cloud was being formed, it's just other people's computers, storage, and network. I don't know if you'd argue that there's a politics, control plane, or a—Corey: Oh, I would say, “Cloud? There's no cloud; just someone else's cost center.”Avi: Exactly. And so, how do you configure it? And back to the question of, should everything be on-prem or does cloud abstract at all, it's all the same stuff that we've been doing for decades and decades, just with other people's software and names, which you help decode. And then it's the question we've always had: what's the best thing to do? Do you like [Wellfleet 00:16:33] or [Protion 00:16:35]? Now, do you like Azure [laugh] or Google or Amazon or somebody else or running your own?Corey: It's almost this generation's equivalent of Vi versus Emacs.Avi: Yes. I guess there could be a crowd equivalent. I use VI, but only because I'm a lisp addict and I don't want to get stuck refining Eliza macros and connecting to the ChatGPT in Emacs. So, you know. Someone just did a Emacs as PID 0. So basically, no init, just, you know, the kernel boots into Emacs, and then someone of course had to do a VI as PID 0. And I have to admit, Emacs would be a lot more useful as a PID 0, even though I use VI.Corey: I would say that—I mean, you wind up in writing in Emacs and writing lisp in it, then I've got to say every third thing you say becomes a parenthetical.Avi: Exactly. Ha.Corey: But I want to say that there's also a definite moving of data going on there that I think is a scale that, for those of us working mostly in home labs and whatnot, can be hard to imagine. And I see that just in terms of the volume of Flow Logs, which to be clear, are smaller than the data transfer they are representing in almost every case.Avi: Almost every.Corey: You see so much of the telemetry that comes out of there and what customers are seeing and what their problems are, in different ways. It's not just Flow Logs, you ingest a whole bunch of different telemetry through a variety of modern and ancient and everything in between variety of protocols to support, you know, the horror that is network equipment interoperability. And just, I can't—I feel like I can't do a terrific job of doing justice to describing just how comprehensive Kentik is, once you get it set up as a product. What is on the wire has always been for me the arbiter of truth because computers will lie to you, but it's very tricky to get them to lie and get the network story to cover for it.Avi: Right. I mean, ultimately, that's one of the sources of truth. There's routing, there's performance testing, there's a whole lot of different things, and as you were saying, in any one of these slices of your, let's just pick the network. There's many different things that all mean the same, but look different that you need to put together. You could—the nerd term would be, you know, normalizing. You need to take all this stuff and normalize it.But traffic, we agree, that's where we started with. We call it the what if what is. What's actually happening on the infrastructure and that's the ancient stuff like IPFIX and NetFlow and sFlow. Some people that would argue that, you know, the [IATF 00:19:04] would say, “Oh, we're still innovating and it's still current,” but you know, it's certainly on-prem only. The major cloud vendors would say, “Oh, well, you can run the router—cloud routers—or you could run cloud versions of the big routers,” but we don't really see that as a super common pattern today.But what's really the difference between NetFlow and the VPC Flow Log? Well, some VPC Flow Logs have permit deny because they're really firewall logs, but ultimately, it's something went from here to there. There might not be a TCP flag, but there might be something else in cloud. And, you know, maybe there's rum data, which is also another kind of traffic. And ultimately, all together, we try to take that and then the business metadata to say, whether it's NetBox in the old world or Kubernetes in the new world, or some other [unintelligible 00:19:49], what application is this? What user is this?So, you can ask questions about why am I blowing up between these cloud regions? What applications are doing it, right? VPC Flow Logs by themselves don't know that, so you need to add that kind of metadata in. And then there's performance testing, which is sort of the what is. Something we do, Thousand Eyes does, some other people do.It's not the actual source of truth, but for example, if you're having a performance problem getting between, you know, us-east and Azure in the east, well, there's three other ways you can get there. If your actual traffic isn't getting there that way, then how do you know which one to use? Well, let's fire up some tests. There's all the metrics on what all of the devices are reporting, just like you get metrics from your machines and from your applications, and then there's stuff even up at the routing layer, which God help you, hopefully you don't need to actually get in and debug, but sometimes you do. And sometimes, you know, your neighbor tells the mailman that that mail is for me and not for you and they believe them and then you have a big problem when your bills don't get paid.The same thing happens in the cloud, the same thing happens on the internet [unintelligible 00:20:52] at the routing. So, the goal is, take all the different sources of it, make it the same within each type, and then pull it all together so you can look at a single place, you can look at a map, you can look at everything, whether it's the cloud, whether it's your own data centers, your own WAN, into the internet and in between in a coherent way that understands your application. So, it's a small task that we've bit off, but you know, we have fun solving it.Corey: Do you find that when you look at customer environments, that they are, and I don't mean to be disparaging here, truly I don't, but if you were to ask me to design something today, I would probably not even be using VPCs if I'm doing this completely greenfield. I would be a lot more cloud-first, et cetera, et cetera. Whereas in many cases, that is not the right path, especially if you know, customers have the temerity to not be founded within the last 18 months before AWS existed in some ways. Do you find that the majority of what they're doing looks like they're treating the cloud like data centers or do you find that they are leveraging cloud in ways that surprise you and would not be possible in traditional data centers? Because I can't shake the feeling that the network has a source of truth for figuring out what's really going on [is very hard to beat 00:22:05].Avi: Yes, for the most part, to both your assertion at the end and sort of the question. So, in terms of the question, for the most part, people think of VPCs as… you know, they could just equivalent be VLANs and [unintelligible 00:22:21], right? I've got policies, and I have these things that are talking to each other, and everything else is not local. And I've got—you know, it's not a perfect mapping to physical interfaces in VLANs but it's the equivalent of that.And that is sort of how people think about it. In the data center, you'd call it micro-segmentation, in the cloud, you call it clouding, but you know, just applying all the same policies and saying this stuff can talk to each other and not. Which is always sort of interesting, if you don't actually know what is talking [laugh] to each other to apply those policies. Which is a lot of what you know, Kentik gets brought in for first. I think where we see the cloud-native thinking, which is overlaid on top of that—you could call it overlay, I guess—which is service mesh.Now, putting aside the question of what's going to be a service mesh, what's going to be a network mesh, where there's something like [unintelligible 00:23:13] sit, the idea that there's a way that you look at traffic above the packets at, you know, layers three to more layer seven, that can do things like load balancing, do things like telemetry, do things like policy enforcement, that is a layer that we see very commonly that a lot of the old school folks have—you know, they want their lsu F5s and they want their F5 script. And they're like, “Why can't I have this in the cloud?”—which I guess you could buy it from F5 if you really want—but that's pretty common. Now, not everything's a sidecar anymore and there's still debates about what's going on there, but that's pretty common, even where the underlying cloud just looks like it could just be a data center.And that seems to be state of the art, I would say, our traditional enterprise customers, for sure. Our web company customers, and you know, service providers use cloud more for their OTT and some other things. As we work with them, they're a little bit more likely to be on-prem, you know, historic. But remember, in the enterprise, there's still a lot of M&A going on, I think that's even going to pick up in the next couple of years and a lot of what they're doing is lift-and-shift of [laugh] actual data centers. And my theory is, it's got to be easier to just make it look like VPCs than completely redo it.Corey: I'd say that there's reasons that things are the way that they are. Like, ignoring that this is the better approach from a technical perspective entirely because that's often not the only answer, it's we have assurances we made as part of audit compliance regimes, of our SOC 2, of how we handle certain things and what those controls are. And yeah, it's not hard for even a junior employee, most of the time, to design a reasonable architecture on a whiteboard. The problem is, how do you take something pre-existing and get it to a state that closely resembles that while not turning it off for a long time?Avi: Right. And I think we're starting to see some things that probably shouldn't exist, like, people trying to do VXLAN as overlays into and between VPCs because that was how their data s—you know, they're more modern on the data center side and trying to do that. But generally, I think people have an understanding they need to be designing architecture for greenfield things that aren't too far bleeding edge, unless it's like a pure developer shop, and also can map to the least common denominator kinds of infrastructure that people have. Now, sometimes that may be serverless, which means, you know, more CDN use and abstracted layers in front, but for, you know, running your own components, we see a lot of differences but also a lot of commonality. It's differences at the micro but commonality the macro. And I don't know what you see in your practice. So.Corey: I will say that what I see in practice is that there's a dichotomy where you have born-in-the-cloud companies where 80% of their spend is on a single workload and you can do a whole bunch of deep optimizations. And then you see the conglomerate approach where it's giant spend, but it's all very diffuse across 1500 different applications. And different philosophies, different processes, different cultures give rise to a lot of these things. I will say that if I had a magic wand, I would—and again, the fact that you sponsor and promote this episode is deeply appreciated. Thank you—Avi: You're welcome.Corey: —but it does not mean that you get to compromise my authenticity and objectivity. You can't buy my opinion, just my attention. But I will say this, that I would love it if my customers used Kentik because it is one of the best things I've ever seen to describe what is talking to what that scale and in volume without going super deep into the weeds. Now, obviously, I'm not allowed to start rolling out random things into customer environments. That's how I get sued to death. But, ugh, I wish it was there.Avi: You probably shouldn't set up IAM rules without asking them, yes. That wouldn't be bad.Corey: There's a reason that the only writable stuff that I have access to is generating reports in Cost Explorer.Avi: [laugh]. Okay.Corey: Everything else is read-only. All we do is to have conversations with folks. It sets context for those conversations. I used to think that we'd be doing this as a software offering. I no longer believe that actually solves the in-depth problems that people have.Avi: Well, I appreciate the praise. I even take some of the backhanded praise slash critique at the beginning because we think a lot about, you know, we did design for these complex and often hybrid infrastructures and it's true, we didn't design it for the two or four router, you know, infrastructure. If we had bootstrapped longer, if we'd done some other things, we might have done it that way. We don't want to be exclusionary. It's just sort of how we focus.But in the kind of customers that you have, these are things that we're thinking about what can we do to make it easier to onboard because people have these massive challenges seeing the traffic and understanding it and the cost and security and the performance, but to do more with the VPC Flow Logs, we need to get some of those metrics. We think about should we make an open-source thing. I don't know how much you've seen the concern that people have universally across cloud providers that they turn on something like Kentik, and they're going to hit their API rate limiter. Which is like, really, you can't build a cache for that at the scale that these guys run at, the large cloud providers. I don't really understand that. But it is what it is.We spent a lot of time thinking about that because of security policy, and getting the kind of metrics that we need. You know, if we open-source some of that, would it make it easier, plug it into people's observability infrastructure, we'd like to get that onboarding time down, even for those more complex infrastructures. But you know, the payoff is there, you know? It only takes a day of elapsed time and one hour or so. It's just you got to get a lot of approvals to get the kind of telemetry that you need to make sense of this in some environments.Corey: Oh, yes. And that's part of the problem, too, is like, you could talk about one of those big environments where you have 1500 apps all talking to each other. You can't make sense of any of it without talking to people and having contacts and occasionally get a little bit of [unintelligible 00:29:07] just what these things are named. But at that point, you're just speculating wildly. And, you know, it's an engineering trap, where I'm just going to guess rather than asking someone who knows the answer because I don't want to look foolish. It's… you just three weeks chasing your own tail. Who's the foolish one?Avi: We're not in a competitive business to yours—Corey: [laugh].Avi: But I do often ask when we're starting off, “So, can you point us at the source of truth that describes what all your applications are?” And usually, they're, like, “[laugh]. No.” But you know, at the same time to make sense of this stuff, you also need that metadata and that's something that we designed to be able to take.Now, Kubernetes has some of that. You may have some of it in ServiceNow, a lot of people use. You may have it in your own text file, CSV somewhere. It may be in NetBox, which we've seen people actually use for the cloud, more on the web company and service provider side, but even some traditional enterprise is starting to use it. So, a lot of what we have to do as a vendor is put all that together because yeah, when you're running across multiple environments and thousands of applications, ultimately scrying at IP addresses and VPC IDs is not going to be sufficient.So, the good news is, almost everybody has those sources and we just tried to drag it out of them and pull it back together. And for now, we refuse to actually try to get into that business because it's not a—seems sort of like, you know, SAP where you're going to be sending consultants forever, and not as interesting as the problems we're trying to solve.Corey: I really want to thank you, not just for supporting the show of course, but also for coming here to suffer my slings and arrows. If people want to learn more, where's the best place for them to find you? And please don't respond with an IP address.Avi: 127.0.0.1. You're welcome at my home at any time.Corey: There's no place like localhost.Avi: There's no place like localhost. Indeed. So, the company is kentik.com, K-E-N-T-I-K. I am avi@kentik.com. I am@avifriedman on Twitter and LinkedIn and some other things. And happy to chat with nerds, infrastructure nerds, cloud nerds, network nerds, software nerds, debate, maybe not VI versus Emacs, but should you swap space or not, and what should your cloud architecture look like?Corey: And we will, of course, put links to that in the [show notes 00:31:20].Avi: Thank you.Corey: Thank you so much for being so generous with your time. I really appreciate it.Avi: Thank you for having this forum. And I will let you know when I am down in San Francisco with some time.Corey: I would be offended if you didn't take the time to at least say hello. Avi Friedman, CEO at Kentik. I'm Cloud Economist Corey Quinn, and this has been a promoted guest episode of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a all five-star review on your podcast platform of choice, along with an angry comment saying how everything, start to finish, is somehow because of the network.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Jeremy Snyder, Founder of FireTail, joins Corey on Screaming in the Cloud to discuss his career journey and what led him to start FireTail. Jeremy reveals what's changed in cloud since he was an AE and AWS, and walks through how the need for customization in cloud security has led to a boom in the number of security companies out there. Corey and Jeremy also discuss the costs of cloud security, and Jeremy points out some of his observations in the world of cloud security pricing and packaging. About JeremyJeremy is the founder and CEO of FireTail.io, an end-to-end API security startup. Prior to FireTail, Jeremy worked in M&A at Rapid7, a global cyber leader, where he worked on the acquisitions of 3 companies during the pandemic. Jeremy previously led sales at DivvyCloud, one of the earliest cloud security posture management companies, and also led AWS sales in southeast Asia. Jeremy started his career with 13 years in cyber and IT operations. Jeremy has an MBA from Mason, a BA in computational linguistics from UNC, and has completed additional studies in Finland at Aalto University. Jeremy speaks 5 languages and has lived in 5 countries. Once, Jeremy went 5 days without seeing another human, but saw plenty of reindeer.Links Referenced: Firetail: https://firetail.io Email: jeremy@firetail.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Jeremy Snyder, who's the founder at Firetail. Jeremy, thank you for joining me today. I appreciate you taking the time from your day to suffer my slings and arrows.Jeremy: My pleasure, Corey. I'm really happy to be here.Corey: So, we'll get to a point where we talk about what you're up to these days, but first, I want to dive into the jobs of yesteryear because over a decade ago, you did a stint at AWS doing sales. And not to besmirch your hard work, but it feels like at the time, that must have been a very easy job. Because back then it really felt across the board like the sales motion was basically responding to, “Well, why should we do business with you?” And the response is, “Oh, you misunderstand. You have 87 different accounts scattered throughout your organization. I'm just here to give you visibility, governance, and possibly some discounting over that.” It feels like times have changed in a lot of ways since then. Is that accurate?Jeremy: Well, yeah, but I will correct a couple of things in there. In my days—Corey: Oh, please.Jeremy: —almost nobody had more than one account. I was in the one account, no VPCs, you know, you only separate your workloads by tagging days of AWS. So, our job was a lot, actually, harder at the time because people couldn't wrap their heads around the lack of subnetting, the lack of workload segregation. All of that was really, like, brand new to people, and so you were trying to tell them like, “Hey, you're going to be launching something on an EC2 instance that's in the same subnet as everybody else's EC2 instance.” And people were really worried about lateral traffic and sniffing and what could their neighbors or other customers on AWS see. And by the way, I mean, this was the customers who even believed it was real. You know, a lot of the conversations we went into with people was, “Oh, so Amazon bought too many servers and you're trying to sell us excess capacity.”Corey: That legend refuses to die.Jeremy: And, you know, it is a legend. That is not at all the genesis of AWS. And you know, the genesis is pretty well publicized at this point; you can go just google, “how did AWS started?” You can find accurate stuff around that.Corey: I did it a few years ago with multiple Amazon execs and published it, and they said definitively that that story was not true. And you can say a lot about AWS folks, and I assure you, I do, but I also do not catch them lying to my face, ever. And as soon as that changes, well, now we're going to have a different series of [laugh] conversations that are a lot more pointed. But they've earned some trust there.Jeremy: Yeah, I would agree. And I mean, look, I saw it internally, the way that Amazon built stuff was at such a breakneck pace, that challenge that they had that was, you know, the published version of events for why AWS got created, developers needed a place to test code. And that was something that they could not get until they got EC2, or could not get in a reasonably enough timeframe for it to be, you know, real-time valid or relevant for what was going on with the company. So, you know, that really is the genesis of things, and you know, the early services, SQS, S3, EC2, they all really came out of that journey. But yeah, in our days at AWS, there was a lot of ease, in the sense that lots of customers had pent-up frustrations with their data center providers or their colo providers and lots of customers would experience bursts and they would have capacity constraints and they would need a lot of the features that AWS offered, but we had to overcome a lot of technical misunderstandings and trust issues and, you know, oh, hey, Amazon just wants to sniff our data and they want to see what we're up to, and explain to them how encryption works and why they have their own keys and all these things. You know, we had to go through a lot of that. So, it wasn't super easy, but there was some element of it where, you know, just demand actually did make some aspects easy.Corey: What have you seen change since, well I guess ten years ago and change now? And let's be clear, you don't work in AWS sales, but you also are not oblivious to what the market is doing.Jeremy: For sure. For sure. I left AWS in 2011 and I've stayed in the cloud ecosystem pretty much ever since. I did spend some time working for a system integrator where all we did was migrate customers to AWS. And then I spent about five, six years working on cloud security primarily focused on AWS, a lot of GCP, a little bit of Azure.So yeah, I mean, I certainly stay up to date with what's going on in the state of cloud. I mean, look, Cloud has evolved from this kind of, you know, developer-centric, very easy-to-launch type of platform into a fully-fledged enterprise IT platform and all of the management structures and all of the kind of bells and whistles that you would want that you probably wanted from your old VMware networks but never really got, they're all there now. It is a very different ballgame in terms of what the platform actually enables you to do, but fundamentally, a lot of the core building block constructs and the primitives are still kind of driving the heart of it. It's just a lot of nicer packaging.What I think is really interesting is actually how customers' usage of cloud platforms has changed over time. And I always think of it and kind of like the, going back to my days, what did I see from my customers? And it was kind of like the month zero, “I just don't believe you.” Like, “This thing can't be real, I don't trust it, et cetera.” Month one is, I'm going to assign some developer to work on some very low-priority, low-risk workload. In my days, that was SharePoint, by the way. Like, nine times out of ten, the first workload that customers stood up was a SharePoint instance that they had to share across multiple locations.Corey: That thing falls over all the time anyway. May as well put it in the cloud where it can do so without taking too much else down with it. Was that the thinking or?Jeremy: Well, and the other thing about it at the time, Corey, was that, like, so many customers worked in this, like, remote-first world, right? And so, SharePoint was inevitably hosted at somebody's office. And so, the workers at that office were so privileged over the workers everywhere else. The performance gap between consuming SharePoint in one location versus another was like, night and day. So, you know, employees in headquarters were like, “Yeah, SharePoint's great.” Employees in branch offices were like, “This thing is terrible,” you know? “It's so slow. I hate it, I hate it, I hate it.”And so, Cloud actually became, like, this neutral location to move SharePoint to that kind of had an equal performance for every office. And so, that was, I think, one of the reasons and it was also, you know, it had capacity problems, and customers were right at that point, uploading tons of static documents to it, like Word documents, Office attachments, et cetera, and so they were starting to have some of these, like, real disk sprawl problems with SharePoint. So, that was kind of the month one problem. And only after they get through kind of month two, three, and four, and they go through, “I don't understand my bill,” and, “Help me understand security implications,” then they think about, like, “Hey, should we go back and look at how we're running that SharePoint stuff and maybe do it more efficiently and, like, move those static Office documents onto S3?” And so on, and so on.And that's kind of one of the big things that I've changed that I would say is very different from, like, 2011 to now, is there's enough sophistication around understanding that, like, you don't just translate what you're doing in your office or in your data center to what you're doing on cloud. Or if you do, you're not getting the most out of your investment.Corey: I'm curious to get your take on how you have seen cloud adoption patterns differ, specifically tied to geo. I mean, I tend to see it from a world where there's a bifurcation of between born-in-the-cloud SaaS-type companies where one workload is 80% of their bill or whatnot, and of the big enterprises where the largest single component is 3%. So, it's a very different slice there. But I'm curious what you would see from a sales perspective, looking across a lot of different geographic boundaries because we're all, on some level, biased based upon where we tend to spend our time doing business. I'm in San Francisco, which is its very own strange universe that has a certain perspective about itself that is occasionally accurate, but not usually. But it's a big world out there.Jeremy: It is. One thing that I would say it's interesting. I spent my AWS days based in Singapore, living in Singapore at the time, and I was working with customers across Southeast Asia. And to your point, Corey, one of the most interesting things was this little bit of a leapfrog effect. Data centers in Asia-Pac, especially in places like the Philippines, were just terrible.You know, the Philippines had, like, the second highest electricity rates in Asia at the time, only behind Japan, even though the GDP per capita gap between those two countries is really large. And yet you're paying, like, these super-high electricity rates. Secondarily, data centers in the Philippines were prone to flooding. And so, a lot of companies in the Philippines never went the data center route. You know, they just hosted servers in their offices, you know, they had a bunch of desktop machines in a cubicle, that kind of situation because, like, data centers themselves were cost prohibitive.So, you saw this effect a little bit like cell phones in a lot of the developing world. Landline infrastructure was too expensive or never got done for whatever reason, and people went straight to cell phones. So actually, what I saw in a lot of emerging markets in Asia was, screw the data center; we're going to go straight to cloud. So, I saw a lot of Asia-Pac get a little bit ahead of places like Europe where you had, for instance, a lot of long-term data center contracts and you had customers really locked in. And we saw this over the next, let's say between, like, say, 2014 and 2018 when I was working with a systems integrator, and then started working on cloud security.We saw that US customers and Asia-Pac customers didn't have these obligations; European customers, a lot of them were still working off their lease, and still, you know, I'm locked into let's just say Equinix Frankfurt for another five years before I can think about cloud migration. So, that's definitely one aspect that I observed. Second thing I think is, like, the earlier you started, the earlier you reached the point where you realize that actually there is value in a lot of managed services and there actually is value in getting away from the kind of server mindset around EC2.Corey: It feels like there's a lot of, I want to call it legacy thinking, in some ways, except that's unfair because legacy remains a condescending engineering term for something that makes money. The problem that you have is that you get bound by choices you didn't necessarily realize you were making, and then something becomes revenue-bearing. And now there's a different way to do it, or you learn more about the platform, or the platform itself evolves, and, “Oh, I'm going to rewrite everything to take advantage of this,” isn't happening. So, it winds up feeling like, yeah, we're treating the cloud like a data center. And sometimes that's right; sometimes that's a problem, but ultimately, it still becomes a significant challenge. I mean, there's no way around it. And I don't know what the right answer is, I don't know what the fix is going to be, but it always feels like I'm doing something wrong somewhere.Jeremy: I think a lot of customers go through that same set of feelings and they realize that they have the active runway problem, where you know, how do you do maintenance on an active runway? You kind of can't because you've got flights going in and out. And I think you're seeing this in your part of the world at SFO with a lot of the work that got done in, like, 2018, 2019 where they kind of had to close down a runway and had, like, near misses because they consolidated all flights onto the one active runway, right? It is a challenge. And I actually think that some of the evolution that I've seen our customers go through over the last, like, two, three years, is starting to get away from that challenge.So, to your point, when you have revenue-bearing workloads that you can't really modify and things are pretty tightly coupled, it is very hard to make change. But when you start to have it where things are broken down into more microservices, it makes it a lot easier to cycle out Service A for Service B, or let's say more accurately, Service A1 with Service A2 where you can kind of just, like, plug and play different APIs, and maybe, you know, repoint services at the new stuff as they come online. But getting to that point is definitely a painful process. It does require architectural changes and often those architectural changes aren't at the infrastructure level; they're actually inside the application or they're between things like applications and third-party dependencies where the customers may not have full control over the dependencies, and that does become a real challenge for people to break down and start to attack. You've heard of the Strangler Methodology?Corey: Oh, yes. Both in terms of the Boston Strangler, as well—Jeremy: [laugh]. Right.Corey: As the Strangler design pattern.Jeremy: Yeah, yeah. But I think, like, getting to that is challenging until, like, once you understand that you want to do that, it makes a lot of sense. But getting to the starting point for that journey can be really challenging for a lot of customers because it involves stakeholders that are often not involved on infrastructure conversations, and organizational dysfunction can really creep in there, where you have teams that don't necessarily play nice together, not for any particular reason, but just because historically they haven't had to. So, that's something that I've seen and definitely takes a little bit of cultural work to overcome.Corey: When you take a look across the board of cloud adoption, it's interesting to have seen the patterns that wound up unfolding. Your career path, though, seem to have gotten away from the selling cloud and into some strange directions leading up to what you're doing now, where you founded Firetail. What do you folks do?Jeremy: We do API security. And it really is kind of the culmination of, like, the last several years and what we saw. I mean, to your point, we saw customers going through kind of Phase One, Two, Three of cloud adoption. Phase One, the, you know, for lack of a better phrase, lift-and-shift and Phase Two, the kind of first step on the path towards quote-unquote, “Enlightenment,” where they start to see that, like, actually, we can get better operational efficiency if we, you know, move our databases off of EC2 and on to RDS and we move our static content onto S3.And then Phase Three, where they realize actually EC2 kind of sucks, and it's a lot of management overhead, it's a lot of attack surface, I hate having to bake AMIs. What I really want to do is just drop some code on a platform and run my application. And that might be serverless. That might be containerized, et cetera. But one path or the other, where we pretty much always see customers ending up is with an API sitting on a network.And that API is doing two things. It front-ends a data set and at front-ends a set of functionality, and most cases. And so, what that really means is that the thing that sits on the network that does represent the attack surface, both in terms of accessing data or in terms of let's say, like, abusing an application is an API. And that's what led us to where I am today, what led me and my co-founder Riley to, you know, start the company and try to make it easier for customers to build more secure APIs. So yeah, that's kind of the change that I've observed over the last few years that really, as you said, lead to what I'm doing now.Corey: There is a lot of, I guess, challenge in the entire space when we bound that to—even API security, though as soon as you going down the security path it starts seeming like there's a massive problem, just in terms of proliferation of companies that each do different things, that each focus on different parts of the story. It feels like everything winds up spitting out huge amounts of security-focused, or at least security-adjacent telemetry. Everything has findings on top of that, and at least in the AWS universe, “Oh, we have a service that spits out a lot of that stuff. We're going to launch another service on top of it that, of course, cost more money that then winds up organizing it for you. And then another service on top of that that does the same thing yet again.” And it feels like we're building a tower of these things that are just… shouldn't just be a feature in the original underlying thing that turns down the noise? “Well, yes, but then we couldn't sell you three more things around it.”Jeremy: Yeah, I mean—Corey: Agree? Disagree?Jeremy: I don't entirely disagree. I think there is a lot of validity on what you just said there. I mean, if you look at like the proliferation of even the security services, and you see GuardDuty and Config and Security Hub, or things like log analysis with Athena or log analysis with an ELK stack, or OpenSearch, et cetera, I mean, you see all these proliferation of services around that. I do think the thing to bear in mind is that for most customers, like, security is not a one size fits all. Security is fundamentally kind of a risk management exercise, right? If it wasn't a risk management exercise, then all security would really be about is, like, keeping your data off of networks and making sure that, like, none of your data could ever leave.But that's not how companies work. They do interact with the outside world and so then you kind of always have this decision and this trade-off to make about how much data you expose. And so, when you have that decision, then it leads you down a path of determining what data is important to your organization and what would be most critical if it were breached. And so, the point of all of that is honestly that, like, security is not the same for you as it is for me, right? And so, to that end, you might be all about Security Hub, and Config instead of basic checks across all your accounts and all your active regions, and I might be much more about, let's say I'm quote-unquote, “Digital-native, cloud-native,” blah, blah, blah, I really care about detection and response on top of events.And so, I only care about log aggregation and, let's say, GuardDuty or Athena analysis on top of that because I feel like I've got all of my security configurations in Infrastructure as Code. So, there's not a right and wrong answer and I do think that's part of why there are a gazillion security services out there.Corey: On some level, I've been of the opinion for a while now that the cloud providers themselves should not necessarily be selling security services directly because, on some level, that becomes an inherent conflict of interest. Why make the underlying platform more secure or easier to use from a security standpoint when you can now turn that into a revenue source? I used to make comments that Microsoft Defender was a classic example of getting this right because they didn't charge for it and a bunch of antivirus companies screamed and whined about it. And then of course, Microsoft's like, “Oh, Corey saying nice things about us. We can't have that.” And they started charging for it. So okay, that more or less completely subverts my entire point. But it still feels squicky.Jeremy: I mean, I kind of doubt that's why they started charging for it. But—Corey: Oh, I refuse to accept that I'm not that influential. There we are.Jeremy: [laugh]. Fair enough.Corey: Yeah, I just can't get away from the idea that it feels squicky when the company providing the infrastructure now makes doing the secure thing on top of it into an investment decision.Jeremy: Yeah.Corey: “Do you want the crappy, insecure version of what we build or do you want the top-of-the-line secure version?” That shouldn't be a choice people have to make. Because people don't care about security until right after they really should have cared about security.Jeremy: Yeah. Look, and I think the changes to S3 configuration, for instance, kind of bear out your point. Like, it shouldn't be the case that you have to go through a lot of extra steps to not make your S3 data public, it should always be the case that, like, you have to go through a lot of steps if you want to expose your data. And then you have explicitly made a set of choices on your own to make some data public, right? So, I kind of agree with the underlying logic. I think the counterargument, if there is one to be made, is that it's not up to them to define what is and is not right for your organization.Because again, going back to my example, what is secure for you may not be secure for me because we might have very different modes of operation, we might have very different modes of building our infrastructure, deploying our infrastructure, et cetera. And I think every cloud provider would tell you, “Hey, we're just here to enable customers.” Now, do I think that they could be doing more? Do I think that they could have more secure defaults? You know, in general, yes, of course, they could. And really, like, the fundamentals of what I worry about are people building insecure applications, not so much people deploying infrastructure with bad configurations.Corey: It's funny, we talk about this now. Earlier today, I was lamenting some of the detritus from some of my earlier builds, where I've been running some of these things in my old legacy single account for a while now. And the build service is dramatically overscoped, just because trying to get the security permissions right, was an exercise in frustration at the time. It was, “Nope, that's not it. Nope, blocked again.”So, I finally said to hell with it, overscope it massively, and then with a, “Todo: fix this later,” which of course, never happened. And if there's ever a breach on something like that, I know that I'll have AWS wagging its finger at me and talking about the shared responsibility model, but it's really kind of a disaster plan of their own making because there's not a great way to say easily and explicitly—or honestly, by default the way Google Cloud does—of okay, by default, everything in this project can talk to everything in this project, but the outside world can't talk to any of it, which I think is where a lot of people start off. And the security purists love to say, “That's terrible. That won't work at a bank.” You're right, it won't, but a bank has a dedicated security apparatus, internally. They can address those things, whereas your individual student learner does not. And that's how you wind up with open S3 bucket monstrosities left and right.Jeremy: I think a lot of security fundamentalists would say that what you just described about that Google project structure, defeats zero trust, and you know, that on its own is actually a bad thing. I might counterargue and say that, like, hey, you can have a GCP project as a zero trust, like, first principle, you know? That can be the building block of zero trust for your organization and then it's up to you to explicitly create these trust relationships to other projects, and so on. But the thing that I think in what you said that really kind of does resonate with me in particular as an area that AWS—and really this case, just AWS—should have done better or should do better, is IAM permissions. Because every developer in the world that I know has had that exact experience that you described, which is, they get to a point where they're like, “Okay, this thing isn't working. It's probably something with IAM.”And then they try one thing, two things, and usually on the third or fourth try, they end up with a star permission, and maybe a comment in that IAM policy or maybe a Jira ticket that, you know, gets filed into backlog of, “Review those permissions at some point in the future,” which pretty much never happens. So, IAM in particular, I think, is one where, like, Amazon should do better, or should at least make it, like, easy for us to kind of graphically build an IAM policy that is scoped to least permissions required, et cetera. That one, I'll a hundred percent agree with your comments and your statement.Corey: As you take a look across the largest, I guess, environments you see, and as well as some of the folks who are just getting started in this space, it feels like, on some level, it's two different universes. Do you see points of commonality? Do you see that there is an opportunity to get the individual learner who's just starting on their cloud journey to do things that make sense without breaking the bank that they then can basically have instilled in them as they start scaling up as they enter corporate environments where security budgets are different orders of magnitude? Because it seems to me that my options for everything that I've looked at start at tens of thousands of dollars a year, or are a bunch of crappy things I find on GitHub somewhere. And it feels like there should be something between those two.Jeremy: In terms of training, or in terms of, like, tooling to build—Corey: In terms of security software across the board, which I know—Jeremy: Yeah.Corey: —is sort of a vague term. Like, I first discovered this when trying to find something to make sense of CloudTrail logs. It was a bunch of sketchy things off GitHub or a bunch of very expensive products. Same thing with VPC flow logs, same thing with trying to parse other security alerting and aggregate things in a sensible way. Like, very often it's, oh, there's a few very damning log lines surrounded by a million lines of nonsense that no one's going to look through. It's the needle in a haystack problem.Jeremy: Yeah, well, I'm really sorry if you spent much time trying to analyze VPC flow logs because that is just an exercise in futility. First of all, the level of information that's in them is pretty useless, and the SLA on actually, like, log delivery, A, whether it'll actually happen, and B, whether it will happen in a timely fashion is just pretty much non-existent. So—Corey: Oh, from a security perspective I agree wholeheartedly, but remember, I'm coming from a billing perspective, where it's—Jeremy: Ah, fair enough.Corey: —huh, we're taking a petabyte in and moving 300 petabytes between availability zones. It's great. It's a fun game called find whatever is chatty because, on some level, it's like, run two of whatever that is—or three—rather than having it replicate. What is the deal here? And just try to identify, especially in the godforsaken hellscape that is Kubernetes, what is that thing that's talking? And sometimes flow logs are the only real tool you've got, other than oral freaking tradition.Jeremy: But God forbid you forgot to tag your [ENI 00:24:53] so that the flow log can actually be attributed to, you know, what workload is responsible for it behind the scenes. And so yeah, I mean, I think that's a—boy that's a case study and, like, a miserable job that I don't think anybody would really want to have in this day and age.Corey: The timing of this is apt. I sent out my newsletter for the week a couple hours before this recording, and in the bottom section, I asked anyone who's got an interesting solution for solving what's talking to what with VPC flow logs, please let me know because I found this original thing that AWS put up as part of their workshops and a lab to figure this out, but other than that, it's more or less guess-and-check. What is the hotness? It's been a while since I explored the landscape. And now we see if the audience is helpful or disappoints me. It's all on you folks.Jeremy: Isn't the hotness to segregate every microservice into an account and run it through a load balancer so that it's like much more properly tagged and it's also consumable on an account-by-account basis for better attribution?Corey: And then everything you see winds up incurring a direct fee when passing through that load balancer, instead of the same thing within the same subnet being able to talk to one another for free.Jeremy: Yeah, yeah.Corey: So, at scale—so yes, for visibility, you're absolutely right. From a, I would like to spend less money giving it directly to Amazon, not so much.Jeremy: [unintelligible 00:26:08] spend more money for the joy of attribution of workload?Corey: Not to mention as well that coming into an environment that exists and is scaled out—which is sort of a prerequisite for me going in on a consulting project—and saying, “Oh, you should rebuild everything using serverless and microservice principles,” is a great way to get thrown out of the engagement in the first 20 minutes. Because yes, in theory, anyone can design something great, that works, that solves a problem on a whiteboard, but most of us don't get to throw the old thing away and build fresh. And when we do great, I'm greenfielding something; there's always constraints and challenges down the road that you don't see coming. So, you finally wind up building the most extensible thing in the universe that can handle all these things, and your business dies before you get to MVP because that takes time, energy and effort. There are many more companies that have died due to failure to find product-market fit than have died because, “Oh hey, your software architecture was terrible.” If you hit the market correctly, there is budget to fix these things down the road, whereas your code could be pristine and your company's still dead.Jeremy: Yeah. I don't really have a solution for you on that one, Corey [laugh].Corey: [laugh].Jeremy: I will come back to your one question—Corey: I was hoping you did.Jeremy: Yeah, sorry. I will come back to the question about, you know, how should people kind of get started in thinking about assessing security. And you know, to your point, look, I mean, I think Config is a low-ish cost, but should it cost anything? Probably not, at least for, like, basic CIS foundation benchmark checks. I mean, like, if the best practice that Amazon tells everybody is, “Turn on these 40-ish checks at last count,” you know, maybe those 40-ish checks should just be free and included and on in everybody's account for any account that you tag as production, right?Like, I will wholeheartedly agree with that sentiment, and it would be a trivial thing for Amazon to do, with one kind of caveat—and this is something that I think a lot of people don't necessarily understand—collecting all the required data for security is actually really expensive. Security is an extremely data-intensive thing at this day and age. And I have a former coworker who used to hate the expression that security is data science, but there is some truth in it at this point, other than the kind of the magic around it is not actually that big because there's not a lot of, let's say, heuristic analysis or magic that goes into what queries, et cetera. A lot of security is very rule-based. It's a lot of, you know, just binary checks: is this bit set to zero or one?And some of those things are like relatively simple, but what ends up inevitably happening is that customers want more out of it. They don't just want to know, is my security good or bad? They want to know things like is it good or bad now relative to last week? Has it gotten better or worse over time? And so, then you start accumulating lots of data and time series data, and that becomes really expensive.And secondarily, the thing that's really starting to happen more and more in the security world is correlation of multiple layers of data, infrastructure with applications, infrastructure with operating system, infrastructure with OS and app vulnerabilities, infrastructure plus vulnerabilities plus Kubernetes configurations plus API sitting at the edge of that. Because realistically, like, so many organizations that are built out at scale, the truth of the matter is, is just like on their operating system vulnerabilities, they're going to have tens of thousands, if not millions of individual items to deal with and no human can realistically prioritize those without some context around it. And that is where the data, kind of, management becomes really expensive.Corey: I hear you. Particularly the complaints about AWS Config, which many things like Control Tower setup for you. And on some level, it is a tax on using the cloud as the cloud should be used because it charges for evaluation of changes to your environment. So, if you're spinning things up all the time and then turning them down when they're not in use, that incurs a bunch of Config charges, whereas if you've treat it like a big dumb version of your data center where you just spin [unintelligible 00:30:13] things forever, your Config charge is nice and low. When you start seeing it entering the top ten of your spend on services, something is very wrong somewhere.Jeremy: Yeah. I would actually say, like, a good compromise in my mind would be that we should be included with something like business support. If you pay for support with AWS, why not include Config, or some level of Config, for all the accounts that are in scope for your production support? That would seem like a very reasonable compromise.Corey: For a lot of folks that have it enabled but they don't see any direct value from it either, so it's one of those things where not knowing how to turn it off becomes a tax on what you're doing, in some cases. In SCPs, but often with Control Tower don't allow you to do that. So, it's your training people who are learning this in their test environments to avoid it, but you want them to be using it at scale in an enterprise environment. So, I agree with you, there has to be a better way to deliver that value to customers. Because, yeah, this thing is now, you know, 3 or 4% of your cloud bill, it's not adding that much value, folks.Jeremy: Yeah, one thing I will say just on that point, and, like, it's a super small semantic nitpick that I have, I hate when people talk about security as a tax because I think it tends to kind of engender the wrong types of relationships to security. Because if you think about taxes, two things about them, I mean, one is that they're kind of prescribed for you, and so in some sense, this kind of Control Tower implementation is similar because, like you know, it's hard for you to turn off, et cetera, but on the other hand, like, you don't get to choose how that tax money is spent. And really, like, you get to set your security budget as an organization. Maybe this Control Tower Config scenario is a slight outlier on that side, but you know, there are ways to turn it off, et cetera.The other thing, though, is that, like, people tend to relate to tax, like, this thing that they really, really hate. It comes once a year, you should really do everything you can to minimize it and to, like, not spend any time on it or on getting it right. And in fact, like, there's a lot of people who kind of like to cheat on taxes, right? And so, like, you don't really want people to have that kind of mindset of, like, pay as little as possible, spend as little time as possible, and yes, let's cheat on it. Like, that's not how I hope people are addressing security in their cloud environments.Corey: I agree wholeheartedly, but if you have a service like Config, for example—that's what we're talking about—and it isn't adding value to you, and you just you don't know what it does, how it works, than it [unintelligible 00:32:37]—or more or less how to turn it off, then it does effectively become directly in line of a tax, regardless of how people want to view the principle of taxation. It's a—yeah, security should not be a tax. I agree with you wholeheartedly. The problem is, is it is—Jeremy: It should be an enabler.Corey: —unclea—yeah, the relationship between Config and security in many cases is fairly attenuated in a lot of people's minds.Jeremy: Yeah. I mean, I think if you don't have, kind of, ideas in mind for how you want to use it or consume it, or how you want to use it, let's say as an assessment against your own environment, then it's particularly vexing. So, if you don't know, like, “Hey, I'm going to use Config. I'm going to use Config for this set of rules. This is how I'm going to consume that data and how I'm going to then, like, pass the results on to people to make change in the organization,” then it's particularly useless.Corey: Yeah. I really want to thank you for taking the time to speak with me. If people want to learn more, where's the best place for them to find you?Jeremy: Easy, breezy. We are just firetail.io. That's ‘fire' like the, you know, flaming substance, and ‘tail' like the tail of an animal, not like a story. But yeah, just firetail.io.And if you come now, we've actually got, like, a white paper that we just put out around API security and kind of analyzing ten years of API-based data breaches and trying to understand what actually went wrong in most of those cases. And you're more than welcome to grab that off of our website. And if you have any questions, just reach out to me. I'm just jeremy@firetail.io.Corey: And we'll put links to all of that in the [show notes 00:34:03]. Thank you so much for your time. I appreciate it.Jeremy: My pleasure, Corey. Thanks so much for having me.Corey: Jeremy Snyder, founder and CEO at Firetail. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment pointing out that listening to my nonsense is a tax on you going about your day.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
AWS networking provides a broad set of services that enable you to , connect, and secure your applications and data. With AWS networking, you can easily connect your on-premises networks to AWS, create virtual private clouds (VPCs), and use a variety of networking services to route traffic, control access, and monitor your network. * Scalability: AWS networking is designed to scale with your needs. You can easily add or remove resources as your traffic demands change. * Reliability: AWS networking is built on a highly reliable global infrastructure. Your applications will be available even if there is a problem with one of your network components. * Security: AWS networking provides a variety of security features to protect your applications and data. You can control who has access to your network and what they can do.
In this episode of AWS Bites, we explore the future of Virtual Private Clouds (VPCs) in the context of the zero-trust security trend. We'll dive into the pros and cons of using VPCs, including their usefulness when dealing with sensitive data or when you need fine-grained control over your network environment. But let's be real, sometimes VPCs can be a bit of a headache. We'll discuss why you might want to avoid them, including the added complexity they can bring to your network environment. Fear not, we'll also provide a summary of when to use and when not to use VPCs, as well as alternatives to using VPCs, such as services that don't require them. So, are ready to talk VPCs!?
On this episode of The Cloud Pod, the team discusses the upcoming 2023 in-person Google Cloud conference, the accessibility of AWS CloudTrail Lake for non-AWS activity events, the new updates from Azure Chaos studio, and the comparison between Oracle Cloud service and other Cloud providers. They also highlight the application and importance of VPCs in CCOE. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
About Chris Chris Farris has been in the IT field since 1994 primarily focused on Linux, networking, and security. For the last 8 years, he has focused on public-cloud and public-cloud security. He has built and evolved multiple cloud security programs for major media companies, focusing on enabling the broader security team's objectives of secure design, incident response and vulnerability management. He has developed cloud security standards and baselines to provide risk-based guidance to development and operations teams. As a practitioner, he's architected and implemented multiple serverless and traditional cloud applications focused on deployment, security, operations, and financial modeling.Chris now does cloud security research for Turbot and evangelizes for the open source tool Steampipe. He is one if the organizers of the fwd:cloudsec conference (https://fwdcloudsec.org) and has given multiple presentations at AWS conferences and BSides events.When not building things with AWS's building blocks, he enjoys building Legos with his kid and figuring out what interesting part of the globe to travel to next. He opines on security and technology on Twitter and his website https://www.chrisfarris.comLinks Referenced: Turbot: https://turbot.com/ fwd:cloudsec: https://fwdcloudsec.org/ Steampipe: https://steampipe.io/ Steampipe block: https://steampipe.io/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I have been meaning to invite slash drag onto this show for a number of years. We first met at re:Inforce the first year that they had such a thing, Amazon's security conference for cloud, as is Amazon's tradition, named after an email subject line. Chris Farris is a cloud security nerd at Turbot. He's also one of the organizers for fwd:cloudsec, another security conference named after an email subject line with a lot more self-awareness than any of Amazon's stuff. Chris, thank you for joining me.Chris: Oh, thank you for dragging me on. You can let go of my hair now.Corey: Wonderful, wonderful. That's why we're all having the thinning hair going on. People just use it to drag us to and fro, it seems. So, you've been doing something that I'm only going to describe as weird lately because your background—not that dissimilar from mine—is as a practitioner. You've been heavily involved in the security space for a while and lately, I keep seeing an awful lot of things with your name on them getting sucked up by the giant app surveillance apparatus deployed to the internet, looking for basically any mention of AWS that I wind up using to write my newsletter and feed the content grist mill every year. What are you doing and how'd you get there?Chris: So, what am I doing right now is, I'm in marketing. It's kind of a, you know, “Oops, I'm sorry I did that.”Corey: Oh, the running gag is, you work in DevRel; that means, “Oh, you're in marketing, but they're scared to tell you that.” You're self-aware.Chris: Yeah.Corey: Good for you.Chris: I'm willing to address that I'm in marketing now. And I've been a cloud practitioner since probably 2014, cloud security since about 2017. And then just decided, the problem that we have in the cloud security community is a lot of us are just kind of sitting in a corner in our companies and solving problems for our companies, but we're not solving the problems at scale. So, I wanted a job that would allow me to reach a broader audience and help a broader audience. Where I see cloud security having—you know, or cloud in general falling down is Amazon makes it really hard for you to do your side of shared responsibility, and so we need to be out there helping customers understand what they need to be doing. So, I am now at a company called Turbot and we're really trying to promote cloud security.Corey: One of the first promoted guest episodes of this show was David Boeke, your CTO, and one of the things that I regret is that I've sort of lost track of Turbot over the past few years because, yeah, one or two things might have been going on during that timeline as I look back at having kids in the middle of a pandemic and the deadly plague o'er land. And suddenly, every conversation takes place over Zoom, which is like, “Oh, good, it's like a happy hour only instead, now it's just like a conference call for work.” It's like, ‘Conference Calls: The Drinking Game' is never the great direction to go in. But it seems the world is recovering. We're going to be able to spend some time together at re:Invent by all accounts that I'm actively looking forward to.As of this recording, you're relatively new to Turbot, and I figured out that you were going there because, once again, content hits my filters. You wrote a fascinating blog post that hits on an interest of mine that I don't usually talk about much because it's off-putting to some folk, and these days, I don't want to get yelled at and more than I have to about the experience of traveling, I believe it was to an all-hands on the other side of the world.Chris: Yep. So, my first day on the job at Turbot, I was landing in Kuala Lumpur, Malaysia, having left the United States 24 hours—or was it 48? It's hard to tell when you go to the other side of the planet and the time zones have also shifted—and then having left my prior company day before that. But yeah, so Turbot about traditionally has an annual event where we all get together in person. We're a completely remote company, but once a year, we all get together in person in our integrate event.And so, that was my first day on the job. And then you know, it was basically two weeks of reasonably intense hackathons, building out a lot of stuff that hopefully will show up open-source shortly. And then yeah, meeting all of my coworkers. And that was nice.Corey: You've always had a focus through all the time that I've known you and all the public content that you've put out there that has come across my desk that seems to center around security. It's sort of an area that I give a nod to more often than I would like, on some level, but that tends to be your bread and butter. Your focus seems to be almost overwhelmingly on I would call it AWS security. Is that fair to say or is that a mischaracterization of how you view it slash what you actually do? Because, again, we have these parasocial relationships with voices on the internet. And it's like, “Oh, yeah, I know all about that person.” Yeah, you've met them once and all you know other than that is what they put on Twitter.Chris: You follow me on Twitter. Yeah, I would argue that yes, a lot of what I do is AWS-related security because in the past, a lot of what I've been responsible for is cloud security in AWS. But I've always worked for companies that were multi-cloud; it's just that 90% of everything was Amazon and so therefore 90% of my time, 90% of my problems, 90% of my risk was all in AWS. I've been trying to break out of that. I've been trying to understand the other clouds.One of the nice aspects of this role and working on Steampipe is I am now experimenting with other clouds. The whole goal here is to be able to scale our ability as an industry and as security practitioners to support multiple clouds. Because whether we want to or not, we've got it. And so, even though 90% of my spend, 90% of my resources, 90% of my applications may be in AWS, that 10% that I'm ignoring is probably more than 10% of my risk, and we really do need to understand and support major clouds equally.Corey: One post you had recently that I find myself in wholehearted agreement with is on the adoption of Tailscale in the enterprise. I use it for all of my personal nonsense and it is transformative. I like the idea of what that portends for a multi-cloud, or poly-cloud, or whatever the hell we're calling it this week, sort of architectures were historically one of the biggest problems in getting to clouds two speak to one another and manage them in an intelligent way is the security models are different, the user identity stuff is different as well, and the network stuff has always been nightmarish. Well, with Tailscale, you don't have to worry about that in the same way at all. You can, more or less, ignore it, turn on host-based firewalls for everything and just allow Tailscale. And suddenly, okay, I don't really have to think about this in the same way.Chris: Yeah. And you get the micro-segmentation out of it, too, which is really nice. I will agree that I had not looked at Tailscale until I was asked to look at Tailscale, and then it was just like, “Oh, I am completely redoing my home network on that.” But looking at it, it's going to scare some old-school network engineers, it's going to impact their livelihoods and that is going to make them very defensive. And so, what I wanted to do in that post was kind of address, as a practitioner, if I was looking at this with an enterprise lens, what are the concerns you would have on deploying Tailscale in your environment?A lot of those were, you know, around user management. I think the big one that is—it's a new thing in enterprise security, but kind of this host profiling, which is hey, before I let your laptop on the network, I'm going to go make sure that you have antivirus and some kind of EDR, XDR, blah-DR agents so that you know we have a reasonable thing that you're not going to just go and drop [unintelligible 00:09:01] on the network and next thing you know, we're Maersk. Tailscale, that's going to be their biggest thing that they are going to have to figure out is how do they work with some of these enterprise concerns and things along those lines. But I think it's an excellent technology, it was super easy to set up. And the ability to fine-tune and microsegment is great.Corey: Wildly so. They occasionally sponsor my nonsense. I have no earthly idea whether this episode is one of them because we have an editorial firewall—they're not paying me to set any of this stuff, like, “And this is brought to you by whatever.” Yeah, that's the sponsored ad part. This is just, I'm in love with the product.One of the most annoying things about it to me is that I haven't found a reason to give them money yet because the free tier for my personal stuff is very comfortably sized and I don't have a traditional enterprise network or anything like that people would benefit from over here. For one area in cloud security that I think I have potentially been misunderstood around, so I want to take at least this opportunity to clear the air on it a little bit has been that, by all accounts, I've spent the last, mmm, few months or so just absolutely beating the crap out of Azure. Before I wind up adding a little nuance and context to that, I'd love to get your take on what, by all accounts, has been a pretty disastrous year-and-a-half for Azure security.Chris: I think it's been a disastrous year-and-a-half for Azure security. Um—[laugh].Corey: [laugh]. That was something of a leading question, wasn't it?Chris: Yeah, no, I mean, it is. And if you think, though, back, Microsoft's repeatedly had these the ebb and flow of security disasters. You know, Code Red back in whatever the 2000s, NT 4.0 patching back in the '90s. So, I think we're just hitting one of those peaks again, or hopefully, we're hitting the peak and not [laugh] just starting the uptick. A lot of what Azure has built is stuff that they already had, commercial off-the-shelf software, they wrapped multi-tenancy around it, gave it a new SKU under the Azure name, and called is cloud. So, am I super-surprised that somebody figured out how to leverage a Jupyter notebook to find the back-end credentials to drop the firewall tables to go find the next guy over's Cosmos DB? No, I'm not.Corey: I find their failures to be less egregious on a technical basis because let's face it, let's be very clear here, this stuff is hard. I am not pretending for even a slight second that I'm a better security engineer than the very capable, very competent people who work there. This stuff is incredibly hard. And I'm not—Chris: And very well-funded people.Corey: Oh, absolutely, yeah. They make more than I do, presumably. But it's one of those areas where I'm not sitting here trying to dunk on them, their work, their efforts, et cetera, and I don't do a good enough job of clarifying that. My problem is the complete radio silence coming out of Microsoft on this. If AWS had a series of issues like this, I'm hard-pressed to imagine a scenario where they would not have much more transparent communications, they might very well trot out a number of their execs to go on a tour to wind up talking about these things and what they're doing systemically to change it.Because six of these in, it's like, okay, this is now a cultural problem. It's not one rando engineer wandering around the company screwing things up on a rotational basis. It's, what are you going to do? It's unlikely that firing Steven is going to be your fix for these things. So, that is part of it.And then most recently, they wound up having a blog post on the MSRC, the Microsoft Security Resource Center is I believe that acronym? The [mrsth], whatever; and it sounds like a virus you pick up in a hospital—but the problem that I have with it is that they spent most of that being overly defensive and dunking on SOCRadar, the vulnerability researcher who found this and reported it to them. And they had all kinds of quibbles with how it was done, what they did with it, et cetera, et cetera. It's, “Excuse me, you're the ones that left customer data sitting out there in the Azure equivalent of an S3 bucket and you're calling other people out for basically doing your job for you? Excuse me?”Chris: But it wasn't sensitive customer data. It was only the contract information, so therefore it was okay.Corey: Yeah, if I put my contract information out there and try and claim it's not sensitive information, my clients will laugh and laugh as they sue me into the Stone Age.Chris: Yeah well, clearly, you don't have the same level of clickthrough terms that Microsoft is able to negotiate because, you know, [laugh].Corey: It's awful as well, it doesn't even work because, “Oh, it's okay, I lost some of your data, but that's okay because it wasn't particularly sensitive.” Isn't that kind of up to you?Chris: Yes. And if A, I'm actually, you know, a big AWS shop and then I'm looking at Azure and I've got my negotiations in there and Amazon gets wind that I'm negotiating with Azure, that's not going to do well for me and my business. So no, this kind of material is incredibly sensitive. And that was an incredibly tone-deaf response on their part. But you know, to some extent, it was more of a response than we've seen from some of the other Azure multi-tenancy breakdowns.Corey: Yeah, at least they actually said something. I mean, there is that. It's just—it's wild to me. And again, I say this as an Azure customer myself. Their computer vision API is basically just this side of magic, as best I can tell, and none of the other providers have anything like it.That's what I want. But, you know, it almost feels like that service is under NDA because no one talks about it when they're using this service. I did a whole blog post singing its praises and no one from that team reached out to me to say, “Hey, glad you liked it.” Not that they owe me anything, but at the same time it's incredible. Why am I getting shut out? It's like, does this company just have an entire policy of not saying anything ever to anyone at any time? It seems it.Chris: So, a long time ago, I came to this realization that even if you just look at the terminology of the three providers, Amazon has accounts. Why does Amazon have Amazon—or AWS accounts? Because they're a retail company and that's what you signed up with to buy your underwear. Google has projects because they were, I guess, a developer-first thing and that was how they thought about it is, “Oh, you're going to go build something. Here's your project.”What does Microsoft have? Microsoft Azure Subscriptions. Because they are still about the corporate enterprise IT model of it's really about how much we're charging you, not really about what you're getting. So, given that you're not a big enterprise IT customer, you don't—I presume—do lots and lots of golfing at expensive golf resorts, you're probably not fitting their demographic.Corey: You're absolutely not. And that's wild to me. And yet, here we are.Chris: Now, what's scary is they are doing so many interesting things with artificial intelligence… that if… their multi-tenancy boundaries are as bad as we're starting to see, then what else is out there? And more and more, we is carbon-based life forms are relying on Microsoft and other cloud providers to build AI, that's kind of a scary thing. Go watch Satya's keynote at Microsoft Ignite and he's showing you all sorts of ways that AI is going to start replacing the gig economy. You know, it's not just Tesla and self-driving cars at this point. Dali is going to replace the independent graphics designer.They've got things coming out in their office suite that are going to replace the mom-and-pop marketing shops that are generating menus and doing marketing plans for your local restaurants or whatever. There's a whole slew of things where they're really trying to replace people.Corey: That is a wild thing to me. And part of the problem I have in covering AWS is that I have to differentiate in a bunch of different ways between AWS and its Amazon corporate parent. And they have that problem, too, internally. Part of the challenge they have, in many cases, is that perks you give to employees have to scale to one-and-a-half million people, many of them in fulfillment center warehouse things. And that is a different type of problem that a company, like for example, Google, where most of their employees tend to be in office job-style environments.That's a weird thing and I don't know how to even start conceptualizing things operating at that scale. Everything that they do is definitionally a very hard problem when you have to make it scale to that point. What all of the hyperscale cloud providers do is, from where I sit, complete freaking magic. The fact that it works as well as it does is nothing short of a modern-day miracle.Chris: Yeah, and it is more than just throwing hardware at the problem, which was my on-prem solution to most of the things. “Oh, hey. We need higher availability? Okay, we're going to buy two of everything.” We called it the Noah's Ark model, and we have an A side and a B side.And, “Oh, you know what? Just in case we're going to buy some extra capacity and put it in a different city so that, you know, we can just fail from our primary city to our secondary city.” That doesn't work at the cloud provider scale. And really, we haven't seen a major cloud outage—I mean, like, a bad one—in quite a while.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: The outages are always fascinating, just from the way that they are reported in the mainstream media. And again, this is hard, I get it. I am not here to crap on journalists. They, for some ungodly, unknowable reason, have decided not to spend their entire career focusing on the nuances of one very specific, very deep industry. I don't know why.But as [laugh] a result, they wind up getting a lot of their baseline facts wrong about these things. And that's fair. I'm not here to necessarily act as an Amazon spokesperson when these things happen. They have an awful lot of very well-paid people who can do that. But it is interesting just watching the blowback and the reaction of whatever there's an outage, the conversation is never “Does Amazon or Azure or Google suck?” It's, “Does cloud suck as a whole?”That's part of the reason I care so much about Azure getting their act together. If it were just torpedoing Microsoft's reputation, then well, that's sad, but okay. But it extends far beyond that to a point where it's almost where the enterprise groundhog sees the shadow of a data breach and then we get six more years of data center build-outs instead of moving things to a cloud. I spent too many years working in data centers and I have the scars from the cage nuts and crimping patch cables frantically in the middle of the night to prove it. I am thrilled at the fact that I don't believe I will ever again have to frantically drive across town in the middle of the night to replace a hard drive before the rest of the array degrades. Cloud has solved those problems beautifully. I don't want to go back to the Dark Ages.Chris: Yeah, and I think that there's a general potential that we could start seeing this big push towards going back on-prem for effectively sovereign data reasons, whether it's this country has said, “You cannot store your data about our citizens outside of our borders,” and either they're doing that because they do not trust the US Silicon Valley privacy or whatever, or because if it's outside of our borders, then our secret police agents can come knocking on the door at two in the morning to go find out what some dissidents' viewings habits might have been, I see sovereign cloud as this thing that may be a back step from this ubiquitous thing that we have right now in Amazon, Azure, and Google. And so, as we start getting to the point in the history books where we start seeing maps with lots of flags, I think we're going to start seeing a bifurcation of cloud as just a whole thing. We see it already right now. The AWS China partition is not owned by Amazon, it is not run by Amazon, it is not controlled by Amazon. It is controlled by the communist government of China. And nobody is doing business in Russia right now, but if they had not done what they had done earlier this year, we might very well see somebody spinning up a cloud provider that is completely controlled by and in the Russian government.Corey: Well, yes or no, but I want to challenge that assessment for a second because I've had conversations with a number of folks about this where people say, “Okay, great. Like, is the alt-right, for example, going to have better options now that there might be a cloud provider spinning up there?” Or, “Well, okay, what about a new cloud provider to challenge the dominance of the big three?” And there are all these edge cases, either geopolitically or politically based upo—or folks wanting to wind up approaching it from a particular angle, but if we were hired to build out an MVP of a hyperscale cloud provider, like, the budget for that MVP would look like one 100 billion at this point to get started and just get up to a point of critical mass before you could actually see if this thing has legs. And we'd probably burn through almost all of that before doing a single dime in revenue.Chris: Right. And then you're doing that in small markets. Outside of the China partition, these are not massively large markets. I think Oracle is going down an interesting path with its idea of Dedicated Cloud and Oracle Alloy [unintelligible 00:22:52].Corey: I like a lot of what Oracle's doing, and if younger me heard me say that, I don't know how hard I'd hit myself, but here we are. Their free tier for Oracle Cloud is amazing, their data transfer prices are great, and their entire approach of, “We'll build an entire feature complete region in your facility and charge you what, from what I can tell, is a very reasonable amount of money,” works. And it is feature complete, not, “Well, here are the three services that we're going to put in here and everything else is well… it's just sort of a toehold there so you can start migrating it into our big cloud.” No. They're doing it right from that perspective.The biggest problem they've got is the word Oracle at the front end and their, I would say borderline addiction to big-E enterprise markets. I think the future of cloud looks a lot more like cloud-native companies being founded because those big enterprises are starting to describe themselves in similar terminology. And as we've seen in the developer ecosystem, as go startups, so do big companies a few years later. Walk around any big company that's undergoing a digital transformation, you'll see a lot more Macs on desktops, for example. You'll see CI/CD processes in place as opposed to, “Well, oh, you want something new, it's going to be eight weeks to get a server rack downstairs and accounting is going to have 18 pages of forms for you to fill out.” No, it's “click the button,” or—Chris: Don't forget the six months of just getting the financial CapEx approvals.Corey: Exactly.Chris: You have to go through the finance thing before you even get to start talking to techies about when you get your server. I think Oracle is in an interesting place though because it is embracing the fact that it is number four, and so therefore, it's like we are going to work with AWS, we are going to work with Azure, our database can run in AWS or it can run in our cloud, we can interconnect directly, natively, seamlessly with Azure. If I were building a consumer-based thing and I was moving into one of these markets where one of these governments was demanding something like a sovereign cloud, Oracle is a great place to go and throw—okay, all of our front-end consumer whatever is all going to sit in AWS because that's what we do for all other countries. For this one country, we're just going to go and build this thing in Oracle and we're going to leverage Oracle Alloy or whatever, and now suddenly, okay, their data is in their country and it's subject to their laws but I don't have to re-architect to go into one of these, you know, little countries with tin horn dictators.Corey: It's the way to do multi-cloud right, from my perspective. I'll use a component service in a different cloud, I'm under no illusions, though, in doing that I'm increasing my resiliency. I'm not removing single points of failure; I'm adding them. And I make that trade-off on a case-by-case basis, knowingly. But there is a case for some workloads—probably not yours if you're listening to this; assume not, but when you have more context, maybe so—where, okay, we need to be across multiple providers for a variety of strategic or contextual reasons for this workload.That does not mean everything you build needs to be able to do that. It means you're going to make trade-offs for that workload, and understanding the boundaries of where that starts and where that stops is going to be important. That is not the worst idea in the world for a given appropriate workload, that you can optimize stuff into a container and then can run, more or less, anywhere that can take a container. But that is also not the majority of most people's workloads.Chris: Yeah. And I think what that comes back to from the security practitioner standpoint is you have to support not just your primary cloud, your favorite cloud, the one you know, you have to support any cloud. And whether that's, you know, hey, congratulations. Your developers want to use Tailscale because it bypasses a ton of complexity in getting these remote island VPCs from this recent acquisition integrated into your network or because you're going into a new market and you have to support Oracle Cloud in Saudi Arabia, then you as a practitioner have to kind of support any cloud.And so, one of the reasons that I've joined and I'm working on, and so excited about Steampipe is it kind of does give you that. It is a uniform interface to not just AWS, Azure, and Google, but all sorts of clouds, whether it's GitHub or Oracle, or Tailscale. So, that's kind of the message I have for security practitioners at this point is, I tried, I fought, I screamed and yelled and ranted on Twitter, against, you know, doing multi-cloud, but at the end of the day, we were still multi-cloud.Corey: When I see these things evolving, is that, yeah, as a practitioner, we're increasingly having to work across multiple providers, but not to a stupendous depth that's the intimidating thing that scares the hell out of people. I still remember my first time with the AWS console, being so overwhelmed with a number of services, and there were 12. Now, there are hundreds, and I still feel that same sense of being overwhelmed, but I also have the context now to realize that over half of all customer spend globally is on EC2. That's one service. Yes, you need, like, five more to get it to work, but okay.And once you go through learning that to get started, and there's a lot of moving parts around it, like, “Oh, God, I have to do this for every service?” No, take Route 53—my favorite database, but most people use it as a DNS service—you can go start to finish on basically everything that service does that a human being is going to use in less than four hours, and then you're more or less ready to go. Everything is not the hairy beast that is EC2. And most of those services are not for you, whoever you are, whatever you do, most AWS services are not for you. Full stop.Chris: Yes and no. I mean, as a security practitioner, you need to know what your developers are doing, and I've worked in large organizations with lots of things and I would joke that, oh, yeah, I'm sure we're using every service but the IoT, and then I go and I look at our bill, and I was like, “Oh, why are we dropping that much on IoT?” Oh, because they wanted to use the Managed MQTT service.Corey: Ah, I start with the bill because the bill is the source of truth.Chris: Yes, they wanted to use the Managed MQTT service. Okay, great. So, we're now in IoT. But how many of those things have resource policies, how many of those things can be made public, and how many of those things are your CSPM actually checking for and telling you that, hey, a developer has gone out somewhere and made this SageMaker notebook public, or this MQTT topic public. And so, that's where you know, you need to have that level of depth and then you've got to have that level of depth in each cloud. To some extent, if the cloud is just the core basic VMs, object storage, maybe some networking, and a managed relational database, super simple to understand what all you need to do to build a baseline to secure that. As soon as you start adding in on all of the fancy services that AWS has. I re—Corey: Yeah, migrating your Step Functions workflow to other cloud is going to be a living goddamn nightmare. Migrating something that you stuffed into a container and run on EC2 or Fargate is probably going to be a lot simpler. But there are always nuances.Chris: Yep. But the security profile of a Step Function is significantly different. So, you know, there's not much you can do there wrong, yet.Corey: You say that now, but wait for their next security breach, and then we start calling them Stumble Functions instead.Chris: Yeah. I say that. And the next thing, you know, we're going to have something like Lambda [unintelligible 00:30:31] show up and I'm just going to be able to put my Step Function on the internet unauthenticated. Because, you know, that's what Amazon does: they innovate, but they don't necessarily warn security practitioners ahead of their innovation that, hey, you're we're about to release this thing. You might want to prepare for it and adjust your baselines, or talk to your developers, or here's a service control policy that you can drop in place to, you know, like, suppress it for a little bit. No, it's like, “Hey, these things are there,” and by the time you see the tweets or read the documentation, you've got some developer who's put it in production somewhere. And then it becomes a lot more difficult for you as a security practitioner to put the brakes on it.Corey: I really want to thank you for spending so much time talking to me. If people want to learn more and follow your exploits—as they should—where can they find you?Chris: They can find me at steampipe.io/blog. That is where all of my latest rants, raves, research, and how-tos show up.Corey: And we will, of course, put a link to that in the [show notes 00:31:37]. Thank you so much for being so generous with your time. I appreciate it.Chris: Perfect, thank you. You have a good one.Corey: Chris Farris, cloud security nerd at Turbot. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, and be sure to mention exactly which Azure communications team you work on.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About HarryHarry has worked at Sysdig for over 6 years, helping organizations mature their journey to cloud native. He's witnessed the evolution of bare metal, VMs, and finally Kubernetes establish itself as the de-facto for container orchestration. He is part of the product team building Sysdig's troubleshooting and cost offering, helping customers increase their confidence operating and managing Kubernetes.Previously, Harry ran, and later sold, a cloud hosting provider where he was working hands on with systems administration. He studied information security and lives in the UK.Links Referenced:Sysdig: https://sysdig.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode has been brought to us by our friends at Sysdig, and they have sent one of their principal product managers to suffer my slings and arrows. Please welcome Harry Perks.Harry: Hey, Corey, thanks for hosting me. Good to meet you.Corey: An absolute pleasure and thanks for basically being willing to suffer all of the various nonsense about to throw your direction. Let's start with origin stories; I find that those tend to wind up resonating the most. Back when I first noticed Sysdig coming into the market, because it was just launching at that point, it seemed like it was a… we'll call it an innovative approach to observability, though I don't recall that we use the term observability back then. It more or less took a look at whatever an application was doing almost at a system call level and tracing what was going on as those requests worked on an individual system, and then providing those in a variety of different forms to reason about. Is that directionally correct as far as the origin story goes, where my misremembering an evening event I went to what feels like half a lifetime ago?Harry: I'd say the latter, but just because it's a funnier answer. But that's correct. So, Sysdig was created by Loris Degioanni, one of the founders of Wireshark. And when containers and Kubernetes was being incepted, you know, it kind of created this problem where you kind of lacked visibility into what's going on inside these opaque boxes, right? These black boxes which are containers.So, we started using system calls as a source of truth for… I don't want to say observability, but observability, and using those system calls to essentially see what's going on inside containers from the outside. And leveraging system calls, we were able to pull up metrics, such as what are the golden signals of applications running in containers, network traffic. So, it's a very simple way to instrument applications. And that was really how monitoring started. And then Sysdig kind of morphed into a security product.Corey: What was it that drove that transformation? Because generally speaking, when you have a product that's in a particular space that's aimed at a particular niche pivots into something that feels as orthogonal as security don't tend to be something that you see all that often. What did you folks see that wound up driving that change?Harry: The same challenges that were being presented by containers and microservices for monitoring were the same challenges for security. So, for runtime security, it was very difficult for our customers to be able to understand what the heck is going on inside the container. Is a crypto miner being spun up? Is there malicious activity going on? So, it made logical sense to use that same data source - system calls - to understand the monitoring and the security posture of applications.Corey: One of the big challenges out there is that security tends to be one of those pervasive things—I would argue that observability does too—where once you have a position of being able to see what is going on inside of an environment and be able to reason about it. And this goes double for inside of containers, which from a cloud provider perspective, at least seems to be, “Oh, yeah, just give us the containers, we don't care what's going on inside, so we're never going to ask, notice, or care.” And being able to bridge between that lack of visibility between—from the outside of container land and inside of container land has been a perennial problem. There are security implications, there are cost implications, there are observability challenges to be sure, and of course, reliability concerns that flow directly from that, which is, I think, most people, at least historically, contextualize observability. It's a fancy word to describe is the site about to fall over and crash into the sea. At least in my experience. Is that your definition of observability, or if I basically been hijacked by a number of vendors who have decided to relabel what they'd been doing for 15 years as observability?Harry: [laugh]. I think observability is one of those things that is down to interpretation depending on what is the most recent vendor you've been speaking with. But to me, observability is: am I happy? Am I sad? Are my applications happy? Are they sad?Am I able to complete business-critical transactions that keep me online, and keep me afloat? So, it's really as simple as that. There are different ways to implement observability, but it's really, you know, you can't improve the performance, and you can't improve the security posture of things, you can't see, right? So, how do I make sure I can see everything? And what do I do with that data is really what observability means to me.Corey: The entire observability space across the board is really one of those areas that is defined, on some level, by outliers within it. It's easy to wind up saying that any given observability tool will—oh, it alerts you when your application breaks. The problem is that the interesting stuff is often found in the margins, in the outlier products that wind up emerging from it. What is the specific area of that space where Sysdig tends to shine the most?Harry: Yeah, so you're right. The outliers typically cause problems and often you don't know what you don't know. And I think if you look at Kubernetes specifically, there is a whole bunch of new problems and challenges and things that you need to be looking at that didn't exist five to ten years ago, right? There are new things that can break. You know, you've got a pod that's stuck in a CrashLoopBackOff.And hey, I'm a developer who's running my application on Kubernetes. I've got this pod in a CrashLoopBackOff. I don't know what that means. And then suddenly I'm being expected to alert on these problems. Well, how can I alert on things that I didn't even know were a problem?So, one of the things that Sysdig is doing on the observability side is we're looking at all of this data and we're actually presenting opinionated views that help customers make sense of that data. Almost like, you know, I could present this data and give it to my grandma, and she would say, “Oh, yeah, okay. You've got these pods in CrashLoopBackoff you've got these pods that are being CPU throttled. Hey, you know, I didn't know I had to worry about CPU limits, or, you know, memory limits and now I'm suffering, kind of, OOM kills.” So, I think one of the things that's quite unique about Sysdig on the monitoring side that a lot of customers are getting value from is kind of demystifying some of those challenges and making a lot of that data actionable.Corey: At the time of this recording, I've not yet bothered to run Kubernetes in anger by which I, of course, mean production. My production environment is of course called ‘Anger' similarly to the way that my staging environment is called ‘Theory' because things work in theory, but not in production. That is going to be changing in the first quarter of next year, give or take. The challenge with that, though, is that so much has changed—we'll say—since the evolution of Kubernetes into something that is mainstream production in most shops. I stopped working in production environments before that switch really happened, so I'm still at a relatively amateurish level of understanding around a lot of these things.I'm still thinking about old-school problems, like, “Okay, how big do I make each one of the nodes in my Kubernetes cluster?” Yeah, if I get big systems, it's likelier that there will be economies of scale that start factoring in fewer nodes to manage, but it does increase the blast radius if one of those nodes gets affected by something that takes it offline for a while. I'm still at the very early stages of trying to wrap my head around the nuances of running these things in a production environment. Cost is, of course, a separate argument. My clients run it everywhere and I can reason about it surprisingly well for something that is not lending itself to easy understanding it by any sense of the word and you almost have to intuit its existence just by looking at the AWS bill.Harry: No, I like your observations. And I think the last part there around costs is something that I'm seeing a lot in the industry and in our customers is, okay, suddenly, you know, I've got a great monitoring posture, or observability posture, whatever that really means. I've got a great security posture. As customers are maturing in their journey to Kubernetes, suddenly there are a bunch of questions that are being asked from atop—and we've kind of seen this internally—such as, “Hey, what is the ROI of each customer?”Or, “What is the ROI of a specific product line or feature that we deliver to our customers?”And we couldn't answer those problems. And we couldn't answer those problems because we're running a bunch of applications and software on Kubernetes and when we receive our billing reports from the multiple different cloud providers we use— Azure, AWS, and GCP—we just received a big fat bill that was compute, and we were unable to kind of break that down by the different teams and business units, which is a real problem. And one of the problems that we really wanted to start solving, both for internal uses, but also for our customers, as well.Corey: Yeah, when you have a customer coming in, the easy part of the equation is well how much revenue are we getting from a customer? Well, that's easy enough to just wind up polling your finance group and, “Yeah, how much have they paid us this year?” “Great. Good to know.” Then it gets really confusing over on the cost side because it gets into a unit economic model that I think most shops don't have a particularly advanced understanding of.If we have another hundred customers sign up this month, what will it cost us to service them? And what are the variables that change those numbers? It really gets into a fascinating model where people more or less, do some gut checks and some rounding, but there are a bunch of areas where people get extraordinarily confused, start to finish. Kubernetes is very much one of them because from a cloud provider's perspective, it's just a single-tenant app that is really gnarly in terms of its behavior, it does a bunch of different things, and from the bill alone, it's hard to tell that you're even running Kubernetes unless you ask.Harry: Yeah, absolutely. And there was a survey from the CNCF recently that said 68% of folks are seeing increased Kubernetes costs—of course—and 69% of respondents said that they have no cost monitoring in place or just cost estimates, which is simply not good enough, right? People want to break down that line item to those individual business units and in teams. Which is a huge challenge that cloud providers aren't fulfilling today.Corey: Where do you see most of the cost issue breaking down? I mean, there's some of the stuff that we are never allowed to talk about when it comes to cost, which is the realistic assessment that people to work on technology cost more than the technology itself. There's a certain—how do we put this—unflattering perspective that a lot of people are deploying Kubernetes into environments because they want to bolster their own resume, not because it's the actual right answer to anything that they have going on. So, that's a little hit or miss, on some level. I don't know that I necessarily buy into that, but you take a look at the compute storage, you look at the data transfer side, which it seems that almost everyone mostly tends to ignore, despite the fact that Kubernetes itself has no zone affinity, so it has no idea whether its internal communication is free or expensive, and it just adds up to a giant question mark.Then you look at Kubernetes architecture diagrams, or God forbid the CNCF landscape diagram, and realize, oh, my God, they have more of these things, and they do Pokemon, and people give up any hope of understanding it other than just saying, “It's complicated,” and accepting that that's just the way that it is. I'm a little less fatalistic, but I also think it's a heck of a challenge.Harry: Absolutely. I mean, the economics of cloud, right? Why is ingress free, but egress is not free? Why is it so difficult to [laugh] understand that intra AZ traffic is completely billed separately to public traffic, for example? And I think network costs is one thing that is extremely challenging for customers.One, they don't even have that visibility into what is the network traffic: what is internal traffic, what is public traffic. But then there's also a whole bunch of other challenges that are causing Kubernetes costs to rise, right? You've got folks that struggle with setting the right requests for Kubernetes, which ultimately blows up the scale of a Kubernetes cluster. You've got the complexity of AWS, for example, economics of instance types, you know? I don't know whether I need to be running ten m5.xlarge versus four, Graviton instances.And this ability to, kind of, size a cluster correctly as well as size a workload correctly is very, very difficult and customers are not able to establish that baseline today. And obviously, you can't optimize what you can't see, right, so I think a lot of customers struggle with both that visibility. But then the complexity means that it's incredibly difficult to optimize those costs.Corey: You folks are starting to dip your toes in the Kubernetes costing space. What approach are you taking?Harry: Sysdig builds products to Kubernetes first. So, if you look at what we're doing on the monitoring space, we were really kind of pioneered what customers want to get out of Kubernetes observability, and then we were doing similar things for security? So, making sure our security product is, [I want to say,] Kubernetes-native. And what we're doing on the cost side of the things is, of course, there are a lot of cost products out there that will give you the ability to slice and dice by AWS service, for example, but they don't give you that Kubernetes context to then break those costs down by teams and business units. So at Sysdig, we've already been collecting usage information, resource usage information–requests, the container CPU, the memory usage–and a lot of customers have been using that data today for right-sizing, but one of the things they said was, “Hey, I need to quantify this. I need to put a big fat dollar sign in front of some of these numbers we're seeing so I can go to these teams and management and actually prompt them to right-size.”So, it's quite simple. We're essentially augmenting that resource usage information with cost data from cloud providers. So, instead of customers saying, “Hey, I'm wasting one terabyte of memory, they can say, hey, I'm wasting 500 bucks on memory each month,” So, it's very much Kubernetes specific, using a lot of Kubernetes context and metadata.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Part of the whole problem that I see across the space is that the way to solve some of these problems internally has been when you start trying to divide costs between different teams is well, we're just going to give each one their own cluster, or their own environment. That does definitely solve the problem of shared services. The counterpoint is it solves them by making every team individually incur them. That doesn't necessarily seem like the best approach in every scenario. One thing I have learned, though, is that, for some customers, that is the right approach. Sounds odd, but that's the world we live in where context absolutely matters a lot. I'm very reluctant these days to say at a glance, “Oh, you're doing it wrong.” You eat a whole lot of crow when you do that, it turns out.Harry: I see this a lot. And I see customers giving their own business units, their own AWS account, which I kind of feel like is a step backwards, right? I don't think you're properly harnessing the power of Kubernetes and creating this, kind of, shared tenancy model, when you're giving a team their own AWS account. I think it's important we break down those silos. You know, there's so much operational overhead with maintaining these different accounts, but there must be a better way to address some of these challenges.Corey: It's one of those areas where “it depends” becomes the appropriate answer to almost anything. I'm a fan of having almost every workload have its own AWS account within the same shared AWS organization, then with shared VPCs, which tend to work out. But that does add some complexity to observing how things interact there. One of the guidances that I've given people is assume in the future that in any architecture diagram you ever put up there, that there will be an AWS account boundary between any two resources because someone's going to be doing it somewhere. And that seems to be something that AWS themselves are just slowly starting to awaken to as well. It's getting easier and easier every week to wind up working with multiple accounts in a more complicated structure.Harry: Absolutely. And I think when you start to adopt a multi-cloud strategy, suddenly, you've got so many more increased dimensions. I'm running an application in AWS, Azure, and GCP, and now suddenly, I've got all of these subaccounts. That is an operational overhead that I don't think jives very well, considering there is such a shortage of folks that are real experts—I want to say experts—in operating these environments. And that's really, you know, I think one of the challenges that isn't being spoken enough about today.Corey: It feels like so much of the time that the Kubernetes is winding up being an expression of the same way that getting into microservices was, which is, “Well, we have a people problem, we're going to solve it with this approach.” Great, but then you wind up with people adopting it where they don't have the context that applied when the stuff was originally built and designed for. Like with mono repos. Yeah, it was a problem when you had 5000 developers all try to work on the same thing and stomping each other, so breaking that apart made sense. But the counterpoint of where you wind up with companies with 20 developers and 200 microservices starts to be a little… okay, has this pendulum swung too far?Harry: Yeah, absolutely. And I think that when you've got so many people being thrown at a problem, there's lots of kinds of changes being made, there's new deployments, and I think things can spiral out of control pretty quickly, especially when it comes to costs. “Hey, I'm a developer and I've just made this change. And how do I understand, you know, what is the financial impact of this change?” “Has this blown up my network costs because suddenly, I'm not traversing the right network path?” Or, suddenly, I'm consuming so much more CPU, and actually, there is a physical compute cost of this. There's a lot of cooks in the kitchen and I think that is causing a lot of challenges for organizations.Corey: You've been working in product for a while and one of my favorite parts of being in a position where you are so close to the core of what it is your company does, is that you find it's almost impossible to not continue learning things just based upon how customers take what you built and the problems that they experienced, both that they bring you in to solve, and of course, the new and exciting problems that you wind up causing for them—or to be more charitable surfacing that they didn't realize already existed. What have you learned lately from your customers that you didn't see coming?Harry: One of the biggest problems that I've been seeing is—I speak to a lot of customers and I've maybe spoken to 40 or 50 customers over the last, you know, few months, about a variety of topics, whether it's observability, in general, or, you know, on the financial side, Kubernetes costs–and what I hear about time and time again, regardless as to the vertical or the size of the organization, is the platform teams, the people closest to Kubernetes know their stuff. They get it. But a lot of their internal customers,so the internal business units and teams, they, of course, don't have the same kind of clarity and understanding, and these are the people that are getting the most frustrated. I've been shipping software for 20 years and now I'm modernizing applications, I'm starting to use Kubernetes, I've got so many new different things to learn about that I'm simply drowning, in problems, in cloud-native problems.And I think we forget about that, right? Too often, we kind of spend time throwing fancy technology at the people, such as the, you know, the DevOps engineers, the platform teams, but a lot of internal customers are struggling to leverage that technology to actually solve their own problems. They can't make sense of this data and they can't make the right changes based off of that data.Corey: I would say that is a very common affliction of Kubernetes where so often it winds up handling things that are now abstracted away to the point where we don't need to worry about that. That's true right up until the point where they break and now you have to go diving into the magic. That's one of the reasons that I was such a fan of Sysdig when it first came out was the idea that it was getting into what I viewed at the time as operating system fundamentals and actually seeing what was going on, abstracted away from the vagaries of the code and a lot more into what system calls is it making. Great, okay, now I'm starting to see a lot of calls that it shouldn't necessarily be making, or it's thrashing in a particular way. And it's almost impossible to get to that level of insight—historically—through traditional observability tools, but being able to take a look at what's going on from a more fundamentals point of view was extraordinarily helpful.I'm optimistic if you can get to a point where you're able to do that with Kubernetes, given its enraging ecosystem, for lack of a better term. Whenever you wind up rolling out Kubernetes, you've also got to pick some service delivery stuff, some observability tooling, some log routers, and so on and so forth. It feels like by the time you're running anything in production, you've made so many choices along the way that the odds that anyone else has made the same choices you have are vanishingly small, so you're running your own bespoke unicorn somewhere.Harry: Absolutely. Flip a coin. And that's probably one [laugh] of the solutions that you're going to throw at a problem, right? And you keep flipping that coin and then suddenly, you're going to reach a combination that nobody else has done before. And you're right, the knowledge that you have gained from, I don't know, Corey Quinn Enterprises is probably not going to ring true at Harry Perks Enterprise Limited, right?There is a whole different set of problems and technology and people that, you know, of course, you can bring some of that knowledge along—there are some common denominators—but every organization is ultimately using technology in different ways. Which is problematic, right to the people that are actually pioneering some of these cloud native applications.Corey: Given my professional interest, I am curious about what it is you're doing as you start moving a little bit away from the security and observability sides and into cost observability. How are you approaching that? What are the mistakes that you see people making and how are you meeting them where they are?Harry: The biggest challenge that I am seeing is with sizing workloads and sizing clusters. And I see this time and time again. Our product shines the light on the capacity utilization of compute. And what it really boils down to is two things. Platform teams are not using the correct instance types or the combination of instance types to run the workloads for their teams, their application teams, but also application developers are not setting things like requests correctly.Which makes sense. Again, I flip a coin and maybe that's the request I'm going to set. I used to size a VM with one gig of memory, so now I'm going to size my pod with one gig of memory. But it doesn't really work like that. And of course, when you request usage is essentially my slice of the pizza that's been carved out.And even if I don't see that entire slice of pizza, it's for me, nobody else can use it. So, what we're trying to do is really help customers with that challenge. So, if I'm a developer, I would be looking at the historical usage of our workloads. Maybe it's the maximum usage or, you know, the p99 or the p95 and then setting my workload request to that. You keep doing that over the course of the different team's applications you have and suddenly, you start to establish this baseline of what is the compute actually needed to run all of these applications.And that helps me answer the question, what should I size my cluster to? And that's really important because until you've established that baseline, you can't start to do things like cluster reshaping, to pick a different combination of instance types to power your cluster.Corey: Some level, a lack of diversity in instance types is a bit of a red flag, just because it generally means that someone said, “Oh, yeah, we're going to start with this default instance size and then we'll adjust as time goes on,” and spoilers just like anything else labeled ‘TODO' in your codebase, it never gets done. So, you find yourself pretty quickly in a scenario where some workloads are struggling to get the resources they need inside of whatever that default instance size is, and on the other, you wind up with some things that are more or less running a cron job once a day and sitting there completely idle but running the whole time, regardless. And optimization and right-sizing on a lot of these scenarios is a little bit tricky. I've been something of a, I'll say, a pessimist, when it comes to the idea of right-sizing EC2 instances, just because so many historical workloads are challenging to get recertified on newer instance families and the rest, whereas when we're running on Kubernetes already, presumably everything's built in such a way that it can stop existing in a stateless way and the service still continues to work. If not, it feels like there are some necessary Kubernetes prerequisites that may not have circulated fully internally yet.Harry: Right. And to make this even more complicated, you've got applications that may be more memory intensive or CPU intensive, so understanding the ratio of CPU to memory requirements for their applications depending on how they've been architected makes this more challenging, right? I mean, pods are jumping around and that makes it incredibly difficult to track these movements and actually pick the instances that are going to be most appropriate for my workloads and for my clusters.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Harry: sysdig.com is where you can learn more about what Sysdig is doing as a company and our platform in general.Corey: And we will, of course, put a link to that in the show notes. Thank you so much for your time. I appreciate it.Harry: Thank you, Corey. Hope to speak to you again soon.Corey: Harry Perks, principal product manager at Sysdig. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that we will lose track of because we don't know where it was automatically provisioned.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
About SamSam Nicholls: Veeam's Director of Public Cloud Product Marketing, with 10+ years of sales, alliance management and product marketing experience in IT. Sam has evolved from his on-premises storage days and is now laser-focused on spreading the word about cloud-native backup and recovery, packing in thousands of viewers on his webinars, blogs and webpages.Links Referenced: Veeam AWS Backup: https://www.veeam.com/aws-backup.html Veeam: https://veeam.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam. And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nicholls, Director of Public Cloud over at Veeam. Sam, thank you for joining me.Sam: Hey. Thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am [laugh] hopefully well-prepared to answer anything and everything that Corey throws my way. Fingers crossed. [laugh].Corey: I don't think there's too much room for criticizing here, to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups; which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, “Oh, I really should have cared about backups at any point prior to 20 minutes ago.”Sam: Yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do, but when push comes to shove, and something's burning down, a file has been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really, kind of, care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery.Corey: It's been over a decade, and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server; it was an iPhone setting that said, “Oh, yeah, automatically delete everything in your trash folder—or archive folder—after 30 days.” It was just a weird default setting back in that era. I didn't realize it was doing that. Yeah, painful stuff.And we learned the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while it still bugs me. Which gets speaks to the point that the people who are the most fanatical about backing things up are the people who have been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me.Sam: Yeah, yeah. It's a good point. I could remember a few years ago, my wife migrated a very aging, polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good—Corey: As the white polycarbonate Mac becomes yellow, then yeah, all right, you know, it's time to replace it. Yeah. So yeah, so she wiped the drive, and what happened?Sam: That was her moment where she learned the value and importance of backup unless she backs everything up now. I fortunately have never gone through it. But I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course.Corey: Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital packrat where her hard drives on her laptop will periodically fill up. And I used to take the approach of oh, you can be more efficient and do the rest. And I realized no, telling other people they're doing it wrong is generally poor practice, whereas just buying bigger drives is way easier. Let's go ahead and do that. It's small price to pay for domestic tranquility.And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked; you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop?Sam: Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing, and, you know, a lot of folks were there trying to utilize agents to protect their vSphere instances, just like they were doing with physical Windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered; it was this agentless backup, image-based backup for vSphere.And, of course, you know, in the last 15 years, we've seen lots of transitions, of course, we're here at Screaming in the Cloud, with you, Corey, so AWS, as well as a number of other public cloud vendors we can help protect as well, as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So, Veeam's really kind of come a long way from just virtual machines to really taking a global look at the entirety of modern environments, and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole?Corey: It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. And I'm one of them, let's be very clear here. It comes down to a problem where backups are a multifaceted, deep problem, and I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data?Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So, what service am I using, in which account am I using, and in what region am I using it, and you wind up with data sprawl, where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so what am I being charged, and for what service?That seems like a good place to start, but where is it getting backed up? How do you think about that? So, some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to backup absolutely everything, and we're going to keep that data for the rest of our natural lives. It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes.Sam: Yeah, snapshot sprawl is a real thing, and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot, but when you're dealing with thousands of snapshots for thousands machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.So, you know, where to start is automating the lifecycle of a snapshot, right, from its creation—how often do we want to be creating them—from the retention—how long do we want to keep these for—and where do we want to keep them because there are other storage services outside of just EBS volumes. And then, of course, the ultimate: deletion. And that's important even from a compliance perspective as well, right? You've got to retain data for a specific number of years, I think healthcare is like seven years, but then you've—Corey: And then not a day more.Sam: Yeah, and then not a day more because that puts you out of compliance, too. So, policy-based automation is your friend and we see a number of folks building these policies out: gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup.Corey: What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere and virtualization was not really in play, so we had file systems on physical disks. And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some ruleset that it has, it copies them off somewhere else.It was slow, it was fraught, it had a whole bunch of logic that was pushed out to the very edge, and forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in the state of flux, where anything changing during the backup run would leave your backups in an inconsistent state. That on some level seems to have largely been solved by snapshots. But what's your take on it? You're a lot closer to this part of the world than I am.Sam: Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance, and again, just the ease of accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So, I think the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly and we all of a sudden are stranded with a crash-consistent backup or snapshot versus an application-consistent snapshot. And then they're also very, very quick to recover from.So, snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other; it's that they've got to go hand-in-hand with something else. And typically, that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.Corey: One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time back—once upon a time when we ran MySQL databases on top of the NetApp Filer—which works surprisingly well—we would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file and then un-quiesce it, which took less than a second, start to finish. And that was awesome, but then you had this snapshot type of thing. It wasn't super portable, it needed to reference a previous snapshot in some cases, and AWS takes the same approach where the first snapshot it captures every block, then subsequent snapshots wind up only taking up as much size as there have been changes since the first snapshots. So, large quantities of data that generally don't get access to a whole lot have remarkably small, subsequent snapshot sizes.But that's not at all obvious from the outside, and looking at these things. They're not the most portable thing in the world. But it's definitely the direction that the industry has trended in. So, rather than having a cron job fire off an AWS API call to take snapshots of my volumes as a sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, “Well, cron jobs are hard and we have a friendlier interface for that.”Sam: [laugh]. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at, and what they are good for within your environment—as previously mentioned, low RPOs, low RTOs, how quickly can I take a backup, how frequently can I take a backup, and more importantly, how quickly can I restore—but then looking at their limitations. So, I mentioned that they were not independent of that disk, so that certainly does introduce a single point of failure as well as being not so secure. We've kind of touched on the cost component of that as well. So, what Veeam can come in and do is then take an image-based backup of those snapshots, right—so you've got your initial snapshot and then your incremental ones—we'll take the backup from that snapshot, and then we'll start to store that elsewhere.And that is likely going to be in a different account. We can look at the Well-Architected Framework, AWS deeming accounts as a security boundary, so having that cross-account function is critically important so you don't have that single point of failure. Locking down with IAM roles is also incredibly important so we haven't just got a big wide open door between the two. But that data is then stored in a separate account—potentially in a separate region, maybe in the same region—Amazon S3 storage. And S3 has the wonderful benefit of being still relatively performant, so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month, instead of—Corey: To start, and it goes down from there with sizeable volumes.Sam: Absolutely, yeah. You can go down to S3 Glacier, where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month, but it's going to take you a couple of days to recover that da—Corey: Even infrequent access cuts that in half.Sam: Oh yeah.Corey: And let's be clear, these are snapshot backups; you probably should not be accessing them on a consistent, sustained basis.Sam: Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying you must keep this data for this length of time. Keeping that—you know, let's just say it's three years' worth of snapshots in an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years—actually, even two months ago? It's very, very small.So, the performance part of S3 is, you don't need to take it as much into consideration. Can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right, spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it.Corey: I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bill. It is one of their largest single line items. And I want to be very clear here because if one of those people who listen to this and thinking, “Well, hang on. Wait, they're telling stories about us, even though they're not naming us by name?” Yeah, there were three of you in the last quarter.So, at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to as one company when there were three of you. This is a narrative device, not me breaking confidentiality. Disclaimer over. Now, when you talk to people about, “So, tell me why you've got 80 times more snapshots than you do EBS volumes?” The answer is as, “Well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups, then monthly, and so on and so forth. And when this was set up, there wasn't a great way to do this natively and we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do? Well, that cost is almost incalculable. So, this is the safe way to go.” And they're not wrong in anything that they're saying. But the world has definitely evolved since then.Sam: Yeah, yeah. It's a really great point. Again, it just folds back into my whole having your cake and eating it conversation. Yes, you need to retain data; it gives you that kind of nice, warm, cozy feeling, it's a nice blanket on a winter's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier [laugh].Corey: One of the inherent challenges I think people have is that snapshots by themselves are almost worthless, in that I have an EBS snapshot, it is sitting there now, it's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is, and okay, great. Well, I'm looking for a file that was not modified since X date, as it was on this time. Well, great, you're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again, with a different one.And after, like, the fifth or six in a row, you start doing a binary search approach on this thing. But it's expensive, it's time-consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically, backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup.Sam: Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a ten-step process to do, you know, granular file or folder-level recovery from a snapshot, right? You've got to, like you say, you've got to determine the point in time when that, you know, you knew the last time that it was around, then you're going to have to determine the volume size, the region, the OS, you're going to have to create an EBS volume of the same size, region, from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then you got to tear it all down.And that is again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be and then you have to start from scratch from a different point in time. So, backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process but make the whole e-discovery, the search, the location of those files, much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again, automatic process, you can just search for your individual file, folder, locate it, you can download it locally, you can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis—I forget the right terminology for it; some of its AWS, some of its Amazon.But by-the-by, the whole recovery process, especially from a file or folder level, is much more pain-free, but also much faster. And that's ultimately what people care about how reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: Right, the idea of RPO versus RTO: recovery point objective and recovery time objective. With an RPO, it's great, disaster strikes right now, how long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes, sometimes it's measured in fractions of a second. It really depends on what we're talking about. Payments databases, that needs to be—the RPO is basically an asymptotically approaches zero.The RTO is okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there's a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of is it possible for us to recover this data. And that's where I know that you are obviously—you have a position on this that is informed by where you work, but I don't, and I will call this out as what I see in the industry: AWS backup is compelling to me except for one fatal flaw that it has, and that is it starts and stops with AWS.I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times, but the one area where it makes absolute sense to me is backups. Have your data in a rehydrate-the-business level state backed up somewhere that is not your primary cloud provider because you're otherwise single point of failure-ing through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor. There has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what also is expensive? Irrevocably losing the data that powers your business. Is it likely? No, but I would much rather do it than have to justify why I'm not doing it.Sam: Yeah. Wasn't likely that I was going to win that 2 billion or 2.1 billion on the Powerball, but [laugh] I still play [laugh]. But I understand your standpoint on multi-cloud and I read your newsletters and understand where you're coming from, but I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole-sourced on a single cloud and nothing else is relatively small, single-digit percentage. It's around 80-some percent that are hybrid, and the remainder of them are your favorite: multi-cloud.But again, having something that is one hundred percent sole-source on a single platform or a single vendor does expose you to a certain degree of risk. So, having the ability to do cross-platform backups, recoveries, migrations, for whatever reason, right, because it might not just be a disaster like you'd mentioned, it might also just be… I don't know, the company has been taken over and all of a sudden, the preference is now towards another cloud provider and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So, we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds, but the other key piece is that we always use the same backup file format. And why that's key is because it enables portability.If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk, I could copy those into Azure, I could do the same with my Azure VMs and store those on S3, or again, on-premises disk, and any other endless combination that goes with that. And it's really kind of centered around, like control and ownership of your data. We are not prescriptive by any means. Like, you do what is best for your organization. We just want to provide you with the toolset that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be.Corey: One of the big challenges that I keep seeing across the board is just a lack of awareness of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed, but you can create those things at will; why do you care about the actual data that's on these things? It winds up almost at the library management problem, on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub and they were paying for a third-party service that wound up backing up Git repos. Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git, but at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere.It's like, “Really? You want to bet the company, slash your job, slash everyone else's job on that being feasible and doable or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it, and they validate that it works?” And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in different places, but what's really expensive is not having a company anymore.Sam: Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, you know, just like with insurances, there's multiple different ways to go about recovering your data and it's only in crunch time, do you really care about what it is that you've been paying for, right, when it comes to backup?Could you get your backup through a git clone? Absolutely. Could you get your data back—how long is that going to take you? How painful is that going to be? What's going to be the impact to the business where you're trying to figure that out versus, like you say, the 39 bucks a month, a year, or whatever it might be to have something purpose-built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online.Corey: I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that yeah, there is a real fear against data loss. It's not, “People are coming to get you, so you absolutely have to buy whatever it is I'm selling,” but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill. And sometimes that means spending more. Okay, that one S3 bucket is extremely important to you and you say you can't sustain the loss of it ever so one zone is not an option. Where is it being backed up? Oh, it's not? Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing.Sam: Yeah, yeah, it's interesting, and it's going to be hard for you to prove the value of doing that when you are driving their bill up when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour, millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison, so having something that is purpose-built, and is going to protect your data and help mitigate that impact of downtime.Because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. And like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without.Corey: That's the big burning question that I think a lot of people do not have a good answer to. And when you don't have an answer, you either backup everything or nothing. And I'm not a big fan of doing either of those things blindly.Sam: Yeah, absolutely. And I think this is why we see varying different backup options as well, you know? You're not going to try and apply the same data protection policies each and every single workload within your environment because they've all got different types of workload criticality. And like you say, some of them might not even need to be backed up at all, just because they don't have data that needs to be protected. So, you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy, in terms of how frequently do you protect it, where do you store it, how often, or when are you eventually going to delete that and apply that on a workload by workload basis. And this is where the joy of things like tags come into play as well.Corey: One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this—because they're forced to by brevity—is in the title tag of their website. I pull up veeam.com and I hover over the tab in my browser, and it says, “Veeam Software: Modern Data Protection.”And I want to call that out because you're not framing it as explicitly backup. So, the last topic I want to get into is the idea of security. Because I think it is not fully appreciated on a lived-experience basis—although people will of course agree to this when they're having ivory tower whiteboard discussions—that every place your data lives is a potential for a security breach to happen. So, you want to have your data living in a bunch of places ideally, for backup and resiliency purposes. But you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it.How do you balance those trade-offs yourself given that what you're fundamentally saying is, “Trust us with your Holy of Holies when it comes to things that power your entire business?” I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of this stuff to destroy our company.Sam: Yeah. Yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it, and I think that backup absolutely needs to be a—I'm going to say this an air quotes—a “first class citizen” of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out?This is going to be a bit of the FUD that you love, but ultimately, the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You on the other hand, as the good guy with your cape and whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect, and it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup, is really going to be the only thing that is going to save your organization, should that actually happen.And what's key to a secure backup? I would say separation, isolation of backup data from the production data, I would say utilizing things like immutability, so in AWS, we've got Amazon S3 object lock, so it's that write once, read many state for whatever retention period that you put on it. So, the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes is encryption, right? And we can utilize things like AWS KMS for that encryption.But that's there to help defend against the exfiltration attempts. Because these bad guys are realizing, “Hey, people aren't paying me my ransom because they're just recovering from a clean backup, so now I'm going to take that backup data, I'm going to leak the personally identifiable information, trade secrets, or whatever on the internet, and that's going to put them in breach compliance and give them a hefty fine that way unless they pay me my ransom.” So encryption, so they can't read that data. So, not only can they not change it, but they can't read it is equally important. So, I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable.Corey: I think that is one of those areas where people need to put additional levels of thought in. I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not—at least with that account and ideally not you at all personally—have access to alter the backups. Full stop. I would say, on some level, there should not be the ability to alter backups for some particular workloads, the idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear, but if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.Sam: Yeah. Yeah, I mean, we can turn to a number of organizations. Code Spaces always springs to mind for me, I love Code Spaces. It was kind of one of those precursors to—Corey: It's amazing.Sam: Yeah, but they were running on AWS and they had everything, production and backups, all stored in one account. Got into the account. “We're going to delete your data if you don't pay us this ransom.” They were like, “Well, we're not paying you the ransoms. We got backups.” Well, they deleted those, too. And, you know, unfortunately, Code Spaces isn't around anymore. But it really kind of goes to show just the importance of at least logically separating your data across different accounts and not having that god-like access to absolutely everything.Corey: Yeah, when you talked about Code Spaces, I was in [unintelligible 00:32:29] talking about GitHub Codespaces specifically, where they have their developer workstations in the cloud. They're still very much around, at least last time I saw unless you know something I don't.Sam: Precursor to that. I can send you the link—Corey: Oh oh—Sam: You can share it with the listeners.Corey: Oh, yes, please do. I'd love to see that.Sam: Yeah. Yeah, absolutely.Corey: And it's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate you're spending so much time with me. Where can people go to learn more?Sam: Yeah, absolutely. I think veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy, so we always make free product available.And then you can find that on the AWS Marketplace. Simply search ‘Veeam' through there. A number of free products; we don't put time limits on it, we don't put feature limitations. You can backup ten instances, including your VPCs, which we actually didn't talk about today, but I do think is important. But I won't waste any more time on that.Corey: Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology.Sam: Yeah [laugh], that's a real pain. So, we can help protect those VPCs and we actually don't put any limitations on the number of VPCs that you can protect; it's always free. So, if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation, want to learn more, want to speak to someone veeam.com is the place to go.Corey: And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated.Sam: Thank you, Corey, and thanks for all the listeners tuning in today.Corey: Sam Nicholls, Director of Public Cloud at Veeam. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that takes you two hours to type out but then you lose it because you forgot to back it up.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
An airhacks.fm conversation with Mark Sailes (@MarkSailes3) about: CRaC API, C1 and C2 compilers, GraalVM and Random, CRaC and Stateful EJB beans, Lambda SnapStart and snapshotting the Firecracker VM, the CraC resource interface and listener methods, priming the critical path, Quarkus with MicroProfile AWS on Lambda CDK template, Plain Java AWS Lambda with CDK template, SDKs calls in the beforeCheckpoint hook, SnapStart state never leaves the region, SnapStart state is cached in caches within Availability Zones, SnapStart is available within VPCs, only versioned AWS Lambdas can be optimized, Provisioned Concurrency and SnapStart, The Other Feature of AWS Lambda Provisioned Concurrency — Saving Money, A serverless journey: AWS Lambda under the hood provisioned concurrency and EC 2 reserved instances, AWS Lambda function starts at bare metal, Mark Sailes on twitter: @MarkSailes3
About ClintonClinton Herget is Field CTO at Snyk, the leader is Developer Security. He focuses on helping Snyk's strategic customers on their journey to DevSecOps maturity. A seasoned technnologist, Cliton spent his 20-year career prior to Snyk as a web software developer, DevOps consultant, cloud solutions architect, and engineering director. Cluinton is passionate about empowering software engineering to do their best work in the chaotic cloud-native world, and is a frequent conference speaker, developer advocate, and technical thought leader.Links Referenced: Snyk: https://snyk.io/ duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out.Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the fun things about establishing traditions is that the first time you do it, you don't really know that that's what's happening. Almost exactly a year ago, I sat down for a previous promoted guest episode much like this one, With Clinton Herget at Snyk—or Synic; however you want to pronounce that. He is apparently a scarecrow of some sorts because when last we spoke, he was a principal solutions engineer, but like any good scarecrow, he was outstanding in his field, and now, as a result, is a Field CTO. Clinton, Thanks for coming back, and let me start by congratulating you on the promotion. Or consoling you depending upon how good or bad it is.Clinton: You know, Corey, a little bit of column A, a little bit of column B. But very glad to be here again, and frankly, I think it's because you insist on mispronouncing Snyk as Synic, and so you get me again.Corey: Yeah, you could add a couple of new letters to it and just call the company [Synack 00:01:27]. Now, it's a hard pivot to a networking company. So, there's always options.Clinton: I acknowledge what you did there, Corey.Corey: I like that quite a bit. I wasn't sure you'd get it.Clinton: I'm a nerd going way, way back, so we'll have to go pretty deep in the stack for you to stump me on some of this stuff.Corey: As we did with the, “I wasn't sure you'd get it.” See that one sailed right past you. And I win. Chalk another one up for me and the networking pun wars. Great, we'll loop back for that later.Clinton: I don't even know where I am right now.Corey: [laugh]. So, let's go back to a question that one would think that I'd already established a year ago, but I have the attention span of basically a goldfish, let's not kid ourselves. So, as I'm visiting the Snyk website, I find that it says different words than it did a year ago, which is generally a sign that is positive; when nothing's been updated including the copyright date, things are going really well or really badly. One wonders. But no, now you're talking about Snyk Cloud, you're talking about several other offerings as well, and my understanding of what it is you folks do no longer appears to be completely accurate. So, let me be direct. What the hell do you folks do over there?Clinton: It's a really great question. Glad you asked me on a year later to answer it. I would say at a very high level, what we do hasn't changed. However, I think the industry has certainly come a long way in the past couple years and our job is to adapt to that Snyk—again, pronounced like a pair of sneakers are sneaking around—it's a developer security platform. So, we focus on enabling the people who build applications—which as of today, means modern applications built in the cloud—to have better visibility, and ultimately a better chance of mitigating the risk that goes into those applications when it matters most, which is actually in their workflow.Now, you're exactly right. Things have certainly expanded in that remit because the job of a software engineer is very different, I think this year than it even was last year, and that's continually evolving over time. As a developer now, I'm doing a lot more than I was doing a few years ago. And one of the things I'm doing is building infrastructure in the cloud, I'm writing YAML files, I'm writing CloudFormation templates to deploy things out to AWS. And what happens in the cloud has a lot to do with the risk to my organization associated with those applications that I'm building.So, I'd love to talk a little bit more about why we decided to make that move, but I don't think that represents a watering down of what we're trying to do at Snyk. I think it recognizes that developer security vision fundamentally can't exist without some understanding of what's happening in the cloud.Corey: One of the things that always scares me is—and sets the spidey sense tingling—is when I see a company who has a product, and I'm familiar—ish—with what they do. And then they take their product name and slap the word cloud at the end, which is almost always codes to, “Okay, so we took the thing that we sold in boxes in data centers, and now we're making a shitty hosted version available because it turns out you rubes will absolutely pay a subscription for it.” Yeah, I don't get the sense that at all is what you're doing. In fact, I don't believe that you're offering a hosted managed service at the moment, are you?Clinton: No, the cloud part, that fundamentally refers to a new product, an offering that looks at the security or potentially the risks being introduced into cloud infrastructure, by now the engineers who were doing it who are writing infrastructure as code. We previously had an infrastructure-as-code security product, and that served alongside our static analysis tool which is Snyk Code, our open-source tool, our container scanner, recognizing that the kinds of vulnerabilities you can potentially introduce in writing cloud infrastructure are not only bad to the organization on their own—I mean, nobody wants to create an S3 bucket that's wide open to the world—but also, those misconfigurations can increase the blast radius of other kinds of vulnerabilities in the stack. So, I think what it does is it recognizes that, as you and I think your listeners well know, Corey, there's no such thing as the cloud, right? The cloud is just a bunch of fancy software designed to abstract away from the fact that you're running stuff on somebody else's computer, right?Corey: Unfortunately, in this case, the fact that you're calling it Snyk Cloud does not mean that you're doing what so many other companies in that same space do it would have led to a really short interview because I have no faith that it's the right path forward, especially for you folks, where it's, “Oh, you want to be secure? You've got to host your stuff on our stuff instead. That's why we called it cloud.” That's the direction that I've seen a lot of folks try and pivot in, and I always find it disastrous. It's, “Yeah, well, at Snyk if we run your code or your shitty applications here in our environment, it's going to be safer than if you run it yourself on something untested like AWS.” And yeah, those stories hold absolutely no water. And may I just say, I'm gratified that's not what you're doing?Clinton: Absolutely not. No, I would say we have no interest in running anyone's applications. We do want to scan them though, right? We do want to give the developers insight into the potential misconfigurations, the risks, the vulnerabilities that you're introducing. What sets Snyk apart, I think, from others in that application security testing space is we focus on the experience of the developer, rather than just being another tool that runs and generates a bunch of PDFs and then throws them back to say, “Here's everything you did wrong.”We want to say to developers, “Here's what you could do better. Here's how that default in a CloudFormation template that leads to your bucket being, you know, wide open on the internet could be changed. Here's the remediation that you could introduce.” And if we do that at the right moment, which is inside that developer workflow, inside the IDE, on their local machine, before that gets deployed, there's a much greater chance that remediation is going to be implemented and it's going to happen much more cheaply, right? Because you no longer have to do the round trip all the way out to the cloud and back.So, the cloud part of it fundamentally means completing that story, recognizing that once things do get deployed, there's a lot of valuable context that's happening out there that a developer can really take advantage of. They can say, “Wait a minute. Not only do I have a Log4Shell vulnerability, right, in one of my open-source dependencies, but that artifact, that application is actually getting deployed to a VPC that has ingress from the internet,” right? So, not only do I have remote code execution in my application, but it's being put in an enclave that actually allows it to be exploited. You can only know that if you're actually looking at what's really happening in the cloud, right?So, not only does Snyk cloud allows us to provide an additional layer of security by looking at what's misconfigured in that cloud environment and help your developers make remediations by saying, “Here's the actual IAC file that caused that infrastructure to come into existence,” but we can also say, here's how that affects the risk of other kinds of vulnerabilities at different layers in the stack, right? Because it's all software; it's all connected. Very rarely does a vulnerability translate one-to-one into risk, right? They're compound because modern software is compound. And I think what developers lack is the tooling that fits into their workflow that understands what it means to be a software engineer and actually helps them make better choices rather than punishing them after the fact for guessing and making bad ones.Corey: That sounds awesome at a very high level. It is very aligned with how executives and decision-makers think about a lot of these things. Let's get down to brass tacks for a second. Assume that I am the type of developer that I am in real life, by which I mean shitty. What am I going to wind up attempting to do that Snyk will flag and, in other words, protect me from myself and warn me that I'm about to commit a dumb?Clinton: First of all, I would say, look, there's no such thing as a non-shitty developer, right? And I built software for 20 years and I decided that's really hard. What's a lot easier is talking about building software for a living. So, that's what I do now. But fundamentally, the reason I'm at Snyk, is I want to help people who are in the kinds of jobs that I had for a very long time, which is to say, you have a tremendous amount of anxiety because you recognize that the success of the organization rests on your shoulders, and you're making hundreds, if not thousands of decisions every day without the right context to understand fully how the results of that decision is going to affect the organization that you work for.So, I think every developer in the world has to deal with this constant cognitive dissonance of saying, “I don't know that this is right, but I have to do it anyway because I need to clear that ticket because that release needs to get into production.” And it becomes really easy to short-sightedly do things like pull an open-source dependency without checking whether it has any CVEs associated with it because that's the version that's easiest to implement with your code that already exists. So, that's one piece. Snyk Open Source, designed to traverse that entire tree of dependencies in open-source all the way down, all the hundreds and thousands of packages that you're pulling in to say, not only, here's a vulnerability that you should really know is going to end up in your application when it's built, but also here's what you can do about it, right? Here's the upgrade you can make, here's the minimum viable change that actually gets you out of this problem, and to do so when it's in the right context, which is in you know, as you're making that decision for the first time, right, inside your developer environment.That also applies to things like container vulnerabilities, right? I have even less visibility into what's happening inside a container than I do inside my application. Because I know, say, I'm using an Ubuntu or a Red Hat base image. I have no idea, what are all the Linux packages that are on it, let alone what are the vulnerabilities associated with them, right? So, being able to detect, I've got a version of OpenSSL 3.0 that has a potentially serious vulnerability associated with it before I've actually deployed that container out into the cloud very much helps me as a developer.Because I'm limiting the rework or the refactoring I would have to do by otherwise assuming I'm making a safe choice or guessing at it, and then only finding out after I've written a bunch more code that relies on that decision, that I have to go back and change it, and then rewrite all of the things that I wrote on top of it, right? So, it's the identifying the layer in the stack where that risk could be introduced, and then also seeing how it's affected by all of those other layers because modern software is inherently complex. And that complexity is what drives both the risk associated with it, and also things like efficiency, which I know your audience is, for good reason, very concerned about.Corey: I'm going to challenge you on aspect of this because on the tin, the way you describe it, it sounds like, “Oh, I already have something that does that. It's the GitHub Dependabot story where it winds up sending me a litany of complaints every week.” And we are talking, if I did nothing other than read this email in that day, that would be a tremendously efficient processing of that entire thing because so much of it is stuff that is ancient and archived, and specific aspects of the vulnerabilities are just not relevant. And you talk about the OpenSSL 3.0 issues that just recently came out.I have no doubt that somewhere in the most recent email I've gotten from that thing, it's buried two-thirds of the way down, like all the complaints like the dishwasher isn't loaded, you forgot to take the trash out, that baby needs a change, the kitchen is on fire, and the vacuuming, and the r—wait, wait. What was that thing about the kitchen? Seems like one of those things is not like the others. And it just gets lost in the noise. Now, I will admit to putting my thumb a little bit on the scale here because I've used Snyk before myself and I know that you don't do that. How do you avoid that trap?Clinton: Great question. And I think really, the key to the story here is, developers need to be able to prioritize, and in order to prioritize effectively, you need to understand the context of what happens to that application after it gets deployed. And so, this is a key part of why getting the data out of the cloud and bringing it back into the code is so important. So, for example, take an OpenSSL vulnerability. Do you have it on a container image you're using, right? So, that's question number one.Question two is, is there actually a way that code can be accessed from the outside? Is it included or is it called? Is the method activated by some other package that you have running on that container? Is that container image actually used in a production deployment? Or does it just go sit in a registry and no one ever touches it?What are the conditions required to make that vulnerability exploitable? You look at something like Spring Shell, for example, yes, you need a certain version of spring-beans in a JAR file somewhere, but you also need to be running a certain version of Tomcat, and you need to be packaging those JARs inside a WAR in a certain way.Corey: Exactly. I have a whole bunch of Lambda functions that provide the pipeline system that I use to build my newsletter every week, and I get screaming concerns about issues in, for example, a version of the markdown parser that I've subverted. Yeah, sure. I get that, on some level, if I were just giving it random untrusted input from the internet and random ad hoc users, but I'm not. It's just me when I write things for that particular Lambda function.And I'm not going to be actively attempting to subvert the thing that I built myself and no one else should have access to. And looking through the details of some of these things, it doesn't even apply to the way that I'm calling the libraries, so it's just noise, for lack of a better term. It is not something that basically ever needs to be adjusted or fixed.Clinton: Exactly. And I think cutting through that noise is so key to creating developer trust in any kind of tool that scanning an asset and providing you what, in theory, are a list of actionable steps, right? I need to be able to understand what is the thing, first of all. There's a lot of tools that do that, right, and we tend to mock them by saying things like, “Oh, it's just another PDF generator. It's just another thousand pages that you're never going to read.”So, getting the information in the right place is a big part of it, but filtering out all of the noise by saying, we looked at not just one layer of the stack, but multiple layers, right? We know that you're using this open-source dependency and we also know that the method that contains the vulnerability is actively called by your application in your first-party code because we ran our static analysis tool against that. Furthermore, we know because we looked at your cloud context, we connected to your AWS API—we're big partners with AWS and very proud of that relationship—but we can tell that there's inbound internet access available to that service, right? So, you start to build a compound case that maybe this is something that should be prioritized, right? Because there's a way into the asset from the outside world, there's a way into the vulnerable functions through the labyrinthine, you know, spaghetti of my code to get there, and the conditions required to exploit it actually exist in the wild.But you can't just run a single tool; you can't just run Dependabot to get that prioritization. You actually have to look at the entire holistic application context, which includes not just your dependencies, but what's happening in the container, what's happening in your first-party, your proprietary code, what's happening in your IAC, and I think most importantly for modern applications, what's actually happening in the cloud once it gets deployed, right? And that's sort of the holy grail of completing that loop to bring the right context back from the cloud into code to understand what change needs to be made, and where, and most importantly why. Because it's a priority that actually translates into organizational risk to get a developer to pay attention, right? I mean, that is the key to I think any security concern is how do you get engineering mindshare and trust that this is actually what you should be paying attention to and not a bunch of rework that doesn't actually make your software more secure?Corey: One of the challenges that I see across the board is that—well, let's back up a bit here. I have in previous episodes talked in some depth about my position that when it comes to the security of various cloud providers, Google is number one, and AWS is number two. Azure is a distant third because it figures out what Crayons tastes the best; I don't know. But the reason is not because of any inherent attribute of their security models, but rather that Google massively simplifies an awful lot of what happens. It automatically assumes that resources in the same project should be able to talk to one another, so I don't have to painstakingly configure that.In AWS-land, all of this must be done explicitly; no one has time for that, so we over-scope permissions massively and never go back and rein them in. It's a configuration vulnerability more than an underlying inherent weakness of the platform. Because complexity is the enemy of security in many respects. If you can't fit it all in your head to reason about it, how can you understand the security ramifications of it? AWS offers a tremendous number of security services. Many of them, when taken in some totality of their pricing, cost more than any breach, they could be expected to prevent. Adding more stuff that adds more complexity in the form of Snyk sounds like it's the exact opposite of what I would want to do. Change my mind.Clinton: I would love to. I would say, fundamentally, I think you and I—and by ‘I,' I mean Snyk and you know, Corey Quinn Enterprises Limited—I think we fundamentally have the same enemy here, right, which is the cyclomatic complexity of software, right, which is how many different pathways do the bits have to travel down to reach the same endpoint, right, the same goal. The more pathways there are, the more risk is introduced into your software, and the more inefficiency is introduced, right? And then I know you'd love to talk about how many different ways is there to run a container on AWS, right? It's either 30 or 400 or eleventy-million.I think you're exactly right that that complexity, it is great for, first of all, selling cloud resources, but also, I think, for innovating, right, for building new kinds of technology on top of that platform. The cost that comes along with that is a lack of visibility. And I think we are just now, as we approach the end of 2022 here, coming to recognize that fundamentally, the complexity of modern software is beyond the ability of a single engineer to understand. And that is really important from a security perspective, from a cost control perspective, especially because software now creates its own infrastructure, right? You can't just now secure the artifact and secure the perimeter that it gets deployed into and say, “I've done my job. Nobody can breach the perimeter and there's no vulnerabilities in the thing because we scanned it and that thing is immutable forever because it's pets, not cattle.”Where I think the complexity story comes in is to recognize like, “Hey, I'm deploying this based on a quickstart or CloudFormation template that is making certain assumptions that make my job easier,” right, in a very similar way that choosing an open-source dependency makes my job easier as a developer because I don't have to write all of that code myself. But what it does mean is I lack the visibility into, well hold on. How many different pathways are there for getting things done inside this dependency? How many other dependencies are brought on board? In the same way that when I create an EKS cluster, for example, from a CloudFormation template, what is it creating in the background? How many VPCs are involved? What are the subnets, right? How are they connected to each other? Where are the potential ingress points?So, I think fundamentally, getting visibility into that complexity is step number one, but understanding those pathways and how they could potentially translate into risk is critically important. But that prioritization has to involve looking at the software holistically and not just individual layers, right? I think we lose when we say, “We ran a static analysis tool and an open-source dependency scanner and a container scanner and a cloud config checker, and they all came up green, therefore the software doesn't have any risks,” right? That ignores the fundamental complexity in that all of these layers are connected together. And from an adversaries perspective, if my job is to go in and exploit software that's hosted in the cloud, I absolutely do not see the application model that way.I see it as it is inherently complex and that's a good thing for me because it means I can rely on the fact that those engineers had tremendous anxiety, we're making a lot of guesses, and crossing their fingers and hoping something would work and not be exploitable by me, right? So, the only way I think we get around that is to recognize that our engineers are critical stakeholders in that security process and you fundamentally lack that visibility if you don't do your scanning until after the fact. If you take that traditional audit-based approach that assumes a very waterfall, legacy approach to building software, and recognize that, hey, we're all on this infinite loop race track now. We're deploying every three-and-a-half seconds, everything's automated, it's all built at scale, but the ability to do that inherently implies all of this additional complexity that ultimately will, you know, end up haunting me, right? If I don't do anything about it, to make my engineer stakeholders in, you know, what actually gets deployed and what risks it brings on board.Corey: This episode is sponsored in part by our friends at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution that prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. My thanks to them for sponsoring my ridiculous nonsense.Corey: When I wind up hearing you talk about this—I'm going to divert us a little bit because you're dancing around something that it took me a long time to learn. When I first started fixing AWS bills for a living, I thought that it would be mostly math, by which I mean arithmetic. That's the great secret of cloud economics. It's addition, subtraction, and occasionally multiplication and division. No, turns out it's much more psychology than it is math. You're talking in many aspects about, I guess, what I'd call the psychology of a modern cloud engineer and how they think about these things. It's not a technology problem. It's a people problem, isn't it?Clinton: Oh, absolutely. I think it's the people that create the technology. And I think the longer you persist in what we would call the legacy viewpoint, right, not recognizing what the cloud is—which is fundamentally just software all the way down, right? It is abstraction layers that allow you to ignore the fact that you're running stuff on somebody else's computer—once you recognize that, you realize, oh, if it's all software, then the problems that it introduces are software problems that need software solutions, which means that it must involve activity by the people who write software, right? So, now that you're in that developer world, it unlocks, I think, a lot of potential to say, well, why don't developers tend to trust the security tools they've been provided with, right?I think a lot of it comes down to the question you asked earlier in terms of the noise, the lack of understanding of how those pieces are connected together, or the lack of context, or not even frankly, caring about looking beyond the single-point solution of the problem that solution was designed to solve. But more importantly than that, not recognizing what it's like to build modern software, right, all of the decisions that have to be made on a daily basis with very limited information, right? I might not even understand where that container image I'm building is going in the universe, let alone what's being built on top of it and how much critical customer data is being touched by the database, that that container now has the credentials to access, right? So, I think in order to change anything, we have to back way up and say, problems in the cloud or software problems and we have to treat them that way.Because if we don't if we continue to represent the cloud as some evolution of the old environment where you just have this perimeter that's pre-existing infrastructure that you're deploying things onto, and there's a guy with a neckbeard in the basement who is unplugging cables from a switch and plugging them back in and that's how networking problems are solved, I think you missed the idea that all of these abstraction layers introduced the very complexity that needs to be solved back in the build space. But that requires visibility into what actually happens when it gets deployed. The way I tend to think of it is, there's this firewall in place. Everybody wants to say, you know, we're doing DevOps or we're doing DevSecOps, right? And that's a lie a hundred percent of the time, right? No one is actually, I think, adhering completely to those principles.Corey: That's why one of the core tenets of ClickOps is lying about doing anything in the console.Clinton: Absolutely, right? And that's why shadow IT becomes more and more prevalent the deeper you get into modern development, not less and less prevalent because it's fundamentally hard to recognize the entirety of the potential implications, right, of a decision that you're making. So, it's a lot easier to just go in the console and say, “Okay, I'm going to deploy one EC2 to do this. I'm going to get it right at some point.” And that's why every application that's ever been produced by human hands has a comment in it that says something like, “I don't know why this works but it does. Please don't change it.”And then three years later because that developer has moved on to another job, someone else comes along and looks at that comment and says, “That should really work. I'm going to change it.” And they do and everything fails, and they have to go back and fix it the original way and then add another comment saying, “Hey, this person above me, they were right. Please don't change this line.” I think every engineer listening right now knows exactly where that weak spot is in the applications that they've written and they're terrified of that.And I think any tool that's designed to help developers fundamentally has to get into the mindset, get into the psychology of what that is, like, of not fundamentally being able to understand what those applications are doing all of the time, but having to write code against them anyway, right? And that's what leads to, I think, the fear that you're going to get woken up because your pager is going to go off at 3 a.m. because the building is literally on fire and it's because of code that you wrote. We have to solve that problem and it has to be those people who's psychology we get into to understand, how are you working and how can we make your life better, right? And I really do think it comes with that the noise reduction, the understanding of complexity, and really just being humble and saying, like, “We get that this job is really hard and that the only way it gets better is to begin admitting that to each other.”Corey: I really wish that there were a better way to articulate a lot of these things. This the reason that I started doing a security newsletter; it's because cost and security are deeply aligned in a few ways. One of them is that you care about them a lot right after you failed to care about them sufficiently, but the other is that you've got to build guardrails in such a way that doing the right thing is easier than doing it the wrong way, or you're never going to gain any traction.Clinton: I think that's absolutely right. And you use the key term there, which is guardrails. And I think that's where in their heart of hearts, that's where every security professional wants to be, right? They want to be defining policy, they want to be understanding the risk posture of the organization and nudging it in a better direction, right? They want to be talking up to the board, to the executive team, and creating confidence in that risk posture, rather than talking down or off to the side—depending on how that org chart looks—to the engineers and saying, “Fix this, fix that, and then fix this other thing.” A, B, and C, right?I think the problem is that everyone in a security role or an organization of any size at this point, is doing 90% of the latter and only about 10% of the former, right? They're acting as gatekeepers, not as guardrails. They're not defining policy, they're spending all of their time creating Jira tickets and all of their time tracking down who owns the piece of code that got deployed to this pod on EKS that's throwing all these errors on my console, and how can I get the person to make a decision to actually take an action that stops these notifications from happening, right? So, all they're doing is throwing footballs down the field without knowing if there's a receiver there, right, and I think that takes away from the job that our security analysts really shouldn't be doing, which is creating those guardrails, which is having confidence that the policy they set is readily understood by the developers making decisions, and that's happening in an automated way without them having to create friction by bothering people all the time. I don't think security people want to be [laugh] hated by the development teams that they work with, but they are. And the reason they are is I think, fundamentally, we lack the tooling, we lack—Corey: They are the barrier method.Clinton: Exactly. And we lacked the processes to get the right intelligence in a way that's consumable by the engineers when they're doing their job, and not after the fact, which is typically when the security people have done their jobs.Corey: It's sad but true. I wish that there were a better way to address these things, and yet here we are.Clinton: If only there were better way to address these things.Corey: [laugh].Clinton: Look, I wouldn't be here at Snyk if I didn't think there were a better way, and I wouldn't be coming on shows like yours to talk to the engineering communities, right, people who have walked the walk, right, who have built those Terraform files that contain these misconfigurations, not because they're bad people or because they're lazy, or because they don't do their jobs well, but because they lacked the visibility, they didn't have the understanding that that default is actually insecure. Because how would I know that otherwise, right? I'm building software; I don't see myself as an expert on infrastructure, right, or on Linux packages or on cyclomatic complexity or on any of these other things. I'm just trying to stay in my lane and do my job. It's not my fault that the software has become too complex for me to understand, right?But my management doesn't understand that and so I constantly have white knuckles worrying that, you know, the next breach is going to be my fault. So, I think the way forward really has to be, how do we make our developers stakeholders in the risk being introduced by the software they write to the organization? And that means everything we've been talking about: it means prioritization; it means understanding how the different layers of the stack affect each other, especially the cloud pieces; it means an extensible platform that lets me write code against it to inject my own reasoning, right? The piece that we haven't talked about here is that risk calculation doesn't just involve technical aspects, there's also business intelligence that's involved, right? What are my critical applications, right, what actually causes me to lose significant amounts of money if those services go offline?We at Snyk can't tell that. We can't run a scanner to say these are your crown jewel services that can't ever go down, but you can know that as an organization. So, where we're going with the platform is opening up the extensible process, creating APIs for you to be able to affect that risk triage, right, so that as the creators have guardrails as the security team, you are saying, “Here's how we want our developers to prioritize. Here are all of the factors that go into that decision-making.” And then you can be confident that in their environment, back over in developer-land, when I'm looking at IntelliJ, or, you know, or on my local command line, I am seeing the guardrails that my security team has set for me and I am confident that I'm fixing the right thing, and frankly, I'm grateful because I'm fixing it at the right time and I'm doing it in such a way and with a toolset that actually is helping me fix it rather than just telling me I've done something wrong, right, because everything we do at Snyk focuses on identifying the solution, not necessarily identifying the problem.It's great to know that I've got an unencrypted S3 bucket, but it's a whole lot better if you give me the line of code and tell me exactly where I have to copy and paste it so I can go on to the next thing, rather than spending an hour trying to figure out, you know, where I put that line and what I actually have to change it to, right? I often say that the most valuable currency for a developer, for a software engineer, it's not money, it's not time, it's not compute power or anything like that, it's the right context, right? I actually have to understand what are the implications of the decision that I'm making, and I need that to be in my own environment, not after the fact because that's what creates friction within an organization is when I could have known earlier and I could have known better, but instead, I had to guess I had to write a bunch of code that relies on the thing that was wrong, and now I have to redo it all for no good reason other than the tooling just hadn't adapted to the way modern software is built.Corey: So, one last question before we wind up calling it a day here. We are now heavily into what I will term pre:Invent where we're starting to see a whole bunch of announcements come out of the AWS universe in preparation for what I'm calling Crappy Cloud Hanukkah this year because I'm spending eight nights in Las Vegas. What are you doing these days with AWS specifically? I know I keep seeing your name in conjunction with their announcements, so there's something going on over there.Clinton: Absolutely. No, we're extremely excited about the partnership between Snyk and AWS. Our vulnerability intelligence is utilized as one of the data sources for AWS Inspector, particularly around open-source packages. We're doing a lot of work around things like the code suite, building Snyk into code pipeline, for example, to give developers using that code suite earlier visibility into those vulnerabilities. And really, I think the story kind of expands from there, right?So, we're moving forward with Amazon, recognizing that it is, you know, sort of the de facto. When we say cloud, very often we mean AWS. So, we're going to have a tremendous presence at re:Invent this year, I'm going to be there as well. I think we're actually going to have a bunch of handouts with your face on them is my understanding. So, please stop by the booth; would love to talk to folks, especially because we've now released the Snyk Cloud product and really completed that story. So, anything we can do to talk about how that additional context of the cloud helps engineers because it's all software all the way down, those are absolutely conversations we want to be having.Corey: Excellent. And we will, of course, put links to all of these things in the [show notes 00:35:00] so people can simply click, and there they are. Thank you so much for taking all this time to speak with me. I appreciate it.Clinton: All right. Thank you so much, Corey. Hope to do it again next year.Corey: Clinton Herget, Field CTO at Snyk. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment telling me that I'm being completely unfair to Azure, along with your favorite tasting color of Crayon.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
An airhacks.fm conversation with Victor Orozco (@tuxtor) about: focus on Jakarta EE and devops, faster release cycles, Apache Cactus - the test container, daily releases and DevOps challenges, the perfect Sun servers, the deprecated Java EE deployment J2EE API, JSR 88: Java EE Application Deployment, onboarding of new developers is harder today, lean Java EE code is reusable in serverless world, Heroku and openshift started the serverless movement, blog post: How To Push Java EE 6 Applications To The Cloud In 5 Minutes, portability of Java workloads in the clouds, kubernetes vs. docker Compose, the costs of the clouds, or Kubernetes vs. serverless, Kubernetes on linode, Kubernetes is a monolith in the cloud, running private VPCs, the payara cloud and the rethinking of clustering, back to efficient monoliths, the plain Quarkus CDK lambda template, a quarkus AWS Lambda looks like an old Glassfish application, buying CPU with RAM, Java's dependencies are easy to manage, Java's serverless comeback, Victor Orozco on twitter: @tuxtor, Victor's company: nabenik
About CaseyCasey spends his days leveraging AWS to help organizations improve the speed at which they deliver software. With a background in software development, he has spent the past 20 years architecting, building, and supporting software systems for organizations ranging from startups to Fortune 500 enterprises.Links Referenced: “17 Ways to Run Containers in AWS”: https://www.lastweekinaws.com/blog/the-17-ways-to-run-containers-on-aws/ “17 More Ways to Run Containers on AWS”: https://www.lastweekinaws.com/blog/17-more-ways-to-run-containers-on-aws/ kubernetestheeasyway.com: https://kubernetestheeasyway.com snark.cloud/quinntainers: https://snark.cloud/quinntainers ECS Chargeback: https://github.com/gaggle-net/ecs-chargeback twitter.com/nektos: https://twitter.com/nektos TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I had the pleasure of meeting at re:Invent last year, but we'll get to that story in a minute. Casey Lee is the CTO with a company called Gaggle, which is—as they frame it—saving lives. Now, that seems to be a relatively common position that an awful lot of different tech companies take. “We're saving lives here.” It's, “You show banner ads and some of them are attack platforms for JavaScript malware. Let's be serious here.” Casey, thank you for joining me, and what makes the statement that Gaggle saves lives not patently ridiculous?Casey: Sure. Thanks, Corey. Thanks for having me on the show. So Gaggle, we're ed-tech company. We sell software to school districts, and school districts use our software to help protect their students while the students use the school-issued Google or Microsoft accounts.So, we're looking for signs of bullying, harassment, self-harm, and potentially suicide from K-12 students while they're using these platforms. They will take the thoughts, concerns, emotions they're struggling with and write them in their school-issued accounts. We detect that and then we notify the school districts, and they get the students the help they need before they can do any permanent damage to themselves. We protect about 6 million students throughout the US. We ingest a lot of content.Last school year, over 6 billion files, about the equal number of emails ingested. We're looking for concerning content and then we have humans review the stuff that our machine learning algorithms detect and flag. About 40 million items had to go in front of humans last year, resulted in about 20,000 what we call PSSes. These are Possible Student Situations where students are talking about harming themselves or harming others. And that resulted in what we like to track as lives saved. 1400 incidents last school year where a student was dealing with suicide ideation, they were planning to take their own lives. We detect that and get them help within minutes before they can act on that. That's what Gaggle has been doing. We're using tech, solving tech problems, and also saving lives as we do it.Corey: It's easy to lob a criticism at some of the things you're alluding to, the idea of oh, you're using machine learning on student data for young kids, yadda, yadda, yadda. Look at the outcome, look at the privacy controls you have in place, and look at the outcomes you're driving to. Now, I don't necessarily trust the number of school administrations not to become heavy-handed and overbearing with it, but let's be clear, that's not the intent. That is not what the success stories you have alluded to. I've got to say I'm a fan, so thanks for doing what you're doing. I don't say that very often to people who work in tech companies.Casey: Cool. Thanks, Corey.Corey: But let's rewind a bit because you and I had passed like ships in the night on Twitter for a while, but last year at re:Invent something odd happened. First, my business partner procrastinated at getting his ticket—that's not the odd part; he does that a lot—but then suddenly ticket sales slammed shut and none were to be had anywhere. You reached out with a, “Hey, I have a spare ticket because someone can't go. Let me get it to you.” And I said, “Terrific. Let me pay you for the ticket and take you to dinner.”You said, “Yes on the dinner, but I'd rather you just look at my AWS bill and don't worry about the cost of the ticket.” “All right,” said I. I know a deal when I see one. We grabbed dinner at the Venetian. I said, “Bust out your laptop.” And you said, “Oh, I was kidding.” And I said, “Great. I wasn't. Bust it out.”And you went from laughing to taking notes in about the usual time that happens when I start looking at these things. But how was your recollection of that? I always tend to romanticize some of these things. Like, “And then everyone's restaurant just turned, stopped, and clapped the entire time.” Maybe that part didn't happen.Casey: Everything was right up until the clapping part. That was a really cool experience. I appreciate you walking through that with me. Yeah, we've got lots of opportunity to save on our AWS bill here at Gaggle, and in that little bit of time that we had together, I think I walked away with no more than a dozen ideas for where to shave some costs. The most obvious one, the first thing that you keyed in on, is we had RIs coming due that weren't really well-optimized and you steered me towards savings plans. We put that in place and we're able to apply those savings plans not just to our EC2 instances but also to our serverless spend as well.So, that was a very worthwhile and cost-effective dinner for us. The thing that was most surprising though, Corey, was your approach. Your approach to how to review our bill was not what I thought at all.Corey: Well, what did you expect my approach was going to be? Because this always is of interest to me. Like, do you expect me to, like, whip a portable machine learning rig out of my backpack full of GPUs or something?Casey: I didn't know if you had, like, some secret tool you were going to hit, or if nothing else, I thought you were going to go for the Cost Explorer. I spend a lot of time in Cost Explorer, that's my go-to tool, and you wanted nothing to do with Cost Exp—I think I was actually pulling up Cost Explorer for you and you said, “I'm not interested. Take me to the bills.” So, we went right to the billing dashboard, you started opening up the invoices, and I thought to myself, “I don't remember the last time I looked at an AWS invoice.” I just, it's noise; it's not something that I pay attention to.And I learned something, that you get a real quick view of both the cost and the usage. And that's what you were keyed in on, right? And you were looking at things relative to each other. “Okay, I have no idea about Gaggle or what they do, but normally, for a company that's spending x amount of dollars in EC2, why is your data transfer cost the way it is? Is that high or low?” So, you're looking for kind of relative numbers, but it was really cool watching you slice and dice that bill through the dashboard there.Corey: There are a few things I tie together there. Part of it is that this is sort of a surprising thing that people don't think about but start with big numbers first, rather than going alphabetically because I don't really care about your $6 Alexa for Business spend. I care a bit more about the $6 million, or whatever it happens to be at EC2—I'm pulling numbers completely out of the ether, let's be clear; I don't recall what the exact magnitude of your bill is and it's not relevant to the conversation.And then you see that and it's like, “Huh. Okay, you're spending $6 million on EC2. Why are you spending 400 bucks on S3? Seems to me that those two should be a little closer aligned. What's the deal here? Oh, God, you're using eight petabytes of EBS volumes. Oh, dear.”And just, it tends to lead to interesting stuff. Break it down by region, service, and use case—or usage type, rather—is what shows up on those exploded bills, and that's where I tend to start. It also is one of the easiest things to wind up having someone throw into a PDF and email my way if I'm not doing it in a restaurant with, you know, people clapping standing around.Casey: [laugh]. Right.Corey: I also want to highlight that you've been using AWS for a long time. You're a Container Hero; you are not bad at understanding the nuances and depths of AWS, so I take praise from you around this stuff as valuing it very highly. This stuff is not intuitive, it is deeply nuanced, and you have a business outcome you are working towards that invariably is not oriented day in day out around, “How do I get these services for less money than I'm currently paying?” But that is how I see the world and I tend to live in a very different space just based on the nature of what I do. It's sort of a case study and the advantage of specialization. But I know remarkably little about containers, which is how we wound up reconnecting about a week or so before we did this recording.Casey: Yeah. I saw your tweet; you were trying to run some workload—container workload—and I could hear the frustration on the other end of Twitter when you were shaking your fist at—Corey: I should not tweet angrily, and I did in this case. And, eh, every time I do I regret it. But it played well with the people, so that does help. I believe my exact comment was, “‘me: I've got this container. Run it, please.' ‘Google Cloud: Run. You got it, boss.' AWS has 17 ways to run containers and they all suck.”And that's painting with an overly broad brush, let's be clear, but that was at the tail end of two or three days of work trying to solve a very specific, very common, business problem, that I was just beating my head off of a wall again and again and again. And it took less than half an hour from start to finish with Google Cloud Run and I didn't have to think about it anymore. And it's one of those moments where you look at this and realize that the future is here, we just don't see it in certain ways. And you took exception to this. So please, let's dive in because 280 characters of text after half a bottle of wine is not the best context to have a nuanced discussion that leaves friendships intact the following morning.Casey: Nice. Well, I just want to make sure I understand the use case first because I was trying to read between the lines on what you needed, but let me take a guess. My guess is you got your source code in GitHub, you have a Docker file, and you want to be able to take that repo from GitHub and just have it continuously deployed somewhere in Run. And you don't want to have headaches with it; you just want to push more changes up to GitHub, Docker Build runs and updates some service somewhere. Am I right so far?Corey: Ish, but think a little further up the stack. It was in service of this show. So, this show, as people who are listening to this are probably aware by this point, periodically has sponsors, which we love: We thank them for participating in the ongoing support of this show, which empowers conversations like this. Sometimes a sponsor will come to us with, “Oh, and here's the URL we want to give people.” And it's, “First, you misspelled your company name from the common English word; there are three sublevels within the domain, and then you have a complex UTM tagging tracking co—yeah, you realize people are driving to work when they're listening to this?”So, I've built a while back a link shortener, snark.cloud because is it the shortest thing in the world? Not really, but it's easily understandable when I say that, and people hear it for what it is. And that's been running for a long time as an S3 bucket with full of redirects, behind CloudFront. So, I wind up adding a zero-byte object with a redirect parameter on it, and it just works.Now, the challenge that I have here as a business is that I am increasingly prolific these days. So, anything that I am not directly required to be doing, I probably shouldn't necessarily be the one to do it. And care and feeding of those redirect links is a prime example of this. So, I went hunting, and the things that I was looking for were, obviously, do the redirect. Now, if you pull up GitHub, there are hundreds of solutions here.There are AWS blog posts. One that I really liked and almost got working was Eric Johnson's three-part blog post on how to do it serverlessly, with API Gateway, and DynamoDB, no Lambdas required. I really liked aspects of what that was, but it was complex, I kept smacking into weird challenges as I went, and front end is just baffling to me. Because I needed a front end app for people to be able to use here; I need to be able to secure that because it turns out that if you just have a, anyone who stumbles across the URL can redirect things to other places, well, you've just empowered a whole bunch of spam email, and you're going to find that service abused, and everyone starts blocking it, and then you have trouble. Nothing lasts the first encounter with jerks.And I was getting more and more frustrated, and then I found something by a Twitter engineer on GitHub, with a few creative search terms, who used to work at Google Cloud. And what it uses as a client is it doesn't build any kind of custom web app. Instead, as a database, it uses not S3 objects, not Route 53—the ideal database—but a Google sheet, which sounds ridiculous, but every business user here knows how to use that.Casey: Sure.Corey: And it looks for the two columns. The first one is the slug after the snark.cloud, and the second is the long URL. And it has a TTL of five seconds on cache, so make a change to that spreadsheet, five seconds later, it's live. Everyone gets it, I don't have to build anything new, I just put it somewhere around the relevant people can access it, I gave him a tutorial and a giant warning on it, and everyone gets that. And it just works well. It was, “Click here to deploy. Follow the steps.”And the documentation was a little, eh, okay, I had to undo it once and redo it again. Getting the domain registered was getting—ported over took a bit of time, and there were some weird SSL errors as the certificates were set up, but once all of that was done, it just worked. And I tested the heck out of it, and cold starts are relatively low, and the entire thing fits within the free tier. And it is reminiscent of the magic that I first saw when I started working with some of the cloud providers services, years ago. It's been a long time since I had that level of delight with something, especially after three days of frustration. It's one of the, “This is a great service. Why are people not shouting about this from the rooftops?” That was my perspective. And I put it out on Twitter and oh, Lord, did I get comments. What was your take on it?Casey: Well, so my take was, when you're evaluating a platform to use for running your applications, how fast it can get you to Hello World is not necessarily the best way to go. I just assumed you're wrong. I assumed of the 17 ways AWS has to run containers, Corey just doesn't understand. And so I went after it. And I said, “Okay, let me see if I can find a way that solves his use case, as I understand it, through a quick tweet.”And so I tried to App Runner; I saw that App Runner does not meet your needs because you have to somehow get your Docker image pushed up to a repo. App Runner can take an image that's already been pushed up and deployed for you or it can build from source but neither of those were the way I understood your use case.Corey: Having used App Runner before via the Copilot CLI, it is the closest as best I can tell to achieving what I want. But also let's be clear that I don't believe there's a free tier; there needs to be a load balancer in front of it, so you're starting with 15 bucks a month for this thing. Which is not the end of the world. Had I known at the beginning that all of this was going to be there, I would have just signed up for a bit.ly account and called it good. But here we are.Casey: Yeah. I tried Copilot. Copilot is a great developer experience, but it also is just pulling together tons of—I mean just trying to do a Copilot service deploy, VPCs are being created and tons IAM roles are being created, code pipelines, there's just so much going on. I was like 20 minutes into it, and I said, “Yeah, this is not fitting the bill for what Corey was looking for.” Plus, it doesn't solve my the way I understood your use case, which is you don't want to worry about builds, you just want to push code and have new Docker images get built for you.Corey: Well, honestly, let's be clear here, once it's up and running, I don't want to ever have to touch the silly thing again.Casey: Right.Corey: And that's so far has been the case, after I forked the repo and made a couple of changes to it that I wanted to see. One of them was to render the entire thing case insensitive because I get that one wrong a lot, and the other is I wanted to change the permanent 301 redirect to a temporary 302 redirect because occasionally, sponsors will want to change where it goes in the fullness of time. And that is just fine, but I want to be able to support that and not have to deal with old cached data. So, getting that up and running was a bit of a challenge. But the way that it worked, was following the instructions in the GitHub repo.The developer environment had spun up in the Google's Cloud Shell was just spectacular. It prompted me for a few things and it told me step by step what to do. This is the sort of thing I could have given a basically non-technical user, and they would have had success with it.Casey: So, I tried it as well. I said, “Well, okay, if I'm going to respond to Corey here and challenge him on this, I need to try Cloud Run.” I had no experience with Cloud Run. I had a small example repo that loosely mapped what I understood you were trying to do. Within five minutes, I had Cloud Run working.And I was surprised anytime I pushed a new change, within 45 seconds the change was built and deployed. So, here's my conclusion, Corey. Google Cloud Run is great for your use case, and AWS doesn't have the perfect answer. But here's my challenge to you. I think that you just proved why there's 17 different ways to run containers on AWS, is because there's that many different types of users that have different needs and you just happen to be number 18 that hasn't gotten the right attention yet from AWS.Corey: Well, let's be clear, like, my gag about 17 ways to run containers on AWS was largely a joke, and it went around the internet three times. So, I wrote a list of them on the blog post of “17 Ways to Run Containers in AWS” and people liked it. And then a few months later, I wrote “17 More Ways to Run Containers on AWS” listing 17 additional services that all run containers.And my favorite email that I think I've ever received in feedback was from a salty AWS employee, saying that one of them didn't really count because of some esoteric reason. And it turns out that when I'm trying to make a point of you have a sarcastic number of ways to run containers, pointing out that well, one of them isn't quite valid, doesn't really shatter the argument, let's be very clear here. So, I appreciate the feedback, I always do. And it's partially snark, but there is an element of truth to it in that customers don't want to run containers, by and large. That is what they do in service of a business goal.And they want their application to run which is in turn to serve as the business goal that continues to abstract out into, “Remain a going concern via the current position the company stakes out.” In your case, it is saving lives; in my case, it is fixing horrifying AWS bills and making fun of Amazon at the same time, and in most other places, there are somewhat more prosaic answers to that. But containers are simply an implementation detail, to some extent—to my way of thinking—of getting to that point. An important one [unintelligible 00:18:20], let's be clear, I was very anti-container for a long time. I wrote a talk, “Heresy in the Church of Docker” that then was accepted at ContainerCon. It's like, “Oh, boy, I'm not going to leave here alive.”And the honest answer is many years later, that Kubernetes solves almost all the criticisms that I had with the downside of well, first, you have to learn Kubernetes, and that continues to be mind-bogglingly complex from where I sit. There's a reason that I've registered kubernetestheeasyway.com and repointed it to ECS, Amazon's container service that is not requiring you to cosplay as a cloud provider yourself. But even ECS has a number of challenges to it, I want to be very clear here. There are no silver bullets in this.And you're completely correct in that I have a large, complex environment, and the application is nuanced, and I'm willing to invest a few weeks in setting up the baseline underlying infrastructure on AWS with some of these services, ideally not all of them at once because that's something a lunatic would do, but getting them up and running. The other side of it, though, is that if I am trying to evaluate a cloud provider's handling of containers and how this stuff works, the reason that everyone starts with a Hello World-style example is that it delivers ideally, the meantime to dopamine. There's a reason that Hello World doesn't have 18 different dependencies across a bunch of different databases and message queues and all the other complicated parts of running a modern application. Because you just want to see how it works out of the gate. And if getting that baseline empty container that just returns the string ‘Hello World' is that complicated and requires that much work, my takeaway is not that this user experience is going to get better once I'd make the application itself more complicated.So, I find that off-putting. My approach has always been find something that I can get the easy, minimum viable thing up and running on, and then as I expand know that you'll be there to catch me as my needs intensify and become ever more complex. But if I can't get the baseline thing up and running, I'm unlikely to be super enthused about continuing to beat my head against the wall like, “Well, I'll just make it more complex. That'll solve the problem.” Because it often does not. That's my position.Casey: Yeah, I agree that dopamine hit is valuable in getting attached to want to invest into whatever tech stack you're using. The challenge is your second part of that. Your second part is will it grow with me and scale with me and support the complex edge cases that I have? And the problem I've seen is a lot of organizations will start with something that's very easy to get started with and then quickly outgrow it, and then come up with all sorts of weird Rube Goldberg-type solutions. Because they jumped all in before seeing—I've got kind of an example of that.I'm happy to announce that there's now 18 ways to run containers on AWS. Because in your use case, in the spirit of AWS customer obsession, I hear your use case, I've created an open-source project that I want to share called Quinntainers—Corey: Oh, no.Casey: —and it solves—yes. Quinntainers is live and is ready for the world. So, now we've got 18 ways to run containers. And if you have Corey's use case of, “Hey, here's my container. Run it for me,” now we've got a one command that you can run to get things going for you. I can share a link for you and you could check it out. This is a [unintelligible 00:21:38]—Corey: Oh, we're putting that in the [show notes 00:21:37], for sure. In fact, if you go to snark.cloud/quinntainers, you'll find it.Casey: You'll find it. There you go. The idea here was this: There is a real use case that you had, and I looked at AWS does not have an out-of-the-box simple solution for you. I agree with that. And Google Cloud Run does.Well, the answer would have been from AWS, “Well, then here, we need to make that solution.” And so that's what this was, was a way to demonstrate that it is a solvable problem. AWS has all the right primitives, just that use case hadn't been covered. So, how does Quinntainers work? Real straightforward: It's a command-line—it's an NPM tool.You just run a [MPX 00:22:17] Quinntainer, it sets up a GitHub action role in your AWS account, it then creates a GitHub action workflow in your repo, and then uses the Quinntainer GitHub action—reusable action—that creates the image for you; every time you push to the branch, pushes it up to ECR, and then automatically pushes up that new version of the image to App Runner for you. So, now it's using App Runner under the covers, but it's providing that nice developer experience that you are getting out of Cloud Run. Look, is container really the right way to go with running containers? No, I'm not making that point at all. But the point is it is a—Corey: It might very well be.Casey: Well, if you want to show a good Hello World experience, Quinntainer's the best because within 30 seconds, your app is now set up to continuously deliver containers into AWS for your very specific use case. The problem is, it's not going to grow for you. I mean that it was something I did over the weekend just for fun; it's not something that would ever be worthy of hitching up a real production workload to. So, the point there is, you can build frameworks and tools that are very good at getting that initial dopamine hit, but then are not going to be there for you unnecessarily as you mature and get more complex.Corey: And yet, I've tilted a couple of times at the windmill of integrating GitHub actions in anything remotely resembling a programmatic way with AWS services, as far as instance roles go. Are you using permanent credentials for this as stored secrets or are you doing the [OICD 00:23:50][00:23:50] handoff?Casey: OIDC. So, what happens is the tool creates the IAM role for you with the trust policy on GitHub's OIDC provider, sets all that up for you in your account, locks it down so that just your repo and your main branch is able to push or is able to assume the role, the role is set up just to allow deployments to App Runner and ECR repository. And then that's it. At that point, it's out of your way. And you're just git push, and couple minutes later, your updates are now running an App Runner for you.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning fast processing power, courtesy of third gen AMD EPYC processors without the IO, or hardware limitations, of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices, and say goodbye to noisy neighbors and egregious egress forever.Vultr delivers the power of the cloud with none of the bloat. "Screaming in the Cloud" listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G E T V U L T R.com/screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Don't undersell what you've just built. This is something that—is this what I would use for a large-scale production deployment, obviously not, but it has streamlined and made incredibly accessible things that previously have been very complex for folks to get up and running. One of the most disturbing themes behind some of the feedback I got was, at one point I said, “Well, have you tried running a Docker container on Lambda?” Because now it supports containers as a packaging format. And I said no because I spent a few weeks getting Lambda up and running back when it first came out and I've basically been copying and pasting what I got working ever since the way most of us do.And response is, “Oh, that explains a lot.” With the implication being that I'm just a fool. Maybe, but let's be clear, I am never the only person in the room who doesn't know how to do something; I'm just loud about what I don't know. And the failure mode of a bad user experience is that a customer feels dumb. And that's not okay because this stuff is complicated, and when a user has a bad time, it's a bug.I learned that in 2012. From Jordan Sissel the creator of LogStash. He has been an inspiration to me for the last ten years. And that's something I try to live by that if a user has a bad time, something needs to get fixed. Maybe it's the tool itself, maybe it's the documentation, maybe it's the way that GitHub repo's readme is structured in a way that just makes it accessible.Because I am not a trailblazer in most things, nor do I intend to be. I'm not the world's best engineer by a landslide. Just look at my code and you'd argue the fact that I'm an engineer at all. But if it's bad and it works, how bad is it? Is sort of the other side of it.So, my problem is that there needs to be a couple of things. Ignore for a second the aspect of making it the right answer to get something out of the door. The fact that I want to take this container and just run it, and you and I both reach for App Runner as the default AWS service that does this because I've been swimming in the AWS waters a while and you're a frickin AWS Container Hero, where it is expected that you know what most of these things do. For someone who shows up on the containers webpage—which by the way lists, I believe 15 ways to run containers on mobile and 19 ways to run containers on non-mobile, which is just fascinating in its own right—and it's overwhelming, it's confusing, and it's not something that makes it is abundantly clear what the golden path is. First, get it up and working, get it running, then you can add nuance and flavor and the rest, and I think that's something that's gotten overlooked in our mad rush to pretend that we're all Google engineers, circa 2012.Casey: Mmm. I think people get stressed out when they tried to run containers in AWS because they think, “What is that golden path?” You said golden path. And my advice to people is there is no golden path. And the great thing about AWS is they do continue to invest in the solutions they come up with. I'm still bitter about Google Reader.Corey: As am I.Casey: Yeah. I built so much time getting my perfect set of RSS feeds and then I had to find somewhere else to—with AWS, the different offerings that are available for running containers, those are there intentionally, it's not by accident. They're there to solve specific problems, so the trick is finding what works best for you and don't feel like one is better than the other is going to get more attention than others. And they each have different use cases.And I approach it this way. I've seen a couple of different people do some great flowcharts—I think Forrest did one, Vlad did one—on ways to make the decision on how to run your containers. And I break it down to three questions. I ask people first of all, where are you going to run these workloads? If someone says, “It has to be in the data center,” okay, cool, then ECS Anywhere or EKS Anywhere and we'll figure out if Kubernetes is needed.If they need specific requirements, so if they say, “No, we can run in the cloud, but we need privileged mode for containers,” or, “We need EBS volumes,” or, “We want really small container sizes,” like, less than a quarter-VCP or less than half a gig of RAM—or if you have custom log requirements, Fargate is not going to work for you, so you're going to run on EC2. Otherwise, run it on Fargate. But that's the first question. Figure out where are you going to run your containers. That leads to the second question: What's your control plane?But those are different, sort of related but different questions. And I only see six options there. That's App Runner for your control plane, LightSail for your control plane, Rosa if you're invested in OpenShift already, EKS either if you have Momentum and Kubernetes or you have a bunch of engineers that have a bunch of experience with Kubernetes—if you don't have either, don't choose it—or ECS. The last option Elastic Beanstalk, but let's leave that as a—if you're not currently invested in Elastic Beanstalk don't start today. But I look at those as okay, so I—first question, where am I going to run my containers? Second question, what do I want to use for my control plane? And there's different pros and cons of each of those.And then the third question, how do I want to manage them? What tools do I want to use for managing deployment? All those other tools like Copilot or App2Container or Proton, those aren't my control plane; those aren't where I run my containers; that's how I manage, deploy, and orchestrate all the different containers. So, I look at it as those three questions. But I don't know, what do you think of that, Corey?Corey: I think you're onto something. I think that is a terrific way of exploring that question. I would argue that setting up a framework like that—one or very similar—is what the AWS containers page should be, just coming from the perspective of what is the neophyte customer experience. On some level, you almost need a slide of have choose your level of experience ranging from, “What's a container?” To, “I named my kid Kubernetes because I make terrible life decisions,” and anywhere in between.Casey: Sure. Yeah, well, and I think that really dictates the control plane level. So, for example, LightSail, where does LightSail fit? To me, the value of LightSail is the simplicity. I'm looking at a monthly pricing: Seven bucks a month for a container.I don't know how [unintelligible 00:30:23] works, but I can think in terms of monthly pricing. And it's tailored towards a console user, someone just wants to click in, point to an image. That's a very specific user, there's thousands of customers that are very happy with that experience, and they use it. App Runner presents that scale to zero. That's one of the big selling points I see with App Runner. Likewise, with Google Cloud Run. I've got that scale to zero. I can't do that with ECS, or EKS, or any of the other platforms. So, if you've got something that has a ton of idle time, I'd really be looking at those. I would argue that I think I did the math, Google Cloud Run is about 30% more expensive than App Runner.Corey: Yeah, if you disregard the free tier, I think that's have it—running persistently at all times throughout the month, the drop-out cold starts would cost something like 40 some odd bucks a month or something like that. Don't quote me on it. Again and to be clear, I wound up doing this very congratulatory and complimentary tweet about them on I think it was Thursday, and then they immediately apparently took one look at this and said, “Holy shit. Corey's saying nice things about us. What do we do? What do we do?” Panic.And the next morning, they raised prices on a bunch of cloud offerings. Whew, that'll fix it. Like—Casey: [laugh].Corey: Di-, did you miss the direction you're going on here? No, that's the exact opposite of what you should be doing. But here we are. Interestingly enough, to tie our two conversation threads together, when I look at an AWS bill, unless you're using Fargate, I can't tell whether you're using Kubernetes or not because EKS is a small charge. And almost every case for the control plane, or Fargate under it.Everything else just manifests as EC2 spend. From the perspective of the cloud provider. If you're running a Kubernetes cluster, it is a single-tenant application that can have some very funky behaviors like cross-AZ chatter back and fourth because there's no internal mechanism to say talk to the free thing, rather than the two cents a gigabyte thing. It winds up spinning up and down in a bunch of different ways, and the behavior patterns, because of how placement works are not necessarily deterministic, depending upon workload. And that becomes something that people find odd when, “Okay, we look at our bill for a week, what can you say?”“Well, first question. Are you running Kubernetes at all?” And they're like, “Who invited these clowns?” Understand, we're not prying into your workloads for a variety of excellent legal and contractual reasons, here. We are looking at how they behave, and for specific workloads, once we have a conversation engineering team, yeah, we're going to dive in, but it is not at all intuitive from the outside to make any determination whether you're running containers, or whether you're running VMs that you just haven't done anything with in 20 years, or what exactly is going on. And that's just an artifact of the billing system.Casey: We ran into this challenge in Gaggle. We don't use EKS, we use ECS, but we have some shared clusters, lots of EC2 spend, hard to figure out which team is creating the services that's running that up. We actually ended up creating a tool—we open-sourced it—ECS Chargeback, and what it does is it looks at the CPU memory reservations for each task definition, and then prorates the overall charge of the ECS cluster, and then creates metrics in Datadog to give us a breakdown of cost per ECS service. And it also measures what we like to refer to as waste, right? Because if you're reserving four gigs of memory, but your utilization never goes over two gigs, we're paying for that reservation, but you're underutilizing.So, we're able to also show which services have the highest degree of waste, not just utilization, so it helps us go after it. But this is a hard problem. I'd be curious, how do you approach these shared ECS resources and slicing and dicing those bills?Corey: Everyone has a different approach, too. This there is no unifiable, correct answer. A previous show guest, Peter Hamilton, over at Remind had done something very similar, open-sourced a bunch of these things. Understanding what your spend is important on this, and it comes down to getting at the actual business concern because in some cases, effectively dead reckoning is enough. You take a look at the cluster that is really hard to attribute because it's a shared service. Great. It is 5% of your bill.First pass, why don't we just agree that it is a third for Service A, two-thirds for Service B, and we'll call it mostly good at that point? That can be enough in a lot of cases. With scale [laugh] you're just sort of hand-waving over many millions of dollars a year there. How about we get into some more depth? And then you start instrumenting and reporting to something, be it CloudWatch, be a Datadog, be it something else, and understanding what the use case is.In some cases, customers have broken apart shared clusters for that specific reason. I don't think that's necessarily the best approach from an engineering perspective, but again, this is not purely an engineering decision. It comes down to serving the business need. And if you're taking up partial credits on that cluster, for a tax credit for R&D for example, you want that position to be extraordinarily defensible, and spending a few extra dollars to ensure that it is the right business decision. I mean, again, we're pure advisory; we advise customers on what we would do in their position, but people often mistake that to be we're going to go for the lowest possible price—bad idea, or that we're going to wind up doing this from a purely engineering-centric point of view.It's, be aware of that in almost every case, with some very notable weird exceptions, the AWS Bill costs significantly less than the payroll expense that you have of people working on the AWS environment in various ways. People are more expensive, so the idea of, well, you can save a whole bunch of engineering effort by spending a bit more on your cloud, yeah, let's go ahead and do that.Casey: Yeah, good point.Corey: The real mark of someone who's senior enough is their answer to almost any question is, “It depends.” And I feel I've fallen into that trap as well. Much as I'd love to sit here and say, “Oh, it's really simple. You do X, Y, and Z.” Yeah… honestly, my answer, the simple answer, is I think that we orchestrate a cyber-bullying campaign against AWS through the AWS wishlist hashtag, we get people to harass their account managers with repeated requests for, “Hey, could you go ahead and [dip 00:36:19] that thing in—they give that a plus-one for me, whatever internal system you're using?”Just because this is a problem we're seeing more and more. Given that it's an unbounded growth problem, we're going to see it more and more for the foreseeable future. So, I wish I had a better answer for you, but yeah, that's stuff's super hard is honest, but it's also not the most useful answer for most of us.Casey: I'd love feedback from anyone from you or your team on that tool that we created. I can share link after the fact. ECS Chargeback is what we call it.Corey: Excellent. I will follow up with you separately on that. That is always worth diving into. I'm curious to see new and exciting approaches to this. Just be aware that we have an obnoxious talent sometimes for seeing these things and, “Well, what about”—and asking about some weird corner edge case that either invalidates the entire thing, or you're like, “Who on earth would ever have a problem like that?” And the answer is always, “The next customer.”Casey: Yeah.Corey: For a bounded problem space of the AWS bill. Every time I think I've seen it all, I just have to talk to one more customer.Casey: Mmm. Cool.Corey: In fact, the way that we approached your teardown in the restaurant is how we launched our first pass approach. Because there's value in something like that is different than the value of a six to eight-week-long, deep-dive engagement to every nook and cranny. And—Casey: Yeah, for sure. It was valuable to us.Corey: Yeah, having someone come in to just spend a day with your team, diving into it up one side and down the other, it seems like a weird thing, like, “How much good could you possibly do in a day?” And the answer in some cases is—we had a Honeycomb saying that in a couple of days of something like this, we wound up blowing 10% off their entire operating budget for the company, it led to an increased valuation, Liz Fong-Jones says that—on multiple occasions—that the company would not be what it was without our efforts on their bill, which is just incredibly gratifying to hear. It's easy to get lost in the idea of well, it's the AWS bill. It's just making big companies spend a little bit less to another big company. And that's not exactly, you know, saving the lives of K through 12 students here.Casey: It's opening up opportunities.Corey: Yeah. It's about optimizing for the win for everyone. Because now AWS gets a lot more money from Honeycomb than they would if Honeycomb had not continued on their trajectory. It's, you can charge customers a lot right now, or you can charge them a little bit over time and grow with them in a partnership context. I've always opted for the second model rather than the first.Casey: Right on.Corey: But here we are. I want to thank you for taking so much time out of well, several days now to argue with me on Twitter, which is always appreciated, particularly when it's, you know, constructive—thanks for that—Casey: Yeah.Corey: For helping me get my business partner to re:Invent, although then he got me that horrible puzzle of 1000 pieces for the Cloud-Native Computing Foundation landscape and now I don't ever want to see him again—so you know, that happens—and of course, spending the time to write Quinntainers, which is going to be at snark.cloud/quinntainers as soon as we're done with this recording. Then I'm going to kick the tires and send some pull requests.Casey: Right on. Yeah, thanks for having me. I appreciate you starting the conversation. I would just conclude with I think that yes, there are a lot of ways to run containers in AWS; don't let it stress you out. They're there for intention, they're there by design. Understand them.I would also encourage people to go a little deeper, especially if you got a significantly large workload. You got to get your hands dirty. As a matter of fact, there's a hands-on lab that a company called Liatrio does. They call it their Night Lab; it's a one-day free, hands-on, you run legacy monolithic job applications on Kubernetes, gives you first-hand experience on how to—gets all the way up into observability and doing things like Canary deployments. It's a great, great lab.But you got to do something like that to really get your hands dirty and understand how these things work. So, don't sweat it; there's not one right way. There's a way that will probably work best for each user, and just take the time and understand the ways to make sure you're applying the one that's going to give you the most runway for your workload.Corey: I will definitely dig into that myself. But I think you're right, I think you have nailed a point that is, again, a nuanced one and challenging to put in a rage tweet. But the services don't exist in a vacuum. They're not there because, despite the joke, someone wants to get promoted. It's because there are customer needs that are going on that, and this is another way of meeting those needs.I think there could be better guidance, but I also understand that there are a lot of nuanced perspectives here and that… hell is someone else's workflow—Casey: [laugh].Corey: —and there's always value in broadening your perspective a bit on those things. If people want to learn more about you and how you see the world, where's the best place to find you?Casey: Probably on Twitter: twitter.com/nektos, N-E-K-T-O-S.Corey: That might be the first time Twitter has been described as a best place for anything. But—Casey: [laugh].Corey: Thank you once again, for your time. It is always appreciated.Casey: Thanks, Corey.Corey: Casey Lee, CTO at Gaggle and AWS Container Hero. And apparently writing code in anger to invalidate my points, which is always appreciated. Please do more of that, folks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, or the YouTube comments, which is always a great place to go reading, whereas if you've hated this podcast, please leave a five-star review in the usual places and an angry comment telling me that I'm completely wrong, and then launching your own open-source tool to point out exactly what I've gotten wrong this time.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About TimTimothy William Bray is a Canadian software developer, environmentalist, political activist and one of the co-authors of the original XML specification. He worked for Amazon Web Services from December 2014 until May 2020 when he quit due to concerns over the terminating of whistleblowers. Previously he has been employed by Google, Sun Microsystemsand Digital Equipment Corporation (DEC). Bray has also founded or co-founded several start-ups such as Antarctica Systems.Links Referenced: Textuality Services: https://www.textuality.com/ laugh]. So, the impetus for having this conversation is, you had a [blog post: https://www.tbray.org/ongoing/When/202x/2022/01/30/Cloud-Lock-In @timbray: https://twitter.com/timbray tbray.org: https://tbray.org duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today has been on a year or two ago, but today, we're going in a bit of a different direction. Tim Bray is a principal at Textuality Services.Once upon a time, he was a Distinguished Engineer slash VP at AWS, but let's be clear, he isn't solely focused on one company; he also used to work at Google. Also, there is scuttlebutt that he might have had something to do, at one point, with the creation of God's true language, XML. Tim, thank you for coming back on the show and suffering my slings and arrows.Tim: Oh, you're just fine. Glad to be here.Corey: [laugh]. So, the impetus for having this conversation is, you had a blog post somewhat recently—by which I mean, January of 2022—where you talked about lock-in and multi-cloud, two subjects near and dear to my heart, mostly because I have what I thought was a fairly countercultural opinion. You seem to have a very closely aligned perspective on this. But let's not get too far ahead of ourselves. Where did this blog posts come from?Tim: Well, I advised a couple of companies and one of them happens to be using GCP and the other happens to be using AWS and I get involved in a lot of industry conversations, and I noticed that multi-cloud is a buzzword. If you go and type multi-cloud into Google, you get, like, a page of people saying, “We will solve your multi-cloud problems. Come to us and you will be multi-cloud.” And I was not sure what to think, so I started writing to find out what I would think. And I think it's not complicated anymore. I think the multi-cloud is a reality in most companies. I think that many mainstream, non-startup companies are really worried about cloud lock-in, and that's not entirely unreasonable. So, it's a reasonable thing to think about and it's a reasonable thing to try and find the right balance between avoiding lock-in and not slowing yourself down. And the issues were interesting. What was surprising is that I published that blog piece saying what I thought were some kind of controversial things, and I got no pushback. Which was, you know, why I started talking to you and saying, “Corey, you know, does nobody disagree with this? Do you disagree with this? Maybe we should have a talk and see if this is just the new conventional wisdom.”Corey: There's nothing worse than almost trying to pick a fight, but no one actually winds up taking you up on the opportunity. That always feels a little off. Let's break it down into two issues because I would argue that they are intertwined, but not necessarily the same thing. Let's start with multi-cloud because it turns out that there's just enough nuance to—at least where I sit on this position—that whenever I tweet about it, I wind up getting wildly misinterpreted. Do you find that as well?Tim: Not so much. It's not a subject I have really had too much to say about, but it does mean lots of different things. And so it's not totally surprising that that happens. I mean, some people think when you say multi-cloud, you mean, “Well, I'm going to take my strategic application, and I'm going to run it in parallel on AWS and GCP because that way, I'll be more resilient and other good things will happen.” And then there's another thing, which is that, “Well, you know, as my company grows, I'm naturally going to be using lots of different technologies and that might include more than one cloud.” So, there's a whole spectrum of things that multi-cloud could mean. So, I guess when we talk about it, we probably owe it to our audiences to be clear what we're talking about.Corey: Let's be clear, from my perspective, the common definition of multi-cloud is whatever the person talking is trying to sell you at that point in time is, of course, what multi-cloud is. If it's a third-party dashboard, for example, “Oh, yeah, you want to be able to look at all of your cloud usage on a single pane of glass.” If it's a certain—well, I guess, certain not a given cloud provider, well, they understand if you go all-in on a cloud provider, it's probably not going to be them so they're, of course, going to talk about multi-cloud. And if it's AWS, where they are the 8000-pound gorilla in the space, “Oh, yeah, multi-clouds, terrible. Put everything on AWS. The end.” It seems that most people who talk about this have a very self-serving motivation that they can't entirely escape. That bias does reflect itself.Tim: That's true. When I joined AWS, which was around 2014, the PR line was a very hard line. “Well, multi-cloud that's not something you should invest in.” And I've noticed that the conversation online has become much softer. And I think one reason for that is that going all-in on a single cloud is at least possible when you're a startup, but if you're a big company, you know, a insurance company, a tire manufacturer, that kind of thing, you're going to be multi-cloud, for the same reason that they already have COBOL on the mainframe and Java on the old Sun boxes, and Mongo running somewhere else, and five different programming languages.And that's just the way big companies are, it's a consequence of M&A, it's a consequence of research projects that succeeded, one kind or another. I mean, lots of big companies have been trying to get rid of COBOL for decades, literally, [laugh] and not succeeding and doing that. So—Corey: It's ‘legacy' which is, of course, the condescending engineering term for, “It makes money.”Tim: And works. And so I don't think it's realistic to, as a matter of principle, not be multi-cloud.Corey: Let's define our terms a little more closely because very often, people like to pull strange gotchas out of the air. Because when I talk about this, I'm talking about—like, when I speak about it off the cuff, I'm thinking in terms of where do I run my containers? Where do I run my virtual machines? Where does my database live? But you can also move in a bunch of different directions. Where do my Git repositories live? What Office suite am I using? What am I using for my CRM? Et cetera, et cetera? Where do you draw the boundary lines because it's very easy to talk past each other if we're not careful here?Tim: Right. And, you know, let's grant that if you're a mainstream enterprise, you're running your Office automation on Microsoft, and they're twisting your arm to use the cloud version, so you probably are. And if you have any sense at all, you're not running your own Exchange Server, so let's assume that you're using Microsoft Azure for that. And you're running Salesforce, and that means you're on Salesforce's cloud. And a lot of other Software-as-a-Service offerings might be on AWS or Azure or GCP; they don't even tell you.So, I think probably the crucial issue that we should focus our conversation on is my own apps, my own software that is my core competence that I actually use to run the core of my business. And typically, that's the only place where a company would and should invest serious engineering resources to build software. And that's where the question comes, where should that software that I'm going to build run? And should it run on just one cloud, or—Corey: I found that when I gave a conference talk on this, in the before times, I had to have a ever lengthier section about, “I'm speaking in the general sense; there are specific cases where it does make sense for you to go in a multi-cloud direction.” And when I'm talking about multi-cloud, I'm not necessarily talking about Workload A lives on Azure and Workload B lives on AWS, through mergers, or weird corporate approaches, or shadow IT that—surprise—that's not revenue-bearing. Well, I guess we have to live with it. There are a lot of different divisions doing different things and you're going to see that a fair bit. And I'm not convinced that's a terrible idea as such. I'm talking about the single workload that we're going to spread across two or more clouds, intentionally.Tim: That's probably not a good idea. I just can't see that being a good idea, simply because you get into a problem of just terminology and semantics. You know, the different providers mean different things by the word ‘region' and the word ‘instance,' and things like that. And then there's the people problem. I mean, I don't think I personally know anybody who would claim to be able to build and deploy an application on AWS and also on GCP. I'm sure some people exist, but I don't know any of them.Corey: Well, Forrest Brazeal was deep in the AWS weeds and now he's the head of content at Google Cloud. I will credit him that he probably has learned to smack an API around over there.Tim: But you know, you're going to have a hard time hiring a person like that.Corey: Yeah. You can count these people almost as individuals.Tim: And that's a big problem. And you know, in a lot of cases, it's clearly the case that our profession is talent-starved—I mean, the whole world is talent-starved at the moment, but our profession in particular—and a lot of the decisions about what you can build and what you can do are highly contingent on who you can hire. And you can't hire a multi-cloud expert, well, you should not deploy, [laugh] you know, a multi-cloud application.Now, having said that, I just want to dot this i here and say that it can be made to kind of work. I've got this one company I advise—I wrote about it in the blog piece—that used to be on AWS and switched over to GCP. I don't even know why; this happened before I joined them. And they have a lot of applications and then they have some integrations with third-party partners which they implemented with AWS Lambda functions. So, when they moved over to GCP, they didn't stop doing that.So, this mission-critical latency-sensitive application of theirs runs on GCP that calls out to AWS to make calls into their partners' APIs and so on. And works fine. Solid as a rock, reliable, low latency. And so I talked to a person I know who knows over on the AWS side, and they said, “Oh, yeah sure, you know, we talked to those guys. Lots of people do that. We make sure, you know, the connections are low latency and solid.” So, technically speaking, it can be done. But for a variety of business reasons—maybe the most important one being expertise and who you can hire—it's probably just not a good idea.Corey: One of the areas where I think is an exception case is if you are a SaaS provider. Let's pick a big easy example: Snowflake, where they are a data warehouse. They've got to run their data warehousing application in all of the major clouds because that is where their customers are. And it turns out that if you're going to send a few petabytes into a data warehouse, you really don't want to be paying cloud egress rates to do it because it turns out, you can just bootstrap a second company for that much money.Tim: Well, Zoom would be another example, obviously.Corey: Oh, yeah. Anything that's heavy on data transfer is going to be a strange one. And there's being close to customers; gaming companies are another good example on this where a lot of the game servers themselves will be spread across a bunch of different providers, just purely based on latency metrics around what is close to certain customer clusters.Tim: I can't disagree with that. You know, I wonder how large a segment that is, of people who are, I think you're talking about core technology companies. Now, of the potential customers of the cloud providers, how many of them are core technology companies, like the kind we're talking about, who have such a need, and how many people who just are people who just want to run their manufacturing and product design and stuff. And for those, buying into a particular cloud is probably a perfectly sensible choice.Corey: I've also seen regulatory stories about this. I haven't been able to track them down specifically, but there is a pervasive belief that one interpretation of UK banking regulations stipulates that you have to be able to get back up and running within 30 days on a different cloud provider entirely. And also, they have the regulatory requirement that I believe the data remain in-country. So, that's a little odd. And honestly, when it comes to best practices and how you should architect things, I'm going to take a distinct backseat to legal requirements imposed upon you by your regulator. But let's be clear here, I'm not advising people to go and tell their auditors that they're wrong on these things.Tim: I had not heard that story, but you know, it sounds plausible. So, I wonder if that is actually in effect, which is to say, could a huge British banking company, in fact do that? Could they in fact, decamp from Azure and move over to GCP or AWS in 30 days? Boy.Corey: That is what one bank I spoke to over there was insistent on. A second bank I spoke to in that same jurisdiction had never heard of such a thing, so I feel like a lot of this is subject to auditor interpretation. Again, I am not an expert in this space. I do not pretend to be—I know I'm that rarest of all breeds: A white guy with a microphone in tech who admits he doesn't know something. But here we are.Tim: Yeah, I mean, I imagine it could be plausible if you didn't use any higher-level services, and you just, you know, rented instances and were careful about which version of Linux you ran and we're just running a bunch of Java code, which actually, you know, describes the workload of a lot of financial institutions. So, it should be a matter of getting… all the right instances configured and the JVM configured and launched. I mean, there are no… architecturally terrifying barriers to doing that. Of course, to do that, it would mean you would have to avoid using any of the higher-level services that are particular to any cloud provider and basically just treat them as people you rent boxes from, which is probably not a good choice for other business reasons.Corey: Which can also include things as seemingly low-level is load balancers, just based upon different provisioning modes, failure modes, and the rest. You're probably going to have a more consistent experience running HAProxy or nginx yourself to do it. But Tim, I have it on good authority that this is the old way of thinking, and that Kubernetes solves all of it. And through the power of containers and powers combining and whatnot, that frees us from being beholden to any given provider and our workloads are now all free as birds.Tim: Well, I will go as far as saying that if you are in the position of trying to be portable, probably using containers is a smart thing to do because that's a more tractable level of abstraction that does give you some insulation from, you know, which version of Linux you're running and things like that. The proposition that configuring and running Kubernetes is easier than configuring and running [laugh] JVM on Linux [laugh] is unsupported by any evidence I've seen. So, I'm dubious of the proposition that operating at the Kubernetes-level at the [unintelligible 00:14:42] level, you know, there's good reasons why some people want to do that, but I'm dubious of the proposition that really makes you more portable in an essential way.Corey: Well, you're also not the target market for Kubernetes. You have worked at multiple cloud providers and I feel like the real advantage of Kubernetes is people who happen to want to protect that they do so they can act as a sort of a cosplay of being their own cloud provider by running all the intricacies of Kubernetes. I'm halfway kidding, but there is an uncomfortable element of truth to that to some of the conversations I've had with some of its more, shall we say, fanatical adherents.Tim: Well, I think you and I are neither of us huge fans of Kubernetes, but my reasons are maybe a little different. Kubernetes does some really useful things. It really, really does. It allows you to take n VMs, and pack m different applications onto them in a way that takes reasonably good advantage of the processing power they have. And it allows you to have different things running in one place with different IP addresses.It sounds straightforward, but that turns out to be really helpful in a lot of ways. So, I'm actually kind of sympathetic with what Kubernetes is trying to be. My big gripe with it is that I think that good technology should make easy things easy and difficult things possible, and I think Kubernetes fails the first test there. I think the complexity that it involves is out of balance with the benefits you get. There's a lot of really, really smart people who disagree with me, so this is not a hill I'm going to die on.Corey: This is very much one of those areas where reasonable people can disagree. I find the complexity to be overwhelming; it has to collapse. At this point, it's finding someone who can competently run Kubernetes in production is a bit hard to do and they tend to be extremely expensive. You aren't going to find a team of those people at every company that wants to do things like this, and they're certainly not going to be able to find it in their budget in many cases. So, it's a challenging thing to do.Tim: Well, that's true. And another thing is that once you step onto the Kubernetes slope, you start looking about Istio and Envoy and [fabric 00:16:48] technology. And we're talking about extreme complexity squared at that point. But you know, here's the thing is, back in 2018 I think it was, in his keynote, Werner said that the big goal is that all the code you ever write should be application logic that delivers business value, which you know rep—Corey: Didn't CGI say the same thing? Didn't—like, isn't there, like, a long history dating back longer than I believe either of us have been alive have, “With this, all you're going to write is business logic.” That was the Java promise. That was the Google App Engine promise. Again, and again, we've had that carrot dangled in front of us, and it feels like the reality with Lambda is, the only code you will write is not necessarily business logic, it's getting the thing to speak to the other service you're trying to get it to talk to because a lot of these integrations are super finicky. At least back when I started learning how this stuff worked, they were.Tim: People understand where the pain points are and are indeed working on them. But I think we can agree that if you believe in that as a goal—which I still do; I mean, we may not have got there, but it's still a worthwhile goal to work on. We can agree that wrangling Istio configurations is not such a thing; it's not [laugh] directly value-adding business logic. To the extent that you can do that, I think serverless provides a plausible way forward. Now, you can be all cynical about, “Well, I still have trouble making my Lambda to talk to my other thing.” But you know, I've done that, and I've also deployed JVM on bare metal kind of thing.You know what? I'd rather do things at the Lambda level. I really rather would. Because capacity forecasting is a horribly difficult thing, we're all terrible at it, and the penalties for being wrong are really bad. If you under-specify your capacity, your customers have a lousy experience, and if you over-specify it, and you have an architecture that makes you configure for peak load, you're going to spend bucket-loads of money that you don't need to.Corey: “But you're then putting your availability in the cloud providers' hands.” “Yeah, you already were. Now, we're just being explicit about acknowledging that.”Tim: Yeah. Yeah, absolutely. And that's highly relevant to the current discussion because if you use the higher-level serverless function if you decide, okay, I'm going to go with Lambda and Dynamo and EventBridge and that kind of thing, well, that's not portable at all. I mean, APIs are totally idiosyncratic for AWS and GCP's equivalent, and Azure's—what do they call it? Permanent functions or something-a-rather functions. So yeah, that's part of the trade-off you have to think about. If you're going to do that, you're definitely not going to be multi-cloud in that application.Corey: And in many cases, one of the stated goals for going multi-cloud is that you can avoid the downtime of a single provider. People love to point at the big AWS outages or, “See? They were down for half a day.” And there is a societal question of what happens when everyone is down for half a day at the same time, but in most cases, what I'm seeing, your instead of getting rid of a single point of failure, introducing a second one. If either one of them is down your applications down, so you've doubled your outage surface area.On the rare occasions where you're able to map your dependencies appropriately, great. Are your third-party critical providers all doing the same? If you're an e-commerce site and Stripe processes your payments, well, they're public about being all-in on AWS. So, if you can't process payments, does it really matter that your website stays up? It becomes an interesting question. And those are the ones that you know about, let alone the third, fourth-order dependencies that are almost impossible to map unless everyone is as diligent as you are. It's a heavy, heavy lift.Tim: I'm going to push back a little bit. Now, for example, this company I'm advising that running GCP and calling out to Lambda is in that position; either GCP or Lambda goes off the air. On the other hand, if you've got somebody like Zoom, they're probably running parallel full stacks on the different cloud providers. And if you're doing that, then you can at least plausibly claim that you're in a good place because if Dynamo has an outage—and everything relies on Dynamo—then you shift your load over to GCP or Oracle [laugh] and you're still on the air.Corey: Yeah, but what is up as well because Zoom loves to sign me out on my desktop whenever I log into it on my laptop, and vice versa, and I wonder if that authentication and login system is also replicated full-stack to everywhere it goes, and what the fencing on that looks like, and how the communication between all those things works? I wouldn't doubt that it's possible that they've solved for this, but I also wonder how thoroughly they've really tested all of the, too. Not because I question them any; just because this stuff is super intricate as you start tracing it down into the nitty-gritty levels of the madness that consumes all these abstractions.Tim: Well, right, that's a conventional wisdom that is really wise and true, which is that if you have software that is alleged to do something like allow you to get going on another cloud, unless you've tested it within the last three weeks, it's not going to work when you need it.Corey: Oh, it's like a DR exercise: The next commit you make breaks it. Once you have the thing working again, it sits around as a binder, and it's a best guess. And let's be serious, a lot of these DR exercises presume that you're able to, for example, change DNS records on the fly, or be able to get a virtual machine provisioned in less than 45 minutes—because when there's an actual outage, surprise, everyone's trying to do the same things—there's a lot of stuff in there that gets really wonky at weird levels.Tim: A related similar exercise, which is people who want to be on AWS but want to be multi-region. It's actually, you know, a fairly similar kind of problem. If I need to be able to fail out of us-east-1—well, God help you, because if you need to everybody else needs to as well—but you know, would that work?Corey: Before you go multi-cloud go multi-region first. Tell me how easy it is because then you have full-feature parity—presumably—between everything; it should just be a walk in the park. Send me a postcard once you get that set up and I'll eat a bunch of words. And it turns out, basically, no one does.Tim: Mm-hm.Corey: Another area of lock-in around a lot of this stuff, and I think that makes it very hard to go multi-cloud is the security model of how does that interface with various aspects. In many cases, I'm seeing people doing full-on network overlays. They don't have to worry about the different security group models and VPCs and all the rest. They can just treat everything as a node sitting on the internet, and the only thing it talks to is an overlay network. Which is terrible, but that seems to be one of the only ways people are able to build things that span multiple providers with any degree of success.Tim: Well, that is painful because, much as we all like to scoff and so on, in the degree of complexity you get into there, it is the case that your typical public cloud provider can do security better than you can. They just can. It's a fact of life. And if you're using a public cloud provider and not taking advantage of their security offerings, infrastructure, that's probably dumb. But if you really want to be multi-cloud, you kind of have to, as you said.In particular, this gets back to the problem of expertise because it's hard enough to hire somebody who really understands IAM deeply and how to get that working properly, try and find somebody who can understand that level of thing on two different cloud providers at once. Oh, gosh.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Another point you made in your blog post was the idea of lock-in, of people being worried that going all-in on a provider was setting them up to be, I think Oracle is the term that was tossed around where once you're dependent on a provider, what's to stop them from cranking the pricing knobs until you squeal?Tim: Nothing. And I think that is a perfectly sane thing to worry about. Now, in the short term, based on my personal experience working with, you know, AWS leadership, I think that it's probably not a big short-term risk. AWS is clearly aware that most of the growth is still in front of them. You know, the amount of all of it that's on the cloud is still pretty small and so the thing to worry about right now is growth.And they are really, really genuinely, sincerely focused on customer success and will bend over backwards to deal with the customers problems as they are. And I've seen places where people have negotiated a huge multi-year enterprise agreement based on Reserved Instances or something like that, and then realize, oh, wait, we need to switch our whole technology stack, but you've got us by the RIs and AWS will say, “No, no, it's okay. We'll tear that up and rewrite it and get you where you need to go.” So, in the short term, between now and 2025, would I worry about my cloud provider doing that? Probably not so much.But let's go a little further out. Let's say it's, you know, 2030 or something like that, and at that point, you know, Andy Jassy decided to be a full-time sports mogul, and Satya Narayana has gone off to be a recreational sailboat owner or something like that, and private equity operators come in and take very significant stakes in the public cloud providers, and get a lot of their guys on the board, and you have a very different dynamic. And you have something that starts to feel like Oracle where their priority isn't, you know, optimizing for growth and customer success; their priority is optimizing for a quarterly bottom line, and—Corey: Revenue extraction becomes the goal.Tim: That's absolutely right. And this is not a hypothetical scenario; it's happened. Most large companies do not control the amount of money they spend per year to have desktop software that works. They pay whatever Microsoft's going to say they pay because they don't have a choice. And a lot of companies are in the same situation with their database.They don't get to budget, their database budget. Oracle comes in and says, “Here's what you're going to pay,” and that's what you pay. You really don't want to be in a situation with your cloud, and that's why I think it's perfectly reasonable for somebody who is doing cloud transition at a major financial or manufacturing or service provider company to have an eye to this. You know, let's not completely ignore the lock-in issue.Corey: There is a significant scale with enterprise deals and contracts. There is almost always a contractual provision that says if you're going to raise a price with any cloud provider, there's a fixed period of time of notice you must give before it happens. I feel like the first mover there winds up getting soaked because everyone is going to panic and migrate in other directions. I mean, Google tried it with Google Maps for their API, and not quite Google Cloud, but also scared the bejesus out of a whole bunch of people who were, “Wait. Is this a harbinger of things to come?”Tim: Well, not in the short term, I don't think. And I think you know, Google Maps [is absurdly 00:26:36] underpriced. That's hellishly expensive service. And it's supposed to pay for itself by, you know, advertising on maps. I don't know about that.I would see that as the exception rather than the rule. I think that it's reasonable to expect cloud prices, nominally at least, to go on decreasing for at least the short term, maybe even the medium term. But that's—can't go on forever.Corey: It also feels to me, like having looked at an awful lot of AWS environments that if there were to be some sort of regulatory action or some really weird outage for a year that meant that AWS could not onboard a single new customer, their revenue year-over-year would continue to increase purely by organic growth because there is no forcing function that turns the thing off when you're done using it. In fact, they can migrate things around to hardware that works, they can continue building you for the things sitting there idle. And there is no governance path on that. So, on some level, winding up doing a price increase is going to cause a massive company focus on fixing a lot of that. It feels on some level like it is drawing attention to a thing that they don't really want to draw attention to from a purely revenue extraction story.When CentOS back-walked their ten-year support line two years, suddenly—and with an idea that it would drive [unintelligible 00:27:56] adoption. Well, suddenly, a lot of people looked at their environment, saw they had old [unintelligible 00:28:00] they weren't using. And massively short-sighted, massively irritated a whole bunch of people who needed that in the short term, but by the renewal, we're going to be on to Ubuntu or something else. It feels like it's going to backfire massively, and I'd like to imagine the strategist of whoever takes the reins of these companies is going to be smarter than that. But here we are.Tim: Here we are. And you know it's interesting you should mention regulatory action. At the moment, there are only three credible public cloud providers. It's not obvious the Google's really in it for the long haul, as last time I checked, they were claiming to maybe be breaking even on it. That's not a good number, you know? You'd like there to be more than that.And if it goes on like that, eventually, some politician is going to say, “Oh, maybe they should be regulated like public utilities,” because they kind of are right? And I would think that anybody who did get into Oracle-izing would be—you know, accelerate that happening. Having said that, we do live in the atmosphere of 21st-century capitalism, and growth is the God that must be worshiped at all costs. Who knows. It's a cloudy future. Hard to see.Corey: It really is. I also want to be clear, on some level, that with Google's current position, if they weren't taking a small loss at least, on these things, I would worry. Like, wait, you're trying to catch AWS and you don't have anything better to invest that money into than just well time to start taking profits from it. So, I can see both sides of that one.Tim: Right. And as I keep saying, I've already said once during this slot, you know, the total cloud spend in the world is probably on the order of one or two-hundred billion per annum, and global IT is in multiple trillions. So, [laugh] there's a lot more space for growth. Years and years worth of it.Corey: Yeah. The challenge, too, is that people are worried about this long-term strategic point of view. So, one thing you talked about in your blog post is the idea of using hosted open-source solutions. Like, instead of using Kinesis, you'd wind up using Kafka or instead of using DynamoDB you use their managed Cassandra service—or as I think of it Amazon Basics Cassandra—and effectively going down the path of letting them manage this thing, but you then have a theoretical Exodus path. Where do you land on that?Tim: I think that speaks to a lot of people's concerns, and I've had conversations with really smart people about that who like that idea. Now, to be realistic, it doesn't make migration easy because you've still got all the CI and CD and monitoring and management and scaling and alarms and alerts and paging and et cetera, et cetera, et cetera, wrapped around it. So, it's not as though you could just pick up your managed Kafka off AWS and drop a huge installation onto GCP easily. But at least, you know, your data plan APIs are the same, so a lot of your code would probably still run okay. So, it's a plausible path forward. And when people say, “I want to do that,” well, it does mean that you can't go all serverless. But it's not a totally insane path forward.Corey: So, one last point in your blog post that I think a lot of people think about only after they get bitten by it is the idea of data gravity. I alluded earlier in our conversation to data egress charges, but my experience has been that where your data lives is effectively where the rest of your cloud usage tends to aggregate. How do you see it?Tim: Well, it's a real issue, but I think it might perhaps be a little overblown. People throw the term petabytes around, and people don't realize how big a petabyte is. A petabyte is just an insanely huge amount of data, and the notion of transmitting one over the internet is terrifying. And there are lots of enterprises that have multiple petabytes around, and so they think, “Well, you know, it would take me 26 years to transmit that, so I can't.”And they might be wrong. The internet's getting faster all time. Did you notice? I've been able to move some—for purely personal projects—insane amounts of data, and it gets there a lot faster than you did. Secondly, in the case of AWS Snowmobile, we have an existence proof that you can do exabyte-ish scale data transfers in the time it takes to drive a truck across the country.Corey: Inbound only. Snowmobiles are not—at least according to public examples—are valid for Exodus.Tim: But you know, this is kind of place where regulatory action might come into play if what the people were doing was seen to be abusive. I mean, there's an existence proof you can do this thing. But here's another point. So, I suppose you have, like, 15 petabytes—that's an insane amount of data—displayed in your corporate application. So, are you actually using that to run the application, or is a huge proportion of that stuff just logs and data gathered of various kinds that's being used in analytics applications and AI models and so on?Do you actually need all that data to actually run your app? And could you in fact, just pick up the stuff you need for your app, move it to a different cloud provider from there and leave your analytics on the first one? Not a totally insane idea.Corey: It's not a terrible idea at all. It comes down to the idea as well of when you're trying to run a query against a bunch of that data, do you need all the data to transit or just the results of that query, as well? It's a question of, can you move the compute closer to the data as opposed to the data to where the compute lives?Tim: Well, you know and a lot of those people who have those huge data pools have it sitting on S3, and a lot of it migrated off into Glacier, so it's not as if you could get at it in milliseconds anyhow. I just ask myself, “How much data can anybody actually use in a day? In the course of satisfying some transaction requests from a customer?” And I think it's not petabyte. It just isn't.Now, there are—okay, there are exceptions. There's the intelligence community, there's the oil drilling community, there are some communities who genuinely will use insanely huge seas of data on a routine basis, but you know, I think that's kind of a corner case, so before you shake your head and say, “Ah, they'll never move because the data gravity,” you know… you need to prove that to me and I might be a little bit skeptical.Corey: And I think that is probably a very fair request. Just tell me what it is you're going to be doing here to validate the idea that is in your head because the most interesting lies I've found customers tell isn't intentionally to me or anyone else; it's to themselves. The narrative of what they think they're doing from the early days takes root, and never mind the fact that, yeah, it turns out that now that you've scaled out, maybe development isn't 80% of your cloud bill anymore. You learn things and your understanding of what you're doing has to evolve with the evolution of the applications.Tim: Yep. It's a fun time to be around. I mean, it's so great; right at the moment lock-in just isn't that big an issue. And let's be clear—I'm sure you'll agree with me on this, Corey—is if you're a startup and you're trying to grow and scale and prove you've got a viable business, and show that you have exponential growth and so on, don't think about lock-in; just don't go near it. Pick a cloud provider, pick whichever cloud provider your CTO already knows how to use, and just go all-in on them, and use all their most advanced features and be serverless if you can. It's the only sane way forward. You're short of time, you're short of money, you need growth.Corey: “Well, what if you need to move strategically in five years?” You should be so lucky. Great. Deal with it then. Or, “Well, what if we want to sell to retail as our primary market and they hate AWS?”Well, go all-in on a provider; probably not that one. Pick a different provider and go all in. I do not care which cloud any given company picks. Go with what's right for you, but then go all in because until you have a compelling reason to do otherwise, you're going to spend more time solving global problems locally.Tim: That's right. And we've never actually said this probably because it's something that both you and I know at the core of our being, but it probably needs to be said that being multi-cloud is expensive, right? Because the nouns and verbs that describe what clouds do are different in Google-land and AWS-land; they're just different. And it's hard to think about those things. And you lose the capability of using the advanced serverless stuff. There are a whole bunch of costs to being multi-cloud.Now, maybe if you're existentially afraid of lock-in, you don't care. But for I think most normal people, ugh, it's expensive.Corey: Pay now or pay later, you will pay. Wouldn't you ideally like to see that dollar go as far as possible? I'm right there with you because it's not just the actual infrastructure costs that's expensive, it costs something far more dear and expensive, and that is the cognitive expense of having to think about both of these things, not just how each cloud provider works, but how each one breaks. You've done this stuff longer than I have; I don't think that either of us trust a system that we don't understand the failure cases for and how it's going to degrade. It's, “Oh, right. You built something new and awesome. Awesome. How does it fall over? What direction is it going to hit, so what side should I not stand on?” It's based on an understanding of what you're about to blow holes in.Tim: That's right. And you know, I think particularly if you're using AWS heavily, you know that there are some things that you might as well bet your business on because, you know, if they're down, so is the rest of the world, and who cares? And, other things, eh, maybe a little chance here. So, understanding failure modes, understanding your stuff, you know, the cost of sharp edges, understanding manageability issues. It's not obvious.Corey: It's really not. Tim, I want to thank you for taking the time to go through this, frankly, excellent post with me. If people want to learn more about how you see things, and I guess how you view the world, where's the best place to find you?Tim: I'm on Twitter, just @timbray T-I-M-B-R-A-Y. And my blog is at tbray.org, and that's where that piece you were just talking about is, and that's kind of my online presence.Corey: And we will, of course, put links to it in the [show notes 00:37:42]. Thanks so much for being so generous with your time. It's always a pleasure to talk to you.Tim: Well, it's always fun to talk to somebody who has shared passions, and we clearly do.Corey: Indeed. Tim Bray principal at Textuality Services. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you then need to take to all of the other podcast platforms out there purely for redundancy, so you don't get locked into one of them.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AidanAidan is an AWS enthusiast, due in no small part to sharing initials with the cloud. He's been writing software for over 20 years and getting paid to do it for the last 10. He's still not sure what he wants to be when he grows up.Links: Stedi: https://www.stedi.com/ GitHub: https://github.com/aidansteele Blog posts: https://awsteele.com/ Ipv6-ghost-ship: https://github.com/aidansteele/ipv6-ghost-ship Twitter: https://twitter.com/__steele TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by someone who is honestly, feels like they're after my own heart. Aidan Steele by day is a serverless engineer at Stedi, but by night, he is an absolute treasure and a delight because not only does he write awesome third-party tooling and blog posts and whatnot around the AWS ecosystem, but he turns them into the most glorious, intricate, and technical shit posts that I think I've ever seen. Aidan, thank you for joining me.Aidan: Hi, Corey, thanks for having me. It's an honor to be here. Hopefully, we get to talk some AWS, and maybe also talk some nonsense as well.Corey: I would argue that in many ways, those things are one in the same. And one of the things I always appreciated about how you approach things is, you definitely seem to share that particular ethos with me. And there's been a lot of interesting content coming out from you in recent days. The thing that really wound up showing up on my radar in a big way was back at the start of January—2022, for those listening to this in the glorious future—about using IPv6 to use multi-factor auth, which it is so… I don't even have the adjectives to throw at this because, first it is ridiculous, two, it is effective, and three, it is just who thinks like that? What is this and what did you—what monstrosity have you built?Aidan: So, what did I end up calling it? I think it was ipv6-ghost-ship. And I think I called it that because I'd recently watched, oh, what was that series that was recently on Apple TV? Uh, the Isaac Asimov—Corey: If it's not Paw Patrol, I have no idea what it is because I have a four-year-old who is very insistent about these things. It is not so much a TV show as it is a way of life. My life is terrible. Please put me out of my misery.Aidan: Well, at least it's not Bluey. That's the one I usually hear about. That's Australia's greatest export. But it was one of the plot devices was a ship that would teleport around the place, and you could never predict where it was next. And so no one could access it. And I thought, “Oh, what about if I use the IPv6 address space?”Corey: Oh, Foundation?Aidan: That's the one. Foundation. That's how the name came about. The idea, honestly, it was because I saw—when was it?—sometime last year, AWS added support for those IP address prefixes. IPv4 prefixes were small; very useful and important, but IPv6 with more than 2 trillion IP addresses, per instance, I thought there's got to be fun to be had there.Corey: 281 trillion, I believe is the—Aidan: 281 trillion.Corey: Yeah. It is sarcastically large space. And that also has effectively, I would say in InfoSec sense, killed port scanning, the idea I'm going to scan the IP range and see what's there, just because that takes such a tremendous amount of time. Now here, in reality, you also wind up with people using compromised resources, and yeah, it turns out, I can absolutely scan trillions upon trillions of IP addresses as long as I'm using your AWS account and associated credit card in which to do it. But here in the real world, it is not an easily discoverable problem space.Aidan: Yeah. I made it as a novelty, really. I was looking for a reason to learn more about IPv6 and subnetting because it's the term I'd heard, a thing I didn't really understand, and the way I learn things is by trying to build them, realizing I have no idea what I'm doing, googling the error messages, reluctantly looking at the documentation, and then repeating until I've built something. And yeah, and then I built it, published it, and seemed to be pretty popular. It struck a chord. People retweeted it. It tickled your fancy. I think it spoke something in all of us who are trying not to take our jobs too seriously, you know, know we can have a little fun with this ludicrous tech that we get to play with.Corey: The idea being, you take the multi-factor auth code that your thing generates, and that is the last series of octets for the IP address you wind up going towards and that is such a large problem space that you're not going to find it in time, so whatever it is automatically connect to that particular IP address because that's the only one that's going to be listening for a 30 to 60-second span for the connection to be established. It is a great idea because SSH doesn't support this stuff natively. There's no good two-factor auth approach for this. And I love it. I'd be scared to death to run this in production for something that actually matters.And we also start caring a lot more about how accurate are the clocks on those instances, all of a sudden. But, oh, I just love the concept so much because it hits on the ethos of—I think—what so much of the cloud does were these really are fundamental building blocks that we can use to build incredible, awe-inspiring things that are globe-spanning, and also ridiculousness. And there's so much value of being able to do the same thing, sometimes at the same time.Aidan: Yeah, it's interesting, you mentioned, like, never using in prod, and I guess when I was building it, I thought, you know, that would be apparent. Like, “Yes, this is very neat, but surely no one's going to use it.” And I did see someone raised an issue on the GitHub project which was talking about clock skew. And I mentioned—Corey: Here at the bank where I'm running this in production, we're—Aidan: [laugh].Corey: —having some trouble with the clock. Yeah, it's—Aidan: You know, I mentioned that the underlying 2FA library did account for clock scheme 30 seconds either way, but it made me realize, I might need to put a disclaimer on the project. While the code is probably reasonably sound, I personally wouldn't run it in production, and it was more meant to be a piece of performance art or something to tickle one's fancy and to move on, not to roll it out. But I don't know, different strokes for different folks.Corey: I have gotten a lot better about calling out my ridiculous shitpost things when I do them. And the thing that really drove that home for me was talking about using DNS TXT records to store information about what server a virtual machine lives on—or container or whatnot—thus using Route 53 is a database. And that was a great gag, and then someone did a Reddit post of “This seems like a really good idea, so I'm going to start doing it, and I'm having these questions.”And at that point is like, “Okay, I've got a break character at that point.” And is, yeah, “Hi. That's my joke. Don't do it because X, Y, and Z are your failure modes, there are better tools for it. So yeah, there are ways you can do this with DNS, but it's not generally a great idea, and there are some risk factors to it. And okay, A, B, and C are the things you don't want to do, so let's instead do it in a halfway intelligent way because it's only funny if everyone's laughing. Otherwise, we fall into this trap of people take you seriously and they feel bad as a result when it doesn't work in production. So, calling it out as this is a joke tends to put a lot of that aside. It also keeps people from feeling left out.Aidan: Yeah. I realized that because the next novelty project I did a few days later—not sure if you caught it—it was a Rick Roll over ICMPv6 packets, where if you had run ping six to a certain IP range, it would return the lyrics to music's greatest treasure. So, I think that was hopefully a bit more self-evident that this should never be taken seriously. Who knows, I'm sure someone will find a use for it in prod.Corey: And I was looking through this, this is great. I love some of the stuff that you're doing because it's just fantastic. And I started digging a bit more to things you had done. And at that point, it was whoa, whoa, whoa, wait a minute. Back in 2020, you found an example of an issue with AWS's security model where CloudTrail would just start—if asked nicely—spewing other people's credential sets and CloudTrail events and whatnot into your account.And, A, that's kind of a problem. B, it was something that didn't make that big of a splash when it came out—I don't even think I linked to it at the time—and, C, it was examples of after the recent revelations around CloudFormation and Glue that the fine folks at Orca Security found out. That wasn't a one-off because you'd done this a year beforehand. We have now an established track record of cross-account data sharing and, potentially, exploits, and I'm looking at this and I got to level with you I felt incredibly naive because I had assumed that since we hadn't heard of this stuff in any real big sense that it simply didn't happen.So, when we heard about Azure; obviously, it's because Azure is complete clown shoes and the excellent people that AWS would never make these sorts of mistakes. Except we now have evidence that they absolutely did and didn't talk about it publicly. And I've got a level with you. I feel more than a little bit foolish, betrayed, naive for all this. What's your take on it?Aidan: Yeah, so just to clarify, it wasn't actually in your account. It was the new AWS custom resource execution model was you would upload a Lambda function that would run in an Amazon-managed account. And so that immediately set off my spidey sense because executing code in someone else's account seems fraught with peril. And so—Corey: Yeah, you can do all kinds of horrifying things there, like, use it to run containers.Aidan: Yeah. [laugh]. Thankfully, I didn't do anything that egregious. I stayed inside the Lambda function, but I look—I poked around at what credentials have had, and it would use CloudWatch to reinvoke itself and CloudWatch kept recording CloudTrail. And I won't go into all the details, but it ended up being that you could see credentials being recorded in CloudTrail in that account, and I could, sort of, funnel them out of there.When I found this, I was a little scared, and I don't think I'd reported an issue to AWS before, so I didn't want to go too far and do anything that could be considered malicious. So, I didn't actively seek out other people's credentials.Corey: Yeah, as a general rule, it's best once you discover things like that to do the right thing and report it, not proceed to, you know, inadvertently commit felonies.Aidan: Yeah. Especially because it was my first time. I felt better safe than sorry. So, I didn't see other credentials, but I had no reason to believe that, I wouldn't see it if I kept looking. I reported it to Amazon. Their security team was incredibly professional, made me feel very comfortable reporting it, and let me know when, you know, they'd remediated it, which was a matter of days later.But afterwards, it left me feeling a little surprised because I was able to publish about it, and a few people responded, you know, the sorts of people who pay close attention to the industry, but Amazon didn't publish anything as far as I was aware. And it changed the way I felt about AWS security, because like you, I sort of felt that AWS, more or less had a pretty perfect track record. They would have advisories about possible [Zen 00:12:04] exploits, and so on. But they'd never published anything about potential for compromise. And it makes me wonder how many of the things might have been reported in the past where either the third-party researcher either didn't end up publishing, or they published and it just disappeared into the blogosphere, and I hadn't seen it.Corey: They have a big earn trust principle over there, and I think that they always focus on the trust portion of it, but I think what got overlooked is the earn. When people are giving you trust that you haven't earned, on some level, the right thing to do is to call it out and be transparent around these things. Yes, I know, Wall Street's going to be annoyed and headlines, et cetera, et cetera, but I had always had the impression that had there been a cross-account vulnerability or a breach of some sort, they would communicate this and they would have their executives go on a speaking tour about it to explain how defense-in-depth mitigated some of it, and/or lessons learned, and/or what else we can learn. But it turns out that wasn't was happening at all. And I feel like they have been given trust that was unearned and now I am not happy with it.I suddenly have a lot more of a, I guess, skeptical position toward them as a result, and I have very little tolerance left for what has previously been a staple of the AWS security discussions, which is an executive getting on stage for a while and droning on about the shared responsibility model with the very strong implication that “Oh, yeah, we're fine. It's all on your side of the fence that things are going to break.” Yeah, turns out, that's not so true. Just you know, about the things on your side of the fence in a way that you don't about the things that are on theirs.Aidan: Yeah, it's an interesting one. Like, I think about it and I think, “Well, they never made an explicit promise that they would publish these things,” so, on one hand, I say to myself, “Oh, maybe that's on me for making that assumption.” But, I don't know, I feel like the way we felt was justified. Maybe naive in hindsight, but then, you know, I guess… I'm still not sure how to feel because of, like, I think about recent issues and how a couple of AWS Distinguished Engineers jumped on Twitter, and to their credit were extremely proactive in engaging with the community.But is that enough? It might be enough for say, to set my mind at ease or your mind at ease because we are, [laugh] to put it mildly, highly engaged, perhaps a little too engaged in the AWS space, but Twitter's very ephemeral. Very few of AWS's customers—Corey: Yeah, I can't link to tweets by distinguished engineers to present to an executive leadership team as an official statement from Amazon. I just can't.Aidan: Yeah. Yeah.Corey: And so the lesson we can take from this is okay, so “Well, we never actually said this.” “So, let me get this straight. You're content to basically let people assume whatever they want until they ask you an explicit question around these things. Really? Is that the lesson you want me to take from this? Because I have a whole bunch of very explicit questions that I will be asking you going forward, if that is in fact, your position. And you are not going to like the fact that I'm asking these questions.”Even if the answer is a hard no, people who did not have this context are going to wonder why are people asking those questions? It's a massive footgun here for them if that is the position that they intend to have. I want to be clear as well; this is also a messaging problem. It is not in any way, a condemnation of their excellent folks working on the security implementation themselves. This stuff is hard and those people are all-stars. I want to be very clear on this. It is purely around the messaging and positioning of the security posture.Aidan: Yeah, yeah. That's a good clarification because like you, my understanding that the service teams are doing a really stellar, above-average job, industry-wide, and the AWS Security Response Teams, I have absolute faith in them. It is a matter of messaging. And I guess what particularly brings it to front-of-mind is, it was earlier this month, or maybe it was last month, I received an email from a company called Sourcegraph. They do code search.I'm not even a customer of theirs yet, you know? I'm on a free trial, and I got an email that—I'm paraphrasing here—was something to the effect of, we discovered that it was possible for your code to appear in other customers' code search results. It was discovered by one of our own engineers. We found that the circumstances hadn't cropped up, but we wanted to tell you that it was possible. It didn't happen, and we're working on making sure it won't happen again.And I think about how radically different that is where they didn't have a third-party researcher forcing their hand; they could have very easily swept under the rug, but they were so proactive that, honestly, that's probably what's going to tipped me over to the edge into me becoming a customer. I mean, other than them having a great product. But yeah, it's a big contrast. It's how I like to see other companies work, especially Amazon.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: The two companies that I can think of that have had security problems have been CircleCI and Travis CI. Circle had an incredibly transparent early-on blog post, they engaged with customers on the forums, and they did super well. Travis basically denied, stonewalled for ages, and now the only people who use Travis are there because they haven't found a good way to get off of it yet. It is effectively DOA. And I don't think those two things are unrelated.Aidan: Yeah. No, that's a great point. Because you know, I've been in this industry long enough. You have to know that humans write code and humans make mistakes—I know I've made more than my fair share—and I'm not going to write off the company for making a mistake. It's entirely in their response. And yeah, you're right. That's why Circle is still a trustworthy business that should earn people's business and why Travis—why I recommend everyone move away from.Corey: Yeah, I like Orca Security as a company and as a product, but at the moment, I am not their customer. I am AWS's customer. So, why the hell am I hearing it from Orca and not AWS when this happens?Aidan: Yeah, yeah. It's… not great. On one hand, I'm glad I'm not in charge of finding a solution to this because I don't have the skills or the expertise to manage that communication. Because like I think you said in the past, there's a lot of different audiences that they have to communicate with. They have to communicate with the stock market, they have to communicate with execs, they have to communicate with developers, and each of those audiences demands a different level of detail, a different focus. And it's tricky. And how do you manage that? But, I don't know, I feel like you have an obligation to when people place that level of trust in you.Corey: It's just a matter of doing right by your customers, on some level.Aidan: Yeah.Corey: How long have you been working on an AWS-side environments? Clearly, this is not like, “Well, it's year two,” because if so I'm going to feel remarkably behind.Aidan: [laugh]. So, I've been writing code in some capacity or another for 20 years. It took about five years to get anyone to pay me to do so. But yeah, I guess the start of my professional career—and by ‘professional,' I want to use it in strictest term, means getting paid for money; not that I [laugh] am necessarily a professional—coincided with the launch of AWS. So, I don't hadn't experienced with the before times of data centers, never had to think about direct connect, but it means I have been using AWS since sometime in 2008.I was just looking at my bill earlier, I saw that my first bill was for $70. It was—I was using a C1xLarge, which was 80 cents an hour, and it had eight-core CPUs. And to put that in context at the time—Corey: Eight vCPUs, technically I believe.Aidan: An it basically is—Corey: —or were they using [eCPU 00:20:31] model back then?Aidan: Yeah, no, that was vCPUs. But to me, that was extraordinary. You know, I was somewhere just after high school. It was—the Netflix Prize was around. If you're not sure what that was, it was Netflix had this open competition where they said anyone who could improve upon their movie recommendation algorithm could win a million dollars.And obviously being a teenager, I had a massive ego and [laugh] no self-doubt, so I thought I could win this, but I just don't have enough CPUs or RAM on my laptop. And so when EC2 launched, and I could pay 80 cents an hour, rather than signing up for a 12-month contract with a colocation company, it was just a dream come true. I was able to run my terrible algorithms, but I could run them eight times faster. Unfortunately and obviously, I didn't win because it turns out, I'm not a world-class statistician. But—Corey: Common mistake. I make that mistake myself all the time.Aidan: [laugh]. Yeah. I mean, you know, I think I was probably 19 at the time, so I had—my ego did make me think I was one, but it turned out not to be so. But I think that was what really blew my mind was that me, a nobody, could create an account with Amazon and get access to these incredibly powerful machines for less than the dollar. And so I was hooked.Since then, I've worked at companies that are AWS customers since then. I've worked at places that have zero EC2 service, worked at places that have had thousands, and places in between. And it's got to a point, actually, where, I guess, my career is so entwined with AWS that one, my initials are actually AWS, but also—and this might sound ridiculous, and it's probably just a sign of my privilege—that I wouldn't consider working somewhere that used another cloud. Not—Corey: No, I think that's absolutely the right approach.Aidan: Yeah.Corey: I had a Twitter thread on this somewhat recently, and I'm going to turn it into a blog post because I got some pushback. If I were looking at doing something and I would come into the industry right now, my first choice would be Google Cloud because its developer experience is excellent. But I'm not coming to this without any experience. I have spent a decade or so learning not just how it was works, but also how it breaks, understanding the failure mode and what that's going to look like and what it's good at and what it's not. That's the valuable stuff for running things in a serious way.Aidan: Yeah. It's an interesting one. And I mean, for better or worse, AWS is big. I'm sure you will know much better than I do the exact numbers, but if a junior developer came to me and said, “Which cloud should I learn, or should I learn all of them?” I mean, you're right, Google Cloud does have a better developer experience, especially for new developers, but when I think about the sheer number of jobs that are available for developers, I feel like I would be doing them a disservice by not suggesting AWS, at least in Australia. It seems they've got such a huge footprint that you'll always be able to find a job working as an AWS-familiar engineer. It seems like that would be less the case with Google Cloud or Azure.Corey: Again, I am not sitting here, suggesting that anyone should, “Oh, clouds are insecure. We're going to run our own stuff in our own data centers.” That is ridiculous in this era. They are still going to do a better job of security than any of us will individually, let's be clear here. And it empowers and unlocks an awful lot of stuff.But with their privileged position as these hyperscale providers that are the default choice for building things, I think comes with a significant level of responsibility that I am displeased to discover that they've been abdicating. And I don't love that.Aidan: Yeah, it's an interesting one, right, because, like you're saying, they have access and the expertise that people doing it themselves will never match. So, you know, I'm never going to hesitate to recommend people use AWS on account security because your company's security posture will almost always be better for using AWS and following their guidelines, and so on. But yeah, like you say, with great power comes significant responsibility to earn trust and retain that trust by admitting and publicizing when mistakes are made.Corey: One last topic I want to get into with you is one that you and I have talked about very briefly elsewhere, that I feel like you and I are both relatively up-to-date on AWS intricacies. I think that we are both better than the average bear working with the platform. But I know that I feel this way, and I suspect you do too that VPCs have gotten confusing as hell. Is that just me? Am I a secret moron that no one bothered to ever tell me this, and I should update my own self-awareness?Aidan: [laugh]. Yeah, it's… I mean, that's been the story of my career with AWS. When I started, VPCs didn't exist. It was EC2 Classic—well, I guess at the time, it was just EC2—and it was simple. You launched an instance and you had an IP address.And then along came VPCs, and I think at the time, I thought something to the effect of “This seems like needless complexity. I'm not going to bother learning this. It will never be relevant.” In the end that wasn't true. I worked in much large deployments when VPCs made fantastic sense made a lot of things possible, but I still didn't go into the weeds.Since then, AWS has announced that EC2 Classic will be retired; an end of an era. I'm not personally still running anything in EC2 Classic, and I think they've done an incredible job of maintain support for this long, but VPC complexity has certainly been growing year-on-year since then. I recently was using the AWS console—like we all do and no one ever admits to—to edit a VPC subnet route table. And I clicked the drop-down box for a target, and I was overwhelmed by the number of options. There were NAT gateways, internet gateways, carrier gateways, I think there was a thing called a wavelength gateway, ENI, and… I [laugh] I think I was surprised because I just scroll through the list, and I thought, “Wow, that is a lot of different options. Why is that?”Especially because it's not so relevant to me. But I realized a big thing of what AWS has been doing lately is trying to make themselves available to people who haven't used the cloud yet. And they have these complicated networking needs, and it seems like they're trying to—reasonably successfully—make anything possible. But with that comes, you know, additional complexity.Corey: I appreciate that the capacity is there, but there has to be an abstraction model for getting rid of some of this complexity because otherwise, the failure mode is you wind up with this amazingly capable thing that can build marvels, but you also need to basically have a PhD in some of these things to wind up tying it all together. And if you bring someone else in to do it, then you have no idea how to run the thing. You're effectively a golden retriever trying to fly a space shuttle.Aidan: Yeah. It's interesting, like, clearly, they must be acutely aware of this because they have default VPCs, and for many use cases, that's all people should need. But as soon as you want, say a private subnet, then you need to either modify that default VPC or create a new one, and it's sort of going from 0 to 100 complexity extremely quickly because, you know, you need to create route tables to everyone's favorite net gateways, and it feels like the on-ramp needs to be not so steep. Not sure what the solution is, I hope they find one.Corey: As do I. I really want to thank you for taking the time to speak with me about so many of these things. If people want to learn more about what you're up to, where's the best place to find you?Aidan: Twitter's the best place. On Twitter, my username is @__Steele, which is S-T-E-E-L-E. From there, that's where I'll either—I'll at least speculate on the latest releases or link to some of the silly things I put on GitHub. Sometimes they're not so silly things. But yeah, that's where I can be found. And I'd love to chat to anyone about AWS. It's something I can geek out about all day, every day.Corey: And we will certainly include links to that in the [show notes 00:29:50]. Thank you so much for taking the time to speak with me today. I really appreciate it.Aidan: Well, thank you so much for having me. It's been an absolute delight.Corey: Aidan Steele, serverless engineer at Stedi, and shit poster extraordinaire. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an immediate request to correct the record about what I'm not fully understanding about AWS's piss-weak security communications.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
In this episode Mike and Ken talk about the magic of software defined things and how skill crossover is becoming a thing of the future. Maybe history is repeating itself. Whether it's endpoint detection and response, physical security, disaster recovery, networks, or a firewall, it seems like everything has a software defined equivalent. Developers and Application Security engineers are being called on more and more to know things they didn't have to even 5 years ago. The team digs into this topic by looking at it through two lenses, what skills engineers need, and how software deals its own set of pros and cons in cloud and modern infrastructure.
Links: S3 Bucket Negligence Award: http://saharareporters.com/2022/01/10/exclusive-hacker-breaks-nimc-server-steals-over-three-million-national-identity-numbers Anyone in a VPC, any VPC, anywhere: https://Twitter.com/santosh_ankr/status/1481387630973493251 A disgruntled developer corrupts their own NPM libs ‘colors' and ‘faker', breaking thousands of apps: https://www.bleepingcomputer.com/news/security/dev-corrupts-npm-libs-colors-and-faker-breaking-thousands-of-apps/ “Top ten security best practices for securing backups in AWS”: https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-backups-in-aws/ Glue: https://aws.amazon.com/security/security-bulletins/AWS-2022-002/ CloudFormation: https://aws.amazon.com/security/security-bulletins/AWS-2022-001/ S3-credentials: https://simonwillison.net/2022/Jan/18/weeknotes/ TranscriptCorey: This is the AWS Morning Brief: Security Edition. AWS is fond of saying security is job zero. That means it's nobody in particular's job, which means it falls to the rest of us. Just the news you need to know, none of the fluff.Corey: This episode is sponsored in part by my friends at Thinkst Canary. Most companies find out way too late that they've been breached. Thinkst Canary changes this and I love how they do it. Deploy canaries and canary tokens in minutes, and then forget about them. What's great is then attackers tip their hand by touching them, giving you one alert, when it matters. I use it myself and I only remember this when I get the weekly update with a, “We're still here, so you're aware,” from them. It's glorious. There is zero admin overhead to this, there are effectively no false positives unless I do something foolish. Canaries are deployed and loved on all seven continents. You can check out what people are saying atcanary.love. And, their Kube config canary token is new and completely free as well. You can do an awful lot without paying them a dime, which is one of the things I love about them. It is useful stuff and not a, “Oh, I wish I had money.” It is spectacular. Take a look. That'scanary.love because it's genuinely rare to find a security product that people talk about in terms of love. It really is a neat thing to see.Canary.love. Thank you to Thinkst Canary for their support of my ridiculous, ridiculous nonsense.Corey: So, yesterday's episode put the boots to AWS, not so much for the issues that Orca Security uncovered, but rather for its poor communication around the topic. Now that that's done, let's look at the more mundane news from last week's cloud world. Every day is a new page around here, full of opportunity and possibility in equal measure.This week's S3 Bucket Negligence Award goes to the Nigerian government for exposing millions of their citizens to a third party who most assuredly did not follow coordinated disclosure guidelines. Whoops.There's an interesting tweet, and exploring it is still unfolding at time of this writing, but it looks that making an API Gateway ‘Private' doesn't mean, “To your VPCs,” but rather, “To anyone in a VPC, any VPC, anywhere.” This is evocative of the way that, “Any Authenticated AWS User,” for S3 buckets caused massive permissions issues industry-wide.And a periodic and growing concern is one of software supply chain—which is a fancy way of saying, “We're all built on giant dependency chains”—what happens when, say, a disgruntled developer corrupts their own NPM libs ‘colors' and ‘faker', breaking thousands of apps across the industry, including some of the AWS SDKs? How do we manage that risk? How do we keep developers gruntled?Corey: Are you building cloud applications with a distributed team? Check out Teleport, an open-source identity-aware access proxy for cloud resources. Teleport provides secure access for anything running somewhere behind NAT: SSH servers, Kubernetes clusters, internal web apps, and databases. Teleport gives engineers superpowers.Get access to everything via single sign-on with multi-factor, list and see all of SSH servers, Kubernetes clusters, or databases available to you in one place, and get instant access to them using tools you already have. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility, and ensuring compliance. And best of all, Teleport is open-source and a pleasure to use. Download Teleport at goteleport.com. That's goteleport.com.AWS had a couple of interesting things. The first is “Top ten security best practices for securing backups in AWS”. People really don't consider the security implications of their backups anywhere near seriously enough. It's not ‘live' but it's still got—by definition—a full set of your data just waiting to be harvested by nefarious types. Be careful with that.And of course, AWS had two security bulletins, one about its Glue issues, one about its CloudFormation issues. The former allowed cross-account access to other tenants. In theory. In practice, AWS did the responsible thing and kept every access event logged, going back for the full five years of the service's life. That's remarkably impressive.And lastly, I found an interesting tool called S3-credentials last week, and what it does is it helps generate tightly-scoped IAM policies that were previously limited to a single S3 bucket, but now are limited to a single prefix within that bucket. You can also make those credential sets incredibly short-lived. More things like this, please. I just tend to over-scope things way too much. And that's what happened Last Week in AWS: Security. Please feel free to reach out and tell me exactly what my problem is.Corey: Thank you for listening to the AWS Morning Brief: Security Edition with the latest in AWS security that actually matters. Please follow AWS Morning Brief on Apple Podcast, Spotify, Overcast—or wherever the hell it is you find the dulcet tones of my voice—and be sure to sign up for the Last Week in AWS newsletter at lastweekinaws.com.Announcer: This has been a HumblePod production. Stay humble.
About PetePete does many startup things at Allma. Links: Last Tweet in AWS: https://lasttweetinaws.com Twitter: https://twitter.com/petecheslock LinkedIn: https://www.linkedin.com/in/petecheslock/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database that is not the bind DNS server. If you're tired of managing open source Redis on your own, or you're using one of the vanilla cloud caching services, these folks have you covered with the go to manage Redis service for global caching and primary database capabilities; Redis Enterprise. To learn more and deploy not only a cache but a single operational data platform for one Redis experience, visit redis.com/hero. Thats r-e-d-i-s.com/hero. And my thanks to my friends at Redis for sponsoring my ridiculous non-sense. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I am joined—as is tradition, for a post re:Invent wrap up, a month or so later, once everything is time to settle—by my friend and yours, Pete Cheslock. Pete, how are you?Pete: Hi, I'm doing fantastic. New year; new me. That's what I'm going with.Corey: That's the problem. I keep hoping for that, but every time I turn around, it's still me. And you know, honestly, I wouldn't wish that on anyone.Pete: Exactly. [laugh]. I wouldn't wish you on me either. But somehow I keep coming back for this.Corey: So, in two-thousand twenty—or twenty-twenty, as the children say—re:Invent was fully virtual. And that felt weird. Then re:Invent 2021 was a hybrid event which, let's be serious here, is not really those things. They had a crappy online thing and then a differently crappy thing in person. But it didn't feel real to me because you weren't there.That is part of the re:Invent tradition. There's a midnight madness thing, there's a keynote where they announce a bunch of nonsense, and then Pete and I go and have brunch on the last day of re:Invent and decompress, and more or less talk smack about everything that crosses our minds. And you weren't there this year. I had to backfill you with Tim Banks. You know, the person that I backfield you with here at The Duckbill Group as a principal cloud economist.Pete: You know, you got a great upgrade in hot takes, I feel like, with Tim.Corey: And other ways, too, but it's rude of me to say that to you directly. So yeah, his hot takes are spectacular. He was going to be doing this with me, except you cannot mess with tradition. You really can't.Pete: Yeah. I'm trying to think how many—is this third year? It's at least three.Corey: Third or fourth.Pete: Yeah, it's at least three. Yeah, it was, I don't want to say I was sad to not be there because, with everything going on, it's still weird out there. But I am always—I'm just that weird person who actually likes re:Invent, but not for I feel like the reasons people think. Again, I'm such an extroverted-type person, that it's so great to have this, like, serendipity to re:Invent. The people that you run into and the conversations that you have, and prior—like in 2019, I think was a great example because that was the last one I had gone to—you know, having so many conversations so quickly because everyone is there, right? It's like this magnet that attracts technologists, and venture capital, and product builders, and all this other stuff. And it's all compressed into, like, you know, that five-day span, I think is the biggest part that makes so great.Corey: The fear in people's eyes when they see me. And it was fun; I had a pair of masks with me. One of them was a standard mask, and no one recognizes anyone because, masks, and the other was a printout of my ridiculous face, which was horrifyingly uncanny, but also made it very easy for people to identify me. And depending upon how social I was feeling, I would wear one or the other, and it worked flawlessly. That was worth doing. They really managed to thread the needle, as well, before Omicron hit, but after the horrors of last year. So, [unintelligible 00:03:00]—Pete: It really—Corey: —if it were going on right now, it would not be going on right now.Pete: Yeah. I talk about really—yeah—really just hitting it timing-wise. Like, not that they could have planned for any of this, but like, as things were kind of not too crazy and before they got all crazy again, it feels like wow, like, you know, they really couldn't have done the event at any other time. And it's like, purely due to luck. I mean, absolute one hundred percent.Corey: That's the amazing power of frugality. Because the reason is then is it's the week after Thanksgiving every year when everything is dirt cheap. And, you know, if there's one thing that I one-point-seve—sorry, their stock's in the toilet—a $1.6 trillion company is very concerned about, it is saving money at every opportunity.Pete: Well, the one thing that was most curious about—so I was at the first re:Invent in-what—2012 I think it was, and there was—it was quaint, right?—there was 4000 people there, I want to say. It was in the thousands of people. Now granted, still a big conference, but it was in the Sands Convention Center. It was in that giant room, the same number of people, were you know, people's booths were like tables, like, eight-by-ten tables, right? [laugh].It had almost a DevOpsDays feel to it. And I was kind of curious if this one had any of those feelings. Like, did it evoke it being more quaint and personable, or was it just as soulless as it probably has been in recent years?Corey: This was fairly soulless because they reduced the footprint of the event. They dropped from two expo halls down to one, they cut the number of venues, but they still had what felt like 20,000 people or something there. It was still crowded, it was still packed. And I've done some diligent follow-ups afterwards, and there have been very few cases of Covid that came out of it. I quarantined for a week in a hotel, so I don't come back and kill my young kids for the wrong reasons.And that went—that was sort of like the worst part of it on some level, where it's like great. Now I could sit alone at a hotel and do some catch-up and all the rest, but all right I'd kind of like to go home. I'm not used to being on the road that much.Pete: Yeah, I think we're all a little bit out of practice. You know, I haven't been on a plane in years. I mean, the travel I've done more recently has been in my car from point A to point B. Like, direct, you know, thing. Actually, a good friend of mine who's not in technology at all had to travel for business, and, you know, he also has young kids who are under five, so he when he got back, he actually hid in a room in their house and quarantine himself in the room. But they—I thought, this is kind of funny—they never told the kids he was home. Because they knew that like—Corey: So, they just thought the house was haunted?Pete: [laugh].Corey: Like, “Don't go in the west wing,” sort of level of nonsense. That is kind of amazing.Pete: Honestly, like, we were hanging out with the family because they're our neighbors. And it was like, “Oh, yeah, like, he's in the guest room right now.” Kids have no idea. [laugh]. I'm like, “Oh, my God.” I'm like, I can't even imagine. Yeah.Corey: So, let's talk a little bit about the releases of re:Invent. And I'm going to lead up with something that may seem uncharitable, but I don't think it necessarily is. There weren't the usual torrent of new releases for ridiculous nonsense in the same way that there have been previously. There was no, this service talks to satellites in space. I mean, sure, there was some IoT stuff to manage fleets of cars, and giant piles of robots, and cool, I don't have those particular problems; I'm trying to run a website over here.So okay, great. There were enhancements to a number of different services that were in many cases appreciated, in other cases, irrelevant. Werner said in his keynote, that it was about focusing on primitives this year. And, “Why do we have so many services? It's because you asked for it… as customers.”Pete: [laugh]. Yeah, you asked for it.Corey: What have you been asking for, Pete? Because I know what I've been asking for and it wasn't that. [laugh].Pete: It's amazing to see a company continually say yes to everything, and somehow, despite their best efforts, be successful at doing it. No other company could do that. Imagine any other software technology business out there that just builds everything the customers ask for. Like from a product management business standpoint, that is, like, rule 101 is, “Listen to your customers, but don't say yes to everything.” Like, you can't do everything.Corey: Most companies can't navigate the transition between offering the same software in the Cloud and on a customer facility. So, it's like, “Ooh, an on-prem version, I don't know, that almost broke the company the last time we tried it.” Whereas you have Amazon whose product strategy is, “Yes,” being able to put together a whole bunch of things. I also will challenge the assertion that it's the primitives that customers want. They don't want to build a data center out of popsicle sticks themselves. They want to get something that solves a problem.And this has been a long-term realization for me. I used to work at Media Temple as a senior systems engineer running WordPress at extremely large scale. My websites now run on WordPress, and I have the good sense to pay WP Engine to handle it for me, instead of doing it myself because it's not the most productive use of my time. I want things higher up the stack. I assure you I pay more to WP Engine than it would cost me to run these things myself from an infrastructure point of view, but not in terms of my time.What I see sometimes as the worst of all worlds is that AWS is trying to charge for that value-added pricing without adding the value that goes along with it because you still got to build a lot of this stuff yourself. It's still a very janky experience, you're reduced to googling random blog posts to figure out how this thing is supposed to work, and the best documentation comes from externally. Whereas with a company that's built around offering solutions like this, great. In the fullness of time, I really suspect that if this doesn't change, their customers are going to just be those people who build solutions out of these things. And let those companies capture the up-the-stack margin. Which I have no problem with. But they do because Amazon is a company that lies awake at night actively worrying that someone, somewhere, who isn't them might possibly be making money somehow.Pete: I think MongoDB is a perfect example of—like, look at their stock price over the last whatever, years. Like, they, I feel like everyone called for the death of MongoDB every time Amazon came out with their new things, yet, they're still a multi-billion dollar company because I can just—give me an API endpoint and you scale the database. There's is—Corey: Look at all the high-profile hires that Mongo was making out of AWS, and I can't shake the feeling they're sitting there going, “Yeah, who's losing important things out of production now?” It's, everyone is exodus-ing there. I did one of those ridiculous graphics of the naming all the people that went over there, and in—with the hurricane evacuation traffic picture, and there's one car going the other way that I just labeled with, “Re:Invent sponsorship check,” because yeah, they have a top tier sponsorship and it was great. I've got to say I've been pretty down on MongoDB for a while, for a variety of excellent reasons based upon, more or less, how they treated customers who were in pain. And I'd mostly written it off.I don't do that anymore. Not because I inherently believe the technology has changed, though I'm told it has, but by the number of people who I deeply respect who are going over there and telling me, no, no, this is good. Congratulations. I have often said you cannot buy authenticity, and I don't think that they are, but the people who are working there, I do not believe that these people are, “Yeah, well, you bought my opinion. You can buy their attention, not their opinion.” If someone changes their opinion, based upon where they work, I kind of question everything they're telling me is, like, “Oh, you're just here to sell something you don't believe in? Welcome aboard.”Pete: Right. Yeah, there's an interview question I like to ask, which is, “What's something that you used to believe in very strongly that you've more recently changed your mind on?” And out of politeness because usually throws people back a little bit, and they're like, “Oh, wow. Like, let me think about that.” And I'm like, “Okay, while you think about that I want to give you mine.”Which is in the past, my strongly held belief was we had to run everything ourselves. “You own your availability,” was the line. “No, I'm not buying Datadog. I can build my own metric stack just fine, thank you very much.” Like, “No, I'm not going to use these outsourced load balancers or databases because I need to own my availability.”And what I realized is that all of those decisions lead to actually delivering and focusing on things that were not the core product. And so now, like, I've really flipped 180, that, if any—anything that you're building that does not directly relate to the core product, i.e. How your business makes money, should one hundred percent be outsourced to an expert that is better than you. Mongo knows how to run Mongo better than you.Corey: “What does your company do?” “Oh, we handle expense reports.” “Oh, what are you working on this month?” “I'm building a load balancer.” It's like that doesn't add the value. Don't do that.Pete: Right. Exactly. And so it's so interesting, I think, to hear Werner say that, you know, we're just building primitives, and you asked for this. And I think that concept maybe would work years ago, when you had a lot of builders who needed tools, but I don't think we have any, like, we don't have as many builders as before. Like, I think we have people who need more complete solutions. And that's probably why all these businesses are being super successful against Amazon.Corey: I'm wondering if it comes down to a cloud economic story, specifically that my cloud bill is always going to be variable and it's difficult to predict, whereas if I just use EC2 instances, and I build load balancers or whatnot, myself, well, yeah, it's a lot more work, but I can predict accurately what my staff compensation costs are more effectively, that I can predict what a CapEx charge would be or what the AWS bill is going to be. I'm wondering if that might in some way shape it?Pete: Well, I feel like the how people get better in managing their costs, right, you'll eventually move to a world where, like, “Yep, okay, first, we turned off waste,” right? Like, step one is waste. Step two is, like, understanding your spend better to optimize but, like, step three, like, the galaxy brain meme of Amazon cost stuff is all, like, unit economics stuff, where trying to better understand the actual cost deliver an actual feature. And yeah, I think that actually gets really hard when you give—kind of spread your product across, like, a slew of services that have varying levels of costs, varying levels of tagging, so you can attribute it. Like, it's really hard. Honestly, it's pretty easy if I have 1000 EC2 servers with very specific tags, I can very easily figure out what it costs to deliver product. But if I have—Corey: Yeah, if I have Corey build it, I know what Corey is going to cost, and I know how many servers he's going to use. Great, if I have Pete it, Pete's good at things, it'll cut that server bill in half because he actually knows how to wind up being efficient with things. Okay, great. You can start calculating things out that way. I don't think that's an intentional choice that companies are making, but I feel like that might be a natural outgrowth of it.Pete: Yeah. And there's still I think a lot of the, like, old school mentality of, like, the, “Not invented here,” the, “We have to own our availability.” You can still own your availability by using these other vendors. And honestly, it's really heartening to see so many companies realize that and realize that I don't need to get everything from Amazon. And honestly, like, in some things, like I look at a cloud Amazon bill, and I think to myself, it would be easier if you just did everything from Amazon versus having these ten other vendors, but those ten other vendors are going to be a lot better at running the product that they build, right, that as a service, then you probably will be running it yourself. Or even Amazon's, like, you know, interpretation of that product.Corey: A few other things that came out that I thought were interesting, at least the direction they're going in. The changes to S3 intelligent tiering are great, with instant retrieval on Glacier. I feel like that honestly was—they talk a good story, but I feel like that was competitive response to Google offering the same thing. That smacks of a large company with its use case saying, “You got two choices here.” And they're like, “Well, okay. Crap. We're going to build it then.”Or alternately, they're looking at the changes that they're making to intelligent tiering, they're now shifting that to being the default that as far as recommendations go. There are a couple of drawbacks to it, but not many, and it's getting easier now to not have the mental overhead of trying to figure out exactly what your lifecycle policies are. Yeah, there are some corner cases where, okay, if I adjust this just so, then I could save 10% on that monitoring fee or whatnot. Yeah, but look how much work that's going to take you to curate and make sure that you're not doing something silly. That feels like it is such an in the margins issue. It's like, “How much data you're storing?” “Four exabytes.” Okay, yeah. You probably want some people doing exactly that, but that's not most of us.Pete: Right. Well, there's absolutely savings to be had. Like, if I had an exabyte of data on S3—which there are a lot of people who have that level of data—then it would make sense for me to have an engineering team whose sole purpose is purely an optimizing our data lifecycle for that data. Until a point, right? Until you've optimized the 80%, basically. You optimize the first 80, that's probably, air-quote, “Easy.” The last 20 is going to be incredibly hard, maybe you never even do that.But at lower levels of scale, I don't think the economics actually work out to have a team managing your data lifecycle of S3. But the fact that now AWS can largely do it for you in the background—now, there's so many things you have to think about and, like, you know, understand even what your data is there because, like, not all data is the same. And since S3 is basically like a big giant database you can query, you got to really think about some of that stuff. But honestly, what I—I don't know if—I have no idea if this is even be worked on, but what I would love to see—you know, hashtag #AWSwishlist—is, now we have countless tiers of EBS volumes, EBS volumes that can be dynamically modified without touching, you know, the physical host. Meaning with an API call, you can change from the gp2 to gp3, or io whatever, right?Corey: Or back again if it doesn't pan out.Pete: Or back again, right? And so for companies with large amounts of spend, you know, economics makes sense that you should have a team that is analyzing your volumes usage and modifying that daily, right? Like, you could modify that daily, and I don't know if there's anyone out there that's actually doing it at that level. And they probably should. Like, if you got millions of dollars in EBS, like, there's legit savings that you're probably leaving on the table without doing that. But that's what I'm waiting for Amazon to do for me, right? I want intelligent tiering for EBS because if you're telling me I can API call and you'll move my data and make that better, make that [crosstalk 00:17:46] better [crosstalk 00:17:47]—Corey: Yeah it could be like their auto-scaling for DynamoDB, for example. Gives you the capacity you need 20 minutes after you needed it. But fine, whatever because if I can schedule stuff like that, great, I know what time of day, the runs are going to kick off that beat up the disks. I know when end-of-month reporting fires off. I know what my usage pattern is going to be, by and large.Yeah, part of the problem too, is that I look at this stuff, and I get excited about it with the intelligent tiering… at The Duckbill Group we've got a few hundred S3 buckets lurking around. I'm thinking, “All right, I've got to go through and do some changes on this and implement all of that.” Our S3 bill's something like 50 bucks a month or something ridiculous like that. It's a no, that really isn't a thing. Like, I have a screenshot bucket that I have an app installed—I think called Dropshare—that hooks up to anytime I drag—I hit a shortcut, I drag with the mouse to select whatever I want and boom, it's up there and the URL is not copied to my clipboard, I can paste that wherever I want.And I'm thinking like, yeah, there's no cleanup on that. There's no lifecycle policy that's turning into anything. I should really go back and age some of it out and do the rest and start doing some lifecycle management. It—I've been using this thing for years and I think it's now a whopping, what, 20 cents a month for that bucket. It's—I just don't—Pete: [laugh].Corey: —I just don't care, other than voice in the back of my mind, “That's an unbounded growth problem.” Cool. When it hits 20 bucks a month, then I'll consider it. But until then I just don't. It does not matter.Pete: Yeah, I think yeah, scale changes everything. Start adding some zeros and percentages turned into meaningful numbers. And honestly, back on the EBS thing, the one thing that really changed my perspective of EBS, in general, is—especially coming from the early days, right? One terabyte volume, it was a hard drive in a thing. It was a virtual LUN on a SAN somewhere, probably.Nowadays, and even, like, many years after those original EBS volumes, like all the limits you get in EBS, those are actually artificial limits, right? If you're like, “My EBS volume is too slow,” it's not because, like, the hard drive it's on is too slow. That's an artificial limit that is likely put in place due to your volume choice. And so, like, once you realize that in your head, then your concept of how you store data on EBS should change dramatically.Corey: Oh, AWS had a blog post recently talking about, like, with io2 and the limits and everything, and there was architecture thinking, okay. “So, let's say this is insufficient and the quarter-million IOPS a second that you're able to get is not there.” And I'm sitting there thinking, “That is just ludicrous data volume and data interactivity model.” And it's one of those, like, I'm sitting here trying to think about, like, I haven't had to deal with a problem like that decade, just because it's, “Huh. Turns out getting these one thing that's super fast is kind of expensive.” If you paralyze it out, that's usually the right answer, and that's how the internet is mostly evolved. But there are use cases for which that doesn't work, and I'm excited to see it. I don't want to pay for it in my view, but it's nice to see it.Pete: Yeah, it's kind of fun to go into the Amazon calculator and price out one of the, like, io2 volumes and, like, maxed out. It's like, I don't know, like $50,000 a month or a hun—like, it's some just absolutely absurd number. But the beauty of it is that if you needed that value for an hour to run some intensive data processing task, you can have it for an hour and then just kill it when you're done, right? Like, that is what is most impressive.Corey: I copied 130 gigs of data to an EFS volume, which was—[unintelligible 00:21:05] EFS has gone from “This is a piece of junk,” to one of my favorite services. It really is, just because of its utility and different ways of doing things. I didn't have the foresight, just use a second EFS volume for this. So, I was unzipping a whole bunch of small files onto it. Great.It took a long time for me to go through it. All right, now that I'm done with that I want to clean all this up. My answer was to ultimately spin up a compute node and wind up running a whole bunch of—like, 400, simultaneous rm-rf on that long thing. And it was just, like, this feels foolish and dumb, but here we are. And I'm looking at the stats on it because the instance was—all right, at that point, the load average [on the instance 00:21:41] was like 200, or something like that, and the EFS volume was like, “Ohh, wow, you're really churning on this. I'm now at, like, 5% of the limit.” Like, okay, great. It turns out I'm really bad at computers.Pete: Yeah, well, that's really the trick is, like, yeah, sure, you can have a quarter-million IOPS per second, but, like, what's going to break before you even hit that limit? Probably many other things.Corey: Oh, yeah. Like, feels like on some level if something gets to that point, it a misconfiguration somewhere. But honestly, that's the thing I find weirdest about the world in which we live is that at a small-scale—if I have a bill in my $5 a month shitposting account, great. If I screw something up and cost myself a couple hundred bucks in misconfiguration it's going to stand out. At large scale, it doesn't matter if—you're spending $50 million a year or $500 million a year on AWS and someone leaks your creds, and someone spins up a whole bunch of Bitcoin miners somewhere else, you're going to see that on your bill until they're mining basically all the Bitcoin. It just gets lost in the background.Pete: I'm waiting for those—I'm actually waiting for the next level of them to get smarter because maybe you have, like, an aggressive tagging system and you're monitoring for untagged instances, but the move here would be, first get the creds and query for, like, the most used tags and start applying those tags to your Bitcoin mining instances. My God, it'll take—Corey: Just clone a bunch of tags. Congratulations, you now have a second BI Elasticsearch cluster that you're running yourself. Good work.Pete: Yeah. Yeah, that people won't find that until someone comes along after the fact that. Like, “Why do we have two have these things?” And you're like—[laugh].Corey: “Must be a DR thing.”Pete: It's maxed-out CPU. Yeah, exactly.Corey: [laugh].Pete: Oh, the terrible ideas—please, please, hackers don't take are terrible ideas.Corey: I had a, kind of, whole thing I did on Twitter years ago, talking about how I would wind up using the AWS Marketplace for an embezzlement scheme. Namely, I would just wind up spinning up something that had, like, a five-cent an hour charge or whatnot on just, like, basically rebadge the CentOS Community AMI or whatnot. Great. And then write a blog post, not attached to me, that explains how to do a thing that I'm going to be doing in production in a week or two anyway. Like, “How to build an auto-scaling group,” and reference that AMI.Then if it ever comes out, like, “Wow, why are we having all these marketplace charges on this?” “I just followed the blog post like it said here.” And it's like, “Oh, okay. You're a dumbass. The end.”That's the way to do it. A month goes by and suddenly it came out that someone had done something similarly. They wound up rebadging these community things on the marketplace and charging big money for it, and I'm sitting there going like that was a joke. It wasn't a how-to. But yeah, every time I make these jokes, I worry someone's going to do it.Pete: “Welcome to large-scale fraud with Corey Quinn.”Corey: Oh, yeah, it's fraud at scale is really the important thing here.Corey: This episode is sponsored by our friends at Oracle HeatWave is a new high-performance accelerator for the Oracle MySQL Database Service. Although I insist on calling it “my squirrel.” While MySQL has long been the worlds most popular open source database, shifting from transacting to analytics required way too much overhead and, ya know, work. With HeatWave you can run your OLTP and OLAP, don't ask me to ever say those acronyms again, workloads directly from your MySQL database and eliminate the time consuming data movement and integration work, while also performing 1100X faster than Amazon Aurora, and 2.5X faster than Amazon Redshift, at a third of the cost. My thanks again to Oracle Cloud for sponsoring this ridiculous nonsense.Corey: I still remember a year ago now at re:Invent 2021 was it, or was it 2020? Whatever they came out with, I want to say it wasn't gp3, or maybe it was, regardless, there was a new EBS volume type that came out that you were playing with to see how it worked and you experimented with it—Pete: Oh, yes.Corey: —and the next morning, you looked at the—I checked Slack and you're like well, my experiments yesterday cost us $5,000. And at first, like, the—my response is instructive on this because, first, it was, “Oh, my God. What's going to happen now?” And it's like, first, hang on a second.First off, that seems suspect but assume it's real. I assumed it was real at the outset. It's “Oh, right. This is not my personal $5-a-month toybox account. We are a company; we can absolutely pay that.” Because it's like, I could absolutely reach out, call it a favor. “I made a mistake, and I need a favor on the bill, please,” to AWS.And I would never live it down, let's be clear. For a $7,000 mistake, I would almost certainly eat it. As opposed to having to prostrate myself like that in front of Amazon. I'm like, no, no, no. I want one of those like—if it's like, “Okay, you're going to, like, set back the company roadmap by six months if you have to pay this. Do you want to do it?” Like, [groans] “Fine, I'll eat some crow.”But okay. And then followed immediately by, wow, if Pete of all people can mess this up, customers are going to be doomed here. We should figure out what happened. And I'm doing the math. Like, Pete, “What did you actually do?” And you're sitting there and you're saying, “Well, I had like a 20 gig volume that I did this.” And I'm doing the numbers, and it's like—Pete: Something's wrong.Corey: “How sure are you when you say ‘gigabyte,' that you were—that actually means what you think it did? Like, were you off by a lot? Like, did you mean exabytes?” Like, what's the deal here?Pete: Like, multiple factors.Corey: Yeah. How much—“How many IOPS did you give that thing, buddy?” And it turned out what happened was that when they launched this, they had mispriced it in the system by a factor of a million. So, it was fun. I think by the end of it, all of your experimentation was somewhere between five to seven cents. Which—Pete: Yeah. It was a—Corey: Which is why you don't work here anymore because no one cost me seven cents of money to give to Amazon—Pete: How dare you?Corey: —on my watch. Get out.Pete: How dare you, sir?Corey: Exactly.Pete: Yeah, that [laugh] was amazing to see, as someone who has done—definitely maid screw-ups that have cost real money—you know, S3 list requests are always a fun one at scale—but that one was supremely fun to see the—Corey: That was a scary one because another one they'd done previously was they had messed up Lightsail pricing, where people would log in, and, like, “Okay, so what is my Lightsail instance going to cost?” And I swear to you, this is true, it was saying—this was back in 2017 or so—the answer was, like, “$4.3 billion.” Because when you see that you just start laughing because you know it's a mistake. You know, that they're not going to actually demand that you spend $4.3 billion for a single instance—unless it's running SAP—and great.It's just, it's a laugh. It's clearly a mispriced, and it's clearly a bug that's going to get—it's going to get fixed. I just spun up this new EBS volume that no one fully understands yet and it cost me thousands of dollars. That's the sort of thing that no, no, I could actually see that happening. There are instances now that cost something like 100 bucks an hour or whatnot to run. I can see spinning up the wrong thing by mistake and getting bitten by it. There's a bunch of fun configuration mistakes you can make that will, “Hee, hee, hee. Why can I see that bill spike from orbit?” And that's the scary thing.Pete: Well, it's the original CI and CD problem of the per-hour billing, right? That was super common of, like, yeah, like, an i3, you know, 16XL server is pretty cheap per hour, but if you're charged per hour and you spin up a bunch for five minutes. Like, it—you will be shocked [laugh] by what you see there. So—Corey: Yeah. Mistakes will show. And I get it. It's also people as individuals are very different psychologically than companies are. With companies it's one of those, “Great we're optimizing to bring in more revenue and we don't really care about saving money at all costs.”Whereas people generally have something that looks a lot like a fixed income in the form of a salary or whatnot, so it's it is easier for us to cut spend than it is for us to go out and make more money. Like, I don't want to get a second job, or pitch my boss on stuff, and yeah. So, all and all, routing out the rest of what happened at re:Invent, they—this is the problem is that they have a bunch of minor things like SageMaker Inference Recommender. Yeah, I don't care. Anything—Pete: [laugh].Corey: —[crosstalk 00:28:47] SageMaker I mostly tend to ignore, for safety. I did like the way they described Amplify Studio because they made it sound like a WYSIWYG drag and drop, build a React app. It's not it. It basically—you can do that in Figma and then it can hook it up to some things in some cases. It's not what I want it to be, which is Honeycode, except good. But we'll get there some year. Maybe.Pete: There's a lot of stuff that was—you know, it's the classic, like, preview, which sure, like, from a product standpoint, it's great. You know, they have a level of scale where they can say, “Here's this thing we're building,” which could be just a twinkle in a product managers, call it preview, and get thousands of people who would be happy to test it out and give you feedback, and it's a, it's great that you have that capability. But I often look at so much stuff and, like, that's really cool, but, like, can I, can I have it now? Right? Like—or you can't even get into the preview plan, even though, like, you have that specific problem. And it's largely just because either, like, your scale isn't big enough, or you don't have a good enough relationship with your account manager, or I don't know, countless other reasons.Corey: The thing that really throws me, too, is the pre-announcements that come a year or so in advance, like, the Outpost smaller ones are finally available, but it feels like when they do too many pre-announcements or no big marquee service announcements, as much as they talk about, “We're getting back to fundamentals,” no, you have a bunch of teams that blew the deadline. That's really what it is; let's not call it anything else. Another one that I think is causing trouble for folks—I'm fortunate in that I don't do much work with Oracle databases, or Microsoft SQL databases—but they extended RDS Custom to Microsoft SQL at the [unintelligible 00:30:27] SQL server at re:Invent this year, which means this comes down to things I actually use, we're going to have a problem because historically, the lesson has always been if I want to run my own databases and tweak everything, I do it on top of an EC2 instance. If I want to managed database, relational database service, great, I use RDS. RDS Custom basically gives you root into the RDS instance. Which means among other things, yes, you can now use RDS to run containers.But it lets you do a lot of things that are right in between. So, how do you position this? When should I use RDS Custom? Can you give me an easy answer to that question? And they used a lot of words to say, no, they cannot. It's basically completely blowing apart the messaging and positioning of both of those services in some unfortunate ways. We'll learn as we go.Pete: Yeah. Honestly, it's like why, like, why would I use this? Or how would I use this? And this is I think, fundamentally, what's hard when you just say yes to everything. It's like, they in many cases, I don't think, like, I don't want to say they don't understand why they're doing this, but if it's not like there's a visionary who's like, this fits into this multi-year roadmap.That roadmap is largely—if that roadmap is largely generated by the customers asking for it, then it's not like, oh, we're building towards this Northstar of RDS being whatever. You might say that, but your roadmap's probably getting moved all over the place because, you know, this company that pays you a billion dollars a year is saying, “I would give you $2 billion a year for all of my Oracle databases, but I need this specific thing.” I can't imagine a scenario that they would say, “Oh, well, we're building towards this Northstar, and that's not on the way there.” Right? They'd be like, “New Northstar. Another billion dollars, please.”Corey: Yep. Probably the worst release of re:Invent, from my perspective, is RUM, Real User Monitoring, for CloudWatch. And I, to be clear, I wrote a shitposting Twitter threading client called Last Tweet in AWS. Go to lasttweetinaws.com. You can all use it. It's free; I just built this for my own purposes. And I've instrumented it with RUM. Now, Real User Monitoring is something that a lot of monitoring vendors use, and also CloudWatch now. And what that is, is it embeds a listener into the JavaScript that runs on client load, and it winds up looking at what's going on loading times, et cetera, so you can see when users are unhappy. I have no problem with this. Other than that, you know, liking users? What's up with that?Pete: Crazy.Corey: But then, okay, now, what this does is unlike every other RUM tool out there, which charges per session, meaning I am going to be… doing a web page load, it charges per data item, which includes HTTP errors, or JavaScript errors, et cetera. Which means that if you have a high transaction volume site and suddenly your CDN takes a nap like Fastly did for an hour last year, suddenly your bill is stratospheric for this because errors abound and cascade, and you can have thousands of errors on a single page load for these things, and it is going to be visible from orbit, at least with a per session basis thing, when you start to go viral, you understand that, “Okay, this is probably going to cost me some more on these things, and oops, I guess I should write less compelling content.” Fine. This is one of those one misconfiguration away and you are wailing and gnashing teeth. Now, this is a new service. I believe that they will waive these surprise bills in the event that things like that happen. But it's going to take a while and you're going to be worrying the whole time if you've rolled this out naively. So it's—Pete: Well and—Corey: —I just don't like the pricing.Pete: —how many people will actively avoid that service, right? And honestly, choose a competitor because the competitor could be—the competitor could be five times more expensive, right, on face value, but it's the certainty of it. It's the uncertainty of what Amazon will charge you. Like, no one wants a surprise bill. “Well, a vendor is saying that they'll give us this contract for $10,000. I'm going to pay $10,000, even though RUM might be a fraction of that price.”It's honestly, a lot of these, like, product analytics tools and monitoring tools, you'll often see they price be a, like, you know, MAU, Monthly Active User, you know, or some sort of user-based pricing, like, the number of people coming to your site. You know, and I feel like at least then, if you are trying to optimize for lots of users on your site, and more users means more revenue, then you know, if your spend is going up, but your revenue is also going up, that's a win-win. But if it's like someone—you know, your third-party vendor dies and you're spewing out errors, or someone, you know, upgraded something and it spews out errors. That no one would normally see; that's the thing. Like, unless you're popping open that JavaScript console, you're not seeing any of those errors, yet somehow it's like directly impacting your bottom line? Like that doesn't feel [crosstalk 00:35:06].Corey: Well, there is something vaguely Machiavellian about that. Like, “How do I get my developers to care about errors on consoles?” Like, how about we make it extortionately expensive for them not to. It's, “Oh, all right, then. Here we go.”Pete: And then talk about now you're in a scenario where you're working on things that don't directly impact the product. You're basically just sweeping up the floor and then trying to remove errors that maybe don't actually affect it and they're not actually an error.Corey: Yeah. I really do wonder what the right answer is going to be. We'll find out. Again, we live, we learn. But it's also, how long does it take a service that has bad pricing at launch, or an unfortunate story around it to outrun that reputation?People are still scared of Glacier because of its original restore pricing, which was non-deterministic for any sensible human being, and in some cases lead to I'm used to spending 20 to 30 bucks a month on this. Why was I just charged two grand?Pete: Right.Corey: Scare people like that, they don't come back.Pete: I'm trying to actually remember which service it is that basically gave you an estimate, right? Like, turn it on for a month, and it would give you an estimate of how much this was going to cost you when billing started.Corey: It was either Detective or GuardDuty.Pete: Yeah, it was—yeah, that's exactly right. It was one of those two. And honestly, that was unbelievably refreshing to see. You know, like, listen, you have the data, Amazon. You know what this is going to cost me, so when I, like, don't make me spend all this time to go and figure out the cost. If you have all this data already, just tell me, right?And if I look at it and go, “Yeah, wow. Like, turning this on in my environment is going to cost me X dollars. Like, yeah, that's a trade-off I want to make, I'll spend that.” But you know, with some of the—and that—a little bit of a worry on some of the intelligent tiering on S3 is that the recommendation is likely going to be everything goes to intelligent tiering first, right? It's the gp3 story. Put everything on gp3, then move it to the proper volume, move it to an sc or an st or an io. Like, gp3 is where you start. And I wonder if that's going to be [crosstalk 00:37:08].Corey: Except I went through a wizard yesterday to launch an EC2 instance and its default on the free tier gp2.Pete: Yeah. Interesting.Corey: Which does not thrill me. I also still don't understand for the life of me why in some regions, the free tier is a t2 instance, when t3 is available.Pete: They're uh… my guess is that they've got some free t—they got a bunch of t2s lying around. [laugh].Corey: Well, one of the most notable announcements at re:Invent that most people didn't pay attention to is their ability now to run legacy instance types on top of Nitro, which really speaks to what's going on behind the scenes of we can get rid of all that old hardware and emulate the old m1 on modern equipment. So, because—you can still have that legacy, ancient instance, but now you're going—now we're able to wind up greening our data centers, which is part of their big sustainability push, with their ‘Sustainability Pillar' for the well-architected framework. They're talking more about what the green choices in cloud are. Which is super handy, not just because of the economic impact because we could use this pretty directly to reverse engineer their various margins on a per-service or per-offering basis. Which I'm not sure they're aware of yet, but oh, they're going to be.And that really winds up being a win for the planet, obviously, but also something that is—that I guess puts a little bit of choice on customers. The challenge I've got is, with my serverless stuff that I build out, if I spend—the Google search I make to figure out what the most economic, most sustainable way to do that is, is going to have a bigger carbon impact on the app itself. That seems to be something that is important at scale, but if you're not at scale, it's one of those, don't worry about it. Because let's face it, the cloud providers—all of them—are going to have a better sustainability story than you are running this in your own data centers, or on a Raspberry Pi that's always plugged into the wall.Pete: Yeah, I mean, you got to remember, Amazon builds their own power plants to power their data centers. Like, that's the level they play, right? There, their economies of scale are so entirely—they're so entirely different than anything that you could possibly even imagine. So, it's something that, like, I'm sure people will want to choose for. But, you know, if I would honestly say, like, if we really cared about our computing costs and the carbon footprint of it, I would love to actually know the carbon footprint of all of the JavaScript trackers that when I go to various news sites, and it loads, you know, the whatever thousands of trackers and tracking the all over, like, what is the carbon impact of some of those choices that I actually could control, like, as a either a consumer or business person?Corey: I really hope that it turns into something that makes a meaningful difference, and it's not just greenwashing. But we'll see. In the fullness of time, we're going to figure that out. Oh, they're also launching some mainframe stuff. They—like that's great.Pete: Yeah, those are still a thing.Corey: I don't deal with a lot of customers that are doing things with that in any meaningful sense. There is no AWS/400, so all right.Pete: [laugh]. Yeah, I think honestly, like, I did talk to a friend of mine who's in a big old enterprise and has a mainframe, and they're actually replacing their mainframe with Lambda. Like they're peeling off—which is, like, a great move—taking the monolith, right, and peeling off the individual components of what it can do into these discrete Lambda functions. Which I thought was really fascinating. Again, it's a five-year-long journey to do something like that. And not everyone wants to wait five years, especially if their support's about to run out for that giant box in the, you know, giant warehouse.Corey: The thing that I also noticed—and this is probably the—I guess, one of the—talk about swing and a miss on pricing—they have a—what is it?—there's a VPC IP Address Manager, which tracks the the IP addresses assigned to your VPCs that are allocated versus not, and it's 20 cents a month per IP address. It's like, “Okay. So, you're competing against a Google Sheet or an Excel spreadsheet”—which is what people are using for these things now—“Only you're making it extortionately expensive?”Pete: What kind of value does that provide for 20—I mean, like, again—Corey: I think Infoblox or someone like that offers it where they become more cost-effective as soon as you hit 500 IP addresses. And it's just—like, this is what I'm talking about. I know it does not cost AWS that kind of money to store an IP address. You can store that in a Route 53 TXT record for less money, for God's sake. And that's one of those, like, “Ah, we could extract some value pricing here.”Like, I don't know if it's a good product or not. Given its pricing, I don't give a shit because it's going to be too expensive for anything beyond trivial usage. So, it's a swing and a miss from that perspective. It's just, looking at that, I laugh, and I don't look at it again.Pete: See I feel—Corey: I'm not usually price sensitive. I want to be clear on that. It's just, that is just Looney Tunes, clown shoes pricing.Pete: Yeah. It's honestly, like, in many cases, I think the thing that I have seen, you know, in the past few years is, in many cases, it can honestly feel like Amazon is nickel-and-diming their customers in so many ways. You know, the explosion of making it easy to create multiple Amazon accounts has a direct impact to waste in the cloud because there's a lot of stuff you have to have her account. And the more accounts you have, those costs grow exponentially as you have these different places. Like, you kind of lose out on the economies of scale when you have a smaller number of accounts.And yeah, it's hard to optimize for that. Like, if you're trying to reduce your spend, it's challenging to say, “Well, by making a change here, we'll save, you know, $10,000 in this account.” “That doesn't seem like a lot when we're spending millions.” “Well, hold on a second. You'll save $10,000 per account, and you have 500 accounts,” or, “You have 1000 accounts,” or something like that.Or almost cost avoidance of this cost is growing unbounded in all of your accounts. It's tiny right now. So, like, now would be the time you want to do something with it. But like, again, for a lot of companies that have adopted the practice of endless Amazon accounts, they've almost gone, like, it's the classic, like, you know, I've got 8000 GitHub repositories for my source code. Like, that feels just as bad as having one GitHub repository for your repo. I don't know what the balance is there, but anytime these different types of services come out, it feels like, “Oh, wow. Like, I'm going to get nickeled and dimed for it.”Corey: This ties into the re:Post launch, which is a rebranding of their forums, where, okay, great, it was a little crufty and it need modernize, but it still ties your identity to an IAM account, or the root email address for an Amazon account, which is great. This is completely worthless because as soon as I change jobs, I lose my identity, my history, the rest, on this forum. I'm not using it. It shows that there's a lack of awareness that everyone is going to have multiple accounts with which they interact, and that people are going to deal with the platform longer than any individual account will. It's just a continual swing and a miss on things like that.And it gets back to the billing question of, “Okay. When I spin up an account, do I want them to just continue billing me—because don't turn this off; this is important—or do I want there to be a hard boundary where if you're about to charge me, turn it off. Turn off the thing that's about to cost me money.” And people hem and haw like this is an insurmountable problem, but I think the way to solve it is, let me specify that intent when I provision the account. Where it's, “This is a production account for a bank. I really don't want you turning it off.” Versus, “I'm a student learner who thinks that a Managed NAT Gateway might be a good thing. Yeah, I want you to turn off my demo Hello World app that will teach me what's going on, rather than surprising me with a five-figure bill at the end of the month.”Pete: Yeah. It shouldn't be that hard. I mean, but again, I guess everything's hard at scale.Corey: Oh, yeah. Oh yeah.Pete: But still, I feel like every time I log into Cost Explorer and I look at—and this is years it's still not fixed. Not that it's even possible to fix—but on the first day of the month, you look at Cost Explorer, and look at what Amazon is estimating your monthly bill is going to be. It's like because of your, you know—Corey: Your support fees, and your RI purchases, and savings plans purchases.Pete: [laugh]. All those things happened, right? First of the month, and it's like, yeah, “Your bill's going to be $800,000 this year.” And it's like, “Shouldn't be, like, $1,000?” Like, you know, it's the little things like that, that always—Corey: The one-off charges, like, “Oh, your Route 53 zone,” and all the stuff that gets charged on a monthly cadence, which fine, whatever. I mean, I'm okay with it, but it's also the, like, be careful when that happen—I feel like there's a way to make that user experience less jarring.Pete: Yeah because that problem—I mean, in my scenario, companies that I've worked at, there's been multiple times that a non-technical person will look at that data and go into immediate freakout mode, right? And that's never something that you want to have happen because now that's just adding a lot of stress and anxiety into a company that is—with inaccurate data. Like, the data—like, the answer you're giving someone is just wrong. Perhaps you shouldn't even give it to them if it's that wrong. [laugh].Corey: Yeah, I'm looking forward to seeing what happens this coming year. We're already seeing promising stuff. They—give people a timeline on how long in advance these things record—late last night, AWS released a new console experience. When you log into the AWS console now, there's a new beta thing. And I gave it some grief on Twitter because I'm still me, but like the direction it's going. It lets you customize your view with widgets and whatnot.And until they start selling widgets on marketplace or having sponsored widgets, you can't remove I like it, which is no guarantee at some point. But it shows things like, I can move the cost stuff, I can move the outage stuff up around, I can have the things that are going on in my account—but who I am means I can shift this around. If I'm a finance manager, cool. I can remove all the stuff that's like, “Hey, you want to get started spinning up an EC2 instance?” “Absolutely not. Do I want to get told, like, how to get certified? Probably not. Do I want to know what the current bill is and whether—and my list of favorites that I've pinned, whatever services there? Yeah, absolutely do.” This is starting to get there.Pete: Yeah, I wonder if it really is a way to start almost hedging on organizations having a wider group of people accessing AWS. I mean, in previous companies, I absolutely gave access to the console for tools like QuickSight, for tools like Athena, for the DataBrew stuff, the Glue DataBrew. Giving, you know, non-technical people access to be able to do these, like, you know, UI ETL tasks, you know, a wider group of a company is getting access into Amazon. So, I think anything that Amazon does to improve that experience for, you know, the non-SREs, like the people who would traditionally log in, like, that is an investment definitely worth making.Corey: “Well, what could non-engineering types possibly be doing in the AWS console?” “I don't know, jackhole, maybe paying the bill? Just a thought here.” It's the, there are people who look at these things from a variety of different places, and you have such sprawl in the AWS world that there are different personas by a landslide. If I'm building Twitter for Pets, you probably don't want to be pitching your mainframe migration services to me the same way that you would if I were a 200-year-old insurance company.Pete: Yeah, exactly. And the number of those products are going to grow, the number of personas are going to grow, and, yeah, they'll have to do something that they want to actually, you know, maintain that experience so that every person can have, kind of, the experience that they want, and not be distracted, you know? “Oh, what's this? Let me go test this out.” And it's like, you know, one-time charge for $10,000 because, like, that's how it's charged. You know, that's not an experience that people like.Corey: No. They really don't. Pete, I want to thank you for spending the time to chat with me again, as is our tradition. I'm hoping we can do it in person this year, when we go at the end of 2022, to re:Invent again. Or that no one goes in person. But this hybrid nonsense is for the birds.Pete: Yeah. I very much would love to get back to another one, and yeah, like, I think there could be an interesting kind of merging here of our annual re:Invent recap slash live brunch, you know, stream you know, hot takes after a long week. [laugh].Corey: Oh, yeah. The real way that you know that it's a good joke is when one of us says something, the other one sprays scrambled eggs out of their nose. Yeah, that's the way to do it.Pete: Exactly. Exactly.Corey: Pete, thank you so much. If people want to learn more about what you're up to—hopefully, you know, come back. We miss you, but you're unaffiliated, you're a startup advisor. Where can people find you to learn more, if they for some unforgivable reason don't know who or what a Pete Cheslock is?Pete: Yeah. I think the easiest place to find me is always on Twitter. I'm just at @petecheslock. My DMs are always open and I'm always down to expand my network and chat with folks.And yeah, right, now, I'm just, as I jokingly say, professionally unaffiliated. I do some startup advisory work and have been largely just kind of—honestly checking out the state of the economy. Like, there's a lot of really interesting companies out there, and some interesting problems to solve. And, you know, trying to spend some of my time learning more about what companies are up to nowadays. So yeah, if you got some interesting problems, you know, you can follow my Twitter or go to LinkedIn if you want some great, you know, business hot takes about, you know, shitposting basically.Corey: Same thing. Pete, thanks so much for joining me, I appreciate it.Pete: Thanks for having me.Corey: Pete Cheslock, startup advisor, professionally unaffiliated, and recurring re:Invent analyst pal of mine. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment calling me a jackass because do I know how long it took you personally to price CloudWatch RUM?Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Cloud Posse holds public "Office Hours" every Wednesday at 11:30am PST to answer questions on all things related to DevOps, Terraform, Kubernetes, CICD. Basically, it's like an interactive "Lunch & Learn" session where we get together for about an hour and talk shop. These are totally free and just an opportunity to ask us (or our community of experts) any questions you may have. You can register here: https://cloudposse.com/office-hoursJoin the conversation: https://slack.cloudposse.com/Find out how we can help your company:https://cloudposse.com/quizhttps://cloudposse.com/accelerate/Learn more about Cloud Posse:https://cloudposse.comhttps://github.com/cloudpossehttps://sweetops.com/https://newsletter.cloudposse.comhttps://podcast.cloudposse.com/[00:00:00] Intro[00:01:10] AWS Launched Cloud Control API (with Terraform provider support)https://twitter.com/lukehoban/status/1443658655467900936?s=21https://github.com/hashicorp/terraform-provider-awscc[00:10:58] Centrally manage alternate contacts for member accountshttps://aws.amazon.com/about-aws/whats-new/2021/10/programmatically-manage-alternate-contacts-aws-accounts/[00:12:49] Cloudflare announced R2 Storagehttps://blog.cloudflare.com/introducing-r2-object-storage/[00:14:57] How to Deploy AWS Config Conformance Packs the Hard way(and the Easy way with Cloud Posse)https://aws.amazon.com/blogs/mt/how-to-deploy-aws-config-conformance-packs-using-terraform/?utm_campaign=weekly.tf&utm_medium=email&utm_source=Revue%20newsletterhttps://github.com/cloudposse/terraform-aws-config/tree/master/modules/conformance-pack[00:17:27] Another Private Module registry and remote state for Terraformhttps://www.modulehub.io/[00:18:50] GitHub supports reusing workflowshttps://github.blog/changelog/2021-10-05-github-actions-dry-your-github-actions-configuration-by-reusing-workflows/[00:22:55] AWS CloudWatch - How to setup metrics for each SNS queue[00:25:40] What to do if running out of IPs in VPCs[00:32:50] How to resize a PV in Kubernetes when defined in Terraform[00:39:18] How can you use Office 365 Authentication with Cognito [00:42:10] Troubleshooting an instance launch failure that's using encrypted AMI [00:47:08] Any SRE articles worth reading?[00:48:30] What if us-east-1 were down like Facebook this week? [00:52:47] Outro#officehours,#cloudposse,#sweetops,#devops,#sre,#terraform,#kubernetes,#awsSupport the show (https://cloudposse.com/office-hours/)
On The Cloud Pod this week, the team wishes there was something else on tap, not just NetApp. Also, AWS Storage Day has come and gone again, and Azure is springing into the enterprise cloud. A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located. This week's highlights
On The Cloud Pod this week, everyone's favorite guessing game is back, with the team making their predictions for AWS Summit and re:Inforce — which were not canceled, as they led us to believe last week. A big thanks to this week's sponsors: Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. JumpCloud, which offers a complete platform for identity, access, and device management — no matter where your users and devices are located. This week's highlights
About RichRich Burroughs is a Senior Developer Advocate at Loft Labs where he's focused on improving workflows for developers and platform engineers using Kubernetes. He's the creator and host of the Kube Cuddle podcast where he interviews members of the Kubernetes community. He is one of the founding organizers of DevOpsDays Portland, and he's helped organize other community events. Rich has a strong interest in how working in tech impacts mental health. He has ADHD and has documented his journey on Twitter since being diagnosed.Links: Loft Labs: https://loft.sh Kube Cuddle Podcast: https://kubecuddle.transistor.fm LinkedIn: https://www.linkedin.com/in/richburroughs/ Twitter: https://twitter.com/richburroughs Polywork: https://www.polywork.com/richburroughs TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part my Cribl Logstream. Cirbl Logstream is an observability pipeline that lets you collect, reduce, transform, and route machine data from anywhere, to anywhere. Simple right? As a nice bonus it not only helps you improve visibility into what the hell is going on, but also helps you save money almost by accident. Kind of like not putting a whole bunch of vowels and other letters that would be easier to spell in a company name. To learn more visit: cribl.ioCorey: This episode is sponsored in part by Thinkst. This is going to take a minute to explain, so bear with me. I linked against an early version of their tool, canarytokens.org in the very early days of my newsletter, and what it does is relatively simple and straightforward. It winds up embedding credentials, files, that sort of thing in various parts of your environment, wherever you want to; it gives you fake AWS API credentials, for example. And the only thing that these things do is alert you whenever someone attempts to use those things. It's an awesome approach. I've used something similar for years. Check them out. But wait, there's more. They also have an enterprise option that you should be very much aware of canary.tools. You can take a look at this, but what it does is it provides an enterprise approach to drive these things throughout your entire environment. You can get a physical device that hangs out on your network and impersonates whatever you want to. When it gets Nmap scanned, or someone attempts to log into it, or access files on it, you get instant alerts. It's awesome. If you don't do something like this, you're likely to find out that you've gotten breached, the hard way. Take a look at this. It's one of those few things that I look at and say, “Wow, that is an amazing idea. I love it.” That's canarytokens.org and canary.tools. The first one is free. The second one is enterprise-y. Take a look. I'm a big fan of this. More from them in the coming weeks.scaCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Periodically, I like to have, well, let's call it fun, at the expense of developer advocates; the developer relations folks; DevRelopers as I insist on pronouncing it. But it's been a while since I've had one of those come on the show and talk about things that are happening in that universe. So, today we're going back to change that a bit. My guest today is Rich Burroughs, who's a Senior Developer Advocate—read as Senior DevReloper—at Loft Labs. Rich, thanks for joining me.Rich: Hey, Corey. Thanks for having me on.Corey: So, you've done a lot of interesting things in the space. I think we first met back when you were at Sensu, you did a stint over at Gremlin, and now you're over at Loft. Sensu was monitoring things, Gremlin was about chaos engineering and breaking things on purpose, and when you're monitoring things that are breaking that, of course, leads us to Kubernetes, which is what Loft does. I'm assuming. That's probably not your marketing copy, though, so what is it you folks do?Rich: I was waiting for your Kubernetes trash talk. I knew that was coming.Corey: Yeah. Oh, good. I was hoping I could sort of sneak it around in there.Rich: [laugh].Corey: But yeah, you know me too well.Rich: By the way, I'm not dogmatic about tools, right? I think Kubernetes is great for some things and for some use cases, but it's not the best tool for everything. But what we do is we really focus a lot on the experience of developers who are writing applications that run in Kubernetes cluster, and also on the platform teams that are having to maintain the clusters. So, we really are trying to address the speed bumps, the things that people bang their shins on when they're trying to get their app running in Kubernetes.Corey: Part of the problem I've always found is that the thing that people bang their shins on is Kubernetes. And it's one of those, “Well, it's sort of in the title, so you can't really avoid it. The only way out is through.” You could also say, “It's better never begin; once begun, better finish.” The same thing seems to apply to technology in a whole bunch of different ways.And that's always been a strange thing for me where I would have bet against Kubernetes. In fact, I did, and—because it was incredibly complicated, and it came out of Google, not that someone needed to tell me. It was very clearly a Google-esque product. And we saw it sort of take the world by storm, and we are all senior YAML engineers now. And here we are.And now you're doing developer advocacy, which means you're at least avoiding the problem of actually working with Kubernetes day-in-day out yourself, presumably. But instead, you're storytelling about it.Rich: You know, I spent a good part of my day a couple days ago fighting with my Kubernetes cluster at Docker Desktop. So, I still feel the pain some, but it's a different kind of pain. I've not maintaining it in production. I actually had the total opposite experience to you. So, my introduction to Kubernetes was seeing Kelsey Hightower talk about it in, like, 2015.And I was just hooked. And the reason that I was hooked is because of what Kubernetes did, and I think especially the service primitive, is that it encoded a lot of these operational patterns that had developed into the actual platform. So, things like how you check if an app is healthy, if it's ready to start accepting requests. These are things that I was doing in the shops that I was working at already, but we had to roll it ourselves; we had to invent a way to do that. But when Kelsey started talking about Kubernetes, it became apparent to me that the people who designed this thing had a lot of experience running applications in distributed systems, and they understood what you needed to be able to do that competently.Corey: There's something to be said for packaging and shipping expertise, and it does feel like we're on a bit of a cusp, where the complexity has risen and risen and risen, and it's always a sawtooth graph where things get so complicated that you then are paying people a quarter-million dollars a year to run the thing. And then it collapses in on itself. And the complexity is still there, but it's submerged to a point where you don't need to worry about it anymore. And it feels like we're a couple years away from Kubernetes hitting that, but I do view that as inevitable. Is that, basically, completely out to sea? Is that something that you think is directionally correct, or something else?Rich: I mean, I think that the thing that's been there for a long time is, how do we take this platform and make it actually usable for people? And that's a lot more about the whole CNCF ecosystem than Kubernetes itself. How do we make it so that we can easily monitor this thing, that we can have observability, that we can deploy applications to it? And I think what we've seen over the last few years is that, even more than Kubernetes itself, the tools that allow you to do those other things that you need to do to be able to run applications have exploded and gotten a lot better, I think.Corey: The problem, of course, is the explosion part of it because we look at the other side, at the CNCF landscape diagram, and it is a hilariously overwrought picture of all of the different offerings and products and tools in the space. There are something like 400 blocks on it, the last time I checked. It looks like someone's idea of a joke. I mean, I come up with various shitposts that I'm sort of embarrassed I didn't come up with one anywhere near that funny.Rich: I left SRE a few years ago, and this actually is one of the reasons. So, the explosion in tools gave me a huge amount of imposter syndrome. And I imagine I'm not the only one because you're on Twitter, you're hanging around, you're seeing people talk about all these cool tools that are out there, and you don't necessarily have a chance to play with them, let alone use them in production. And so what I would find myself doing is I would compare myself to these people who were experts on these tools. Somebody who actually invented the thing, like Joe Beda or something like that, and it's obviously unfair to do because I'm not that person. But my brain just wants to do that. You see people out there that know more than you and a lot of times I would feel bad about it. And it's an issue, I really think it is.Corey: So, one of the problems that I ran into when I left SRE was that I was solving the same problem again and again, in rapid succession. I was generally one of the first early SRE-type hires, and, “Oh, everything's on fire, and I know how to fix those things. We're going to migrate out of EC2 Classic into VPCs; we're going to set up infrastructure as code so we're not hand-building these things from scratch every time.” And in time, we wind up getting to a point where it's, okay, there are backups, and it's easy to provision stuff, and things mostly work. And then it becomes tedium, where the day starts to look too much alike.And I start looking for other problems elsewhere in the organization, and it turns out that when you don't have strategic visibility into what other orgs are doing but tell them what they're doing wrong, you're not a popular person; and you're often wrong. And that was a source of some angst in my case. The reason I started what I do now is because I was looking to do something different where no two days look alike, and I sort of found that. Do you find that with respect to developer advocacy, or does it fall into some repetitive pattern? Not there's anything wrong with that; I wish I had the capability to do that, personally.Rich: So, it's interesting that you mentioned this because I've talked pretty publicly about the fact that I've been diagnosed with ADHD a few months ago. You talked about the fact that you have it as well. I loved your Twitter thread about it, by the way; I still recommend it to people. But I think the real issue for me was that as I got more advanced in my career, people assumed that because you have ‘senior' in your title, that you're a good project manager. It's just assumed that as you grow technically and move into more senior roles, that you're going to own projects. And I was just never good at that. I was always very good at reactive things, I think I was good at being on call, I think I was good at responding to incidents.Corey: Firefighting is great for someone with our particular predilections. It's, “Oh, great. There's a puzzle to solve. It's extremely critical that we solve it.” And it gets the adrenaline moving. It's great, “Cool, now fill out a bunch of Jira tickets.” And those things will sit there unfulfilled until the day I die.Rich: Absolutely. And it's still not a problem that I've solved. I'll preface this with the kids don't try this at home advice because everybody's situation is different. I'm a white guy in the industry with a lot of privilege; I've developed a really good network over the years; I don't spend a lot of time worried about what happens if I lose my job, right, or how am I going to get another one. But when I got this latest job that I'm at now, I was pretty open with the CEO who interviewed me—it's a very small company, I'm like employee number four.And so when we talked to him ahead of time, I was very clear with him about the fact that bored Rich is bad. If Rich gets bored with what he's doing, if he's not engaged, it's not going to be good for anyone involved. And so—Corey: He's going to go find problems to solve, and they very well may not align with the problems that you need solved.Rich: Yeah, I think my problem is more that I disengage. Like, I lose my passion for what it is that I'm doing. And so I've been pretty intentional about trying to kind of change it up, make different kinds of content. I happen to be at this place that has four open-source projects, right, along with our commercial project. And so, so far at least, there's been plenty for me to talk about. I haven't had to worry about being bored so far.Corey: Small companies are great for that because you're everyone does everything to some extent; you start spreading out. And the larger a company gets, the smaller your remit is. The argument I always made against working at Google, for example was, let's say that I went in with evil in mind on day one. I would not be able—regardless of how long I was there, how high in the hierarchy I climbed—to take down google.com for one hour—the search engine piece.If I can't have that much impact intentionally, then the question really becomes how much impact can I have in a positive direction with everyone supposedly working in concert with me? And the answer I always came up with was not that much, not in the context of a company like that. It's hard for me to feel relevant to a large company. For better or worse, that is the thing that keeps me engaged is, “You know, if I get this wrong enough, we don't have a company anymore,” is sort of the right spot for me.Rich: [laugh]. Yeah, I mean, it's interesting because I had been at a number of startups last few years that were fairly early stage, and when I was looking for work this last time, my impulse was to go the opposite direction, was to go to a big company, you know, something that was going to be a little more stable, maybe. But I just was so interested in what these folks were building. And I really clicked with Lukas, the CEO, when we talked, and I ended up deciding to go this route. But there's a flip side to that.There's a lot of responsibility that comes with that, too. Part of me wanting to avoid being in that spotlight, in a way; part of me wanted to back off and be one of the million people building things. But I'm happy that I made this choice, and like I said, it's been working out really well, so far.Corey: It seems to be. You seem happy, which is always a nice thing to be able to pick up from someone in how they go about these things. Talk to me a little bit about what Loft does. You're working on some virtual cluster nonsense that mostly sails past me. Can you explain it using small words?Rich: [laugh]. Yeah, sure. So, if you talk to people who use Kubernetes, a lot, you are—Corey: They seem sad all the time. But please continue.Rich: One of the reasons that they're sad is because of multi-tenancy in Kubernetes; it just wasn't designed with that sort of model in mind. And so what you end up with is a couple of different things that happen. Either people build these shared clusters and feel a whole lot of pain trying to share them because people commonly use namespaces to isolate things, and that model doesn't completely work. Because there are objects like CRDs and things that are global, that don't live in the namespace, and so that can cause pain. Or the other option that people go with is that they just spin up a whole bunch of clusters.So, every team or every developer gets their own cluster, and then you've got all this cluster sprawl, and you've got costs, and it's not great for the environment. And so what we are really focused a lot on with the virtual cluster stuff is it provides people what looks like a full-blown Kubernetes cluster, but it just lives inside the namespace on your host cluster. So, it actually uses K3s, from the Rancher folks, the SUSE folks. And literally, this K3s API server sits in the namespace. And as a user, it looks to you like a full-blown Kubernetes cluster.Corey: Got it. So, basically a lightweight [unintelligible 00:13:31] that winds up stripping out some of the overwrought complexity. Do you find that it winds up then becoming a less high-fidelity copy of production?Rich: Sure. It's not one-to-one, but nothing ever is, right?Corey: Right. It's a question of whether people admit it or not, and where they're willing to make those trade-offs.Rich: Right. And it's a lot closer to production than using Docker Compose or something like that. So yeah, like you said, it's all about trade-offs, and I think that everything that we do as technical people is about trade-offs. You can give everybody their own Kubernetes cluster, you know, would run it in GK or AWS, and there's going to be a cost associated with that, not just financially, but in terms of the headaches for the people administering things.Corey: The hard part from where I've always been sitting has just been—because again, I deal with large-scale build-outs; I come in in the aftermath of these things—and people look at the large Kubernetes environments that they've built and it's expensive, and you look at it from the cloud provider perspective, and it's just a bunch of one big noisy application that doesn't make much sense from the outside because it's obviously not a single application. And it's chatty across availability zone boundaries, so it costs two cents per gigabyte. It has no [affinity 00:14:42] for what's nearby, so instead of talking to the service that is three racks away, it talks the thing over an expensive link. And that has historically been a problem. And there are some projects being made in that direction, but it's mostly been a collective hand-waving around it.And then you start digging into it in other directions from an economics perspective, and they're at large scale in the extreme corner cases, it always becomes this, “Oh, it's more trouble than it's worth.” But that is probably unfair for an awful lot of the real-world use cases that don't rise to my level of attention.Rich: Yeah. And I mean, like I said earlier, I think that it's not the best use case for everything. I'm a big fan of the HashiCorp tools. I think Nomad is awesome. A lot of people use it, they use it for other things.I think that one of the big use cases for Nomad is, like, running batch jobs that need to be scheduled. And there are people who use Nomad and Kubernetes both. Or you might use something like Cloud Run or AppRun, whatever works for you. But like I said, from someone who spent literally decades figuring out how to operate software and operating it, I feel like the great thing about this platform is the fact that it does sort of encode those practices.I actually have a podcast of my own. It's called Kube Cuddle. I talk to people in the Kubernetes community. I had Kelsey Hightower on recently, and the thing that Kelsey will tell you, and I agree with him completely, is that, you know, we talk about the complexity in Kubernetes, but all of that complexity, or a lot of it, was there already.We just dealt with it in other ways. So, in the old days, I was the Kubernetes scheduler. I was the guy who knew which app ran on which host, and deployed them and did all that stuff. And that's just not scalable. It just doesn't work.Corey: This episode is sponsored by our friends at Oracle Cloud. Counting the pennies, but still dreaming of deploying apps instead of "Hello, World" demos? Allow me to introduce you to Oracle's Always Free tier. It provides over 20 free services and infrastructure, networking databases, observability, management, and security.And - let me be clear here - it's actually free. There's no surprise billing until you intentionally and proactively upgrade your account. This means you can provision a virtual machine instance or spin up an autonomous database that manages itself all while gaining the networking load, balancing and storage resources that somehow never quite make it into most free tiers needed to support the application that you want to build. With Always Free you can do things like run small scale applications, or do proof of concept testing without spending a dime. You know that I always like to put asterisks next to the word free. This is actually free. No asterisk. Start now. Visit https://snark.cloud/oci-free that's https://snark.cloud/oci-free.Corey: The hardest part has always been the people aspect of things, and I think folks have tried to fix this through a lens of, “The technology will solve the problem, and that's what we're going to throw at it, and see what happens by just adding a little bit more code.” But increasingly, it doesn't work. It works for certain problems, but not for others. I mean, take a look at the Amazon approach, where every team communicates via APIs—there's no shared data stores or anything like that—and their entire problem is a lack of internal communication. That's why the launch services that do basically the same thing as each other because no one bothers to communicate with one another. And half my job now is introducing Amazonians to one another. It empowers some amazing things, but it has some serious trade-offs. And this goes back to our ADHD aspect of the conversation.Rich: Yeah.Corey: The thing that makes you amazing is also the thing that makes you suck. And I think that manifests in a bunch of different ways. I mean, the fact that I can switch between a whole bunch of different topics and keep them all in state in my head is helpful, but it also makes me terrible, as far as an awful lot of different jobs, where don't come back to finish things like completing the Jira ticket to hit on Jira a second time in the same recording.Rich: Yeah, I'm the same way, and I think that you're spot on. I think that we always have to keep the people in mind. You know, when I made this decision to come to Loft Labs, I was looking at the tools and the tools were cool, but it wasn't just that. It's that they were addressing problems that people I know have. You hear these stories all the time about people struggling with the multi-tenancy stuff and I could see very quickly that the people building the tools were thinking about the people using them, and I think that's super important.Corey: As I check your LinkedIn profile, turns out, no, we met back in your Puppet days, the same era that I was a traveling trainer, teaching people how to Puppet and hoping not to get myself ejected from the premises for using sarcastic jokes about the company that I was conducting the training for. And that was fun. And then I worked at a bunch of places, you worked in a bunch of places, and you mentioned a few minutes ago that we share this privilege where if one of us loses our job, the next one is going to be a difficult thing for us to find, given the skill set that we have, the immense privilege that we enjoy, and the way that this entire industry works. Now, I will say that has changed somewhat since starting my own company. It's no longer the fear of, “Well, I'm going to land on my feet.” Rich: Right.Corey: Yeah, but I've got a bunch of people who are counting on me not to completely pooch this up. So, that's the thing that keeps me awake at night, now. But I'm curious, do you feel like that's given you the flexibility to explore a bunch of different company types and a bunch of different roles and stretch yourself a little with the understanding that, yeah, okay. If you've never last five years at the same company, that's not an inherent problem.Rich: Yeah, it's interesting. I've had conversations with people about this. If you do look up my LinkedIn, you're going to see that a lot of the recent jobs have been less than two years: year, year and a half, things like that. And I think that I do have some of that freedom, now. Those exits haven't always been by choice, right?And that's part of what happens in the industry, too. I think I've been laid off, like, four or five times now in my career. The worst one by far was when the bubble burst back in 2000. I was working at WebMD, and they ended up closing our office here.Corey: You were Doctor Google.Rich: I kind of was. So, I was actually the guy who would deploy the webmd.com site back then. And it was three big Sun servers. And I would manually go in and run shell scripts and take one out of the load balancer and roll the new code on it, and then move on to the next one. And those are early days; I started in the industry in about '95. Those early days, I just felt bulletproof because everybody needed somebody with my skills. And after that layoff in 2000, it was a lot different. The market just dried up, I went 10 months unemployed. I ended up taking a job where I took a really big pay cut in a role that wasn't really good for me, career-wise. And I guess it's been a little bit of a comfort to me, looking back. If I get laid off now, I know it's not going to be as bad as that was. But I think that's important, and one of the things that's helped me a lot and I'm assuming it's helped you, too, is building up a network, meeting people, making friends. I sort of hate the word networking because it has really negative connotations to it to me. The salespeople slapping each other on the back at the bar and exchanging business cards is the image that comes to my mind when I think of networking. But I don't think it has to be like that. I think that you can make genuine friendships with people in the industry that share the interests and passions that you have.Corey: That's part of it. People, I think, also have the wrong idea about passion and how that interplays with career. “Do a thing that you love, and the money will follow,” is terrific advice in the United States to make about $30,000 a year. I assure you, when I started this place, I was not deeply passionate about AWS billing. I developed a passion for it as I rapidly had to become an expert in this thing.I knew there was an expensive business problem there that leveraged the skill set that I already had and I could apply it to something that was valuable to more than just engineers because let's face it, engineers are generally terrible customers for a variety of reasons. And by doing that and becoming the expert in that space, I developed a passion for it. I think it was Scott Galloway who in one of his talks said he had a friend who was a tax attorney. And do you think that he started off passionate about tax law? Of course not.He was interested in making a giant pile of money. Like, his preferred seat when he flies is ‘private.' So, he's obviously very passionate about it now, but he found something that he could enjoy that would pay a bunch of money because it was an in-demand, expensive skill. I often wonder if instead of messing around and computers, my passion had been oil painting, for example. Would I have had anything approaching to the standard of living I have now?The answer is, “Of course not.” It would have been a very different story. And that says many deeply troubling things about our society across the board. I don't know how to fix any of them. I'm one of those people that rather than sitting here talking how the world should be; I deal with the world as I encounter it.And at times, that doesn't feel great, but it is the way that I've learned to cope, I guess, with the existential angst. I'm envious in some ways of the folks who sit here saying, “No, we demand a better world.” I wish I shared their optimism or ability to envision it being different than it is, but I just don't have it.Rich: Yeah, I mean, there are oil painters who make a whole lot of money, but it's not many of them, right?Corey: Yeah, but you shouldn't have to be dead first.Rich: [laugh]. I used to… know a painter who Jim Carrey bought one of his big canvases for quite a lot of money. So, they're not all dead. But again, your point is very valid. We are in this bubble in the tech industry where people do make on average, I think, a lot more money than people do in many other kinds of jobs.And I recently started thinking about possibly going into ADHD coaching. So, I have an ADHD coach myself; she has made a very big difference in my life so far. And I actually have started taking classes to prepare for possibly getting certified in that. And I'm not sure that I'm going to do it. I may stay in tech.I may do some of both. It doesn't have to be either-or. But it's been really liberating to just have this vision of myself working outside of tech. That's something that I didn't consider was even possible for quite a long time.Corey: I have to confess I've never had an ADHD coach. I was diagnosed when I was five years old and back then—my mother had it as well, and the way that it was dealt with in the '50s and '60s when she was growing up was, she had a teacher once physically tie her to a chair. Which—Rich: Oh, my gosh.Corey: —is generally frowned upon these days. And coaching was never a thing. They decided, “Oh, we're going to medicate you to the gills,” in my case. And that was great. I was basically a zombie for a lot of my childhood.When I was 17, I took myself off of it and figured I'd white-knuckle it for the next 10 years or so. Again, everyone's experience is different, but for me, didn't work, and it led to some really interesting tumultuous times in my '20s. I've never explored coaching just because it feels like so much of what I do is the weirdest aspects of different areas of ADHD. I also have constraints upon me that most folks with ADHD wouldn't have. And conversely, I have a tremendous latitude in other areas.For example, I keep dropping things periodically from time to time; I have an assistant. Turns out that most people, they bring in an assistant to help them with stuff will find themselves fired because you're not supposed to share inside company data with someone who is not an employee of that company. But when you own the company, as I do, it's well, okay, I'm not supposed to share confidential client data or give access to it to someone who's not an employee here. “Da da da da da. Welcome aboard. Your first day is Monday.”And now I've solved that problem in a way that is not open to most people. That is a tremendous benefit and I'm constantly aware of how much privilege is just baked into that. It's a hard thing for me to reconcile, so I've never explored the coaching angle. I also, on some level—and this is an area that I understand is controversial and I in no way, shape or form, mean any—want anyone to take anything negative away from this. There are a number of people I know where ADHD is a cornerstone of their identity, where that is the thing that they are.That is the adjective that gets hung on them the most—by choice, in many cases—and I'm always leery about going down that path because I'm super strange ever on a whole bunch of different angles, and even, “Oh, well he has ADHD. Does that explain it?” No, not really. I'm still really, really strange. But I never wanted to go down that path of it being, “Oh, Corey. The guy with ADHD.”And again, part of this is growing up with that diagnosis. I was always the odd kid, but I didn't want to be quote-unquote, “The freak” that always had to go to the nurse's office to wind up getting the second pill later in the day. I swear people must have thought I had irritable bowel syndrome or something. It was never, “Time to go to the nurse, Corey.” It was one of those [unintelligible 00:27:12]. “Wow, 11:30. Wow, he is so regular. He must have all the fiber in his diet.” Yeah, pretty much.Rich: I think that from reading that Twitter thread of yours, it sounds like you've done a great job at mitigating some of the downsides of ADHD. And I think it's really important when we talk about this that we acknowledge that everybody's experience is different. So, your experience with ADHD is likely different than mine. But there are some things that a lot of us have in common, and you mentioned some of them, that the idea of creating that Jira ticket and never following through, you put yourself in a situation where you have people around you and structures, external structures, that compensate for the things that you might have trouble with. And that's kind of how I'm looking at it right now.My question is, what can I do to be the most successful Rich Burroughs that I can be? And for me right now, having that coach really helps a lot because being diagnosed as an adult, there's a lot of self-image problems that can come from ADHD. You know that you failed at a lot of things over time; people have often pointed that out to you. I was the kid in high school who the counselors or my teachers were always telling me I needed to apply myself.Corey: “If you just tried harder and suck a little less, then you'll be much better off.” Yeah, “Just to apply yourself. You have so much potential, Rich.” Does any of that ring a bell?Rich: Yeah, for sure. And, you know, something my coach said to me not too long ago, I was talking about something and I said to her, I can't do X. Like, I'm just not—it's not possible. And her response was, “Well, what if you could?” And I think that's been one of the big benefits to me is she helps me think outside of my preconceptions of what I can do.And then the other part of it, that I'm personally finding really valuable, is having the goal setting and some level of accountability. She helps with those things as well. So, I'm finding it really useful. I'm sure it's not for everybody. And like we said, everybody's experience with ADHD isn't the same, but one of the things that I've had happened since I started talking about getting diagnosed, and what I've learned since then, is I've had a bunch of people come to me.And it's usually on Twitter; it's usually in DMs; you know, they don't want to talk about it publicly themselves, but they'll tell me that they saw my tweets and they went out and got diagnosed or their kid got diagnosed. And when I think about the difference that could make in someone's life, if you're a kid and you actually get diagnosed and hopefully get better treatment than it sounds like you did, it could make a really big positive impact in someone's life and that's the reason that I'm considering putting doing it myself is because I found that so rewarding. Some of these messages I get I'm almost in tears when I read them.Corey: Yeah. The reason I started talking about it more is that I was hoping that I could serve as something of, if not a beacon of inspiration, at least a cautionary tale of what not to do. But you never know if you ever get there or not. People come up and say that things you've said or posted have changed the trajectory of how they view their careers and you've had a positive impact on their life. And, I mean, you want to talk about weird Gremlins in our own minds?I always view that as just the nice things people say because they feel like they should. And that is ridiculous, but that's the voice in my head that's like, “You aren't shit, Corey, you aren't shit,” that's constantly whispering in my ear. And it's, I don't know if you can ever outrun those demons.Rich: I don't think I can outrun them. I don't think that the self-image issues I have are ever going to just go away. But one thing I would say is that since I've been diagnosed, I feel like I'm able to be at least somewhat kinder to myself than I was before because I understand how my brain works a little bit better. I already knew about the things that I wasn't good at. Like, I knew I wasn't a good project manager; I knew that already.What I didn't understand is some of the reasons why. I'm not saying that it's all because of ADHD, but it's definitely a factor. And just knowing that there's some reason for why I suck, sometimes is helpful. It lets me let myself off the hook, I guess, a little bit.Corey: Yeah, I don't have any answers here. I really don't. I hope that it becomes more clear in the fullness of time. I want to thank you for taking so much time to speak with me about all these things. If people want to learn more, where can they find you?Rich: I'm @richburroughs on Twitter, and also on Polywork, which I've been playing around with and enjoying quite a bit.Corey: I need to look into that more. I have an account but I haven't done anything with it, yet.Rich: It's a lot of fun and I think that, speaking of ADHD, one of the things that occurred to me is that I'm very bad at remembering the things that I accomplish.Corey: Oh, my stars, yes. People ask me what I do for a living and I just stammer like a fool.Rich: Yeah. And it's literally this map of, like, all the content I've been making. And so I'm able to look at that and, I think, appreciate what I've done and maybe pat myself on the back a little bit.Corey: Which is important. Thank you so much again, for your time, Rich. I really appreciate it.Rich: Thanks for having me on, Corey. This was really fun.Corey: Rich Burroughs, Senior Developer Advocate at Loft Labs. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment telling me what the demon on your shoulder whispers into your ear and that you can drive them off by using their true name, which is Kubernetes.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
Links: Alkira: https://www.alkira.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part byLaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visitlaunchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: If your familiar with Cloud Custodian, you'll love Stacklet. Which is made by the same people who made Cloud Custodian, but put something useful on top of it so you don't have to be a need to be a YAML expert to work with it. They're hosting a webinar called “Governance as Code: The Guardrails for Cloud at Scale” because its a new paradigm that enables organizations to use code to manage and automate various aspects of governance. If you're interested in exploring this you should absolutely make it a point to sign up, because they're going to have people who know what they're talking about—just kidding they're going to have me talking about this. Its doing to be on Thursday, July 22nd at 1pm Eastern. To sign up visit snark.cloud/stackletwebinar and I'll talk to you on Thursday, July 22nd. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. On this promoted episode, we're returning to something I did a while back on the AWS Morning Brief. I took us on a twelve-week exploration of networking in the cloud and how that wound up impacting how companies do business. Today, my guest is cloud networking evangelist Rasam Tooloee, and he works over at a company called Alkira. Rasam, thanks for joining me.Rasam: Thank you for having me, Corey. Pleasure.Corey: So, let's start with the obvious. What is a cloud networking evangelist? I've heard of people evangelizing all kinds of things, some of which make more sense than others, but this is the first evangelism title that actually made me sit up and say, “Ooh, this is relevant to my interests.”Rasam: That's funny. A cloud networking evangelist, to me—you know, what I consider my charter is really helping our customers and prospective customers understand that networking the way they're used to doing it historically, if you look at legacy networking and the way that networking has evolved, specifically in the cloud, is just not sufficiently agile; it's not sufficiently natively enterprise-grade from a visibility and control and compliance perspective. And that there's just a better way of doing networking for this cloud era that we're in. And I find enterprises that I talk to every day struggle with the complexity associated with how to get their properties into the cloud. There are many of them have become natively multi-cloud just by default, you know, some through business imperatives and priorities, some through acquisitions, but ultimately in the context of networking, that has led to a whole lot of complexities that they grapple with, and the evangelist in me is looking to help them find the better options that are out there for them.Corey: In the earlier days, before I got into Cloud, I was deep into configuration management. And oh, we manage all of these systems via configuration drift detection, and every time they run, they remediate the drift, and it's great. Cool, so how do we wind up managing the networking equipment? Well, there's this thing called RANCID. It's made out of some horrifying Perl and if you turn on ‘strict,' the whole thing breaks.And it was this awful sort of dark ages technology to approaching networking. It felt like the DevOps movement towards agility really didn't come to networking in any meaningful sense for a while after that. Is that accurate? Or was I just hanging out in the wrong shops?Rasam: No, that's absolutely accurate. For sure.Corey: Your career has been fascinating. You went from Cisco, where you presumably worked on networking because that's kind of the thing they're known for, then Salesforce, which is sort of definitionally SaaS, as says on the tin. You went to cloud with Microsoft for a while, and now you're at Alkira, where you're sort of in the perfect center of all three of those things. Tell me a little about how you got to where you are?Rasam: Yeah. Well, I have my roots in networking. I worked for Cisco—a great company—for a long time, and really got the opportunity to tackle networking for many different facets, both core networking as well as some of the advanced technologies that Cisco forayed into, and absolutely loved the ride and learned so much. And then there came a time where SaaS was clearly the next big wave. And being at Cisco, I watched that wave grow from afar, and at a certain point in my career, I decided to take the leap and go to the SaaS leader at the time, Salesforce, and really learn that business from the ground up, both in terms of the underlying constructs of how SaaS business model works, but also the core business that Salesforce has in CRM.And then from there, I transitioned to Microsoft because SaaS was the tip of the spear that launched the cloud revolution, but then there's a lot more to cloud than just SaaS. And going to Microsoft really helped me to understand that business from the ground up. I've always had this inclination of really being intrigued and curious about what's next and trying to take that next leap before I have the opportunity to really enjoy the uptrend of the ride. I'm looking for the next emerging significant innovation, but what's interesting is, come full circle, I'm back in networking. Which kind of begs the question of, like, what happened?And to me, what happened was, in Alkira, I find the coming together of everything that is fascinating to me about cloud and SaaS with where I really earned my chops in technology, which is networking, and really solving this problem for this era, this moment in time in a truly innovative way. So, I feel like it's actually—it may look like full circle, but it's actually a continuous trend of seeking out the next innovative thing.Corey: Back when I wound up getting into networking, the reason I did it was because first, it was the 2008 financial crisis and no one was hiring; we just had a salary freeze and I was demoralized at my job. But I realized that as a systems administrator, I was always sort of hand waving over the networking pieces. And all right, let's figure out how this whole thing works. So, I got my CCNA, my Cisco Certified Network Administrator, cert, back in the days when that didn't have a whole bunch of different derivative adjectives after, telling you exactly what kind. And what happened next was that, okay, now I understand it a lot better.Every time I find myself basically scratching my head, trying to figure out exactly what the deal is with something that I'm working on technically, and I don't understand what's there, dig deeper into that and you'll often discover that it makes everything else make a bit more sense. And then came cloud. And now we have cloud networking, and anyone who tells you they understand how cloud networking works is generally lying to you. It feels like it is complexity stacked on top of complexity, and these days, it more or less distills down to you fire up your cloud provider of choice, you click a few buttons in the console, and really hope you did it right. I'm guessing that you have not automated the clicking of the buttons in the console, so how do you folks approach it? What have you done that's different?Rasam: So, to build on your point about the complexity of cloud networking, there're a number of reasons why it is so cumbersome and so complex for enterprises to tackle the challenge of cloud networking. One is, it tends to be rather rudimentary in nature, and there's a lot of manual effort involved, there's hop-by-hop configuration, you have to do unnatural things to solve for some basic challenges. For example, we often find enterprises or service chaining firewalls so they can have symmetric traffic routing. And they will do things like have a separate path for ingress and egress traffic; the segmentation is extremely hard. So, one part of it is—probably my personal opinion; I don't think cloud was built with the idea of let's solve for the networking from the ground up because that's so important to how people are going to have to manage their compute and storage.It was all about compute and storage, and networking was kind of an afterthought. And that shows. That it shows in the way that you actually have to configure your cloud network. Now, the other thing that really complicates it is the fundamentals of networking don't change, but the way in which the fundamentals are applied and the vernacular that's used to describe it and the UI that's used to control it varies from cloud to cloud. So, just because you learn Azure doesn't mean you know AWS, and doesn't mean you know GCP.So, if you are becoming multi-cloud, now you got to go learn all this stuff separately, and then actually as an amplification of the challenge that is associated with the hop-by-hop configuration, when you bring up a region, for example, in cloud provider A, that doesn't mean that you all of a sudden did most of the heavy lifting required for region B, you got to go do the same thing in region B.Corey: Oh, I had a client that had a very deeply skilled networking team that spent months without much success trying to get Terraform to set up IPsec between GCP and AWS at one point, and ultimately they gave up in disgust. My argument about, “Oh, we're going to go multi-cloud to avoid locking.” “Well sorry, you already have lock-in, both in terms of what your staff's up to speed on, but also the identity model, the security model, and critically, the networking approach.”Rasam: Yeah, absolutely. And back to your original question of how we do it differently. So, what we have done is really looked at the problem differently through a new way of thinking. Again, this goes back to my prior point about the network isn't sufficiently agile, and the reason it's not agile is for all the reasons that I explained. And when our founders who come from decades of experience in networking looked at this problem and they looked at the native value proposition of cloud—which in our mind is agility—is the fact that cloud is a competitive imperative these days, that's where innovation is happening, and we see enterprises every day increase their investment in cloud because if they don't, they're going to get left behind; they're going to get left behind because the next digital disrupter or their competitor is going to do, in cloud, the things that their customers expect, and the things that are required to truly compete in today's marketplace.So, because it's a competitive imperative, and at the heart of it is agility, our founders really contrasted what networking was like in the cloud and in the cloud era, which is really largely fragmented, and silos, highly complex, slow to deploy to your point, often CapEx heavy because you're making substantial investments in things like colocation and dedicated bandwidth. There are a lot of delays and limitations like I talked about in terms of the various constructs between the cloud providers.They contrasted that with DevOps and said, “Okay, look. DevOps is all about automation. It's about rapid iteration. It's about abstracting the underlying complexity on an elastic platform that scales with you. You can actually go into cloud with minimum upfront investments, test and iterate, and then scale as you need to with velocity and agility. That's what cloud is about. That's the way DevOps has adapted to the constructs of cloud. But network isn't the case, so how do we rethink networking from the ground up, so that it is more in line with the business imperative of why businesses go in the cloud in the first place?”And to do that, what they really did was design a unified fabric that's a multi-cloud unified fabric that delivers a full stack of networking services that meet the vast majority of the use cases that an enterprise would have from a networking perspective, and does so in a way that's natively multi-cloud, and does so in a way that natively addresses some of the complexities with things like security, compliance, visibility, control, et cetera.Corey: And I've been very vocal about opposing multi-cloud as a best practice, and people sometimes are surprised to discover that as soon as I find a customer who's doing multi-cloud, I dive right into discussions about that, and, “We thought you were going to yell at us.” Look, do I think it's a best practice in the general sense? No, but you have specific constraints, and you have an environment that is how it is, and sitting here saying, “Oh, you should have made a different series of decisions six years ago,” it turns out is not the most compelling story. And there are always specifics that override general guidance. So, whether I like multi-cloud or not as a guidance perspective, I don't think that I can intelligently deny the reality that it very much exists in an awful lot of places.And sitting here just trying to be a purist by going through one cloud, whatever it happens to be, and nothing else doesn't really solve any pain that customers have. Hybrid is and will be a big story for a long time. In my more cynical moments, I tend to view hybrid as, “Well, we tried to do an all-in cloud migration and got stuck halfway through because it turns out, it's hard to move some things, so we gave up and called it hybrid and now we're calling it good.” That might be overly cynical, but it takes time to move these things. It takes time to wind up wrapping around a bunch of different environments.So, if you have something that makes it a lot, I guess, more straightforward to rationalize about and around the network layer, that really feels like it's a great equalizer because that is one of the most differentiated aspects of all the different clouds.Rasam: Yeah, absolutely. I mean, the proof is in the pudding, right? So, we find the challenge of getting to cloud, getting cloud networking enterprise-ready from a security, governance, compliance perspective, high availability perspective, disaster recovery perspective, to be a monumental challenge. And for an enterprise, it could be an effort of months, or years, sometimes, for a single cloud, much less a multi-cloud. And just because you did it with Cloud A doesn't make Cloud B all that much easier.And I agree with you; I think multi-cloud isn't necessarily an easy and desirable place to find yourself, but that's besides the point because enterprises are finding themselves there for a myriad of reasons. It could be business imperatives, partnerships, acquisitions, it just happens. And when it happens, you need the best possible strategies and tools to deal with that. And for us the proof is in the pudding because we've had customers be able to contract the amount of time that it would have taken them to get from Cloud A to Cloud B from months and months to a matter of weeks. We can provision something that would take multiple weeks of change control and manual effort, and do it in a matter of hours.So, I don't want to overstate how much the technology simplifies things, but the technology does literally simplify things that much. There's still business process involved, there's still change control involved, there's still the human element of making sure that the change is well orchestrated, but the actual process of getting your cloud networking and multi-cloud networking up and running is simplified in a way that I think you have to see to believe and, you know, the proof is in the pudding, and when we have a chance to actually demonstrate that to our prospective customers, it truly is game-changing.Corey: It's clear that you've built something that works. You have a laundry list of customers on your website who are referenced customers, and these are logos and names people recognize. It's not, “Oh, wow. That sounds like you made half of those up, and weren't three of those the big evil corporation in some movie somewhere?” No, these are real companies solving real problems.And digging a bit into what you've built before you came on the show, it is clear that you folks offer a TCO story that lowers the total cost of ownership, but lies, damn lies, and TCO analyses tend to be the three forms of lies people tell. I'm much more interested in the story of how you accelerate time-to-market because speaking as someone who focuses on AWS bills and cost reduction, it always takes a backseat to accelerating features being released. So, there's a capability story that goes along with this, which it sounds like they're very much is. That's the real win; the fact that it saves money is almost icing on the cake.Rasam: Yeah, absolutely. You're right, the Holy Grail is time-to-market, which really, time-to-market is very much for me, synonymous with this idea of agility and the ability to pivot, and to get to the next iterative desired outcome for your organization, whatever that may be, quickly. That's consistent with this idea of velocity, and iterative testing, and scale that the cloud provides. For example, recently, I've been working with one of our prospective customers who's, really, underlying challenge is, “Look, I've already built this really robust infrastructure from a cloud networking perspective. It is really colo-centric; that's my model for my cloud interconnects, but I am now in a global expansion phase. I need to go to all these new geographies, and if I were to do what I just did to build out my cloud networking footprint, I'm looking at a substantial CapEx investment and a substantial amount of time and runway to get that operational, and I just don't have the CapEx or the time for that.” So—Corey: What, they can't just copy and paste the config from one to the other again and again and again in the true StackOverflow tradition?Rasam: Or get the circuits dropped in the colo, or get all that hardware delivered, and deal with all the complexities of international customs control, et cetera, et cetera. So, what we bring to them as a value proposition is the fact that our points of presence are virtual; they're software-defined constructs that run atop the hyperscale cloud provider. We can spin them up anywhere in the world where the hyperscale cloud provider has a footprint, and we are in many regions across the globe. And if we're not in one, we can get one up and running in a matter of days. And most of that time is actually just spent testing it to make sure that is operationally viable; the actual provisioning and turn up of it is very, very quick.So, the ability for us to be a virtual PoP for this particular customer and give them the ability to quickly expand into brand new geos in a way that also concurrently, natively streamlines and simplifies the complexities of cloud networking that we've already covered is extremely attractive to them. And from time-to-service perspective, it's taking their ability to deliver the needed services in the cloud to their business users from something that would have taken months and months to something that can be up and running in a matter of weeks.Corey: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the Cloud: low effort, high visibility and detection. To learn more, visit lacework.com.Corey: Can you give me an example of a customer pain point that you've resolved? Because, again, you have customers willing to say nice things about you, but one of the challenges I've often found with a lot of the, shall we say larger, more enterprise-y offerings is, “Well, what did you actually do for the customer?” And the answer requires two hours and at least 40 PowerPoint slides and at the end, you say you get it just to get the person to stop talking. What is the value, the better outcome that you've delivered for a customer?Rasam: Yeah, sure. So, our customer, Koch Industries—and they're a public reference for us; you can check out their story more in-depth on our website—but they were your traditional enterprise, originally designed for cloud using a hub-and-spoke architecture, which consisted of using the data centers as the focal point for data center interconnect, cloud interconnect, high-speed bandwidth, private Lan, et cetera, that comprise their overall architecture. And over time, they simplified—and I use the word simplified loosely here—but they simplified with a more cloud-native, cloud-transit type of architecture, where they leveraged more of the default capabilities and networking services on cloud, which helped considerably. There was a ramp involved in learning the native-cloud constructs and associated networking and security aspects of that, but over time, they did simplify. They were able to condense their overall provisioning time of a cloud interconnect from what they originally shared with us was eighteen months down to about six, and consolidated across about a dozen transit hubs from a cloud networking perspective. But then, as we discussed previously in the podcast, when they took a step back and looked at it, what they still saw was an enormous level of complexity in networking, an enormous level of complexity in operations, and they still were seeking a better way, a way that was operationally viable in the long run with a lower total cost of ownership, and the ability to really consume networking services in a way that moved at the speed of business in a way that was more in line with the way that we're using cloud computing storage, and in line with the speed and agility with which their business wanted to move. And that's where Alkira came into the picture, and there was a real alignment of vision between how they saw their networking strategy moving forward and how Alkira delivered services. Long story short, they are now able to take their planning process down from six months to a matter of weeks, and the actual provisioning process of cloud networking to a matter of hours, sometimes less. And that has brought immense value to their business and to their IT organization, again, in terms of agility, in terms of total cost of ownership, in terms of visibility and control, in terms of governance. And another added benefit was historically they were single cloud, AWS, but in the process of their journey, with Alkira, the need came up to go into Azure for some Azure native services in a scenario where the data still resided inside of AWS, and that request historically would have been months and months of due diligence to get the environment up and running, and in their case, they were able to do that all within a day because they were already leveraging the Alkira multi-cloud platform.So, a tremendous amount of value for them across a myriad of fronts that, again, have been pivotal to their long-term strategy and how they address cloud networking moving forward.Corey: If we go back to the early days of cloud, we started off with some of the advanced stuff like, you know, virtual machines—some places called them instances—and there was a lot of competitive variation between them. “Well, these instances cost a fifth of what this other cloud providers do.” “Yes, but that other cloud provider [unintelligible 00:19:47] don't fall over every 20 minutes and have persistent disk.” In the fullness of time, everything's sort of commoditized to the point where now, in many cases, if you're just running a bunch of virtual machines on cloud providers, it's largely a matter of price. The same story has happened in many respects with object store. Do you think that the network will eventually wind up commoditizing as well, or do you think that there's still going to be significant variances as the rest of the cloud world grows up on top of that bedrock foundation?Rasam: I think that's a brilliant question, and I think the answer is yet undetermined. I don't think it's clear. I think there are a lot of different approaches to trying to solve for the challenge of networking in a cloud, first world, right? And most of the solutions on the market address some subset of the problem: some gets you to the edge of cloud, some really reside on the edge and try to interconnect you to the various clouds that you want to be in, some are meant to help you orchestrate your cloud footprint once you're in the cloud. The underlying challenge remains that, at its core, cloud networking itself remains extremely complex and extremely siloed.If you zoom out and look at your traditional enterprise architecture, it's a bunch of siloed solutions that have been stitched together to meet the end-to-end workflow. Well, cloud is kind of a microcosm of that. The same thing happens in cloud is, you have a lot of manual intervention of stitching together the various pieces to meet the end-to-end workflow. None of the existing approaches on the market are really operationalizing cloud from a networking perspective the way DevOps and containerization has done with compute and storage and made it really a seamless part of an end-to-end infrastructure as code strategy. So, I think everyone is really trying to tackle that problem in a way that hopefully, the end state will be one that is aligned with the underlying value proposition of what cloud brings to an enterprise.But how that is going to end up looking and whether or not it ends up being a singular sort of end-to-end infrastructure as code strategy that ties the pieces together elegantly, or ends up being all these various piece-parts that are solving a best of breed problem but still need to get stitched together, I think remains to be seen.Corey: One of the things that I think networking has had in common or is at least spiritually aligned with the world of security is that when it isn't working, “Well, we're going to go ahead and make things broader and broader and broader, and we're going to go ahead and grant everything access to everything, and once we get it working, then we're going to go back and dial that back down because we want to be secure.” Yeah, no one ever remembers to go back and dial things back down. Once it's working, we're on to the next ticket, in many cases. So, the complexity doesn't just act as a drag on feature velocity; it also acts as significant security risk in many environments. How do you folks tackle that, or think about that? Or is that one of those, “Oh, that's the best kind of problem: someone else's.”Rasam: I think at the root of that problem is the visibility and control problem because it's easy to do something, to turn some knobs to get something up and running and then forget about it. And if you don't ever go and touch that part of your network again, then you can easily end up in a situation like the one that you described. And that's why we really think of the idea of solving for this problem as needing a new paradigm and a new way of thinking, which is a unified fabric, end-to-end, in a multi-cloud world, with a full stack of network services that addresses the vast majority of the use cases that an enterprise would have. So, we're literally giving you a single user interface for full visibility and control end-to-end for all of your networking use cases, be they on-prem, for your remote users, for your branches, or any of the clouds that you might be in.Corey: When you find that you're talking to your prospective customers that, in the fullness of time, become actual customers, and they wind up going from, “Okay, this might work,” to, “This is awesome,” what do you find that they're, first, the most surprised about during the adoption? And secondly, what do you think their biggest misunderstanding along the way was?Rasam: You know, the way that you leverage the Alkira Network Cloud—which is what we call it. We call it Alkira Network Cloud because it is in fact a network cloud that delivers all your full stack of network services in a cloud model. But the way you leverage the Alkira Network Cloud is you go through a multi-step, really simple workflow. So, we have this concept of cloud exchange points, which you can think of as virtual PoPs, and they reside all over the world. So, the first thing you do is you pick your virtual PoP or PoPs—you can have one or multiple of them, as many as you need—and the next thing you do is you attach your sites to this fabric.And there are multiple ways you can do that. You can do that through high-speed dedicated connectivity like AWS Direct Connect, you can do it by extending your SD-WAN fabric into the Alkira fabric, you can do it through IPsec connections, you could do it through remote access for your users. But that's the first step. And then the next step is to attach your cloud VPCs or VNets. So, you go through a process of providing your credentials for your cloud properties, and you attach the cloud properties to the Alkira fabric, and in the middle, there's the step of defining your segments.So, you define logically what your segments will be, and then you assign your sites, or your users, or your cloud properties to that segment. And literally, I mean, that's five steps, and at the end of those five steps, you just established end-to-end multi-cloud connectivity from your sites, and branches, and data centers, and users to your cloud properties end-to-end with full visibility and control. And usually, that process can take 30 minutes, if you have all of your credentials and the necessary data lined up for what you're connecting and the sequence that you want to go through, and at the end of that half-hour, people that are new to the platform will stop and say, “That's it? We're done? It can't be that easy.” And in fact, it was that easy. And that's really the big aha moment for a lot of our enterprise customers that see the platform for the first time of, like, “Wait. This is way, way different than anything I've seen before.”Corey: Your website has a 30-minute challenge for configuring a network, and I haven't run myself through it yet with a stopwatch, but the fact that you can even make that claim means that there's something radically different because frankly, it takes that long to find that the networking section of the console in many of the cloud providers. Something you just said was—talking about your enterprise clients; do you find that you're generally working in the enterprise space, or do you tend to have offerings that make sense at the SMB scale? In other words, when is it time to start talking to you folks? Invariably, “After someone probably should have,” seems to be a common refrain, but at what scale does Akira begin to make sense?Rasam: Yeah. I think I use the term ‘enterprise' sort of, more generically than your large enterprise.Corey: Oh, to me, a big company is anything with more than 200 people, so I'm the wrong person to ask on that score. But yeah.Rasam: Yeah, and I would say I agree with you, and that's kind of the definition of when I say enterprise for me. Because networking is a horizontal problem. Every company needs networking and no matter what the size of your organization, if you're going into cloud, you're going to have to deal with the challenges of cloud and operationalizing the challenges of cloud. Now, the larger you are and the more clouds you're in, the greater the complexity that you have to deal with and the greater the operationalization of that complexity. So, we deal with large enterprises that are deep into their cloud journey and find themselves back-ended into complexity and looking to simplify.And we also have enterprises that are born in—I'm sorry. When I say enterprise, I'm talking about customers that are born in the cloud, startups that are really looking for a simplified and operationally aligned networking solution with the way that they're intending to leverage cloud. So really, if you're getting into cloud, and you're getting into cloud networking, and you have a cloud-first strategy, regardless of the size of your organization, the chances are pretty good that Alkira is going to be a good fit for you.Corey: Thank you so much for taking the time to speak with me today. If people want to learn more about what you're up to, how you view these things or basically take it for a spin themselves where can they find you?Rasam: On alkira.com. So, www dot alkira—A-L-K-I-R-A dot com, and take a look at our resources page. It's packed with great content. And like I said earlier, you really have to see this to believe it, so we're happy to show you; request a demo and we'll get online for you and take you through the journey.Corey: Excellent. Well, thank you so much for taking the time to speak with me. I really do appreciate your being so generous with your time.Rasam: Thank you, Corey. I really appreciate it.Corey: Rasam Tooloee, cloud networking evangelist. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you hated this podcast, please leave a five-star review on your podcast platform of choice along with a comment containing the proper Terraform configuration to get IPsec working between two different clouds.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
TranscriptJesse: Welcome to Meanwhile in Security where I, your host Jesse Trucks, guides you to better security in the cloud.Announcer: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the cloud: low effort, high visibility, and detection. To learn more, visit lacework.com. That's lacework.com.Jesse: Don't be stupid. Focus on your real risks, not hacker movie risks. It is easy to get caught up in a type of advance for persistent threats and the latest in obscure attack methodologies to the point where you spend all of your energy and time hunting for these in your systems. This stuff is right out of the latest bad hacking movie. It's a colossal waste of time for most of us. Spend your time on learning and monitoring things based on your real risk, not your overblown sense of self-importance that the latest international crime ring of nation-state-backed hackers wants to breach your defenses. News flash: APTs probably don't care about you. If you make it fairly easy to get your data and use your resources, of course you'll get popped. That's like leaving your wallet on a bench in the park; of course someone will take it. Raise the barrier to entry for obtaining your resources and you reduce opportunistic crime, just like locking your car at night protects from casual pilfering through your things.Meanwhile, in the news. Amazon Sidewalk Mesh Network Raises Security, Privacy Concerns. Tangential to cloud security, these types of networks worry me for privacy and physical security concerns more than cybersecurity for the device and users. As this article says, privacy and security are separate issues. Conflating the two can compromise one or the other or both. Don't confuse privacy and security as being one and the same.This Week in Database Leaks: Cognyte, CVS, Wegmans. I routinely hammer on securing your cloud storage and other ways to minimize self-exposure of sensitive data for a reason. You should be scared of the implications of these exposures in terms of business risk, reputation loss, and regulatory violations and fines. In other words, don't be stupid.Data is Wealth: Data Security is Wealth Protection. Ignore the schilling of services as usual and take in the message: protecting your data is your prime directive. Ask yourself every morning, “How will I protect my data today?” Doing anything else is doing it wrong.Google Workspace Adds Client-Side Encryption. This means you can store encrypted data in your Google accounts without Google having access to the contents of your data. This is a big deal. Take advantage of this if you use Google for document creation and storage.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn't translate well to cloud or multi-cloud environments, and that's not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Jesse: Cybersecurity Tips for Business Travelers: Best Practices for 2021. I plan to avoid a return to routine business travel, but if you want to, or don't have a choice not to get back on the road, do it safely. If you don't want the US Customs and Border Patrol agents searching your devices, wipe your phone before reaching customs. You can set your device to wipe on too many failed passcode entries then backup your phone right before boarding or departing the plane and wipe it on the way to the customs by tapping one number over and over as you walk off the plane.2021 Verizon Data Breach Incident Report insights. The annual Verizon data breach incident report—known as DBIR—has incredible and useful insights for all tech workers, not just security practitioners. Once again, humans are the weak link. I know spending more time educating your people than hunting for ABTs is boring sauce, but you'll be better off.One in Five Manufacturing Firms Targeted by Cyberattacks. If you create real-world goods, you are a prize target. Don't be fooled into thinking you're safer because it's harder to steal things in meatspace than in cyberspace.Confidential Computing: The Future of Cloud Computing Security. Using hardware-level security is still possible in the cloud. Most of us don't need to encrypt everything on a system or everything running in memory, but some of us do need to be that paranoid. However, don't do this unless you really truly have a business case for it, and to implement checkout services like AWS CloudHSM for encryption of in-use memory and data.Many Mobile Apps Intentionally Using Insecure Connections for Sending Data. Don't use insecure transport in your apps. Encrypt your data in transit. Eventually, consumers will have ways to disable all apps that don't use basic security measures like proper authentication without stored credentials or using unencrypted channels. Don't be stupid. Are you sensing a theme of the week?The Art and Strategy of Becoming More Cyber Resilient. Resiliency in IT architectures and applications is becoming the only way to survive the modern distributed world, especially in cybersecurity. You need to change your whole paradigm to be risk and recovery-based, not just the old-school defender attitude of building lots of walls.Cyber is the New Cold War & AI is the Arms Race. The whole AI marketing trope gets old. Ugh. But the message is accurate. There is too much data even in small systems to manage detection and protection without advanced math hunting for anomalous things that go bump in the night. We are in an arms race and we are at war. If nothing else, I like this article because it says what many of us in security always say: “It isn't if you get popped; it's when you get popped.”The Future of Machine Learning and Cybersecurity. A reality check on using advanced math for security monitoring and analysis is important. Use it but don't rely on it too much. Like with all things in life, find balance between known attack analysis and mathematically finding potential attack indicators.And now for the tip of the week. Use a virtual private cloud or VPC for any systems or services not requiring direct public interaction. All three of the biggest public cloud providers have these available. Both AWS and GCP use the term VPC, but Azure calls it an Azure Virtual Network or VNet. This is as simple as setting up a private network for your compute and storage systems and adding a second network for public access for your outside interactions with users and external services. They're easy to implement, and you get significant improvements in security and risk profile reduction quickly using VPCs. This is the cloud version of keeping your things hidden behind a firewall on-prem.And that's it for the week. Securely yours Jesse Trucks.Jesse: Thanks for listening. Please subscribe and rate us on Apple and Google Podcast, Spotify, or wherever you listen to podcasts.Announcer: This has been a HumblePod production. Stay humble.
שלום וברוכים הבאים לפודקאסט מספר 412 של רברס עם פלטפורמה. התאריך היום הוא ה-13 ביוני 2021, - הייתי אומר שזה תאריך קצת היסטורי: ככל הנראה היום הוקמה איזושהי ממשלה, אנחנו לא יודעים האם היא תחזיק מעמד, אבל ההצבעה הייתה ממש היום ואנחנו במתח לקראת מה שהולך להיות [מחשש שמספר הממשלות יעקוף את מספר הפרקים של רברסים?].היום אנחנו מקליטים פודקאסט עם ינון מחברת Via - תיכף תציג את עצמך - ויש לנו גם אורח מיוחד היום: כמחליף לאורי יש לנו היום את יונתן מחברת Outbrain - היי יונתן!שלום וברוך הבא - יונתן עובד ב-Outbrain כבר כך-וכך שנים וגם התארח בעבר בפרקים שלנו, אבל זה כבר ממש ממש היסטוריה עתיקה [נגיד 328 The tension between Agility and Ownership או Final Class 23: IDEs או 131 uijet או 088 Final Class 2 ... יש עוד].אז קודם כל - כיוון ש-412 זה Precondition Failed - כולם יודעים, נכון? לא הייתי צריך לבדוק את זה לפני השידור, ממש לא, זכרתי בע”פ . . . - אז יונתן, בוא וספר לנו על ה-Precondition שלך. או מי אתה, במילים אחרות . . . (יונתן) אז קודם כל - אני מאזין ותיק של רברסים, אני חושב שכשהגעתי ל-Outbrain לפני 10 שנים, אחת הסיבות הייתה הפודקאסט.הגעתי כמהנדס Backend, ובחמש השנים האחרונות אני מוביל ב-Outbrain את הפיתוח.(רן) “מוביל ב-Outbrain את הפיתוח” זו הדרך שלך להצטנע ולהגיד שאתה מנהל הפיתוח?(יונתן) מנהל הפיתוח . . .(רן) יפה, טוב - פיתוח ב-Outbrain זו קבוצה גדולה, יש לך הרבה עבודה, ותודה שבאת(יונתן) בשמחה.(רן) אז ינון מחברת Via, והיום אנחנו הולכים לדבר על הנושא של Serverless - אבל מזויות אחרות, זויות שעדיין לא כיסינו.לפני שנכנס ככה לנושא - ספר לנו קצת על עצמך.(ינון) אז אהלן, אני ינון, נעים מאוד.אני נמצא ב-Via משהו כמו שלוש שנים וקצת.סתם כרקע - דיברנו על Precondition - הגעתי ל-Via מחברת Ravello לפני כן - למי שלא מכיר, Ravello תומכת די גדולה ברברסים וככה גם התוודעתי גם לפודקאסט, וככה הגעתי אליך, רן . . .אספר קצת על Via, אני חושב שהם התארחו כבר בפודקסט [אכן - 360 Via] אבל עוד פעם, אולי מזוית אישית שלי, אני תמיד אוהב לספר לכולם צ'יזבט על Via . . .(רן) אז, דרך אגב, אירחנו איש מוצר, אז זה היה פחות טכנולוגי - והיום אנחנו הולכים להיות הרבה יותר טכנולוגיים.(ינון) מעולה - זו ההקדמה שאני עושה.למעשה, Via ככה התחילה . . . אני מספר תמיד צ'יזבט כזה, שאף אחד לא אישר או הכחיש במסדרונות של Via.לפי הסיפור, ה-CTO וה-Founder שלנו, אורן, הסתובב יום אחד ברחוב אלנבי וניסה לתפוס מונית שירות לכיוון תל אביב, לכיוון האוניברסיטה - ולמעשה גילה שזה די מסובך, בתקופה ההיא.לפני 7-8 שנים, משהו כזה, לתפוס מונית שירות זה לא כזה קל - צריך למצוא, ולאן להגיע, ואיך לעלות עליה ומה עושים איתה, ואיך שעולים צריך לשלם את הכסף הזה וזה נורא מסובך . . .ומה שעלה לו בראש זה ש”וואלה, זה רעיון מגניב הדבר הזה, זה סוג של בלגן מזרח-תיכוני כזה מגניב”, של מי שהיה כבר בתוך התחבורה הציבורית - ומצד שני, איזה כיף היה אם הייתה איזו אפליקציה או משהו שהיה מאפשר לפחות להזמין מונית, להגיע איתה מאיפה שאתה רוצה לאן שאתה רוצה, שאומרת לך איך להגיע, מה לעשות, זה היה משלם בשבילך . . . מרגיש כמו חלום.אז הוא הלך והגשים אותו, וככה Via התחילה את חייה - בניו-יורק . . .משם התגלגלנו להרבה מאוד מקומות אחרים - התחלנו כשירות סוג-של-Consumer ומשם הלאה זה התגלגל לשירותים שונים של Pre-booking ו-Paratransit.היום אנחנו מתעסקים גם ברכבים כמו School Bus, כלומר - כל מערך שירות האוטובוסים של עיריית ניו-יורק, כך שבעצם אנחנו יכולים לעשות המון המון דברים.ואת כל זה אנחנו בונים בעצם מעל Stack טכנולוגי אחד ויחיד, שכמו שרן קודם רמז - רובו ככולו נעשה מעל Serverless.(רן) כן, אז מיד ניכנס לסיפור הזה . . . דרך אגב, אני מניח שהרבה מהמאזינים מכירים את Via, וגם היו הרצאות של עובדים שלכם בעבר בכנסים - יש שם ערבוב מעניין של טכנולוגיה, אלגוריתמיקה, Data Science ודברים אחרים - וכמובן גם סיפור מוצרי.יכול להיות שמי ששומע את ה-Pitch שלך אומר “אה! זה Uber!” או אחרים - אבל זה לא . . . הפעם לא ניכנס לזה, כי אנחנו לא עושים פרק על מוצר. יש הבדלים, אבל לא ניכנס אליהם [וכמו כן - 360 Via].ועכשיו - בוא ניכנס לטכנולוגיה: אז אחד הדברים המעניינים ב-Via זה שה-Stack הטכנולוגי כולו, או רובו, רץ מעל פלטפורמת Serverless.אז בוא ספר לנו - איך בנוי ה-Stack שלכם?(ינון) אז בוא נתחיל אולי גם כן היסטורית -אז היסטורית, Via, כמובן, כמו כל חברת היי-טק ישראלית טובה, סטארטאפ ישראלי, התחילה עם Monolith, שלשמחתינו או לצערינו קיים עד היום, אותו Monolith מפורסם שנקרא “Via Server”, שם מאוד מקורי . . . ואותו Via Server רץ במקור על הרבה מכונות EC2 באמאזון - די סטנדרטי, כמו שאתה מצפה, הרבה לפני Kubernetes .מה שגילינו זה שככל שמתפתחים, והזכרתי קודם מקומות ש-Via התפתחה - בעצם ה-Stack עצמו התחיל להיות נורא-נורא יקר . . .מצד אחד, היינו נתקלים בהרבה מאוד בעיות של Scale-up - זאת אומרת שהיה צריך לעשות הרבה Scale-up ולהגדיל את ה-Monolith הזה כדי לתמוך ב-Traffic שכל הזמן גדל, ומצד שני, אם היינו משאירים אותו ב-Scale מאוד גבוה, אז AWS היו מאוד נהנים ו-Via פחות . . . מה גם שב-Via מסתכלים גם על צורת השימוש, בעיקר במקור אבל גם היום - יש תקופות שיש בהן Peak, אפשר לחשוב על זה גם בתל אביב, אנחנו מפעילים גם את שירות Bubble, אז אפשר לחשוב שבשעות הבוקר, בערך מ-07:00 עד 09:30 - מי שמנסה לתפוס Bubble יודע שזה לא פשוט, הרכבים מאוד מאוד עמוסים, ואתם בעצם מפציצים את השרתים שלנו . . . ואותו הדבר קורה בשעות הערב.מצד שני - מי שמנסה לתפוס Bubble בסביבות השעה 12:00, אז החיים שלו מאוד קלים, הוא מוצא אותו מאוד מאוד מהר - וזה פשוט כי יש פחות ביקוש, יש פחות אנשים שרוצים להגיד אל ומ-העבודה.(רן) אבל נשמע שאתם יודעים מה הולכות להיות שעות העומס . . . למה לא פשוט לעשות Scale-up ו-Scale-Down, כשאפילו יש לך חמש דקות לעשות Warm-up לשרתים, אם אתם יודעים מראש . . . ?(ינון) מעולה - אז ככה אמרנו גם אנחנו . . . “מעולה! החיים קלים - אנחנו יודעים לעשות warm-up ו-Scale-Down”הבעיה היא שבשביל זה צריך קצת לחזות ולהבין מה באמת יהיה ה-Traffic - ובכל פעם צריך כמו בפיתוח - לשים באפרים (Buffers) . . .“כמה יהיה מחר בדיוק? אז יהיו מחר 1,000 נוסעים? אז בוא נשים 10 שרתים, או 15 שרתים . . . “ואז למחרת קמים בבוקר - ויש גשם, וגשם זה מכה . . . אז פתאום ה-1,000 נוסעים הופכים ל-2,000 . . . ואם מסתכלים על עיר ב-Scale של ניו-יורק, שהיא עיר מן הסתם הרבה יותר גדולה, אז שם זה הופך מ-10,000 נוסעים ל-100,000 פתאום . . . במכה אחת פתאום כולם מפציצים.אז אם אין גשם - נהדר, זה אומר שמישהו צריך ב-06:00 בבוקר לשים לב ולהגיד “וואו, כדאי שנגדיל עוד יותר את השרתים” - ואתם יכולים לדמיין כמה פעמים מישהו פספס, או פספס לכמה שעות, פספס את ההערכה שלו, ובמקום 100 שרתים הוסיף רק 50 - והפסדנו.(רן) אני מבין את הנקודה הכואבת - זה לקום ב-06:00 בבוקר . . . זאת אומרת, אם זו הייתה חברה שפעילה בצהריים, אז לאף אחד אולי לא היה אכפת, ולא הייתם מגיעים ל-Serverless, אבל לקום ב-06:00 בבוקר זה כבר סיפור . . . (ינון) זו סיבה ממש מעולה לעבור למשהו אחר . . . הבעיה השנייה היא שגם אחרי כשהיינו מעלים את ה-50 שרתים - מישהו גם היה צריך לזכור להוריד את זה אחר כך . . . זה לא תמיד כזה קליל של “נרים עכשיו ואחר כך נזכור”, כי שוכחים, ולמחרת לא . . . והחשבון AWS נראה פתאום לא להיט.אז חפשנו פתרון שיאפשר לנו לעשות גם את ה-Scaling האוטומטי.מצד אחד אתה אומר “אחלה, אז יש פתרונות יותר מודרניים” . . . Kubernetes הגיע בשלב יותר מאוחר, אבל גם לפני כן היו פתרונות של Auto-Scaling Groups ב-AWS, אפשר היה להרים גם איתם.הבעיה היא שכשמסתכלים על זה רגע - אז Monolith שכזה, אמנם כתוב ב-Python, שזה עולה יחסית מהר - ועדיין עד שהוא עולה ועושה את ה-Init שלו, ומוריד . . . אפשר לחשוב קצת על מה ש-Via עושה, אז צריך להוריד את את המפות, צריך להוריד קונפיגורציות, צריך להכין כל מיני דברים . . . וזמן ה-Warm-up והבנייה של ה-Container הוא לא קצר - זה יכול לקחת גם דקות, תלוי כמובן בגודל ה-Traffic ובגודל הדברים שצריך להעלות - וזה כמובן די כואב.לעשות Scale-Up שמסתמך רק על ה-Auto-Scaling הזה מראש זה לא מספיק מהר, ויש תקופה לא מספיק קצרה שמפסידים.מפסידים כסף, מפסידים תנועה - וגם יש שירות ממש לא מוצלח למשתמש שמנסה לנסוע.וזו הסיבה שהתחלנו לחפש דברים אחרים.(רן) אבל זמן טעינה כזה - בטח יבוא יונתן תיכף ויטען - “רגע! אבל אתם לא Monolith! יש Microservices!” . . . אז למה ה-Monolith צרך לטעון את כל המפות של כל העולם ואת כל שאר הדברים? נכון, זה כבד - אבל יש לזה גם פתרונות אחרים, לא רק Serverless . . .(ינון) מעולה - זה השלב שבאמת הסתכלנו - ותודה יונתן על השאלה . . . - הסתכלנו ואמרנו “אוקיי, פתרון אחד זה באמת להגיד יופי, בוא נבנה את זה עם כל מיני Microservices”למעשה, אפשר להסתכל על זה ולהגיד שזה הפתרון שבחרנו - השאלה רק עכשיו היא רק מה ה-Transport שלו, מה ה-Pipeline שבאמצעותו אנחנו בעצם מרימים את אותו הדבר.אופציה אחת הייתה להגיד “אוקיי, נכתוב את הקוד באוסף של שרתים קטנים, ב-Python . . .”, ואגב - בחלק מהדברים זה מה שעשינויש מקומות שבהם . . . Via לא דוגמאטית ואומרת “Serverless is the only way”, זה לא הדרך הנכונה שלנו להסתכל על זה.אנחנו אומרים שבמקומות שבהם אפשר לעשות את זה בצורה קלה דווקא שלא על ידי להרים Service שלם וכבד מעל Kubernetes אלא להסתכל על יתרונות של דברים אחרים, זה היה המקום שבו הסתכלנו על Serverless.אם מסתכלים רק על למה בחרנו ללכת עם Serverless בחלקים ספציפיים, אז בעצם שמנו לעצמנו כמה נקודות מעניינות - אמרנו שאנחנו רוצים כמה שפחות התערבות של DevOps, כי DevOps זה דבר יקר וזה דבר מסובך - לא רק מצד האנשים אלא גם עצם הזמן שמושקע ב-DevOps, בלהרים סביבות ולסדר אותן - מאוד יקר.גם בסביבה מאוד מוצלחת כמו Kubernetes, שבאמת יש לה הרבה יתרונות - עדיין יש הרבה מאוד קונפיגורציה שצריך לעשותצריך להבין מהם הפרמטרים שבעזרתם אנחנו קובעים Scale-up ו-Scale-Down ו-Scale-In - ובעצם לקנפג (Configure) את השרת, לעשות Fine-tuning כל הזמן, כדי להגיע בעצם לתוצאות שהיינו רוצים להגיע אליהן.אז גם בסביבה של Microservices קלאסית כזאת, שבה יש Containers ו-Pods, רוב עבודת הקונפיגורציה הזאת היא עלינו, אחריות שלנו . . . (רן) אני חושב שיש חוק, כלל שימור האנרגיה בטכנולוגיה: עבודה לא נעלמת - היא משנה צורה . . . אם לפני כן היית צריך לחווט כבלים, אז היום אתה צריך לקנפג VPCs, ואם לפני כן היית צריך לקנפג איזושהי מכונה, אז היום אתה צריך לייצר איזשהו Script או לעשות איזושהי קונפיגורציה ב-Terraform או כל כלי אחר . . .ההתמחות משתנה, אבל העבודה לא נעלמת(יונתן) אמרתם שאתם לא דוגמאטיים, זאת אומרת - אתם לא אומרים שזו הדרך היחידה. יש דברים שבהם אתם כן משתמשים עדיין ב . . . ה-Monolith עדיין משחק תפקיד? ה-Microservices עדיין באיזור? או שזה . . .(ינון) קודם כל, ה-Monolith עדיין קיים - לא בכל Deployment ולא בכל מקום, אבל עדיין קיים בלב של חלק מה-Deployment שלנו.ויש לנו עדיין כמה מה-Services האחרים, שהם Microservices סטנדרטיים, עם Containers, חלק כתובים ב-Java, חלקם ב-Python, ועדיין קיימים כ-microServices קלאסיים, Docker Containers בתוך Kubernetes.בעיקר במקומות שבהם יש צריכת זכרון מאוד גבוה - אנחנו צריכים להוריד . . . לצורך העניין להחזיק את המפה - מפה, מן הסתם, זה אובייקט שלוקח הרבה מאוד זכרון, ושם דווקא יוצא לנו יותר נוח להחזיק אותה למשל בתוך Container.כך שיש לנו גם Kubernetes stacks שלמים.עם זאת, במקומות שבהם זו לוגיקה או שהוא “Container Glue” - וזה, אגב, הרבה ממה ש-Via עושה . . .תחשבו, לצורך העניין, על נהגים שמסתובבים בעיר ומדווחים לנו מיקום - הם צריכים כל הזמן לדווח איפה הם נמצאים ולקבל הוראות - זה משהו שלא מצריך הרבה זכרוןמה שהוא באמת מצריך זה את היכולת לעשות Scale-up ו-Scale-In, לפי כמות הנהגים שמסתובבים כרגע בכביש.אז במקום, בעצם, להרים Containers שיודעים לטפל בדבר הזה, גילינו שהרבה יותר קל לנו להרים “micro-micro-micro-Containers”, או “Nano-Containers” כאלה - שזה, בתכל'ס, Lambda . . . אז זה בדיוק מה שזה עושה.(יונתן) אז זה Use-case של, נגיד, לקבל את המיקומים של הנהגים ולכתוב אותם איפשהו, אני מניח? יש Use-cases אחרים, נניח אם אני רוצה להזמין מונית, זה גם . . .(ינון) גם זה על Serverless, לגמרי. גם זה ירוץ Serverless.בעצם תגיע בקשה, לאיזושהי Lambda, שיושבת, לצורך העניין, מאחורי או ALB או איזשהו API Gateway.היא מחוברת ישירות לתוך ה-Lambda - ומשם, למעשה, יכולה לרוץ “שרשרת של Lambda-ות” . . .הגיעה בקשה - מזהה מי הנוסע - משם זה רץ למוקד ההזמנות שלנו, שזה בעצם המוקד “שמדבר” מול הרכבים - תבוצע הזמנה - זה יעבור לאיזושהי Lambda שיודעת לנהל תשלומים, מן הסתם צריך לבדוק שאתה גם רשאי לעלות על הנסיעה . . . זה גם יעבור משם לאיזשהו מקום שהנהג מקבל בו את ההוראותמשם זה יפנה לשירות המיפוי - שזה, להזכיר לכם, Container - נגיד לו “נא לייצר לנהג מסלול חדש”, שיוביל אתכם לאיסוף של אותו נוסע.ובסופו של דבר זה יתורגם כהוראות בחזרה לנהג - והנהג מקבל הוראה וימשיך לשדר לנו את אותם דיווחים, שאנחנו קוראים להם Heartbeat, בשם המקורי . . .וזה יגיע חזרה, בעצם, למערכות שלנו, וימשיך את אותו Flow שהזכרתי קודם.(רן) בוא נדבר רגע על כסף, כלכלה . . . קודם הזכרת שהיו Instances של EC2, והייתם צריך לעשות Scale-up ואז אולי שכחתם לעשות להם Scale-Down וזה עולה כסף וכו' . . . לי יצא לעבוד Serverless, בסטארטאפ הרבה יותר קטן מ-Via, וזה גם היה לפני כמה שנים, לפני חמש שנים או משהו כזה, ואז היה ברור ש-Serverless זה יקר . . . זאת אומרת - יש יתרונות בצד של האופרציה, יש מודל תפעולי, יש מודל תכנותי שהוא בריא, את כל הדברים האלה מאוד מאוד אהבתי - אבל דבר אחד היה ברור: שזה הולך להיות מאוד יקר כשנעשה Scale-up.אם אין כלום באוויר, אז נכון, זה לא עולה - אם לא קוראים לפונקציה שלך, אז עץ שנופל ביער ואף לא שומע אותו אז הוא לא באמת נופל . . . אז זה ברור שיותר קל מאשר לתחזק Container של EC2.אבל ברגע שיש Traffic משמעותי, והפונקציה כל הזמן נקראית, אז היה ברור, לפחות אז, שזה גם הולך להיות הרבה-הרבה-יותר יקר מאשר לתחזק Microservice משלך.איך נראית הכלכלה של זה היום?(ינון) אז בוא נספר לכם את זה ככה . . . נתחיל דווקא מסיפור ואז ניכנס לאט לאט לכיוון הזה.בעצם, בתחילת משבר הקורונה [הסיבוב הראשון . . .], אתם יכולים לדמיין מה הייתה ההשפעה של זה על שירותי ההסעה . . . במכה אחת, בוקר אחד, בתוך שבוע פחות או יותר, עברנו ממאות אלפי נוסעים בניו-יורק לבערך עשרה.אף אחד לא נסע, אף אחד לא זז - וזה היה בכל העולם, לא רק בניו-יורק.מצד שני, אם מסתכלים רגע עכשיו על הכלכלה, או על מה שעלה כסף - במכה אחת כל ה-Lambda-ות שלנו ירדו לאפס, הפסקנו לשלם עליהן לחלוטין,ולעומת זאת כל אותם Containers - שנשארו ב-Monolith וכל מיני כאלה - השאירו שם לא מעט דולרים, שהמשיכו לזרום ישירות לכיסים של מיסטר בזוס . . . [קצת אמפטיה, לאיש יש חללית לבנות].כך שלפחות ברמת ה-Scale-Up / Scale-In, יש לזה כלכלה שהיא סופר-מוצלחת - אנחנו לא צריכים להשאיר בשום שלב “ספיירים” כדי להתמודד עם עומס “למקרה ש…"אפשר לדבר על Warn-ups, אבל לא משאיריםומצד שני, גם בזמן שהיא באוויר והיא כן עושה את הפעילות, אנחנו רואים שזמן העיבוד בפועל, שבו ה-Lambda בעצם עובדת, הוא מאוד נמוך - בעיקר כי כי משתמשים בקריאות א-סינכרוניות.אם עובדים בצורה שהיא יותר א-סינכרונית - כלומר, קריאות שמגיעות עוברות . . . משקיעות את רוב הזמן שלהן במעבר בין Lambda-ות בתוך תורים, ואין בו קריאות סינכרוניות החוצה - פתאום הזמן שבעצם ה-Lambda רצה הוא מאוד מאוד קטן [קצר].ספציפית, אגב - לפני כמה חודשים AWS החליפו את צורת ה-Billing שלהם ממינימום של 100 מילי-שניות למינימום של 1 מילי-שנייה ל-Lambda, וזה שיפר משמעותית את העלות שלהן, בעיקר של Lambda-ות קצרות, שזה הרבה ממה שאנחנו עושים.(רן) הבנתי - זאת אומרת שאם, לדוגמא, ה-Lambda-ות . . . נגיד, אני אתאר איזשהו Flow של -Lambda שקוראת ל--Lambda וכו' - אם כל אחת מהן מחכה לשניה בצורה סינכרונית, אז אתה משלם את החשבון של כולן, אם היא בדרך; אבל אם זה קורה בצורה א-סינכרונית, במעבר דרך SQS או כל מכניזם אחר - אז אתה משלם רק על זמן העיבוד המינימלי. בסדר, אני מבין . . .(יונתן) גם מה שמעניין פה, רן, זה שיש קשר ישיר בין ה-Business - שזה ה-Traffic שאתם מקבלים - לבין העלות, מה שעם Services יותר קשה לעשות את הקשר הזה.הוא חי כל הזמן, גם אם הוא לא יקבל Traffic, גם אם אתה לא “מקבל כסף”.(ינון) בדיוק - בערים של Via, יש ערים שלמות שבהן אין לך שירות בשעות מסויימות של היוםגם השירות ב-Bubble, לצורך העניין, הוא רק בשעות היום, הוא נגמר סביב 22:00-23:00 בלילה [אפילו קצת יותר].תחשוב שכל הלילה יש איזשהו שרת פעיל, וכן צריך לענות תשובות לשאלות: אם איזשהו נוסע פותח אפליקציה של Bubble ב-02:00 לפנות בוקר, אנחנו ניתן לו תשובה שאין כרגע שירות - אבל בשביל זה צריך שיהיה איזשהו שרת באוויר . . .אז ככל שנעביר יותר מהדברים האלה ל-Serverless, אם מישהו יבקש בקשה אז הוא יקבל, אבל אם לא - אז אין צורך בכלל להרים את ה-Service.(רן) אז אתה אומר שבגלל שאתם מאופיינים באלסטיות מאוד גדולה - אולי קורונה זו דוגמא קיצונית, אבל עדיין ביום-יום יש אלסטיות - יש סופי-שבוע, יש שעות שונות במהלך היום, יש חגים . . . בכל אופן, יש אלסטיות בצורה יחסית משמעותית - זה עושה את המודל של Serverless ליותר משתלם אצלכם.אני תוהה - אני לא יודע אם יש בכלל את התשובה, אבל אני תוהה - האם למישהו עם Workload יחסית מאוזן לאורך היממה, האם גם לו זה הולך להשתלם?(ינון) זו תמיד שאלה של מה באמת ה-Workload שלך - וכמה באמת ממה שאתה עושה הוא באמת Broken-down ל-Microservices עד הסוף.למה אני מתכוון? לצורך העניין, אם מסתכלים לרגע על אותו שירות של Via, אז מצד אחד מה שבאמת לוקח המון מהקריאות ומה-Traffic אלו אותם Hearbeat-ים של הנהגים - זה משהו שאנחנו יודעים עליו שהוא מאוד מאוד כבד מבחינת כמות הקריאות שנעשות ומבחינת זמן העיבוד שרץ שם בפנים - גם אם העיבוד עצמו הוא מאוד קצר.אז אם יש לך איזשהו שירות שבו את מחזיק את ה-Heartbeats האלה יחד עם עוד שירות ביחד, בעצם אתה עושה פה Coupling מאוד חזק של של שרת אחד יחד לשתי שכבות יחד.בעולם של Serverless, נורא קל לעשות את ה-Breakdown הזה ממש ל-”Nano-Services” - זה Service שאולי אין לו בכלל זכות חיים משל עצמו, אבל לעשות Scale-up של חתיכה קטנה מתוך ה-Service זה נורא קל.(רן) אוקיי, כן, בסדר - זה היה השיקול הכלכלי. עכשיו, בוא נסתכל על השיקול המתודולוגי.אני אשאל את זה ככה - האם המפתחים שלכם ניהיו יותר טובים, כי הם עובדים Stateless? הם נהיו יותר טובים כי הם נאלצים לרוץ תחת Constraints כאלה של פונקציות Lambda? או במילים אחרות - איך אתה רואה שמתודולוגיה כזאת משפיעה על צורת הפיתוח, איכות הפיתוח, איכות הקוד וכו'?(ינון) אז יש לזה כמה תשובות . . .מצד אחד, כן - המפתחים, בלית ברירה, צריכים לחשוב על עולם שבוא אין זכרון מרכזי, אין שיתוף בין . . . השיתוף היחידי בין Containers הוא בעצם משהו חיצוני, כך שזה גורם לאנשים לחשוב כמה שיותר על איך מחזיקים State ומה עושים איתו.באמת עלינו עם הרבה כיוונים ופתרונות לזה, שגם חלקם, אגב, זה שיקולים כלכליים גם כן . . . לצורך העניין, גילינו שעבודה עם Databases שהם יותר Serverless באופי שלהם, כמו DynamoDB, יוצא לנו הרבה יותר זול - וגם נוח מבחינת Burst-ים של Traffic - מאשר להשתמש ב-Database שהוא “MySQL-כזה”, ושיותר קשה לו לעשות Scale-up.ו-Dynamo, לצורך העניין, הוא גם “אם לא השתמשת - לא שילמת”, אז אם לא קראת אז לא קרה כלום - ולעומת זאת ב-MySQL, גם בגרסאות מוצלחות כמו Aurora, אתה משלם כל עוד ה-Instance למעלה, לא יעזור כלום.בנוסף לכך, גילינו שיש הרבה דרכים גם לשפר את הקריאה מה-Database - אנחנו עובדים כמובן גם עם איזשהו ElastiCache או עם איזשהו S3 כ-Cache מקומי, שעוזר לנו להתמודד בעצם עם Burst-ים של “פתאום אלפי Lambda-ות מנסים לתקוף את ה-Database” [הסרט הבא של Netflix?].ו-Dynamo, לצורך העניין, מתמודד עם זה די יפה, בשביל זה הוא בנוי.כש-Aurora, שהוא Database די מוצלח, קצת פחות נהנה מכזה Burst של Traffic.יש כמה פתרונות ל-AWS, שעובדים ויודעים לפתור את הבעיה הזו - חלק מהם זה אנחנו בנינו בעצמנו לצורך העניין - הרמנו Cache ב-Redis מעל הדבר הזה, בעזרת ElastiCache.אופציה אחרת זה שיש לשים Proxy לפני ה-Database - ובעצם לעשות Connection Pooling לפני ה-Database עצמו.זה באמת מאפשר לנו להריץ, שוב, הרבה Load עם הרבה מאוד Spikiness . . .(רן) בוא רגע נתעכב על הסיטואציה הזאת של Connection Pooling - אני חייב להגיב שגם אני נכוותי מזה . . . ממש אותה סיטואציה שאתה מתאר: פונקציות Lambda, עם Aurora ו-MySQL מאחור - ואלפי פונקציות Lambda שמנסות להתחבר אל ה-Database . . . עכשיו - אם כל האלפים הללו היו בסך הכל Thread-ים בתוך אותו Process, אז יש Connection Pooling ולפי . . . נניח שאתה מחליט שה-Database מרשה שיהיו 300 Connections, אז 300 Thread-ים ידברו עם ה-Database, והאחרים יחכו בתור.אבל פה - Lambda לא יודעת “לחכות בתור” . . . . אז הן מתחילות להיכשל . . .(ינון) ה-Lambda-ות הן באמת יצור קצת אנוכי בקטע הזה - הן לא כל כך “מסתכלות מסביב”ובאמת יש שני פתרונות שאנחנו גילינו והשתמשנו בהם - אחד מהם זה פתרון Built-in של AWS, יש להם Proxy שנועד לפתור בדיוק את הבעיות האלה.בעצם שמים . . . תחשוב על זה כעל סוג של מכונה ששמים לפני ה-EC2, בעצם EC2 לפני ה-Database.ה-Lambda מתחברת למכונה הזאת - והמכונה עצמה מחזיקה Connection Pool - והיא אומרת ל-Lambda “אוקיי, חכי שנייה, אני אתפוס אותך על ה-Connection Pool הבא”.וזה מתנהג, בעצם, מבחינת התפיסה, מאוד דומה למה שהזכרת קודם - בתוך Monolith שכזה . . . זה מאפשר להשתמש בעצם ב-Aurora [מחייב רפרנס ל-The Robots of Dawn . . . .](יונתן) דרך אגב, אתה יודע, רק כדי להשלים את התמונה ואת המוטיבציה - זה לא רק שהן מפגיזות את ה-Database וחלקן נכשלות, אלה למה בכלל מייצרים Connection Pool? כדי לחסוך את זמן יצירת ה-Connection, שב-DCP זה זמן יקר - אבל Lambda לא יכולה לעשות את זה . . . Lambda חייבת בכל פעם לייצר את ה-Connection מחדשואז אתה משלם שוב על Latency - וגם דולרים בסופו של דבר . . .וזו רק דוגמא אחת של Connection Pool - אני חושב שכל Local Cache . . . .כל מה שב-Microservices אתה יכול להשתמש ב-Local Cache, פה אתה בבעיה, אתה צריך פתרונות אחרים.(ינון) נכון . . . אז יש כמה דרכים . . . שוב, כשנתקלנו בבעיה דומה, אגב במקום שבו ה-Database היה Read-Mostly, הפתרון היה בעצם להשתמש ב-ElastiCache כסוג-של-Cache מעל ה-Database.אפשרנו להכריז את ה-Database עצמו כהרבה יותר קטן - ומה שצריך זה Lambda ש”פעם ב” . . . פעם בזמן ה-Refresh-הרלוונטי הייתה פשוט מרפרשת (Refresh) את ה-Cache.די פשוט - לקרוא מה-Database, לדחוף ל-Cache . . . בעצם לעבוד ישירות מול ה-ElastiCache.עלו על כמה פתרונות בדרך, אגב - יש פתרון שנקרא EFS, שבעצם מאפשר להחזיק File System, כשה-Lambda-ות בעצם חולקות, ויש גישה שהיא הרבה יותר קלה, לא צריך להחזיק Connection אלה פשוט זה ניגש ישירות ל-Data.וגם כשהוא באותו Proxy, באמת זה שימושי כדי להחזיק DCP Connection פתוח מול ה-Database ואז רק צריך ליצור Connection קטן מול “הדברצ'יק” הזה.(רן) לפעמים קורה שאתה כן רוצה לשלוט על Server . . . זאת אומרת: אנחנו מדברים על Serverless, ואתה רץ בתוך איזשהו Container. אבל וואלה - לפעמים אתה רוצה לקבוע את כמות הזכרון, לשחק ב-TCP Stack, לעשות כל מיני אופטימיזציות על File Descriptors וכו' . . . מה אתה עושה כשאתה מגיע למצב כזה? מה אתה עושה כשאתה מרגיש שאתה כבר “מגרד את תקרת הזכוכית” בתוך ה-Lambda שלכם?(ינון) קודם כל, אני אשאל אותך - למה? מה המניע?כי בדרך אצלנו, At the end of the day, the business is not that . . . אנחנו לא מתעסקים בלהתעסק עם הקרביים של איזשהו קובץ . . . אם אין ברירה אז אין ברירה, אבל לא מצאנו, עד עכשיו, שום מקום שבו היה צורך בזה.כלומר, העדפנו את הקלות של ה-No-Ops, כשכל מה שצריך לקבוע ב-Lambda זה את כמות הזיכרון שלה - והיא רצה.אני כמובן מגזים, ואפשר לקבוע עוד כמה דברים - האם היא רצה בתוך vPC או מחוץ ל-vPC, יש Security Groups וכו' - אבל בגדול, ברגע שקבעת אותם Once אז גמרנו, ואין מה להתעסק עם זה כמעט.לפי כמות הזכרון בעצם אתה קובע את ה . . . לא את הזכרון אלא את ה-Performance הכללי של ה-Lambda.בגדול, כשאני חושב על זה - כשאתה קובע את הזכרון אתה קובע כמה Lambda-ות רצות על Container אחד של AWSוככל שרצות פחות Lambda-ות, כלומר תופסות יותר זיכרון ורצות פחות Lambda-ות על ה-Container - אז יש לך יותר משאבים בתוך ה-Container: גם CPU, גם Network card - וזו בעצם השליטה שיש לך.בסך הכל - גילינו שכשקצת משחקים עם הזכרון אז זה ממש מעל ומעבר למה שאנחנו צריכים מבחינת השליטה שיש שלנו בתוך ה-Server.לדברים שהם ממש Fine grained - אני מסכים, Lambda לא מתאימה.בשביל זה בדיוק אנחנו הולכים למקומות אחרים כמו Containers ו-Pods.(יונתן) אם מסתכלים קצת אחורה, אז פעם היו “מפלצות כאלה” - היה WebSphere ו-JBOSS ואפילו Tomcat . . . אתה היית כותב את הקוד שלך, עושה לו איזשהו . . . היו קוראים לזה WAR ו-EAR וכל מיני קללות . . . ועושה לזה Deploy בתוך איזשהו Container.וכשהגיעו ה-Microservices זה די הלך לכיוון אחר . . . במקום להיות “אורח” בתוך איזשהו Run-time שמישהו אחר מתחזק, והוא מאוד גדול ומורכב ומוטת השליטה שלך היא קטנה, אתה ניהיה בעל בית של ה . . . אם אתה רץ ב-Python אז אתה ניהיה הבעל-בית של ה-Process של ה-Python של ה-JVM - ובאיזשהו אופן ה-Serverless קצת מחזיר אותך אחורה, לפחות “אחורה” מבחינת האופנה . . . אתה עדיין ניהיה אורח בתוך איזשהו Run-time שמישהו אחר מחזיק ומקנפג (Configure) - איך אתה פה עם “הרטרו” הזה? . . .(ינון) אני מת על הרטרו הזה, כי מי שמחזיק ומקפנג את ה-Server הענק הזה זה לא אני . . . זה התותחים ב-AWS, שיודעים בדיוק מה רוצים - והם די טובים בעולם הזה . . . בגלל זה אני חי עם זה בשלום.אם אני הייתי צריך לתחזק את ה-JBOSS הזה או את ה-WebSphere הזה, אז כנראה שלא היינו מדברים היום . . .מכיוון ש-AWS עושים את זה ואנחנו . . . בסופו של דבר הם באמת יודעים בדיוק מה הם עושים והם טובים בזה, אז אני חי עם זה די בשלום..אני אמנם נתון לחסדיהם, וזה נשמע קצת פטאליסטי, אבל at the end of the day, אם יש מישהו שטוב להיות בידיים שלו זה כנראה החבר'ה ב-AWS שעושים עבודה די טובה.ומה שאני מרוויח מזה באמת זה שאני לא צריך להתעסק יותר עם קונפיגורציות מסובכות, אני בסך הכל I Deploy my code, it works - וזה די הסיפור.בטח כאשר אנחנו מרגישים . . . זאת אומרת, AWS הם מאוד פתוחים מבחינת האימפלמנטציה (Implementation) שלהם ומה שהם מוכנים לספר, ברמה כזאת שהם מאפשרים לך להריץ Any Run-time you want, bring your own Run-time . . . אם אתה רוצה ממש להריץ קוד Fortran מעל Lambda אז No problem, you can do it [ברצינות…] כך שזה אמנם סגור מצד אחד - אבל יש לזה הרבה פתיחות מהצד השני.(יונתן) ואני מניח שאם אתה באמת רוצה להיות בעל הבית של ה-Process, אתה תעשה Microservice שיפתור את הבעיה, אם אתה צריך לקנפג (Configure) את הלא-יודע-כמה Descriptors שאתה צריך . . .(ינון) בדיוק, ואגב - גם שם, זה קצת שונה, אבל במובן מסויים אתה Hosted בתוך Kubernetes . . כלומר - יש לך יותר שליטה על ה-Process, אתה שולט באמת על ה-Run-time, יש לך את ה-Pod . . .מצד שני, יש לך עדיין איזשהו “בעל-בית” שאומר לך “שמע, אתה לא בדיוק עושה את מה שאתה . . . אני עדיין בעל הבית פה”.(יונתן) נכון.(רן) אתם עדיין “Python-Shop”? או ש . . .(ינון) עדיין Python-Shop . . .(רן) זאת אומרת - ברמת העיקרון, Lambda מאפשר לך, אפילו יותר בקלות, לגוון בשפות היעד - אבל זו הזדמנות שעוד לא לקחתם.(ינון) נכון - חוץ מהעובדה שבעצם מה שבאמת משפיע, ואפשר לדבר גם על זה קצת, זה Cold-Start . . . מסתכלים רגע על מתי Lambda עולה, אז כש-Lambda מרימה את עצמה, היא צריכה לעשות הרבה קונפיגורציות ו-Setup.ובעצם העלאת Run-time של Python זה ה-Run-time, אולי חוץ מ-Node, הכי מהיר שיש.משמעותית, לצורך העניין, יותר מהיר מאשר לעלות Lambda של Javaוגם כאלה, אגב, יש לנו כמה, מסיבות הסטוריות - ובאמת רואים שה-Run-time של Java, עד שהוא עולה . . . הוא כבד.מצד שני, אם משתמשים ב-Lambda, אחד מהחסרונות - שהוא גם יתרון, במובן מסויים - הוא שה-Lambda Run-time הוא Single-threaded, או לפחות Single-Core - אין שם באמת תמיכה מלאה ב-Multi-threading, שרצים במקביל.אפשר להסתכל על כמה Processes בתוך Lambda, אבל לא Multi-thread - מה שמייתר, לצורך העניין, את הצורך להשתמש ב-asyncio או בכל מני Thread-ים מסובכים ב-Javaוגם הסתכלנו קצת בעבר על Go-lang - שפה כזו מודרנית ומגניבה [חכה לבאמפרס הבא . . . ]אז לכתוב בה Containers זה די מגניב, אבל לכתוב אותה בתוך Lambda זה די מיותר . . . זאת אומרת - אי אפשר להרוויח שם בכלל מכל ה-asyncio שיושב בתוך Go, כל ה-Async Functions(רן) כן . . . טכנית זה אפשרי, רק שאתה לא מרוויח(ינון) בדיוק - זה עדיין Single-threaded אז זה סינכרוני לחלוטין.(רן) איך נראית חווית המפתח? זאת אומרת - מה קורה אם פתאום ה-Production איטי, או פתאום דברים אובדים, או פתאום . . . לא יודע, בקשה מקבלת Time-out או דברים כאלה? איך מדבגים (Debug) תהליך? איך עושים Tracing? . . איך מדבגים פונקציות Lambda שמפוזרות, אני לא יודע כמה . . . כמה יש לכם?(ינון) יש לנו, בפעם האחרונה שספרתי - כמה אלפים טובים של פונקציות.(יונתן) . . . בטח אתה מתחיל להתגעגע ל-Monolith, שיכולת לשים Break-point ולראות בדיוק מה קורה . . .(ינון) . . . בדיוק, זו אכן שחוויה שהיא . . . At first daunting . . . כשמסתכלים על זה בפעם הראשונה, אני זוכר שאני הסתכלתי על זה ואמרתי “אוקיי, מה אני עושה?” . . . אני פותח את CloudWatch ומנסה לחפור בלוגים . . .לא חווייה מאוד מעניינת, לא כיפית כל כך.ובאמת, אחד הדברים ש-Lambda מחייב זה Clear observability - אז יש כלים פנימיים של AWS - לצורך העניין X-Ray, ש-AWS מאוד דוחפיםכלי חביב כזה, שעוזר בעצם לעשות Distributed Tracing.הבעיה העיקרית עם X-Ray זה שצריך לעבוד בשביל לגרום לזה לעבוד . . . כלומר, חלק מהעבודה היא גם להכניס בעצם יכולת של Tracing בפנים . . .(רן) . . . אינסטרומנטציה (Instrumentation)(ינון) . . אינסטרומנטציה שכזאת, בדיוק . . .ואנחנו העדפנו כלי שעושה את אינסטרומנטציה בשבילנוחפרנו קצת מסביב והתלבשנו בסוף על Epsagonלמי שלא מכיר את Epsagon - כלי מעניין מאוד, שבעצם, עם מעט מאוד עבודה, מאפשר להיכנס ולעשות Tracing של כל ה-Lambda-ות שלנו יחד ולחבר אותן ביחד.הוא משתמש בספרייה שנקראת Jaeger כדי לעשות בעצם Distributed Tracing, זו ספריית open-source די מוכרת, פשוט המימוש שלהם די מוצלח.בעצם, זה מאפשר לנו לראות קריאות שמתחילות ב-Lambda אחת ונגמרות ב-Lambda אחרת, בקצה ה-stack, דרך כל ה-Lambda-ות האחרות.גם מבחינת Tracing שלהן, גם מבחינת ה-Payloads שעברו בתוך ה-Lambda-ות - מבחוץ פנימה, דרך ה-SQS-ים השונים, קריאות ל-Database וכן הלאה.בעצם, זה מאפשר גם לראות את הלוגים - וגם לראות Performance: כמה כל קריאה לקחה, בפנים.(רן) אבל מההיכרות שלי עם Jaeger - הוא מצויין כשמדובר על gRPC או HTTP - כל הדברים הסינכרוניים, אבל דברים א-סינכרוניים, למשל המעבר ב-SQS או מעבר ב-Kafka - שם אתה צריך כבר להמציא בעצמך פתרון . . . אז הם עטפו לכם את זה?(ינון) הם עטפו את כל העסק, הם טיפלו בזה מאוד יפה - אפשר לראות ממש את הקריאות ל-SQS ואת המעבר החוצה, את הקריאה החוצה מתוכו.בעצם הכניסו לא מעט מה-Tracing . . . הרחיבו Jaeger לתוך ה-Tracing שלהם - על זה אולי יהיה מעניין לעשות פרק אחר . . .אבל אנחנו כן משתמשים ב-Epsagon ורואים Observability מלא, End-to-End.המקומות היחידים שבהם זה נשבר הם מקומות שבהם לא הוספנו איזה ארבע שורות לתוך ה-Serverless Framework, שבאמצעותו אנחנו עושים Deploy ל-Lambda-ות, ששם בעצם אנחנו לא עושים את ה-Automatic wrapping שלהם - ושם באמת רואים מתי זה נשבר וכמה זה קשה.ובמקומות כאלה שאנחנו מזהים, זה מאוד פשוט להוסיף Tracing אוטומטי שכזה - זה ממש כמה שורות, להוסיף מודול קטן ב-Node וזה הופ! עושה Tracing אוטומטי ובעצם מאפשר לנו Observability מלא ממש של הכל.(רן) אוקיי, אז זה Tracing ב-Production - אבל איך נראית חוויית הפיתוח? אני עכשיו צריך לכתוב איזשהו Service חדש, או פונקציה חדשה - מה, אני פשוט פורש את זה לענן ורואה מה קורה, או שיש איזשהו משהו מקומי?(ינון) . . . That's pretty much itיש כמובן דברים בסיסיים - אם כותבים ב-Python אז Unit Testing זה דבר די סטנדרטי, שאנחנו מן הסתם חייבים לכתוב, זה אפילו סוג-של-תחליף-Complier, בלאית ברירה.יש קצת Linting וכאלה - אבל למי שלא מכיר את Python, ואני מניח שיש מעט מאוד כאלה, יודע שבלי איזה Unit Test אחד או שניים כדי לראות שהקוד באוויר אתה מאוד בקלות פורש איזו שטות . . . אפשר להריץ Unit Testing בשביל לעשות Local Debugging פשוטבדרך כלל, מה שאנחנו עושים זה פשוט פורשים את זה ישירות לענן מהסביבת Dev, מריצים אוסף של קריאות HTTP ל-Lambda-ות כדי לראות שזה עובד, PostPlan עובד שעות נוספות . . .כן יש פה כמה אופציות להריץ לוקאלית, זאת אומרת - גם ל-AWS יש אופציה להריץ סוג-של Local Lambda Server, “להרים את Lambda מקומית”להגיד שזה מאוד נוח? זה לא . . . זה לא להיט, וגילינו שהרבה יותר קל ונוח לנו לפרוש ישירות ל-Dev Environment ולהריץ הכל משם.(רן) זאת אומרת שיש לכם איזשהו עותק של סביבת ה-Production . . . זאת אומרת, להריץ את הפונקציה שלך זה . . השאלה היא האם היא תלויה בפונקציות אחרות? בתורים אחרים? ב-Databases אחרים? שם הדברים יותר מתחילים להסתבך.אז בעצם, את כל זה אתם עושים ישר בענן? לא על תחנה מקומית?(ינון) נכון.אפשר להסתכל על זה בעצם כעל סוג של Sandbox, שמכיל את ה-Lambda-ות שיש לנו בעולם.בעצם, אנחנו פורשים את הקוד ישירות לשם ובודקים אחד מול השני.מן הסתם, כל Lambda היא באחריות של איזשהו צוות, כל אוסף Lambda-ות או כל Service, בעצם - זה לא רק Lambda, אנחנו מסתכלים על אוסף של X [כמה] Lambda-ות כעל Service מסויים, שיש לו איזושהי מטרה.אנחנו בעצם פורשים את השירותים השונים אל תוך הענן ועושים . . . משתמשים ב-Convention כדי להגיע משירות לשירות ולעשות את כל ה-Wiring בין ה-Lambda-ות השונות.(יונתן) במובן מסוים, גם ב-microServices, החל מ-Scale מסויים, אתה בבעיה די דומה . . .זאת אומרת - כשיש לך כבר כמה מאות אתה כבר לא מרים את כולם על ה-Laptop, וגם פה תלוי ב-Cloud שלך, בעצם.(רן) מסכים ב-100% . . .אני חושב שהבעיה, או האתגר, של שירותים מבוססי-דאטה זה לשחזר איזושהי סביבת Production, כלומר - אם אתה רוצה איזשהו Copy של סביבת ה-Production, עם הדאטה של Production, אבל בלי להזיק ל-Production, וגם לא לשלם את העלות של Production.לפעמים, ה-Databases הם ענקיים, ואתה לא באמת רוצה עותק מלא - אז תיקח את ה-Sub-set של הדאטה, שהוא בדיוק מה שאתה צריך אבל לא יותר מזה - וגם לא תזיק ל-Production - זה אתגר לא פשוט לכל מי שמתעסק עם כמויות גדולות של דאטה, בלי קשר ל-Lambda או לא Lambda.כמה זמן אתה ב-Via?(ינון) שלוש שנים . . .(רן) אוקיי . . . כשהגעת, כבר הייתם בעולם ה-Serverless?(ינון) זו בדיוק הייתה ההתחלה, בשלב שבו הסתכלנו על זה בפעם הראשונה.(רן) אני אגיד לך למה אני שואל - אני מנסה לדמיין מפתח ותיק, מפתח מנוסה אחר, שעכשיו נכנס ל-Via. האם אתה מוצא, נגיד כשאתה מסתכל על מפתחים שגוייסו בזמן האחרון, ואני לא מדבר על צעירים שבחיים לא כתבו קוד אלא על כאלה שכבר . . . אתה יודע, “שועלי קרבות” . . .(יונתן) הפילו את ה-Production כבר כמה פעמים . . .(רן) כן . . . האם אתה רואה שהם, אתה יודע - הם מסתכלים על כל עולם ה-Serverless הזה, ועכשיו צריכים לכתוב איזושהי פונקציה חדשה - האם אתה רואה שהם נלחמים ביצר הטבעי שלהם, או שזה פשוט בא להם בטבעיות, והם “משילים מעליהם” איזשהו משקל כבד שהם נשאו עד עכשיו על הכתפיים ופורחים סביבת ה-Serverless?(ינון) אז הייתי אומר שהם די פורחים . . . זאת אומרת, יש תמיד את המעבר המסוחרר הראשון הזה שאומר, כמו שהזכרת קודם: “רגע, אין לי Connection Pool”, “אין לי פה . . . אני צריך להבין רגע איך זה מגיע, אני פורש את זה כבר לענן? מה קרה לי? זה קצת מוקדם?”.אז באמת יש את הכמה ימים האלה של “רגע-רגע, איך אני עושה פה דברים?”אבל באמת זה לוקח ממש מעט זמן.ב-Via, אתה בדרך כלל מתחיל לכתוב קוד תוך פחות משבועייםכלומר - צריך להרים איזושהי סביבה מקומית, צריך לראות שהכל עובד, שהכל מותקן והכל בסדר - ואז טיפה ללמוד את העולם, גם של Via וגם את עולם של Serverless.אבל תוך באמת פחות משבועיים הוא מקבל משימה ואוקיי - פורש בפעם הראשונה ובפעם השנייה ומשם בעצם זה מגיע ל-Production די מהר.(יונתן) יש גם, אני מניח, יתרון שאולי ה-Scope של הקוד שאתה צריך להכיר כדי לעשות שינוי הוא, כנראה, יותר קטן - זאת אומר, הוא כנראה תלוי בעוד הרבה דברים אחרים, אבל כבר מראש צמצמו לך אותו לסט מסויים של פונקציות או של Services . . . (ינון) כן, אז יותר קל, כנראה, לפרק ל-microServices קטנים, כי העלות של להרים microService היא כמעט כלום.לא צריך פה לפרוש איזה Pod חדש או לייצר משהו חדש - זה “אוקיי, מעתיקים את ה-Serverless.yml”, שזה yml פשוט שרק מגדיר את ה-Service עצמו, יש בו ממש-ממש כלום הגדרות.ומשם פורשים Service חדש מאוד-מאוד בקלות, מה שמאפשר לנו בעצם להריץ הרבה מאוד microServicesרק אצלי בקבוצה יש בין 80 ל-100 microServices ו… And Growing . . . (יונתן) יש איזו אופטימיזציה ש-AWS נותנים, נניח שהם מזהים Lambda-ות שקוראות אחת לשנייה בצורה . . . באופן תדיר, ובעצם להוריד את ה-Network ביניהן ושהן תרוצנה In-process?(ינון) רעיון מדליק . . . אבל לא שאני מכיר . . . מה שאנחנו עושים הרבה באמת זה שאם יש לנו הרבה מקומות שבהם אנחנו קוראים לאותו קוד שוב ושוב ושוב, אנחנו פשוט אורזים אותו כ-Packages.כלומר, במקום לארוז אותו כ-Lambda נפרדת, אנחנו אורזים את זה ב-Package כזה, ואז משתמשים בו, ב-Re-use, במקומות שונים.(רן) שזה, ”בשפת Lambda”, זה ספרייה, נכון? זאת אומרת, יש מגבלה על גודל הפונקציה, אז בשביל זה AWS מציעים Packages, שזה כמו Library . . . (ינון) לא . . . ל-Lambda עצמו, Built-in, יש את מה שנקרא Layers . . .עם Layers, ב-AWS בעצם מאפשרים לך להרים, בהגדרות של AWS, ממש “שכבות” כאלה של Lambda, שמאפשרות לפרוש כחלק מה-Lambda, כאשר ה-Lambda עצמה נפרשת לתוך ה-Container.רעיון די מדליק - אנחנו לא משתמשים בו הרבה . . . אנחנו משתמשים ממש ב-Node Packages בשביל לארוז מחדש את ה-Packages אצלנו, מכמה סיבות.ל-Layers היו כל מיני מגבלות טכניות - היה אפשר רק 5 Layers, ואם אתה צריך את השישית אז אתה כבר נתקע.יש עניין ש-Cold start לא מתחיל מחדש את ה-Layer תמיד . . . כך שהשליטה שלך היא לא מספיק חזקה שם.הרגשנו יותר בנוח לעבוד עם Node Packages, עם גרסאות מסודרות, כשכל Lambda תדע מתי היא מתקדמת לגירסא הבאה . . .(רן) רגע, אמרת Node Packages? אנחנו לא ב-Python? . . . (ינון) סליחה . . . Python . . . אתה צודק, 100% . . .(רן) כמעט תפסנו אותך . . . (ינון) כמעט תפסתם אותי . . . אגב - יש לנו Lambda-ות גם ב-Node, כתבנו כמה Lambda-ות ב-JavaScriptיש צוותים שהעדיפו לעבוד ב-JavaScriptלא הרבה . . . זה עובד, אגב, טוב ממש כמו Python, אם כי אני חובב Python יותר מאשר Node, ולכן אצלי בצוות עובדים בעיקר ב-Python.(רן) כן . . . דרך אגב, יונתן אולי נתייחס לשאלה שלך - שאלת האם כשיש פונקציות שקוראות אחת לשנייה באופן תכוף, האם אפשר לעשות כזאת אופטימיזציה, אבל אז, זאת אומרת . . (א) זה רעיון טובאבל כנראה שבמקרה של ינון זה לא יעזור, כי הם עושים את הכל א-סינכרוני ושמים את הכל ב-Queue, אז בכל מקרה צריך לשלוח ל-Queue . . . (יונתן) Wix בדיוק נתנו הרצאה לא מזמן, על ה-Vision שלהם בעולם ה-Serverless, והם נתנו את הדוגמא הזאת.[לפני חודש - Beyond Serverless and DevOps, Aviran Mordo]זאת אומרת - את ההזדמנות לאופטימיזציה הזאתאז שווה ל . . .(רן) אז מה - גם ה-Queue נמצא בתוך ה-Host?(יונתן) אני לא יודע, אני חושב שזה היה . . .צריך לשאול את אבירן, זה היה יותר “תכנונים עתידיים”, כמו שאני הבנתי . . .(רן) הבנתיטוב, ינון, שמע - מרתק . . . אז אנחנו ממש, ככה, לקראת סיום - תן כמה “מילים סוגרות”, אני בטוח שאתם מגייסים . . .(ינון) כמובן . . . אנחנו בהחלט - כמו, כנראה, כל חברה אחרת בארץ - אבל בטח, אצלנו מגייסים.אנחנו מגייסים, אגב, בכל מיני מקומות בארץ - גם בתל אביב, גם בירושליםואנחנו גם די עובדים, כזה, From anywhere - אז אנחנו מאוד נשמח, אם מעניין אתכם.וכן - העולם של Serverless הוא מרתק בעיני, הוא מאוד שינה לי את החשיבה, מרגע שהגעתי ל-Via, ומאפשר לי באמת לעשות כמה דברים מאוד מאוד מהר ובקלות.ושוב - אנחנו באמת, אם נסכם את זה - אנחנו לא דוגמאטיים בעניין.אנחנו מאוד מאוד מאמינים ב-Serverless כאחת מהטכנולוגיות שעוזרות לנו לקדם את המוצראבל, אתה יודע: מה שעובד - עובד . . . If it works, don't break it . . . (רן) טוב, אחלה - אז תודה רבה, היה מעניין, ונתראה.תודה רן, תודה יונתן.כנס רברסים 2021: נפתחה הקריאה להגשות!הקובץ נמצא כאן, האזנה נעימה ותודה רבה לעופר פורר על התמלול
Confluent Cloud isn't just for public access anymore. As the requirement for security across sectors increases, so does the need for virtual private cloud (VPC) connections. It is becoming more common today to come across Apache Kafka® implementations with the latest private link connectivity option. In the past, most Confluent Cloud users were satisfied with public connectivity paths and VPC peering. However, enabling private links on the cloud is increasingly important for security across networks and even the reliability of stream processing. Dan LaMotte, who since this recording became a staff software engineer II, and his team are focused on making secure connections for customers to utilize Confluent Cloud. This is done by allowing two VPCs to connect without sharing their own private IP address space. There's no crossover between them, and it lends itself to entirely secure connection unidirectional connectivity from customer to service provider without sharing IPs. But why do clients still want to peer if they have the option to interface privately? Dan explains that peering has been known as the base architecture for this type of connection. Peering at the core is just point-to-point cloud connections that happen between two VPCs. With global connectivity becoming more commonplace and the rise of globally distributed working teams, networks are often not even based in the same region. Regardless of region, however, organizations must take the level of security into account. Peering and transit gateways with a high level of analogy are the new baseline for these use cases, and this is where Kafka's private links come in handy. Private links now allow team members to connect to Confluent Cloud instantaneously without depending on the internet. You can directly connect all of your multi-cloud options and microservices within your own secure space that is private to the company and to specific IP addresses. Also, the connection must be initiated on the client side for an increased security measure. With the option of private links, you can now also build microservices that use new functionality that wasn't available in the past, such as:Multi-cloud clustersProduct enhancements with IP rangesUnlimited IP space Scalability Load balancingYou no longer need to segment your workflow, thanks to completely secure connections between teams that are otherwise disconnected from one another. EPISODE LINKSSecuring the Cloud with VPC Peering ft. Daniel LaMotteSoftware Engineer, Cloud Networking [Remote – AMER]Software Engineer, Cloud Networking [Remote – USA]eBPF documentationJoin the Confluent CommunityLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)
About ChaddChadd Kenney is the Vice President of Product at Clumio. Chadd has 20 years of experience in technology leadership roles, most recently as Vice President of Products and Solutions for Pure Storage. Prior to that role, he was the Vice President and Chief Technology Officer for the Americas helping to grow the business from zero in revenue to over a billion. Chadd also spent 8 years at EMC in various roles from Field CTO to Principal Engineer. Chadd is a technologist at heart, who loves helping customers understand the true elegance of products through simple analogies, solutions use cases, and a view into the minds of the engineers that created the solution.Links: Clumio: https://clumio.com/ Clumio AWS Marketplace: https://aws.amazon.com/marketplace/pp/prodview-ifixh6lnreang TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by ChaosSearch. As basically everyone knows, trying to do log analytics at scale with an ELK stack is expensive, unstable, time-sucking, demeaning, and just basically all-around horrible. So why are you still doing it—or even thinking about it—when there’s ChaosSearch? ChaosSearch is a fully managed scalable log analysis service that lets you add new workloads in minutes, and easily retain weeks, months, or years of data. With ChaosSearch you store, connect, and analyze and you’re done. The data lives and stays within your S3 buckets, which means no managing servers, no data movement, and you can save up to 80 percent versus running an ELK stack the old-fashioned way. It’s why companies like Equifax, HubSpot, Klarna, Alert Logic, and many more have all turned to ChaosSearch. So if you’re tired of your ELK stacks falling over before it suffers, or of having your log analytics data retention squeezed by the cost, then try ChaosSearch today and tell them I sent you. To learn more, visit chaossearch.io.Corey: This episode is sponsored in part by our friends at Lumigo. If you’ve built anything from serverless, you know that if there’s one thing that can be said universally about these applications, it’s that it turns every outage into a murder mystery. Lumigo helps make sense of all of the various functions that wind up tying together to build applications.It offers one-click distributed tracing so you can effortlessly find and fix issues in your serverless and microservices environment. You’ve created more problems for yourself; make one of them go away. To learn more, visit lumigo.io.Corey: Welcome to Screaming in the Cloud. I’m Corey Quinn. Periodically, I talk an awful lot about backups and that no one actually cares about backups, just restores; usually, they care about restores right after they discover they didn’t have backups of the thing that they really, really, really wish that they did. Today’s promoted guest episode is sponsored by Clumio. And I’m speaking to their VP of product, Chadd Kenney. Chadd, thanks for joining me.Chadd: Thanks for having me. Super excited to be here.Corey: So, let’s start at the very beginning. What is a Clumio? Possibly a product, possibly a service, probably not a breakfast cereal, but again, we try not to judge.Chadd: [laugh]. Awesome. Well, Clumio is a Backup as a Service offering for the enterprise, focused in on the public cloud. And so our mission is, effectively, to help simplify data protection and make it a much, much better experience to the end-user, and provide a bunch of values that they just can’t get today in the public cloud, whether it’s in visibility, or better protection, or better granularity. And we’ve been around for a bit of time, really focused in on helping customers along their journey to the cloud.Corey: Backups are one of those things where people don’t spend a lot of time and energy thinking about them until they are, I guess, befallen by tragedy in some form. Ideally, it’s something minor, but occasionally it’s, “Oh, yeah. I used to work at that company that went under because there was a horrible incident and we didn’t have backups.” And then people go from not caring to being overzealous converts. Based upon my focus on this, you can probably pretty safely guess which side of that [chasm 00:02:04] I fall into. But let’s start, I guess, with positioning; you said that you are backup for the enterprise. What does that mean exactly? Who are your customers?Chadd: We’ve been trying to help customers into their cloud journey. So, if you think about many of our customers are coming from the on-prem data center, they have moved some of their applications, whether they’re lift-and-shift applications, or whether they’ve, kind of, stalled doing net-new development on-prem and doing all net new development in the public cloud. And we’ve been helping them along the way and solving one fundamental challenge, which is, “How do I make sure my data is protected? How do I make sure I have good compliance and visibility to understand, you know, is it working? And how do I be able to restore as fast as possible in the event that I need it?”And you mentioned at the beginning backup is all about restore and we a hundred percent agree. I feel like today, you get this [unintelligible 00:02:51] together a series of solutions, whether it’s a script, or it’s a backup solution that’s moved from on-prem, or it’s a snapshot orchestrator, but no one’s really been able to tackle the problem of, help me provide data protection across all of my accounts, all of my regions, all of my services that I’m using within the cloud. And if you look at it, the enterprise has transitioned dramatically to the cloud and don’t have great solutions to latch on to solve this fundamental problem. And our mission has been exactly that: bring a whole bunch of cool innovation. We’re built natively in the public cloud; we started off on a platform that wasn’t built on a whole bunch of EC2 instances that look like a box that was built on-prem, we built the thing mostly on Lambda functions, very event-driven. All AWS native services. We didn’t build anything proprietary data structure for our environment. And it’s really been able to build a better user experience for our end customers.Corey: I guess there’s an easy question to start with, of why would someone consider working with Clumio instead of AWS Backup, which came out a few months after re:Invent, I want to say 2018, but don’t quote me on that; may have been 2019. But it has the AWS label on the tin, which is always a mark of quality.Chadd: [laugh]. Well, there’s definitely a fair bit to be desired on the AWS Backup front. And if you look at it, what we did is we spent, really, before going into development here, a lot of time with customers to just understand where those pains are. And I’ve nailed it, kind of, to four or five different things that we hear consistently. One is that there’s near zero insights; “I don’t know what’s going on with it. I can’t tell whether I’m compliant or not compliant, or protecting not enough or too much.”They haven’t really provided sufficient security on being able to airgap my data to a point where I feel comfortable that even one of my admins can’t accidentally fat-finger a script and delete, you know, whether the primary copy or secondary copy. Restore times have a lot to be desired. I mean, you’re using snapshots. You can imagine that doesn’t really give you a whole bunch of fine-grained granularity, and the timeframe it takes to get to it—even to find it—is kind of a time-consuming game. And they’re not cheap.The snapshots are five cents per gig per month. And I will say they leave a lot to be desired for that cost basis. And so all of this complexity kind of built-in as a whole has given us an opportunity to provide a very different customer experience. And what the difference between these two solutions are is we’ve been providing a much better visibility just in the core solution. And we’ll be announcing here, on May 27, Clumio Discover which gives customers so much better visibility than what AWS Backup has been able to deliver.And instead of them having to create dashboards and other solutions as a whole, we’re able to give them unique visibility into their environment, whether it’s global visibility, ensuring data is protected, doing cost comparisons, and a whole bunch of others. We allow customers to be able to restore data incredibly faster, at fine-grained granularities, whether it’s at a file level, directory level, instance level, even in RDS we go down to the record level of a particular database with direct query access. And so the experience just as a whole has been so much simpler and easier for the end consumer, that we’ve been able to add a lot of value well beyond what AWS Backup uses. Now, that being said, we still use snapshots for operational recovery at some level, where customers can still use what they do today but what Clumio brings is an enhanced version of that by actually using airgap protection inside of our service for those datasets as well. And so it allows you to almost enhance AWS Backup at some level if you think about it. Because AWS Backups really are just orchestrating the snapshots; we can do that exact same thing, too, but really bring the airgap protection solution on top of that as well.Corey: I’ve talked about this periodically on the show. But one of the last, I guess, big migration projects I did when I was back in my employee days—before starting this place—was a project I’d done a few times, which was migrating an environment from EC2-Classic into a VPC world. Back in the dark times, before VPCs were a thing, EC2-Classic is what people used. And they were not just using EC2 in those environments, they were using RDS in this case. And the way to move an RDS database is to stop everything, take a final snapshot, then restore that snapshot—which is the equivalent of backup—to the new environment.How long does that take? It is non-deterministic. In the fullness of time, it will be complete. That wasn’t necessarily a disaster restoration scenario, it was just a migration, and there were other approaches we theoretically could have taken, but this was the one that we decided to go with based upon a variety of business constraints. And it’s awkward when you’re sitting there, just waiting indefinitely for, it turns out, about 45 minutes in this case, and you think everything’s going well, but there’s really nothing else to do during those moments.And that was, again, a planned maintenance, so it was less nerve-wracking then the site is down and people are screaming. But it’s good to have that expectation brought into it. But it was completely non-transparent; there was no idea what was going on, and in actual disasters, things are never that well planned or clear-cut. And at some level, the idea of using backup restoration as a migration strategy is kind of a strange one, but it’s a good way of testing backups. If you don’t test your backups, you don’t really have them in the first place. At least, that’s always been my philosophy. I’m going to theorize, unless this is your first day in business, that you sort of feel the same way, given your industry.Chadd: Definitely. And I think the interesting parts of this is that you have the validation that backups occurring, which is—you need visibility on that functioning, at some level; like, did it actually happen? And then you need the validation that the data is actually in a state that I can recover—Corey: Task failed successfully.Chadd: [laugh]. Exactly. And then you need validation that you can actually get to the data. So, there’s snapshots which give you this full entire thing, and then you got to go find the thing that you’re looking for within it. I think one of the values that we’ve really taken advantage of here is we use a lot of the APIs within AWS first to get optimization in the way that we access the data.So, as an example—on your EC2 example—we use EBS direct APIs, and we do change block tracking off of that, and we send the data from the customers tenancy into our service directly. And so there’s no egress charges, there’s no additional cost associated to it; it just goes into our service. And the customer pays for what they consume only. But in doing that, they get a whole bunch of new values. Now, you can actually get file-level indexing, I can search globally for files in an instance without having to restore the entire thing, which seems like that would be a relatively obvious thing to get to.But we don’t stop there. You could restore a file, you could go browse the file system, you could restore to an AMI, you could restore to another EC2 instance, you could move it to another account. In RDS, not an easy service to protect, I will say. You know, you get this game of, “I’ve got to restore the entire instance and then go find something to query the thing.” And our solution allows you direct query access, so we can see a schema browser, you can go see all of your databases that are in it, you can see all the tables, the rows in the table, you can do advanced queries to join across tables to go [unintelligible 00:10:00] results.And that experience, I think, is what customers are truly looking forward to be able to provide additional values beyond just the restoration of data. I’ll give you a fun example that a SaaS customer was using. They have a centralized customer database that keeps all of the config information across all of the tenants.Corey: I used to do something very similar with Route 53, and everyone looks at me strangely when I say it, but it worked at the time. There are better approaches now. But yeah, very common pattern.Chadd: And so you get into a world where it’s like, I don’t want to restore this entire thing at that point in time to another instance, and then just pull the three records for that one customer that they screwed up. Instead, it would be great if I could just take those three records from a solution and then just imported into the database. And the funny part of this is that the time it takes to do all these things is one component, the accidentally forgetting to delete all the stuff that I left over from trying to restore the data for weeks at a time that now I pay for in AWS is just this other thing that you don’t ever think about. It’s like, inefficiencies built in with the manual operations that you build into this model to actually get to the datasets. And so we just think there’s a better way to be able to see and understand datasets in AWS.Corey: One of my favorite genres of fiction is reading companies’ DR plans for how they imagine a disaster is going to go down. And it’s always an exercise in hilarity. I was not invited to those meetings anymore after I had the temerity to suggest that maybe if the city is completely uninhabitable and we have to retreat to a DR site, no one cares about this job that much. Or if us-east-one has burned to the ground over in AWS land, that maybe your entire staff is going to go quit to become consultants for 100 times more money by companies that have way bigger problems than you do. And then you’re not invited back.But there’s usually a certain presumed scale of a disaster, where you’re going to swing into action and exercise your DR plan. Okay, great. Maybe the data center is not a smoking crater in the ground; maybe even the database is largely where; what if you lost a particular record or a particular file somewhere? And that’s where it gets sticky, in a lot of cases because people start wondering, “Do I just spend the time and rebuild that file from scratch, kind of? Do I do a full restore of the”—all I have is either nothing or the entire environment. You’re talking about row-level restores, effectively, for RDS, which is kind of awesome and incredible. I don’t think I’ve ever seen someone talking about that before. How does that map as far as, effectively, a choose-your-own-disaster dial?Chadd: [laugh]. There’s a bunch of cool use cases to this. You’ve definitely got disaster recovery; so you’ve got the instance where somebody blew something away and you only need a series of records associated to it; maybe the SQL query was off. You’ve got compliance stuff. Think about this for a quick sec: you’ve got an RDS instance that you’ve been backing up, let’s say you keep it for just even a year.How many versions of that RDS database has AWS gone through in that period of time so that when you go restore that actual snapshot, you’ve got to rev the thing to the current version, which would take you some time [laugh] to get up and running, before you can even query the thing. And imagine if you do that, like, years down the road, if you’re keeping databases out there, and your legal team’s asking for a particular thing for discovery, let’s say. And you’ve got to now go through all of these iterations to try to get it back. The thing we decided to do that was genius on the [unintelligible 00:13:19] team was, we wanted to decouple the infrastructure from the data. So, what we actually do is we don’t have a database engine that’s sitting behind this.We’re exporting the RDS snapshot into a Parquet file, and the Parquet file then gets queried directly from Athena. And that allows us to allow customers to go to any timeframe to be able to pull not-specific database engine data into—whether it’s a restore function, or whether I want to migrate to a new database engine, I can pull that data out and re-import it into some other engine without having to have that infrastructure be coupled so closely to the dataset. And this was, really, kind of a way for customers to be able to leverage those datasets in all sorts of different ways in the future, with being able to query the data directly from our platform.Corey: It’s always fun talking to customers and asking them questions that they look at me as if I’ve grown a second head, such as, “Okay. So, in what disaster scenario are you going to need to restore your production database to a state that was in nine months ago?” And they look at me like I’ve just asked a ridiculous question because, of course, they’re never going to do that. If the database is restored to a copy that backed up more than 15 minutes or so in the past, there are serious problems. That’s why the recovery point objective—or RPO—of what is your data window of loss when you do a restore is so important for these plannings.And that’s great. “Okay then, why do you have six years of snapshots of your database taken on an interval going back all that time, if you’re never going to ever restore to any of them?” “Well, something compliance.” Yeah. There are better stories for that. But people start keeping these things around almost as digital packrats, and then they wind up surprised that their backup bill has skyrocketed. I’m going to go out on a limb presume—because if not, this is going to be a pretty awkward question—that you do not just backup management but also backup aging as far as life cycles go.Chadd: Yeah. So, there’s a couple different ways that are fun for us is we see multiple different tiers within backup. So, you’ve got the operational recovery methodology, which is what people usually use snapshots for. And unfortunately, you pay that at a pretty high premium because it’s high value. You’re going to restore a database that maybe went corrupt, or got somehow updated incorrectly or whatever else, and so you pay a high number for that for, let’s say, a couple days; or maybe it’s just even a couple hours.The unfortunate part is, that’s all you’ve got, really, in AWS to play with. And so, if I need to keep long-term retention, I’m keeping this high-value item now for a long duration. And so what we’ve done is we’ve tried to optimize the datasets as much as possible. So, on EC2 and EBS, we’ll dedupe and compress the datasets, and then store them in S3 on our tenancy. And then there’s a lower cost basis for the customer.They can still use operational recovery, we’ll manage that as part of the policy, but they can also store it in an airgap protected solution so that no one has access to it, and they can restore it to any of the accounts that they have out there.Corey: Oh, separating access is one of those incredibly important things to do, just because, first, if someone has completely fat-fingered something, you want to absolutely constrain the blast radius. But two, there is the theoretical problem of someone doing this maliciously, either through ransomware or through a bad actor—external or internal—or someone who has compromised one of your staff’s credentials. The idea being that people with access to production should never be the people who have access to, in some cases, the audit logs, or the backups themselves in some cases. So, having that gap—an airgap as you call it—is critical.Chadd: Mm-hm. The only way to do this, really, in AWS—and a lot of customers are doing this and then they move to us—is they replicate their snapshots to another account and vault them somewhere else. And while that works, the downside—and it’s not a true airgap, in a sense; it’s just effectively moving the data out of the account that it was created in. But you double the cost, so that sucks because you’re keeping your local copy, and then the secondary copy that sits on the other account. The admins still have access to it, so it’s not like it’s just completely disconnected from the environment. It’s still in the security sphere, so if you’re looking at a ransomware attack, trust me, they’ll find ways to get access to that thing and compromise it. And so you have vulnerabilities that are kind of built into this altogether.Corey: “So-what’s-your-security-approach-to-keeping-those-two-accounts-separated?” “The sheer complexity that it takes to wind up assuming a role in that other account that no one’s going to be able to figure it out because we’ve tried for years and can’t get it to work properly.” Yeah, maybe that’s not plan A.Chadd: Exactly. And I feel like while you can [unintelligible 00:17:33] these things together in various scripts, and solutions, and things, people are looking for solutions, not more complexity to manage. I mean, if you think about this, backup is not usually the thing that is strategic to that company’s mission. It’s something that protects their mission, but not drives their mission. It is our mission and so we help customers with that, but it should be something we can take off their hands and provide as a service versus them trying to build their own backup solution as a whole.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn’t translate well to cloud or multi-cloud environments, and that’s not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Corey: Back when I was an employee if I was being honest, people said, “So, what is the number one thing you’re always sure to do on a disaster recovery plan?” My answer is, “I keep my resume updated.” Because, on some level, you can always quit and go work somewhere else. That is honest, but it’s also not the right answer in many cases. You need to start planning for these things before you need them.No one cares about backups until right after you really needed backups. And keeping that managed is important. There are reasons why architectures around this stuff are the way that they are, but there are significant problems around how a lot of AWS implements these things. I wound up having to use a backup about a month or so ago when some of my crappy code deleted data—imagine that—from a DynamoDB table, and I have point-in-time restores turned on. Cool. So, I just roll it back half an hour and that was great. The problem is, there was about four megabytes of data in that table, and it took an hour to do the restore into a new table and then migrate everything back over, which was a different colossal pain. And I’m sure there are complicated architectural reasons under the hood, but it’s like, that is almost as slow as someone who’s retyped it all by hand, and it’s an incredibly frustrating experience. You also see it with EBS snapshots: you backup an EBS volume with a snapshot—it just copies the data that’s there. Great—every time there’s another snapshot taken, it just changes the delta. And that’s the storage it gets built to. So, what does that actually cost? No one really knows. They recently launched direct APIs for EBS snapshots; you can start at least getting some of that data out of it if you just write a whole bunch of code—preferably in a Lambda function because that’s AWS’s solution for everything—but it’s all plumbing solution where you’re spending all your time building the scaffolding and this tooling. Backups are right up there with monitoring and alerting for the first thing I will absolutely hurl to a third party.Chadd: I a hundred percent agree. It’s—Corey: I know you’re a third-party. You’re, uh, you’re hardly objective on this.Chadd: [laugh].Corey: But again, I don’t partner with anyone. I’m not here to shill for people. You can’t buy my opinion on these things. I’ve been paying third parties to back things up for a very long time because that’s what makes sense.Chadd: The one thing that I think, you know, we hit on at the beginning a little bit was this visibility challenge—and this was one of the big launch around Clumio Discover that’s coming out on May 27th there—is we found out that there was near-zero visibility, right? And so you’re talking about the restore times, which is one key component, but [laugh]—Corey: Yeah, then you restore after four hours and discover you don’t have what you thought you did.Chadd: [laugh]. And so, I would love to see, like, am I backing things up? How much am I paying for all of these things? Can I get to them fast? I mean, the funny thing about the restore that I don’t think people ever talk about—and this is one of the things that I think customers love the most about Clumio—is, when you go to restore something, even that DynamoDB database you talked about earlier, you have to go actually find the snapshot in a long scroll.So first, you had to go to the service, to the account, and scroll through all of the snapshots to find the one that you actually want to restore with—and by the way, maybe that’s not a monster amount for you, but in a lot of companies that could be thousands, tens of thousands of snapshots they’re scrolling through—and they’ve got a guy yelling at them to go restore this as soon as possible, and they’re trying to figure out which one it is; they hunt-and-peck to find it. Wouldn’t it be nice if you just had a nice calendar that showed you, “Here’s where it is, and here’s all the different backups that you have on that point in time.” And then just go ahead and restore it then?Corey: Save me from the world of crappy scripts for things like this that you find on GitHub. And again, no disrespect to the people writing these things, but it’s clear that people are scratching their own itch. That’s the joy of open-source. Yeah, this is the backup script—or whatever it is—that works on the ten instances I have in my environment. That’s great.You roll that out to 600 instances and everything breaks. It winds up hitting rate limits as it tries to iterate through a bunch of things rather than building a queue and working through the rest of it. It’s very clearly aimed at small-scale stuff and built by people who aren’t working in large-scale environments. Conversely, you wind up with sort of the Google problem when you look at solving it for just the giant environments. Great, that you wind up with this overengineered, enormously unwieldy solution. Like, “Oh yeah, the continental saw. We use it to wind up cutting continents in half.” It’s, “I’m trying to cut a sandwich in half here. What’s the problem here?”It becomes a hard problem. The idea of having something that scales and has decent user ergonomics is critically important, especially when you get to something as critical as backups and restores. Because you’re not generally allowed to spend months on end building a backup solution at a company, and when you’re doing a restore, it’s often 3 a.m. and you’re bleary-eyed and panicked, which is not the time to be making difficult decisions; it’s the time to be clicking the button.Chadd: A hundred percent agree. I think the lack of visibility, this being a solution, less a problem I’m trying to solve [laugh] on my own is, I think, one area no one’s really tackled in the industry, especially around data protection. I will say people have done this on-prem at a decent level, but it just doesn’t exist inside the public cloud today. Clumio Discover, as an example, is one thing that we just heard constantly. It was like, “Give me global visibility to see everything in one single pane of glass across all my accounts, ensure all of my data is protected, optimized the way that I’m spending in data protection, identify if I’ve got massive outliers or huge consumers, and then help me restore faster.”And the cool part with Discover is that we’re actually giving this away to customers for free. They can go use this whether they’re using AWS Backup or us, and they can now see all of their environment. And at the same time, they get to experience Clumio as a solution in a way that is vastly different than what they’re experiencing today, and hopefully, they’ll continue to expand with us as we continue to innovate inside of AWS. But it’s a cool value for them to be able to finally get that visibility that they’ve never had before.Corey: Did, you know, that AWS users can have multiple accounts and have resources in those accounts in multiple regions?Chadd: Oh, yeah. Lots of them.Corey: Yeah. Because—the reason that you know that, apparently, is that you don’t work for AWS Backup where, last time I checked, there are still something like eight or nine regions that they are not present in. And you have to wind up configuring this, in many cases, separately, and of course, across multiple accounts, which is a modern best practice: separate things out by account. There we go. But it is absolutely painful to wind up working with.Sure, it’s great for small-scale test accounts where I have everything in a single account and I want to make sure that data doesn’t necessarily go on walkabout. Great. But I can’t scale that in the same way without creating a significant management problem for myself.Chadd: Yeah, just the amount of accounts that we see in enterprises is nuts. And with people managing this at an account level, it’s unbearable. And with no visibility, you’re doing this without really an understanding of whether you’re successfully executing this across all of those accounts at any point in time. And so this is one of the areas that we really want to help enterprises with. It’s, not only make the protection simple but also validate that it’s actually occurring. Because I think the one thing that no one likes to talk about in this is the whole compliance game, right? Like—Corey: Yeah, doing something is next to useless; you got to prove that you’re doing the thing.Chadd: Yeah. I got an auditor who shows up once a quarter and says, “Show me this backup.” And then I got to go fumble to try to figure out where that is. And, “Oh, my God. It’s not there. What do I tell the guy?” Well, wouldn’t it be nice if you had this global compliance report that showed you whether you were compliant, or if it wasn’t—which, you know, maybe it wasn’t for a snapshot that you created—at least would tell you why. [laugh]. Like, an RPO was exceeded on the amount of time it took to take the snapshot. Okay, well, that’s good to know. Now, I can tell the guy something other than just make something up because I have no information.Corey: So, you’d have multiple snapshots in flight simultaneously; always a great plan. Talk to me a little bit about Discover, your new offering? What is it? What led to it?Chadd: I love talking to customers, for one, and we spend a lot of time understanding where the gaps exist in the public cloud. And our job is to help fill those gaps with really cool innovation. And so the first one we heard was, “I cannot see multiple services, regions, accounts in one view. I had to go to each one of the services to understand what’s going on in it versus being able to see all of my assets in one view. I’ve got a lot of fragment reporting. I’ve got no compliance view whatsoever. I can’t tell if I’m over-protecting or under-protecting.”Orphan snapshots are the bane of many people’s existence, where they’ve taken snapshots at some point, deleted an EC2 instance, and they pay monthly for these things. We’ve got an orphan snapshot report. It will show you all of the snapshots that exist out there with no EC2 instance associated to it, and you can go take action on it. And so, what Discover came from is customers saying, “I need help.” And we built a solution to help them.And it gives them actionable insights, globally, across their entire set of accounts, across various different services, and allows them to do a whole bunch of fun stuff, whether it’s actionable and, “Help me delete all my orphan snapshots,” to, “I’ve got a 30-day retention period. Show me every snapshot that’s over 30 days. I’d like to get rid of that one, too.” Or, “How much are my backups costing me in snapshots today?”Corey: Yeah, today, the answer is, “[mumble].”Chadd: [laugh]. And imagine being able to see that with, effectively, a free tool that gives you actionable insights. That’s what Discovery is. And so you pair that with Clumio Protect, which is our backup solution, and you’ve got a really awesome solution to be able to see everything, validate it’s working, and actually go protect it, whether it’s operational recovery, or a true airgap solution, of which it’s really hard to pull off in AWS today.Corey: What problem that’s endemic to the backup space is that from a customer perspective, you are either invisible, or you have failed them. There are remarkably few happy customers talking about their experience with their backup vendor. So, as a counterpoint to that, what do the customers love about you, folks?Chadd: So, first and foremost, customers love the support experience. We are a SaaS offering, and we manage the backups completely for the end-user; there’s no cloud infrastructure the customer has to manage. You know, there’s a lot of these fake SaaS offerings out there where I better deploy a thing and manage it in my tenancy. We’ve created an experience that allows our support organization to help customers proactively support it, and we become an extension to those infrastructure teams, and really help customers to make sure they have great visibility and understanding what’s going on in their environment. The second part is just a completely new customer experience.You’ve got simplicity around the way that I add accounts, I create a policy, I assign a tag, and I’m off and running. There’s no management or hand-holding that you need to do within the system. The system scales to any size environment, and you know, you’re off and running. And if you want to validate anything, you can validate it via compliance reports, audit reports, activity reports. And you can see all of your accounts, data assets, in one single pane of glass, and now with Clumio Discover, you get the ability to be able to see it in one single view and see history, footprint, and all sorts of other fun stuff on top of it. And so it’s a very different user experience than what you see in any other solution that’s out there for data protection today.Corey: Thank you so much for taking the time to speak with me today. If people want to learn more about Clumio and kick the tires for themselves, what should they do?Chadd: So, we are on AWS Marketplace, so you can get us up and running there and test us out. We give you $200 of free credits, so you can not only use our operational recovery, which is, kind of, snapshot management, similar database backup, which is free. You can check out Clumio Discover, which is also free, and see all of your accounts and environments in one single pane of glass with some awesome actionable insights, as we mentioned. And then you can reach out to us directly on clumio.com, where you can see a whole bunch of great content, blog posts, and the like, around our solution and service. And we’re looking forward to hearing from you.Corey: Excellent. And we will, of course, throw links to that in the [show notes 00:29:57]. Thank you so much for taking the time to speak with me today. I appreciate it.Chadd: Well, thank you so much for having me. I had an awesome time. Thank you.Corey: Chadd Kenney, VP of product at Clumio. I’m Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you’ve enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you’ve hated this podcast, please leave a five-star review on your podcast platform of choice along with a very long-winded comment that you accidentally lose because the page refreshes, and you didn’t have a backup.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About Rebecca MarshburnRebecca's interested in the things that interest people—What's important to them? Why? And when did they first discover it to be so? She's also interested in sharing stories, elevating others' experiences, exploring the intersection of physical environments and human behavior, and crafting the perfect pun for every situation. Today, Rebecca is the Head of Content & Community at Common Room. Prior to Common Room, she led the AWS Serverless Heroes program, where she met the singular Jeremy Daly, and guided content and product experiences for fashion magazines, online blogs, AR/VR companies, education companies, and a little travel outfit called Airbnb.Twitter: @beccaodelayLinkedIn: Rebecca MarshburnCompany: www.commonroom.ioPersonal work (all proceeds go to the charity of the buyer's choice): www.letterstomyexlovers.comWatch this episode on YouTube: https://youtu.be/VVEtxgh6GKI This episode sponsored by CBT Nuggets and Lumigo.Transcript:Rebecca: What a day today is! It's not every day you turn 100 times old, and on this day we celebrate Serverless Chats 100th episode with the most special of guests. The gentleman whose voice you usually hear on this end of the microphone, doing the asking, but today he's going to be doing the telling, the one and only, Jeremy Daly, and me. I'm Rebecca Marshburn, and your guest host for Serverless Chats 100th episode, because it's quite difficult to interview yourself. Hey Jeremy!Jeremy: Hey Rebecca, thank you very much for doing this.Rebecca: Oh my gosh. I am super excited to be here, couldn't be more honored. I'll give your listeners, our listeners, today, the special day, a little bit of background about us. Jeremy and I met through the AWS Serverless Heroes program, where I used to be a coordinator for quite some time. We support each other in content, conferences, product requests, road mapping, community-building, and most importantly, I think we've supported each other in spirit, and now I'm the head of content and community at Common Room, and Jeremy's leading Serverless Cloud at Serverless, Inc., so it's even sweeter that we're back together to celebrate this Serverless Chats milestone with you all, the most important, important, important, important part of the podcast equation, the serverless community. So without further ado, let's begin.Jeremy: All right, hit me up with whatever questions you have. I'm here to answer anything.Rebecca: Jeremy, I'm going to ask you a few heavy hitters, so I hope you're ready.Jeremy: I'm ready to go.Rebecca: And the first one's going to ask you to step way, way, way, way, way back into your time machine, so if you've got the proper attire on, let's do it. If we're going to step into that time machine, let's peel the layers, before serverless, before containers, before cloud even, what is the origin story of Jeremy Daly, the man who usually asks the questions.Jeremy: That's tough. I don't think time machines go back that far, but it's funny, when I was in high school, I was involved with music, and plays, and all kinds of things like that. I was a very creative person. I loved creating things, that was one of the biggest sort of things, and whether it was music or whatever and I did a lot of work with video actually, back in the day. I was always volunteering at the local public access station. And when I graduated from high school, I had no idea what I wanted to do. I had used computers at the computer lab at the high school. I mean, this is going back a ways, so it wasn't everyone had their own computer in their house, but I went to college and then, my first, my freshman year in college, I ended up, there's a suite-mate that I had who showed me a website that he built on the university servers.And I saw that and I was immediately like, "Whoa, how do you do that"? Right, just this idea of creating something new and being able to build that out was super exciting to me, so I spent the next couple of weeks figuring out how to do HTML, and this was before, this was like when JavaScript was super, super early and we're talking like 1997, and everything was super early. I was using this, I eventually moved away from using FrontPage and started using this thing called HotDog. It was a software for HTML coding, but I started doing that, and I started building websites, and then after a while, I started figuring out what things like CGI-bins were, and how you could write Perl scripts, and how you could make interactions happen, and how you could capture FormData and serve up different things, and it was a lot of copying and pasting.My major at the time, I think was psychology, because it was like a default thing that I could do. But then I moved into computer science. I did computer science for about a year, and I felt that that was a little bit too narrow for what I was hoping to sort of do. I was starting to become more entrepreneurial. I had started selling websites to people. I had gone to a couple of local businesses and started building websites, so I actually expanded that and ended up doing sort of a major that straddled computer science and management, like business administration. So I ended up graduating with a degree in e-commerce and internet marketing, which is sort of very early, like before any of this stuff seemed to even exist. And then from there, I started a web development company, worked on that for 12 years, and then I ended up selling that off. Did a startup, failed the startup. Then from that startup, went to another startup, worked there for a couple of years, went to another startup, did a lot of consulting in between there, somewhere along the way I found serverless and AWS Cloud, and then now it's sort of led me to advocacy for building things with serverless and now I'm building sort of the, I think what I've been dreaming about building for the last several years in what I'm doing now at Serverless, Inc.Rebecca: Wow. All right. So this love story started in the 90s.Jeremy: The 90s, right.Rebecca: That's an incredible, era and welcome to 2021.Jeremy: Right. It's been a journey.Rebecca: Yeah, truly, that's literally a new millennium. So in a broad way of saying it, you've seen it all. You've started from the very HotDog of the world, to today, which is an incredible name, I'm going to have to look them up later. So then you said serverless came along somewhere in there, but let's go to the middle of your story here, so before Serverless Chats, before its predecessor, which is your weekly Off-by-none newsletter, and before, this is my favorite one, debates around, what the suffix "less" means when appended to server. When did you first hear about Serverless in that moment, or perhaps you don't remember the exact minute, but I do really want to know what struck you about it? What stood out about serverless rather than any of the other types of technologies that you could have been struck by and been having a podcast around?Jeremy: Right. And I think I gave you maybe too much of a surface level of what I've seen, because I talked mostly about software, but if we go back, I mean, hardware was one of those things where hardware, and installing software, and running servers, and doing networking, and all those sort of things, those were part of my early career as well. When I was running my web development company, we started by hosting on some hosting service somewhere, and then we ended up getting a dedicated server, and then we outgrew that, and then we ended up saying, "Well maybe we'll bring stuff in-house". So we did on-prem for quite some time, where we had our own servers in the T1 line, and then we moved to another building that had a T3 line, and if anybody doesn't know what that is, you probably don't need to anymore.But those are the things that we were doing, and then eventually we moved into a co-location facility where we rented space, and we rented electricity, and we rented all the utilities, the bandwidth, and so forth, but we had Blade servers and I was running VMware, and we were doing all this kind of stuff to manage the infrastructure, and then writing software on top of that, so it was a lot of work. I know I posted something on Twitter a few weeks ago, about how, when I was, when we were young, we used to have to carry a server on our back, uphill, both ways, to the data center, in the snow, with no shoes, and that's kind of how it felt, that you were doing a lot of these things.And then 2008, 2009, as I was kind of wrapping up my web development company, we were just in the process of actually saying it's too expensive at the colo. I think we were paying probably between like $5,000 and $7,000 a month between the ... we had leases on some of the servers, you're paying for electricity, you're paying for all these other things, and we were running a fair amount of services in there, so it seemed justifiable. We were making money on it, that wasn't the problem, but it just was a very expensive fixed cost for us, and when the cloud started coming along and I started actually building out the startup that I was working on, we were building all of that in the cloud, and as I was learning more about the cloud and how that works, I'm like, I should just move all this stuff that's in the co-location facility, move that over to the cloud and see what happens.And it took a couple of weeks to get that set up, and now, again, this is early, this is before ELB, this is before RDS, this is before, I mean, this was very, very early cloud. I mean, I think there was S3 and EC2. I think those were the two services that were available, with a few other things. I don't even think there were VPCs yet. But anyways, I moved everything over, took a couple of weeks to get that over, and essentially our bill to host all of our clients' sites and projects went from $5,000 to $7,000 a month, to $750 a month or something like that, and it's funny because had I done that earlier, I may not have sold off my web development company because it could have been much more profitable, so it was just an interesting move there.So we got into the cloud fairly early and started sort of leveraging that, and it was great to see all these things get added and all these specialty services, like RDS, and just taking the responsibility because I literally was installing Microsoft SQL server on an EC2 instance, which is not something that you want to do, you want to use RDS. It's just a much better way to do it, but anyways, so I was working for another startup, this was like startup number 17 or whatever it was I was working for, and we had this incident where we were using ... we had a pretty good setup. I mean, everything was on EC2 instances, but we were using DynamoDB to do some caching layers for certain things. We were using a sharded database, MySQL database, for product information, and so forth.So the system was pretty resilient, it was pretty, it handled all of the load testing we did and things like that, but then we actually got featured on Good Morning America, and they mentioned our app, it was the Power to Mobile app, and so we get mentioned on Good Morning America. I think it was Good Morning America. The Today Show? Good Morning America, I think it was. One of those morning shows, anyways, we got about 10,000 sign-ups in less than a minute, which was amazing, or it was just this huge spike in traffic, which was great. The problem was, is we had this really weak point in our system where we had to basically get a lock on the database in order to get an incremental-ID, and so essentially what happened is the database choked, and then as soon as the database choked, just to create user accounts, other users couldn't sign in and there was all kinds of problems, so we basically lost out on all of this capability.So I spent some time doing a lot of research and trying to figure out how do you scale that? How do you scale something that fast? How do you have that resilience in there? And there's all kinds of ways that we could have done it with traditional hardware, it's not like it wasn't possible to do with a slightly better strategy, but as I was digging around in AWS, I'm looking around at some different things, and we were, I was always in the console cause we were using Dynamo and some of those things, and I came across this thing that said "Lambda," with a little new thing next to it. I'm like, what the heck is this?So I click on that and I start reading about it, and I'm like, this is amazing. We don't have to spin up a server, we don't have to use Chef, or Puppet, or anything like that to spin up these machines. We can basically just say, when X happens, do Y, and it enlightened me, and this was early 2015, so this would have been right after Lambda went GA. Had never heard of Lambda as part of the preview, I mean, I wasn't sort of in that the re:Invent, I don't know, what would you call that? Vortex, maybe, is a good way to describe the event.Rebecca: Vortex sounds about right. That's about how it feels by the end.Jeremy: Right, exactly. So I wasn't really in that, I wasn't in that group yet, I wasn't part of that community, so I hadn't heard about it, and so as I started playing around with it, I immediately saw the value there, because, for me, as someone who again had managed servers, and it had built out really complex networking too. I think some of the things you don't think about when you move to an on-prem where you're managing your stuff, even what the cloud manages for you. I mean, we had firewalls, and we had to do all the firewall rules ourselves, right. I mean, I know you still have to do security groups and things like that in AWS, but just the level of complexity is a lot lower when you're in the cloud, and of course there's so many great services and systems that help you do that now.But just the idea of saying, "wait a minute, so if I have something happen, like a user signup, for example, and I don't have to worry about provisioning all the servers that I need in order to handle that," and again, it wasn't so much the server aspect of it as it was the database aspect of it, but one of the things that was sort of interesting about the idea of Serverless 2 was this asynchronous nature of it, this idea of being more event-driven, and that things don't have to happen immediately necessarily. So that just struck me as something where it seemed like it would reduce a lot, and again, this term has been overused, but the undifferentiated heavy-lifting, we use that term over and over again, but there is not a better term for that, right?Because there were just so many things that you have to do as a developer, as an ops person, somebody who is trying to straddle teams, or just a PM, or whatever you are, so many things that you have to do in order to get an application running, first of all, and then even more you have to do in order to keep it up and running, and then even more, if you start thinking about distributing it, or scaling it, or getting any of those things, disaster recovery. I mean, there's a million things you have to think about, and I saw serverless immediately as this opportunity to say, "Wait a minute, this could reduce a lot of that complexity and manage all of that for you," and then again, literally let you focus on the things that actually matter for your business.Rebecca: Okay. As someone who worked, how should I say this, in metatech, or the technology of technology in the serverless space, when you say that you were starting to build that without ELB even, or RDS, my level of anxiety is like, I really feel like I'm watching a slow horror film. I'm like, "No, no, no, no, no, you didn't, you didn't, you didn't have to do that, did you"?Jeremy: We did.Rebecca: So I applaud you for making it to the end of the film and still being with us.Jeremy: Well, the other thing ...Rebecca: Only one protagonist does that.Jeremy: Well, the other thing that's interesting too, about Serverless, and where it was in 2015, Lambda goes GA, this will give you some anxiety, there was no API gateway. So there was no way to actually trigger a Lambda function from a web request, right. There was no VPC access in Lambda functions, which meant you couldn't connect to a database. The only thing you do is connect via HDP, so you could connect to DynamoDB or things like that, but you could not connect directly to RDS, for example. So if you go back and you look at the timeline of when these things were released, I mean, if just from 2015, I mean, you literally feel like a caveman thinking about what you could do back then again, it's banging two sticks together versus where we are now, and the capabilities that are available to us.Rebecca: Yeah, you're sort of in Plato's cave, right, and you're looking up and you're like, "It's quite dark in here," and Lambda's up there, outside, sowing seeds, being like, "Come on out, it's dark in there". All right, so I imagine you discovering Lambda through the console is not a sentence you hear every day or general console discovery of a new product that will then sort of change the way that you build, and so I'm guessing maybe one of the reasons why you started your Off-by-none newsletter or Serverless Chats, right, is to be like, "How do I help tell others about this without them needing to discover it through the console"? But I'm curious what your why is. Why first the Off-by-none newsletter, which is one of my favorite things to receive every week, thank you for continuing to write such great content, and then why Serverless Chats? Why are we here today? Why are we at number 100? Which I'm so excited about every time I say it.Jeremy: And it's kind of crazy to think about all the people I've gotten a chance to talk to, but so, I think if you go back, I started writing blog posts maybe in 2015, so I haven't been doing it that long, and I certainly wasn't prolific. I wasn't consistent writing a blog post every week or every, two a week, like some people do now, which is kind of crazy. I don't know how that, I mean, it's hard enough writing the newsletter every week, never mind writing original content, but I started writing about Serverless. I think it wasn't until the beginning of 2018, maybe the end of 2017, and there was already a lot of great content out there. I mean, Ben Kehoe was very early into this and a lot of his stuff I read very early.I mean, there's just so many people that were very early in the space, I mean, Paul Johnson, I mean, just so many people, right, and I started reading what they were writing and I was like, "Oh, I've got some ideas too, I've been experimenting with some things, I feel like I've gotten to a point where what I could share could be potentially useful". So I started writing blog posts, and I think one of the earlier blog posts I wrote was, I want to say 2017, maybe it was 2018, early 2018, but was a post about serverless security, and what was great about that post was that actually got me connected with Ory Segal, who had started PureSec, and he and I became friends and that was the other great thing too, is just becoming part of this community was amazing.So many awesome people that I've met, but so I saw all this stuff people were writing and these things people were doing, and I got to maybe August of 2018, and I said to myself, I'm like, "Okay, I don't know if people are interested in what I'm writing". I wasn't writing a lot, but I was writing a little bit, but I wasn't sure people were overly interested in what I was writing, and again, that idea of the imposter syndrome, certainly everything was very early, so I felt a little bit more comfortable. I always felt like, well, maybe nobody knows what they're talking about here, so if I throw something into the fold it won't be too, too bad, but certainly, I was reading other things by other people that I was interested in, and I thought to myself, I'm like, "Okay, if I'm interested in this stuff, other people have to be interested in this stuff," but it wasn't easy to find, right.I mean, there was sort of a serverless Twitter, if you want to use that terminology, where a lot of people tweet about it and so forth, obviously it's gotten very noisy now because of people slapped that term on way too many things, but I don't want to have that discussion, but so I'm reading all this great stuff and I'm like, "I really want to share it," and I'm like, "Well, I guess the best way to do that would just be a newsletter."I had an email list for my own personal site that I had had a couple of hundred people on, and I'm like, "Well, let me just turn it into this thing, and I'll share these stories, and maybe people will find them interesting," and I know this is going to sound a little bit corny, but I have two teenage daughters, so I'm allowed to be sort of this dad-jokey type. I remember when I started writing the first version of this newsletter and I said to myself, I'm like, "I don't want this to be a newsletter." I was toying around with this idea of calling it an un-newsletter. I didn't want it to just be another list of links that you click on, and I know that's interesting to some people, but I felt like there was an opportunity to opine on it, to look at the individual links, and maybe even tell a story as part of all of the links that were shared that week, and I thought that that would be more interesting than just getting a list of links.And I'm sure you've seen over the last 140 issues, or however many we're at now, that there's been changes in the way that we formatted it, and we've tried new things, and things like that, but ultimately, and this goes back to the corny thing, I mean, one of the first things that I wanted to do was, I wanted to basically thank people for writing this stuff. I wanted to basically say, "Look, this is not just about you writing some content". This is big, this is important, and I appreciate it. I appreciate you for writing that content, and I wanted to make it more of a celebration really of the community and the people that were early contributors to that space, and that's one of the reasons why I did the Serverless Star thing.I thought, if somebody writes a really good article some week, and it's just, it really hits me, or somebody else says, "Hey, this person wrote a great article," or whatever. I wanted to sort of celebrate that person and call them out because that's one of the things too is writing blog posts or posting things on social media without a good following, or without the dopamine hit of people liking it, or re-tweeting it, and things like that, it can be a pretty lonely place. I mean, I know I feel that way sometimes when you put something out there, and you think it's important, or you think people might want to see it, and just not enough people see it.It's even worse, I mean, 240 characters, or whatever it is to write a tweet is one thing, or 280 characters, but if you're spending time putting together a tutorial or you put together a really good thought piece, or story, or use case, or something where you feel like this is worth sharing, because it could inspire somebody else, or it could help somebody else, could get them past a bump, it could make them think about something a different way, or get them over a hump, or whatever. I mean, that's just the kind of thing where I think people need that encouragement, and I think people deserve that encouragement for the work that they're doing, and that's what I wanted to do with Off-by-none, is make sure that I got that out there, and to just try to amplify those voices the best that I could. The other thing where it's sort of progressed, and I guess maybe I'm getting ahead of myself, but the other place where it's progressed and I thought was really interesting, was, finding people ...There's the heavy hitters in the serverless space, right? The ones we all know, and you can name them all, and they are great, and they produce amazing content, and they do amazing things, but they have pretty good engines to get their content out, right? I mean, some people who write for the AWS blog, they're on the AWS blog, right, so they're doing pretty well in terms of getting their things out there, right, and they've got pretty good engines.There's some good dev advocates too, that just have good Twitter followings and things like that. Then there's that guy who writes the story. I don't know, he's in India or he's in Poland or something like that. He writes this really good tutorial on how to do this odd edge-case for serverless. And you go and you look at their Medium and they've got two followers on Medium, five followers on Twitter or something like that. And that to me, just seems unfair, right? I mean, they've written a really good piece and it's worth sharing right? And it needs to get out there. I don't have a huge audience. I know that. I mean I've got a good following on Twitter. I feel like a lot of my Twitter followers, we can have good conversations, which is what you want on Twitter.The newsletter has continued to grow. We've got a good listener base for this show here. So, I don't have a huge audience, but if I can share that audience with other people and get other people to the forefront, then that's important to me. And I love finding those people and those ideas that other people might not see because they're not looking for them. So, if I can be part of that and help share that, that to me, it's not only a responsibility, it's just it's incredibly rewarding. So ...Rebecca: Yeah, I have to ... I mean, it is your 100th episode, so hopefully I can give you some kudos, but if celebrating others' work is one of your main tenets, you nail it every time. So ...Jeremy: I appreciate that.Rebecca: Just wanted you to know that. So, that's sort of the Genesis of course, of both of these, right?Jeremy: Right.Rebecca: That underpins the foundational how to share both works or how to share others' work through different channels. I'm wondering how it transformed, there's this newsletter and then of course it also has this other component, which is Serverless Chats. And that moment when you were like, "All right, this newsletter, this narrative that I'm telling behind serverless, highlighting all of these different authors from all these different global spaces, I'm going to start ... You know what else I want to do? I don't have enough to do, I'm going to start a podcast." How did we get here?Jeremy: Well, so the funny thing is now that I think about it, I think it just goes back to this tenet of fairness, this idea where I was fortunate, and I was able to go down to New York City and go to Serverless Days New York in late 2018. I was able to ... Tom McLaughlin actually got me connected with a bunch of great people in Boston. I live just outside of Boston. We got connected with a bunch of great people. And we started the Serverless Days Boston for 2019. And we were on that committee. I started traveling and I was going to conferences and I was meeting people. I went to re:Invent in 2018, which I know a lot of people just don't have the opportunity to do. And the interesting thing was, is that I was pulling aside brilliant people either in the hallway at a conference or more likely for a very long, deep discussion that we would have about something at a pub in Northern Ireland or something like that, right?I mean, these were opportunities that I was getting that I was privileged enough to get. And I'm like, these are amazing conversations. Just things that, for me, I know changed the way I think. And one of the biggest things that I try to do is evolve my thinking. What I thought a year ago is probably not what I think now. Maybe call it flip-flopping, whatever you want to call it. But I think that evolving your thinking is the most progressive thing that you can do and starting to understand as you gain new perspectives. And I was talking to people that I never would have talked to if I was just sitting here in my home office or at the time, I mean, I was at another office, but still, I wasn't getting that context. I wasn't getting that experience. And I wasn't getting those stories that literally changed my mind and made me think about things differently.And so, here I was in this privileged position, being able to talk to these amazing people and in some cases funny, because they're celebrities in their own right, right? I mean, these are the people where other people think of them and it's almost like they're a celebrity. And these people, I think they deserve fame. Don't get me wrong. But like as someone who has been on that side of it as well, it's ... I don't know, it's weird. It's weird to have fans in a sense. I love, again, you can be my friend, you don't have to be my fan. But that's how I felt about ...Rebecca: I'm a fan of my friends.Jeremy: So, a fan and my friend. So, having talked to these other people and having these really deep conversations on serverless and go beyond serverless to me. Actually I had quite a few conversations with some people that have nothing to do with serverless. Actually, Peter Sbarski and I, every time we get together, we only talk about the value of going to college for some reason. I don't know why. It has usually nothing to do with serverless. So, I'm having these great conversations with these people and I'm like, "Wow, I wish I could share these. I wish other people could have this experience," because I can tell you right now, there's people who can't travel, especially a lot of people outside of the United States. They ... it's hard to travel to the United States sometimes.So, these conversations are going on and I thought to myself, I'm like, "Wouldn't it be great if we could just have these conversations and let other people hear them, hopefully without bar glasses clinking in the background. And so I said, "You know what? Let's just try it. Let's see what happens. I'll do a couple of episodes. If it works, it works. If it doesn't, it doesn't. If people are interested, they're interested." But that was the genesis of that, I mean, it just goes back to this idea where I felt a little selfish having conversations and not being able to share them with other people.Rebecca: It's the very Jeremy Daly tenet slogan, right? You got to share it. You got to share it ...Jeremy: Got to share it, right?Rebecca: The more he shares it, it celebrates it. I love that. I think you do ... Yeah, you do a great job giving a megaphone so that more people can hear. So, in case you need a reminder, actually, I'll ask you, I know what the answer is to this, but do you know the answer? What was your very first episode of Serverless Chats? What was the name, and how long did it last?Jeremy: What was the name?Rebecca: Oh yeah. Oh yeah.Jeremy: Oh, well I know ... Oh, I remember now. Well, I know it was Alex DeBrie. I absolutely know that it was Alex DeBrie because ...Rebecca: Correct on that.Jeremy: If nobody, if you do not know Alex DeBrie, not only is he an AWS data hero, as well as the author of The DynamoDB Book, but he's also like the most likable person on the planet too. It is really hard if you've ever met Alex, that you wouldn't remember him. Alex and I started communicating, again, we met through the serverless space. I think actually he was working at Serverless Inc. at the time when we first met. And I think I met him in person, finally met him in person at re:Invent 2018. But he and I have collaborated on a number of things and so forth. So, let me think what the name of it was. "Serverless Purity Versus Practicality" or something like that. Is that close?Rebecca: That's exactly what it was.Jeremy: Oh, all right. I nailed it. Nailed it. Yes!Rebecca: Wow. Well, it's a great title. And I think ...Jeremy: Don't ask me what episode number 27 was though, because no way I could tell you that.Rebecca: And just for fun, it was 34 minutes long and you released it on June 17th, 2019. So, you've come a long way in a year and a half. That's some kind of wildness. So it makes sense, like, "THE," capital, all caps, bold, italic, author for databases, Alex DeBrie. Makes sense why you selected him as your guest. I'm wondering if you remember any of the ... What do you remember most about that episode? What was it like planning it? What was the reception of it? Anything funny happened recording it or releasing it?Jeremy: Yeah, well, I mean, so the funny thing is that I was incredibly nervous. I still am, actually a lot of guests that I have, I'm still incredibly nervous when I'm about to do the actual interview. And I think it's partially because I want to do justice to the content that they're presenting and to their expertise. And I feel like there's a responsibility to them, but I also feel like the guests that I've had on, some of them are just so smart, and the things they say, just I'm in awe of some of the things that come out of these people's mouths. And I'm like, "This is amazing and people need to hear this." And so, I feel like we've had really good episodes and we've had some okay episodes, but I feel like I want to try to keep that level up so that they owe that to my listener to make sure that there is high quality episode that, high quality information that they're going to get out of that.But going back to the planning of the initial episodes, so I actually had six episodes recorded before I even released the first one. And the reason why I did that was because I said, "All right, there's no way that I can record an episode and then wait a week and then record another episode and wait a week." And I thought batching them would be a good idea. And so, very early on, I had Alex and I had Nitzan Shapira and I had Ran Ribenzaft and I had Marcia Villalba and I had Erik Peterson from Cloud Zero. And so, I had a whole bunch of these episodes and I reached out to I think, eight or nine people. And I said, "I'm doing this thing, would you be interested in it?" Whatever, and we did planning sessions, still a thing that I do today, it's still part of the process.So, whenever I have a guest on, if you are listening to an episode and you're like, "Wow, how did they just like keep the thing going ..." It's not scripted. I don't want people to think it's scripted, but it is, we do review the outline and we go through some talking points to make sure that again, the high-quality episode and that the guest says all the things that the guest wants to say. A lot of it is spontaneous, right? I mean, the language is spontaneous, but we do, we do try to plan these episodes ahead of time so that we make sure that again, we get the content out and we talk about all the things we want to talk about. But with Alex, it was funny.He was actually the first of the six episodes that I recorded, though. And I wasn't sure who I was going to do first, but I hadn't quite picked it yet, but I recorded with Alex first. And it was an easy, easy conversation. And the reason why it was an easy conversation was because we had talked a number of times, right? It was that in a pub, talking or whatever, and having that friendly chat. So, that was a pretty easy conversation. And I remember the first several conversations I had, I knew Nitzan very well. I knew Ran very well. I knew Erik very well. Erik helped plan Serverless Days Boston with me. And I had known Marcia very well. Marcia actually had interviewed me when we were in Vegas for re:Invent 2018.So, those were very comfortable conversations. And so, it actually was a lot easier to do, which probably gave me a false sense of security. I was like, "Wow, this was ... These came out pretty well." The conversations worked pretty well. And also it was super easy because I was just doing audio. And once you add the video component into it, it gets a little bit more complex. But yeah, I mean, I don't know if there's anything funny that happened during it, other than the fact that I mean, I was incredibly nervous when we recorded those, because I just didn't know what to expect. If anybody wants to know, "Hey, how do you just jump right into podcasting?" I didn't. I actually was planning on how can I record my voice? How can I get comfortable behind a microphone? And so, one of the things that I did was I started creating audio versions of my blog posts and posting them on SoundCloud.So, I did that for a couple of ... I'm sorry, a couple of blog posts that I did. And that just helped make me feel a bit more comfortable about being able to record and getting a little bit more comfortable, even though I still can't stand the sound of my own voice, but hopefully that doesn't bother other people.Rebecca: That is an amazing ... I think we so often talk about ideas around you know where you want to go and you have this vision and that's your goal. And it's a constant reminder to be like, "How do I make incremental steps to actually get to that goal?" And I love that as a life hack, like, "Hey, start with something you already know that you wrote and feel comfortable in and say it out loud and say it out loud again and say it out loud again." And you may never love your voice, but you will at least feel comfortable saying things out loud on a podcast.Jeremy: Right, right, right. I'm still working on the, "Ums" and, "Ahs." I still do that. And I don't edit those out. That's another thing too, actually, that one of the things I do want people to know about this podcast is these are authentic conversations, right? I am probably like ... I feel like I'm, I mean, the most authentic person that I know. I just want authenticity. I want that out of the guests. The idea of putting together an outline is just so that we can put together a high quality episode, but everything is authentic. And that's what I want out of people. I just want that authenticity, and one of the things that I felt kept that, was leaving in, "Ums" and, "Ahs," you know what I mean? It's just, it's one of those things where I know a lot of podcasts will edit those out and it sounds really polished and finished.Again, I mean, I figured if we can get the clinking glasses out from the background of a bar and just at least have the conversation that that's what I'm trying to achieve. And we do very little editing. We do cut things out here and there, especially if somebody makes a mistake or they want to start something over again, we will cut that out because we want, again, high quality episodes. But yeah, but authenticity is deeply important to me.Rebecca: Yeah, I think it probably certainly helps that neither of us are robots because robots wouldn't say, "Um" so many times. As I say, "Uh." So, let's talk about, Alex DeBrie was your first guest, but there's been a hundred episodes, right? So, from, I might say the best guest, as a hundredth episode guests, which is our very own Jeremy Daly, but let's go back to ...Jeremy: I appreciate that.Rebecca: Your guests, one to 99. And I mean, you've chatted with some of the most thoughtful, talented, Serverless builders and architects in the industry, and across coincident spaces like ML and Voice Technology, Chaos Engineering, databases. So, you started with Alex DeBrie and databases, and then I'm going to list off some names here, but there's so many more, right? But there's the Gunnar Grosches, and the Alexandria Abbasses, and Ajay Nair, and Angela Timofte, James Beswick, Chris Munns, Forrest Brazeal, Aleksandar Simovic, and Slobodan Stojanovic. Like there are just so many more. And I'm wondering if across those hundred conversations, or 99 plus your own today, if you had to distill those into two or three lessons, what have you learned that sticks with you? If there are emerging patterns or themes across these very divergent and convergent thinkers in the serverless space?Jeremy: Oh, that's a tough question.Rebecca: You're welcome.Jeremy: So, yeah, put me on the spot here. So, yeah, I mean, I think one of the things that I've, I've seen, no matter what it's been, whether it's ML or it's Chaos Engineering, or it's any of those other observability and things like that. I think the common thing that threads all of it is trying to solve problems and make people's lives easier. That every one of those solutions is like, and we always talk about abstractions and, and higher-level abstractions, and we no longer have to write ones and zeros on punch cards or whatever. We can write languages that either compile or interpret it or whatever. And then the cloud comes along and there's things we don't have to do anymore, that just get taken care of for us.And you keep building these higher level of abstractions. And I think that's a lot of what ... You've got this underlying concept of letting somebody else handle things for you. And then you've got this whole group of people that are coming at it from a number of different angles and saying, "Well, how will that apply to my use case?" And I think a lot of those, a lot of those things are very, very specific. I think things like the voice technology where it's like the fact that serverless powers voice technology is only interesting in the fact as to say that, the voice technology is probably the more interesting part, the fact that serverless powers it is just the fact that it's a really simple vehicle to do that. And basically removes this whole idea of saying I'm building voice technology, or I'm building a voice app, why do I need to worry about setting up servers and all this kind of stuff?It just takes that away. It takes that out of the equation. And I think that's the perfect idea of saying, "How can you take your use case, fit serverless in there and apply it in a way that gets rid of all that extra overhead that you shouldn't have to worry about." And the same thing is true of machine learning. And I mean, and SageMaker, and things like that. Yeah, you're still running instances of it, or you still have to do some of these things, but now there's like SageMaker endpoints and some other things that are happening. So, it's moving in that direction as well. But then you have those really high level services like NLU API from IBM, which is the Watson Natural Language Processing.You've got AP recognition, you've got the vision API, you've got sentiment analysis through all these different things. So, you've got a lot of different services that are very specific to machine learning and solving a discrete problem there. But then basically relying on serverless or at least presenting it in a way that's serverless, where you don't have to worry about it, right? You don't have to run all of these Jupiter notebooks and things like that, to do machine learning for a lot of cases. This is one of the things I talk about with Alexandra Abbas, was that these higher level APIs are just taking a lot of that responsibility or a lot of that heavy lifting off of your plate and allowing you to really come down and focus on the things that you're doing.So, going back to that, I do think that serverless, that the common theme that I see is that this idea of worrying about servers and worrying about patching things and worrying about networking, all that stuff. For so many people now, that's just not even a concern. They didn't even think about it. And that's amazing to think of, compute ... Or data, or networking as a utility that is now just available to us, right? And I mean, again, going back to my roots, taking it for granted is something that I think a lot of people do, but I think that's also maybe a good thing, right? Just don't think about it. I mean, there are people who, they're still going to be engineers and people who are sitting in the data center somewhere and racking servers and doing it, that's going to be forever, right?But for the things that you're trying to build, that's unimportant to you. That is the furthest from your concern. You want to focus on the problem that you're trying to solve. And so I think that, that's a lot of what I've seen from talking to people is that they are literally trying to figure out, "Okay, how do I take what I'm doing, my use case, my problem, how do I take that to the next level, by being able to spend my cycles thinking about that as opposed to how I'm going to serve it up to people?"Rebecca: Yeah, I think it's the mantra, right, of simplify, simplify, simplify, or maybe even to credit Bruce Lee, be like water. You're like, "How do I be like water in this instance?" Well, it's not to be setting up servers, it's to be doing what I like to be doing. So, you've interviewed these incredible folks. Is there anyone left on your list? I'm sure there ... I mean, I know that you have a large list. Is there a few key folks where you're like, "If this is the moment I'm going to ask them, I'm going to say on the hundredth episode, 'Dear so-and-so, I would love to interview you for Serverless Chats.'" Who are you asking?Jeremy: So, this is something that, again, we have a stretch list of guests that we attempt to reach out to every once in a while just to say, "Hey, if we get them, we get them." But so, I have a long list of people that I would absolutely love to talk to. I think number one on my list is certainly Werner Vogels. I mean, I would love to talk to Dr. Vogels about a number of things, and maybe even beyond serverless, I'm just really interested. More so from a curiosity standpoint of like, "Just how do you keep that in your head?" That vision of where it's going. And I'd love to drill down more into the vision because I do feel like there's a marketing aspect of it, that's pushing on him of like, "Here's what we have to focus on because of market adoption and so forth. And even though the technology, you want to move into a certain way," I'd be really interesting to talk to him about that.And I'd love to talk to him more too about developer experience and so forth, because one of the things that I love about AWS is that it gives you so many primitives, but at the same time, the thing I hate about AWS is it gives you so many primitives. So, you have to think about 800 services, I know it's not that many, but like, what is it? 200 services, something like that, that all need to kind of connect together. And I love that there's that diversity in those capabilities, it's just from a developer standpoint, it's really hard to choose which ones you're supposed to use, especially when several services overlap. So, I'm just curious. I mean, I'd love to talk to him about that and see what the vision is in terms of, is that the idea, just to be a salad bar, to be the Golden Corral of cloud services, I guess, right?Where you can choose whatever you want and probably take too much and then not use a lot of it. But I don't know if that's part of the strategy, but I think there's some interesting questions, could dig in there. Another person from AWS that I actually want to talk to, and I haven't reached out to her yet just because, I don't know, I just haven't reached out to her yet, but is Brigid Johnson. She is like an IAM expert. And I saw her speak at re:Inforce 2019, it must have been 2019 in Boston. And it was like she was speaking a different language, but she knew IAM so well, and I am not a fan of IAM. I mean, I'm a fan of it in the sense that it's necessary and it's great, but I can't wrap my head around so many different things about it. It's such a ...It's an ongoing learning process and when it comes to things like being able to use tags to elevate permissions. Just crazy things like that. Anyways, I would love to have a conversation with her because I'd really like to dig down into sort of, what is the essence of IAM? What are the things that you really have to think about with least permission? Especially applying it to serverless services and so forth. And maybe have her help me figure out how to do some of the cross role IAM things that I'm trying to do. Certainly would love to speak to Jeff Barr. I did meet Jeff briefly. We talked for a minute, but I would love to chat with him.I think he sets a shining example of what a developer advocate is. Just the way that ... First of all, he's probably the only person alive who knows every service at AWS and has actually tried it because he writes all those blog posts about it. So that would just be great to pick his brain on that stuff. Also, Adrian Cockcroft would be another great person to talk to. Just this idea of what he's done with microservices and thinking about the role, his role with Netflix and some of those other things and how all that kind of came together, I think would be a really interesting conversation. I know I've seen this in so many of his presentations where he's talked about the objections, what were the objections of Lambda and how have you solved those objections? And here's the things that we've done.And again, the methodology of that would be really interesting to know. There's a couple of other people too. Oh, Sam Newman who wrote Building Microservices, that was my Bible for quite some time. I had it on my iPad and had a whole bunch of bookmarks and things like that. And if anybody wants to know, one of my most popular posts that I've ever written was the ... I think it was ... What is it? 16, 17 architectural patterns for serverless or serverless microservice patterns on AWS. Can't even remember the name of my own posts. But that post was very, very popular. And that even was ... I know Matt Coulter who did the CDK. He's done the whole CDK ... What the heck was that? The CDKpatterns.com. That was one of the things where he said that that was instrumental for him in seeing those patterns and being able to use those patterns and so forth.If anybody wants to know, a lot of those patterns and those ideas and those ... The sort of the confidence that I had with presenting those patterns, a lot of that came from Sam Newman's work in his Building Microservices book. So again, credit where credit is due. And I think that that would be a really fascinating conversation. And then Simon Wardley, I would love to talk to. I'd actually love to ... I actually talked to ... I met Lin Clark in Vegas as well. She was instrumental with the WebAssembly stuff, and I'd love to talk to her. Merritt Baer. There's just so many people. I'm probably just naming too many people now. But there are a lot of people that I would love to have a chat with and just pick their brain.And also, one of the things that I've been thinking about a lot on the show as well, is the term "serverless." Good or bad for some people. Some of the conversations we have go outside of serverless a little bit, right? There's sort of peripheral to it. I think that a lot of things are peripheral to serverless now. And there are a lot of conversations to be had. People who were building with serverless. Actually real-world examples.One of the things I love hearing was Yan Cui's "Real World Serverless" podcast where he actually talks to people who are building serverless things and building them in their organizations. That is super interesting to me. And I would actually love to have some of those conversations here as well. So if anyone's listening and you have a really interesting story to tell about serverless or something peripheral to serverless please reach out and send me a message and I'd be happy to talk to you.Rebecca: Well, good news is, it sounds like A, we have at least ... You've got at least another a hundred episodes planned out already.Jeremy: Most likely. Yeah.Rebecca: And B, what a testament to Sam Newman. That's pretty great when your work is referred to as the Bible by someone. As far as in terms of a tome, a treasure trove of perhaps learnings or parables or teachings. I ... And wow, what a list of other folks, especially AWS power ... Actually, not AWS powerhouses. Powerhouses who happened to work at AWS. And I think have paved the way for a ton of ways of thinking and even communicating. Right? So I think Jeff Barr, as far as setting the bar, raising the bar if you will. For how to teach others and not be so high-level, or high-level enough where you can follow along with him, right? Not so high-level where it feels like you can't achieve what he's showing other people how to do.Jeremy: Right. And I just want to comment on the Jeff Barr thing. Yeah.Rebecca: Of course.Jeremy: Because again, I actually ... That's my point. That's one of the reasons why I love what he does and he's so perfect for that position because he's relatable and he presents things in a way that isn't like, "Oh, well, yeah, of course, this is how you do this." I mean, it's not that way. It's always presented in a way to make it accessible. And even for services that I'm not interested in, that I know that I probably will never use, I generally will read Jeff's post because I feel it gives me a good overview, right?Rebecca: Right.Jeremy: It just gives me a good overview to understand whether or not that service is even worth looking at. And that's certainly something I don't get from reading the documentation.Rebecca: Right. He's inviting you to come with him and understanding this, which is so neat. So I think ... I bet we should ... I know that we can find all these twitter handles for these folks and put them in the show notes. And I'm especially ... I'm just going to say here that Werner Vogels's twitter handle is @Werner. So maybe for your hundredth, all the listeners, everyone listening to this, we can say, "Hey, @Werner, I heard that you're the number one guest that Jeremy Daly would like to interview." And I think if we get enough folks saying that to @Werner ... Did I say that @Werner, just @Werner?Jeremy: I think you did.Rebecca: Anyone if you can hear it.Jeremy: Now listen, he did retweet my serverless musical that I did. So ...Rebecca: That's right.Jeremy: I'm sort of on his radar maybe.Rebecca: Yeah. And honestly, he loves serverless, especially with the number of customers and the types of customers and ... that are doing incredible things with it. So I think we've got a chance, Jeremy. I really do. That's what I'm trying to say.Jeremy: That's good to know. You're welcome anytime. He's welcome anytime.Rebecca: Do we say that @Werner, you are welcome anytime. Right. So let's go back to the genesis, not necessarily the genesis of the concept, right? But the genesis of the technology that spurred all of these other technologies, which is AWS Lambda. And so what ... I don't think we'd be having these conversations, right, if AWS Lambda was not released in late 2014, and then when GA I believe in 2015.Jeremy: Right.Rebecca: And so subsequently the serverless paradigm was thrust into the spotlight. And that seems like eons ago, but also three minutes ago.Jeremy: Right.Rebecca: And so I'm wondering ... Let's talk about its evolution a bit and a bit of how if you've been following it for this long and building it for this long, you've covered topics from serverless CI/CD pipelines, observability. We already talked about how it's impacted voice technologies or how it's made it easy. You can build voice technology without having to care about what that technology is running on.Jeremy: Right.Rebecca: You've even talked about things like the future and climate change and how it relates to serverless. So some of those sort of related conversations that you were just talking about wanting to have or having had with previous guests. So as a host who thinks about these topics every day, I'm wondering if there's a topic that serverless hasn't touched yet or one that you hope it will soon. Those types of themes, those threads that you want to pull in the next 100 episodes.Jeremy: That's another tough question. Wow. You got good questions.Rebecca: That's what I said. Heavy hitters. I told you I'd be bringing it.Jeremy: All right. Well, I appreciate that. So that's actually a really good question. I think the evolution of serverless has seen its ups and downs. I think one of the nice things is you look at something like serverless that was so constrained when it first started. And it still has constraints, which are good. But it ... Those constraints get lifted. We just talked about Adrian's talks about how it's like, "Well, I can't do this, or I can't do that." And then like, "Okay, we'll add some feature that you can do that and you can do that." And I think that for the most part, and I won't call it anything specific, but I think for the most part that the evolution of serverless and the evolution of Lambda and what it can do has been thoughtful. And by that I mean that it was sort of like, how do we evolve this into a way that doesn't create too much complexity and still sort of holds true to the serverless ethos of sort of being fairly easy or just writing code.And then, but still evolve it to open up these other use cases and edge cases. And I think that for the most part, that it has held true to that, that it has been mostly, I guess, a smooth ride. There are several examples though, where it didn't. And I said I wasn't going to call anything out, but I'm going to call this out. I think RDS proxy wasn't great. I think it works really well, but I don't think that's the solution to the problem. And it's a band-aid. And it works really well, and congrats to the engineers who did it. I think there's a story about how two different teams were trying to build it at the same time actually. But either way, I look at that and I say, "That's a good solution to the problem, but it's not the solution to the problem."And so I think serverless has stumbled in a number of ways to do that. I also feel EFS integration is super helpful, but I'm not sure that's the ultimate goal to share ... The best way to share state. But regardless, there are a whole bunch of things that we still need to do with serverless. And a whole bunch of things that we still need to add and we need to build, and we need to figure out better ways to do maybe. But I think in terms of something that doesn't get talked about a lot, is the developer experience of serverless. And that is, again I'm not trying to pitch anything here. But that's literally what I'm trying to work on right now in my current role, is just that that developer experience of serverless, even though there was this thoughtful approach to adding things, to try to check those things off the list, to say that it can't do this, so we're going to make it be able to do that by adding X, Y, and Z.As amazing as that has been, that has added layers and layers of complexity. And I'll go back way, way back to 1997 in my dorm room. CGI-bins, if people are not familiar with those, essentially just running on a Linux server, it was a way that it would essentially run a Perl script or other types of scripts. And it was essentially like you're running PHP or you're running Node, or you're running Ruby or whatever it was. So it would run a programming language for you, run a script and then serve that information back. And of course, you had to actually know ins and outs, inputs and outputs. It was more complex than it is now.But anyways, the point is that back then though, once you had the script written. All you had to do is ... There's a thing called FTP, which I'm sure some people don't even know what that is anymore. File transfer protocol, where you would basically say, take this file from my local machine and put it on this server, which is a remote machine. And you would do that. And the second you did that, magically it was updated and you had this thing happening. And I remember there were a lot of jokes way back in the early, probably 2017, 2018, that serverless was like the new CGI-bin or something like that. But more as a criticism of it, right? Or it's just CGI-bins reborn, whatever. And I actually liked that comparison. I felt, you know what? I remember the days where I just wrote code and I just put it to some other server where somebody was dealing with it, and I didn't even have to think about that stuff.We're a long way from that now. But that's how serverless felt to me, one of the first times that I started interacting with it. And I felt there was something there, that was something special about it. And I also felt the constraints of serverless, especially the idea of not having state. People rely on things because they're there. But when you don't have something and you're forced to think differently and to make a change or find a way to work around it. Sometimes workarounds, turn into best practices. And that's one of the things that I saw with serverless. Where people were figuring out pretty quickly, how to build applications without state. And then I think the problem is that you had a lot of people who came along, who were maybe big customers of AWS. I don't know.I'm not going to say that you might be influenced by large customers. I know lots of places are. That said, "We need this." And maybe your ... The will gets bent, right. Because you just... you can only fight gravity for so long. And so those are the kinds of things where I feel some of the stuff has been patchwork and those patchwork things haven't ruined serverless. It's still amazing. It's still awesome what you can do within the course. We're still really just focusing on fast here, with everything else that's built. With all the APIs and so forth and everything else that's serverless in the full-service ecosystem. There's still a lot of amazing things there. But I do feel we've become so complex with building serverless applications, that you can't ... the Hello World is super easy, but if you're trying to build an actual application, it's a whole new mindset.You've got to learn a whole bunch of new things. And not only that, but you have to learn the cloud. You have to learn all the details of the cloud, right? You need to know all these different things. You need to know cloud formation or serverless framework or SAM or something like that, in order to get the stuff into the cloud. You need to understand the infrastructure that you're working with. You may not need to manage it, but you still have to understand it. You need to know what its limitations are. You need to know how it connects. You need to know what the failover states are like.There's so many things that you need to know. And to me, that's a burden. And that's adding new types of undifferentiated heavy-lifting that shouldn't be there. And that's the conversation that I would like to have continuing to move forward is, how do you go back to a developer experience where you're saying you're taking away all this stuff. And again, to call out Werner again, he constantly says serverless is about writing code, but ask anybody who builds serverless applications. You're doing a lot more than writing code right now. And I would love to see us bring the conversation back to how do we get back there?Rebecca: Yeah. I think it kind of goes back to ... You and I have talked about this notion of an ode to simplicity. And it's sort of what you want to write into your ode, right? If we're going to have an ode to simplicity, how do we make sure that we keep the simplicity inside of the ode?Jeremy: Right.Rebecca:So I've got ... I don't know if you've seen these.Jeremy: I don't know.Rebecca: But before I get to some wrap-up questions more from the brainwaves of Jeremy Daly, I don't want to forget to call out some long-time listener questions. And they wrote in a via Twitter and they wanted to perhaps pick your brain on a few things.Jeremy: Okay.Rebecca: So I don't know if you're ready for this.Jeremy: A-M-A. A-M-A.Rebecca: I don't know if you've seen these. Yeah, these are going to put you in the ...Jeremy: A-M-A-M. Wait, A-M-A-A? Asked me almost anything? No, go ahead. Ask me anything.Rebecca: A-M-A-A. A-M-J. No. Anyway, we got it. Ask Jeremy almost anything.Jeremy: There you go.Rebecca: So there's just three to tackle for today's episode that I'm going to lob at you. One is from Ken Collins. "What will it take to get you back to a relational database of Lambda?"Jeremy: Ooh, I'm going to tell you right now. And without a doubt, Aurora Serverless v2. I played around with that right after re:Invent 2000. What was it? 20. Yeah. Just came out, right? I'm trying to remember what year it is at this point.Rebecca: Yes. Indeed.Jeremy: When that just ... Right when that came out. And I had spent a lot of time with Aurora Serverless v1, I guess if you want to call it that. I spent a lot of time with it. I used it on a couple of different projects. I had a lot of really good success with it. I had the same pains as everybody else did when it came to scaling and just the slowness of the scaling and then ... And some of the step-downs and some of those things. There were certainly problems with it. But v2 just the early, early preview version of v2 was ... It was just a marvel of engineering. And the way that it worked was just ... It was absolutely fascinating.And I know it's getting ready or it's getting close, I think, to being GA. And when that becomes GA, I think I will have a new outlook on whether or not I can fit RDS into my applications. I will say though. Okay. I will say, I don't think that transactional applications should be using relational databases though. One of the things that was sort of a nice thing about moving to serverless, speak
Carmen nos viene a contar acerca de los servicios de networking en AWS y como podemos compararlos con los que tradicionalmente encontramos en los centros de datos. Carmen va a explicarnos hacer de Security Groups, NACLs, Transit Gateway, Routing tables, VPCs, VPNs y demás.Carmen Pino Cuevas - https://www.linkedin.com/in/carmen-pino-cuevas/ Carmen es una Solutions Architect que ayuda a pequeñas y medianas empresas localizadas en España a desplegar y optimizar sus cargas de trabajo en AWS. Anteriormente se dedicó al diseño e implementación de redes IP para un Service Provider americano.Rodrigo Asensio - https://twitter.com/rasensio Basado en Barcelona, España, Rodrigo es responsable de un equipo de Solution Architecture del segmento Enterprise que ayuda a grandes clientes en Iberia a moverse al cloud y aprovechar sus beneficios.LinksNetworking on AWS Home: https://aws.amazon.com/products/networking/ Security Groups: https://docs.aws.amazon.com/vpc/latest/userguide/VPC_SecurityGroups.html Networking blog home: https://aws.amazon.com/blogs/networking-and-content-delivery/ Transit Gateway: https://aws.amazon.com/transit-gateway/?whats-new-cards.sort-by=item.additionalFields.postDateTime&whats-new-cards.sort-order=desc
SHOW: Season 1, Show 3OVERVIEW: From the creators of the Internet's #1 Cloud Computing podcast, The Cloudcast, Aaron Delp (@aarondelp) and Brian Gracely (@bgracely) introduce this new podcast, Cloudcast Basics. What does networking mean in the cloud? Connectivity between nodes, VPCs (private groupings), DNS, Load-Balancing and Auto-Scaling, CDN Services, Private Links (into the cloud), Internet Facing IPs, Private IPs, Data transfer within AZs, across Regions, Service MeshHow is networking allocated? IP Addressing, Bandwidth, Traffic/Transfers, Private Groupings. CapacityHow was networking allocated before cloud computing? What does the cloud computing provider do with a networking offering (responsibilities vs. customer responsibilities? Lots of variety, depending on the serviceWhy are there so many variations of networking? (connectivity, security, scaling, local networking, global networking, etc.)Does it matter where the networking is located? How do clouds organize the networking (availability zones, regions, etc.)?How much does networking cost in the cloud? What are the various ways you can buy networking? Examples:AWS - https://aws.amazon.com/products/networking/Azure - https://azure.microsoft.com/en-us/product-categories/networking/Google Cloud - https://cloud.google.com/products/networkingOracle Cloud - https://www.oracle.com/cloud/networking/IBM Cloud - https://www.ibm.com/cloud/networkDigital Ocean - https://www.digitalocean.com/products/ (networking)SUBSCRIBE: Please subscribe anywhere you get podcasts (Apple Podcasts, Google Podcasts, Spotify, Stitcher, Amazon Music, Pandora, etc.).CLOUD NEWS OF THE WEEK - http://bit.ly/cloudcast-cnotwLEARNING CLOUD COMPUTING:Here are some great places to begin your cloud journey, if you're interested in getting hands-on experience with the technology, or you'd like to build your skills towards a certification. CBT Nuggets - Training and CertificationsA Cloud Guru - Training and CertificationsCloud Academy - Training and CertificationsKatakoda - Self-Paced, Interactive LearningGitHub - Code Samples and CollaborationFEEDBACK?Web: Cloudcast Basics Email: show at cloudcastbasics dot netTwitter: @cloudcastbasics
Roger Hoover, one of the first engineers to work on Confluent Cloud, joins Tim Berglund (Senior Director of Developer Experience, Confluent) to chat about the evolution of Confluent Cloud, all the stages that it’s been through, and the lessons he’s learned on the way. He talks through the days before Confluent Platform was created, and how he contributed to Apache Kafka® to run it on OpenStack (the feature used to separate advertised hostnames from the internal hostnames).The Confluent Cloud control plane is now run in over 40 regions. Under the covers, Roger and his team are managing tens of thousands of resources at the cloud provider layer. This means creating VPCs, VMs, volumes, and DNS records, to manage software artifacts, like what version of Kafka is running and user management. Confluent Cloud is a complex application and distributed system spread across the entire world, but Roger reveals how it's done.EPISODE LINKSBuilding Confluent Cloud – Here’s What We’ve Learned Join the Confluent Community SlackLearn more with Kafka tutorials, resources, and guides at Confluent DeveloperLive demo: Kafka streaming in 10 minutes on Confluent CloudUse 60PDCAST to get an additional $60 of free Confluent Cloud usage (details)
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.11.09.373845v1?rss=1 Authors: Suyama, H., Egger, V., Lukas, M. Abstract: Social discrimination in rats requires activation of the intrinsic bulbar vasopressin system, but it is unclear how this system comes into operation. Here we show that a higher number of bulbar vasopressin cells (VPC) is activated by stimulation with a conspecific compared to rat urine, indicating that VPC activation depends on more than olfactory cues during social interaction. In-vitro slice electrophysiology combined with pharmacology and immunohistochemistry then demonstrated that centrifugal cholinergic inputs from the diagonal band of Broca can enable olfactory nerve-evoked action potentials in VPCs via muscarinic neuromodulation. Finally, such muscarinic activation of the vasopressin system is essential for vasopressin-dependent social discrimination, since recognition of a known rat could be blocked by a muscarinic antagonist and rescued by additional application of vasopressin. For the first time, we demonstrated that top-down cholinergic modulation of bulbar VPC activity in a social context is crucial for individual social discrimination in rats. Copy rights belong to original authors. Visit the link for more info
Michael shares his learnings about IPv6 on AWS. Enabling IPv6 is highly recommended for public endpoints like CloudFront and ALB. On top of that, Michael explains how to enable IPv6 for your VPCs.
Link to bioRxiv paper: http://biorxiv.org/cgi/content/short/2020.09.03.281311v1?rss=1 Authors: Conchou, L., Lucas, P., Deisig, N., Demondion, E., RENOU, M. Abstract: Olfaction allows insects to communicate with pheromones even in complex olfactory landscapes. It is generally admitted that, due to the binding selectivity of the receptors, general odorants should weakly interfere with pheromone detection. However, laboratory studies show that volatile plant compounds (VPCs) modulate responses to the pheromone in male moths. We used extracellular electrophysiology and calcium imaging to measure the responses to the pheromone of receptor and central neurons in males Agrotis ipsilon while exposed to simple or composite backgrounds of VPCs. Maps of activities were built using calcium-imaging to visualize which areas in antennal lobes (ALs) were affected by VPCs. To mimic a natural olfactory landscape short pheromone puffs were delivered over VPC backgrounds. We chose a panel of VPCs with different chemical structures and physicochemical properties representative of the odorant variety encountered by a moth. We evaluated the intrinsic activity of each VPC and compared the impact of VPC backgrounds at antenna and antennal lobe levels. Then, we prepared binary, ternary and quaternary combinations to determine whether blend activity could be deduced from that of its components. Our data confirm that a VPC background interfere with the moth pheromone system in a dose-dependent manner. Interference with the neuronal coding of pheromone signal starts at the periphery. VPCs showed differences in their capacity to elicit Phe-ORN firing response that cannot be explained by differences in stimulus intensities because we adjusted the source concentrations to vapor pressures. Thus, these differences must be attributed to the selectivity of ORs or any other olfactory proteins. The neuronal network in the ALs, which reformats the ORN-input, did not improve pheromone salience. We postulate that the AL network might have evolved to increase sensitivity and encode for fast changes over a wide range of concentrations, possibly at some cost for selectivity. Comparing three- or four-component blends to binary blends or single compound indicated that a blend showed the activity of its most active compound. Thus, although the diversity of a background might increase the probability of including a VPC interacting with the pheromone system, chemical diversity does not seem to be a prominent factor per se. Global warming is significantly affecting plant metabolism so that the emissions of VPCs and resulting odorscapes are modified. Increase in atmospheric mixing rates of VPCs will change olfactory landscapes which, as confirmed in our study, might impact pheromone communication. Copy rights belong to original authors. Visit the link for more info
Some highlights of the show include: The company's cloud native journey, which accelerated with the acquisition of Uswitch. How the company assessed risk prior to their migration, and why they ultimately decided the task was worth the gamble. Uswitch's transformation into a profitable company resulting from their cloud native migration. The role that multidisciplinary, collaborative teams played in solving problems and moving projects forward. Paul also offers commentary on some of the tensions that resulted between different teams. Key influencing factors that caused the company to adopt containerization and Kubernetes. Paul goes into detail about their migration to Kubernetes, and the problems that it addressed. Paul's thoughts on management and prioritization as CTO. He also explains his favorite engineering tool, which may come as a surprise. Links: RVU Website: https://www.rvu.co.uk/ Uswitch Website: https://www.uswitch.com/ Twitter: https://twitter.com/pingles GitHub: https://github.com/pingles TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast, where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. I'm your host, Emily Omier, and today I am chatting with Paul Ingles. Paul, thank you so much for joining me.Paul: Thank you for having me.Emily: Could you just introduce yourself: where do you work? What do you do? And include, sort of, some specifics. We all have a job title, but it doesn't always reflect what our actual day-to-day is.Paul: I am the CTO at a company called RVU in London. We run a couple of reasonably big-ish price comparison, aggregator type sites. So, we help consumers figure out and compare prices on broadband products, mobile phones, energy—so in the UK, energy is something which is provided through a bunch of different private companies, so you've got a fair amount of choice on kind of that thing. So, we tried to make it easier and simpler for people to make better decisions on the household choices that they have. I've been there for about 10 years, so I've had a few different roles. So, as CTO now, I sit on the exec team and try to help inform the business and technology strategy. But I've come through a bunch of teams. So, I've worked on some of the early energy price comparison stuff, some data infrastructure work a while ago, and then some underlying DevOps type automation and Kubernetes work a couple of years ago.Emily: So, when you get in to work in the morning, what types of things are usually on your plate?Paul: So, I keep a journal. I use bullet journalling quite extensively. So, I try to track everything that I've got to keep on top of. Generally, what I would try to do each day is catch up with anybody that I specifically need to follow up with. So, at the start of the week, I make a list of every day, and then I also keep a separate column for just general priorities. So, things that are particularly important for the week, themes of work going on, like, technology changes, or things that we're trying to launch, et cetera. And then I will prioritize speaking to people based on those things. So, I'll try and make sure that I'm focusing on the most important thing. I do a weekly meeting with the team. So, we have a few directors that look after different aspects of the business, and so we do a weekly meeting to just run through everything that's going on and sharing the problems. We use the three P's model: so, sharing progress problems and plans. And we use that to try and steer on what we do. And we also look at some other team health metrics. Yeah, it's interesting actually. I think when I switched from working in one of the teams to being in the CTO role, things change quite substantially. That list of things that I had to care about increase hugely, to the point where it far exceeded how much time I had to spend on anything. So, nowadays, I find that I'm much more likely for some things to drop off. And so it's unfortunate, and you can't please everybody, so you just have to say, “I'm really sorry, but this thing is not high on the list of priorities, so I can't spend any time on it this week, but if it's still a problem in a couple of weeks time, then we'll come back to it.” But yeah, it can vary quite a lot.Emily: Hmm, interesting. I might ask you more questions about that later. For now, let's sort of dive into the cloud-native journey. What made RVU decide that containerization was a good idea and that Kubernetes was a good idea? What were the motivations and who was pushing for it?Paul: That's a really good question. So, I got involved about 10 years ago. So, I worked for a search marketing startup that was in London called Forward Internet Group, and they acquired USwitch in 2010. And prior to working at Forward, I'd worked as a consultant at ThoughtWorks in London, so I spent a lot of time working in banks on continuous delivery and things like that. And so when Uswitch came along, there were a few issues around the software release process. Although there was a ton of automation, it was still quite slow to actually get releases out. We were only doing a release every fortnight. And we also had a few issues with the scalability of data. So, it was a monolithic Windows Microsoft stack. So, there was SQL Server databases, and .NET app servers, and things like that. And our traffic can be quite spiky, so when companies are in the news, or there's policy changes and things like that, we would suddenly get an increase in traffic, and the Microsoft solution would just generally kind of fall apart as soon as we hit some kind of threshold. So, I got involved, partly to try and improve some of the automation and release practices because at the search start-up, we were releasing experiments every couple of hours, even. And so we wanted to try and take a bit of that ethos over to Uswitch, and also to try and solve some of the data scalability and system scalability problems. And when we got started doing that, a lot of it was—so that was in the early heyday of AWS, so this was about 2008, that I was at the search startup. And we were used to using EC2 to try and spin up Hadoop clusters and a few other bits and pieces that we were playing around with. And when we acquired Uswitch, we felt like it was quickest for us to just create a different environment, stick it under the load balancer so end users wouldn't realize that some requests was being served off of the AWS infrastructure instead, and then just gradually go from there. We found that that was just the fastest way to move. So, I think it was interesting, and it was both a deliberate move, but it was also I think the degree to which we followed through on it, I don't think we'd really anticipated quite how quickly we would shift everything. And so when Forward made the acquisition, I joined summer of 2010, and myself and a colleague wrote a little two-pager on, here are the problems we see, here are the things that we think we can help with and the ways that technology approach that we'd applied at Forward would carry across, and what benefits we thought it would bring. Unfortunately because Forward was a privately held business—we were relatively small but profitable—and the owner of that business was quite risk-affine. He was quite keen on playing blackjack and other stuff. So, he was pretty happy with talking about probabilities of success.And so we just said, we think there's a future in it if we can get the wheels turning a bit better. And he was up for it. He backed us and we just took it from there. And so we replaced everything from self-hosted physical infrastructure running on top of .NET to all AWS hosted, running a mix of Ruby, and Closure, and other bits and pieces in about two years. And that's just continued from there. So, the move to Kubernetes was a relatively recent one; that was only within the last—I say ‘recent.' it was about two years ago, we started moving things in earnest. And then you asked what was the rationale for switching to Kubernetes—Emily: Let me first ask you, when you were talking with the owner, what were the odds that you gave him for success?Paul: [laughs]. That's a good question. I actually don't know. I think we always knew that there was a big impact to be had. I don't think we knew the scale of the upside. So, I don't think we—I mean, at the time, Uswitch was just about breaking even, so we didn't realize that there was an opportunity to radically change that. I think we underestimated how long it would take to do. So, I think we'd originally thought that we could replace, I think maybe most of the stuff that we needed replaced within six months. We had an early prototype out within two weeks, two or three weeks because we'd always placed a big emphasis on releasing early, experimenting, iterative delivery, A/B testing, that kind of thing. So, I think it was almost like that middle term that was the harder piece. And there was definitely a point where… I don't know, I think it was this classic situation of pulling on a ball of string where it was like, what wanted to do was to focus on improving the end-user experience because our original belief was that, aside from the scalability issues, that the existing site just didn't solve the problem sufficiently well, that it needed an overhaul to simplify the journeys, and simplify the process, and improve the experience for people. We were focusing on that and we didn't want to get drawn into replacing a lot of the back office and integration type systems partly because there was a lot of complexity there. But also because you then have to engage with QA environments, and test environments, and sign-offs with the various people that we integrate with. But it was, as I said, it was this kind of tugging on a ball of string where every improvement that we made in the end-user experience—so we would increase conversion rate by 10 percent but through doing that, we would introduce downstream error in the ways that those systems would integrate, and so we gradually just ended up having to pull in slightly more and more pieces to make it work. I don't think we ever gave odds of success. I think we underestimated how long that middle piece would take. I don't think we really anticipated the degree of upside that we would get as a consequence, through nothing other than just making releases quicker, being able to test and move faster, and focusing on end-user experience was definitely the right thing to focus on.Emily: Do you think though, that everybody perceived it as a risk? I'm just asking because you mentioned the blackjack, was this a risk that could fail?Paul: Well, I think the interesting thing about it was that we knew it was the right thing to do. So, again, I think our experience as consultants at ThoughtWorks was on applying continuous delivery, what we would today call DevOps, applying those practices to software delivery. And so we'd worked on systems where there weren't continuous integration servers and where people weren't releasing every day, and then we'd worked in environments where we were releasing every couple of hours, and we were very quickly able to hone in on what worked and discard things that didn't. And so I think because we've been able to demonstrate that success within the search business, I think that carried a great deal of trust. And so when it came to talking about things we could potentially do, we were totally convinced that there were things that we could improve. I think it was a combination of, there was a ton of potential, we knew that there was a new confluence of technologies and approaches that could be successful if we were able to just start over. And then I think also probably a healthy degree of, like, naive, probably overconfidence in what we could do that we would just throw ourselves into it. So, it's hard work, but yeah, it was ultimately highly successful. So, it's something I'm exceedingly proud of today.Emily: You said something really interesting, which is that Uswitch was barely profitable. And if I understand correctly, that changed for the better. Can you talk about how this is related?Paul: Yeah, sure. I think the interesting thing about it was that we knew that there was something we could do better, but we weren't sure what it was. And so the focus was always on being able to release as frequently as we possibly could to try and understand what that was, as well as trying to just simplify and pay back some of the technical debt. Well, so, trying to overcome some of the artificial constraints that existed because of the technology choices that people have made—perfectly decent decisions on, back in the day, but platforms like AWS offered better alternatives, now. So, we just focused on being able to deliver iteratively, and just keep focusing on continual improvement, releasing, understanding what the problems were, and then getting rid of those little niggly things. The manager I had at Forward was this super—I don't know, he just had the perfect ethos, and he was driven—so we were a team that were focused on doing daily experiments. And so we would rely on data on our spend and data on our revenue. And that would come in on a daily cycle. So, a lot of the rhythm of the team was driven off of that cycle. And so as we could run experiments and measure their profitability, we could then inform what we would do on the day. And so, we have a handful of long-running technology things that we were doing, and then we would also have other tactical things that he would have ideas on, he would have some hypothesis of, well, “Maybe this is the reason that this is happening, let's come up with a test that we can use to try and figure out whether that's true.” We would build something quickly to throw it together to help us either disprove it or support it, and we would put it live, see what happened, and then move on to the next thing. And so I think a lot of the—what we wanted to do is to instill a bit of that environment in Uswitch. And so a lot of it was being able to release quickly, making sure that people had good data in front of them. I mean, even tools like Google Analytics were something which we were quite au fait with using but didn't have broad adoption at the time. And so we were using that to look at site behavior and what was going on and reason about what was happening. So, we just tried to make sure that people were directly using that, rather than just making changes on a longer cycle without data at all.Emily: And can you describe how you were working with the business side, and how you were communicating, what the sort of working relationship was like? If there was any misunderstandings on either side.Paul: Yeah, it's a good question. So, when I started at Uswitch, the organizational structure was, I guess, relatively classical. So, you had a pooled engineering team. So, it was a monolithic system, deployed onto physical infrastructure. So, there was an engineering team, there was an operations team, and then there were a handful of people that were business specific in the different markets that we operated in. So, there was a couple of people that focused on, like, the credit card market; a couple of people that focused on energy, for example. And, I used to call it the stand-up swarm: so, in the morning, we would sit on our desks and you would see almost the entire office moved from the different card walls that were based around the office. Although there was a high degree of interaction between the business stakeholders, the engineers, designers, and other people, it always felt slightly weird that you would have almost all of the company interested in almost everything that was going on, and so I think the intuition we had was that a lot of the ways that we would think about structuring software around loosely-coupled but highly cohesive, those same principles should or could apply to the organization itself. And so what we tried to do is to make sure that we had multidisciplinary teams that had the people in them to do the work. So, for the early days of the energy work, there was only a couple of us that were in it. So, we had a couple of engineers, and we had a lady called Emma, who was the product owner. She used to work in the production operations team, so she used to be focused on data entry from the products that different energy providers would send us, but she had the strongest insight into the domain problem, what problem consumers were trying to overcome, and what ways that we could react to it. And so, when we got involved, she had a couple of ideas that she'd been trying to get traction on, that she'd been unable to. And so what we—we had a, I don't know, probably a, I think a half-day session in an office. So, we took over the boardroom at the office and just said, “Look, we could really do with a separate space away from everybody to be able to focus on it. And we just want to prove something out for a couple of weeks. And we want to make sure that we've got space for people to focus.” And so we had a half-day in there, we had a conversation about, “Okay, well, what's the problem? What's the technical complexity of going after any of these things?” And there's a few nuances, too. Like, if you choose option A, then we have to get all of the historical information around it, as well as the current products and market. Whereas if we choose option B, then we can simplify it down, and we don't need to do all of that work, and we can try and experiment with something sooner. So, we wanted it to be as collaborative as possible because we knew that the way that we would be successful is by trying to execute on ideas faster than we'd been able to before. And at the same time, we also wanted to make sure that there was a feeling of momentum and that we would—I think there was probably a healthy degree of slight overconfidence, but we were also very keen to be able to show off what we could do. And so we genuinely wanted to try and improve the environment for people so that we could focus on solving problems quicker, trying out more experiments, being less hung up on whether it was absolutely the right thing to do, and instead just focus on testing it. So, were there tensions? I think there were definitely tensions; I don't think there weren't tensions so much on the technical side; we were very lucky that most of the engineers that already worked there were quite keen on doing something different, and so we would have conversations with them and just say, “Look, we'll try everything we can to try and remove as many of the constraints that exist today.” I think a lot of the disagreement or tension was whether or not it was the right problem to be going after. So, again, the search business that we worked in was doing a decent amount of money for the number of people that were there, and we knew there was a problem we could fix, but we didn't know how much runway it would have. And so there was a lot of tension on whether we should be pulling people into focusing on extending the search business, or whether we needed to focus on fixing Uswitch. So, there was a fair amount of back and forth about whether or not we needed to move people from one part of the business to another and that kind of thing.Emily: Let's talk a little bit about Kubernetes, and how Uswitch decided to use Kubernetes, what problem it solved, and who was behind the decision, who was really making the push.Paul: Yeah, interesting. So, I think containers was something that we'd been experimenting with for a little while. So, as I think a lot of the culture was, we were quite risk-affine. So, we were quite keen to be trying out new technologies, and we'd been using modern languages and platforms like Closure since the early days of them being available. We'd been playing around with containers for a while, and I think we knew there was something in it, but we weren't quite sure what it was. So, I think, although we were playing around with it quite early, I think we were quite slow to choose one platform or another. I think in the end, we—in the intervening period, I guess, between when we went from the more classical way of running Puppet across a bunch of EC2 instances that run a version of your application, the next step after that was switching over to using ECS. So, Amazon's container service. And I guess the thing that prompted a bit more curiosity into Kubernetes was that—I forgot the projects I was working on, but I was working on a team for a little while, and then I switched to go do something else. And I needed to put a new service up, and rather than just doing the thing that I knew, I thought, “Well, I'll go talk to the other teams.” I'll talk to some other people around the company, and find out what's the way that I ought to be doing this today, and there was a lot of work around standardizing the way that you would stand up an ECS cluster. But I think even then, it always felt like we were sharing things in the wrong way. So, if you were working on a team, you had to understand a great deal of Amazon to be able to make progress. And so, back when I got started at Uswitch, when I talk about doing the work about the energy migration, AWS at the time really only offered EC2, load balancers, firewalling, and then eventually relational databases, and so back then the amounts of complexity to stand up something was relatively small. And then come to a couple of years ago. You have to appreciate and understand routing tables, VPCs, the security rules that would permit traffic to flow between those, it was one of those—it was just relatively non-trivial to do something that was so core to what we needed to be able to do. And I think the thing that prompted Kubernetes was that, on the Kubernetes project side, we'd seen a gradual growth and evolution of the concepts, and abstractions, and APIs that it offered. And so there was a differentiation between ECS or—I actually forget what CoreOS's equivalent was. I think maybe it was just called CoreOS. But there are a few alternative offerings for running containerized, clustered services, and Kubernetes seems to take a slightly different approach that it was more focused on end-user abstractions. So, you had a notion of making a deployment: that would contain replicas of a container, and you would run multiple instances of your application, and then that would become a service, and you could then expose that via Ingress. So, there was a language that you could use to talk about your application and your system that was available to you in the environment that you're actually using. Whereas AWS, I think, would take the view that, “Well, we've already got these building blocks, so what we want our users to do is assemble the building blocks that already exist.” So, you still have to understand load balancers, you still have to understand security groups, you have to understand a great deal more at a slightly lower level of abstraction. And I think the thing that seemed exciting, or that seems—the potential about Kubernetes was that if we chose something that offered better concepts, then you could reasonably have a team that would run some kind of underlying platform, and then have teams build upon that platform without having to understand a great deal about what was going on inside. They could focus more on the applications and the systems that they were hoping to build. And that would be slightly harder on the alternative. So, I think at the time, again, it was one of those fortunate things where I was just coming to the end of another project and was in the fortunate position where I was just looking around at the various different things that we were doing as a business, and what opportunity there was to do something that would help push things on. And Kubernetes was one of those things which a couple of us had been talking about, and thinking, “Oh, maybe now is the time to give it a go. There's enough stability and maturity in it; we're starting to hit the problems that it's designed to address. Maybe there's a bit more appetite to do something different.” So, I think we just gave it a go. Built a proof of concept, showed that could run the most complex system that we had, and I think also did a couple of early experiments on the ways in which Kubernetes had support for horizontal scaling and other things which were slightly harder to put into practice in AWS. And so we did all that, I think gradually it just kind of growed out from there, just took the proof of concepts to other teams that were building products and services. We found a team that were struggling to keep their systems running because they were a tiny team. They only had, like, two or three engineers in. They had some stability problems over a weekend because the server ran out of hard disk space, and we just said, “Right. Well, look, if you use this, we'll take on that problem. You can just focus on the application.” It kind of just grew and grew from there.Emily: Was there anything that was a lot harder than you expected? So, I'm looking for surprises as you're adopting Kubernetes.Paul: Oh, surprises. I think there was a non-trivial amount that we had to learn about running it. And again, I think at the point at which we'd picked it up, it was, kind of, early days for automation, so there was—I think maybe Google had just launched Google Kubernetes engine on Google Cloud. Amazon certainly hadn't even announced that hosted Kubernetes would be an option. There was an early project within Kubernetes, called kops that you could use to create a cluster, but even then it didn't fit our network topology because it wouldn't work with the VPC networking that we needed and expected within our production infrastructure. So, there was a lot of that kind of work in the early days, to try and make something work, you had to understand in quite a level of detail what each component of Kubernetes was doing. As we were gradually rolling it out, I think the things that were most surprising were that, for a lot of people, it solved a lot of problems that meant they could move on, and I think people were actually slightly surprised by that. Which, [laughs], it sounds like quite a weird turn of phrase, but I think people were positively surprised at the amount of stuff that they didn't have to do for solving a fair few number of problems that they had. There was a couple of teams that were doing things that are slightly larger scale that we had to spend a bit more time on improving the performance of our setup. So, in particular, there was a team that had a reasonably strong requirement on the latency overheads of Ingress. So, they wanted their application to respond within, I don't know, I think it was maybe 200 milliseconds or something. And we, through setting up the monitoring and other bits and pieces that we had, we realized that Ingress currently was doing all right, but there was a fair amount of additional latency that was added at the tail that was a consequence of a couple of bugs or other things that existed in the infrastructure. So, there was definitely a lot of little niggly things that came up as we were going, but we were always confident that we could overcome it. And, as I said, I think that a lot of teams saw benefits very early on. And I think the other teams that were perhaps a little bit more skeptical because they got their own infrastructure already, they knew how to operate it, it was highly tested, they'd already run capacity and load tests on it, they were convinced that it was the most efficient thing that they could possibly run. I think even over the long run, I think they realized that there was more work that they needed to do than they should be focusing on, and so they were quite happy to ultimately switch over to the shared platform and infrastructure that the cloud infrastructure team run.Emily: As we wrap up, there's actually a question I want to go back to, which is how you were talking about the shifting priorities now that you've become CTO. Do you have any sort of examples of, like, what are the top three things that you will always care about, that you will always have the energy to think about? And then I'm curious to have some examples of things that you can't deal with, you can't think about. The things that tend to drop off.Paul: The top three things that I always think about. So, I think, actually, what's interesting about being CTO, that I perhaps wasn't expecting is that you're ever so slightly removed from the work, that you can't rely on the same signals or information to be able to make a decision on things, and so when I give the Kubernetes story, it's one of those, like, because I'd moved from one system to another, and I was starting a new project, I experienced some pain. It's like, “Right. Okay, I've got to go do something to fix this. I've had enough.” And I think the thing that I'm always paying attention to now, is trying to understand where that pain is next, and trying to make sure that I've got a mechanism for being able to appreciate that. So, I think a lot of the things I try to spend time on are things to help me keep track of what's going on, and then help me make decisions off the back of it. So, I think the things that I always spend time on are generally things trying to optimize some process or invest in automation. So, a good example at the moment is, we're talking about starting to do canary deployments. So, starting to automate the actual rollout of some new release, and being able to automate a comparison against the existing service, looking at latency, or some kind of transactional metrics to understand whether it's performing as well or different than something historical. So, I think the things that I tend to spend time on are process-oriented or are things to try and help us go quicker. One of the books that I read that changed my opinion of management was Andy Grove's, High Output Management. And I forget who recommended it to me, but somebody recommended it to me, and it completely altered my opinion of what value a manager can add. So, one of the lenses I try to apply to anything is of everything that's going on, what's the handful of things that are going to have the most impact or leverage across the organization, and try and spend my time on those. I think where it gets tricky is that you have to go broad and deep. So, as much as there are broad things that have a high consequence on the organization as a whole, you also need an appreciation of what's going on in the detail, and I think that's always tricky to manage. I'm sorry, I forgot what was the second part of your question.Emily: The second part was, do you have any examples of the things that you tend to not care about? That presumably someone is asking you to care about, and you don't?Paul: [laughs]. Yeah, it's a good question. I don't think it's that I don't care about it. I think it's that there are some questions that come my way that I know that I can defer, or they're things which are easy to hand off. So, I think the… that is a good question. I think one of the things that I think are always tricky to prioritize, are things which feel high consequence but are potentially also very close to bikeshedding. And I think that is something which is fair—I'd be interested to hear what other people said. So, a good example is, like, choice of tooling. And so when I was working on a team, or on a problem, we would focus on choosing the right tool for the job, and we would bias towards experimenting with tools early, and figuring out what worked, and I think now you have to view the same thing through a different lens. So, there's a degree to which you also incur an organizational cost as a consequence of having high variability in the programming languages that you choose to use. And so I don't think it's something I don't care about, but I think it's something which is interesting that I think it's something which, over the time I've been doing this role, I've gradually learned to let go of things that I would otherwise have previously thoroughly enjoyed getting involved in. And so you have to step back and say, “Well, actually I'm not the right person to be making a decision about which technology this team should be using. I should be trusting the team to make that decision.” And you have to kind of—I think that over the time I've been doing the role, you kind of learn which are the decisions that are high consequence that you should be involved in and which are the ones that you have to step back from. And you just have to say, look, I've got two hours of unblocked time this week where I can focus on something, so of the things on my priority list—the things that I've written in my journal that I want to get done this month—which of those things am I going to focus on, and which of the other things can I leave other people to get on with, and trust that things will work out all right?Emily: That's actually a very good segue into my final question, which is the same for everyone. And that is, what is an engineering tool that you can't live without—your favorite?Paul: Oh, that's a good question. So, I don't know if this is a cop-out by not mentioning something engineering-related, but I think the tool and technique which has helped me the most as I had more and more management responsibility and trying to keep track of things, is bullet journaling. So, I think, up until, I don't know, maybe five years ago, probably, I'd focus on using either iOS apps or note tools in both my laptop, and phone, and so on, and it never really stuck. And bullet journaling, through using a pen and a notepad, it forced me to go a bit slower. So, it forced me to write things down, to think through what was going on, and there is something about it being physical which makes me treat it slightly differently. So, I think bullet journaling is one of the things which has had the—yeah, it's really helped me deal with keeping track of what's going on, and then giving me the ability to then look back over the week, figure out what were the things that frustrated me, what can I change going into next week, one of the suggestions that the person that came up with bullet journaling recommended, is this idea of an end of week reflection. And so, one of the things I try to do—it's been harder doing it now that I'm working at home—is to spend just 15 minutes at the end of the week thinking of, what are the things that I'm really proud of? What are some good achievements that I should feel really good about going into next week? And so I think a lot of the activities that stem from bullet journaling have been really helpful. Yeah, it feels like a bit of a cop-out because it's not specifically technology related, but bullet journaling is something which has made a big difference to me.Emily: Not at all. That's totally fair. I think you are the first person who's had a completely non-technological answer, but I think I've had someone answer Slack, something along those lines.Paul: Yeah, I think what's interesting is there there are loads of those tools that we use all the time. Like Google Docs is something I can't live without. So, I think there's a ton of things that I use day-to-day that are hard to let go off, but I think the I think that the things that have made the most impact on my ability to deal with a stressful job, and give you the ability to manage yourself a little bit, I think yeah, it's been one of the most interesting things I've done.Emily: And where could listeners connect with you or follow you?Paul: Cool. So, I am @pingles on Twitter. My DMs are open, so if anybody wants to talk on that, I'm happy to. I'm also on GitHub under pingles, as well. So, @pingles, generally in most places will get you to me.Emily: Well, thank you so much for joining me.Paul: Thank you for talking. It's been good fun.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.
En este es el episodio #3 del Podcast de AWS en Español.En este episodio, hablamos de las diferentes formas de usar AWS. Cubrimos temas para principiantes en la nube y luego temas más complejos como virtualización, contenedores y serverless.00:00 - Introducción al episodio01:15 - AWS CodeArtifact 04:23 - Introducción al tema de hoy05:11 - Amazon Lightsail08:45 - AWS Elastic Beanstalk13:30 - AWS EC2 - Virtualización en AWS 20:30 - Contenedores en AWS24:00 - AWS Fargate25:00 - Servicios gestionados vs servicios no-gestionados29:25 - VPCs - redes logicas en AWS34:25 - AWS Lambda - Funciones en la nube 40:00 - Como elegir que usar?43:45 - Herramientas y guias para empezar con AWS45:00 - Infrastructura como código 48:00 - Certificaciones48:30 - En el próximo episodio...
Puesta en marcha de la maqueta de Cumulus Linux en GNS3. VPCs con Virtual PC, Bonding, Spanning Tree y Valns. Explicación en audio, vídeo y texto. La entrada Cumulus Linux – Primeros pasos con la maqueta se publicó primero en Eduardo Collado.
As more customers adopt Amazon VPC architectures, the features and flexibility of the service are encountering the obstacles of evolving design requirements. In this session, we follow the evolution of a single regional VPC to a multi-VPC, multi-region design with diverse connectivity into on-premises systems and infrastructure. Along the way, we investigate creative customer solutions for scaling and securing outbound VPC traffic, securing private access to Amazon Simple Storage Service (Amazon S3), managing multi-tenant VPCs, integrating existing customer networks through AWS Direct Connect, and building a full VPC mesh network across global regions.
In this advanced session, we review common architectural patterns for designing networks with many VPCs. Segmentation, security, scalability, cross-region connectivity, and flexibility become more important as you scale on AWS. We review designs that include AWS Transit Gateway, AWS Direct Connect, VPN, AWS PrivateLink, VPC peering, and more.
In this session, we walk through the fundamentals of Amazon VPC. First, we cover build-out and design fundamentals for VPCs, including picking your IP space, subnetting, routing, security, NAT, and much more. We then transition to different approaches and use cases for connecting your VPC to your physical data center with VPN or AWS Direct Connect. This mid-level architecture discussion is aimed at architects, network administrators, and technology decision-makers interested in understanding the building blocks that AWS makes available with Amazon VPC. Learn how you can connect VPCs with your offices and current data center footprint.
Everything is moving to the cloud, which makes it increasingly important to secure your cloud infrastructure and minimize the threat of potential attackers. With a virtual private cloud (VPC)—your own private network in the cloud that you can launch your own instances into—this can be done with VPC Peering, connecting VPCs together to create a path between them to keep your data safe and accessible to you alone. Although typically performed in a single cloud provider, it is possible to do in more than one—think of it as your cloud routerDaniel LaMotte (Site Reliability Engineer, Confluent) walks through the details of cloud networking and VPC peering: what it is, what it does, and how to launch a VPC in the cloud, plus the difference between AWS PrivateLink and AWS Transit Gateway, CIDR, and its accessibility across cloud providers. EPISODE LINKSVPC Peering in Confluent CloudJoin the Confluent Community SlackFully managed Apache Kafka as a service! Try free.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more. The post Day Two Cloud 020: Design Tips For Cloud Networking Success appeared first on Packet Pushers.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more. The post Day Two Cloud 020: Design Tips For Cloud Networking Success appeared first on Packet Pushers.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more. The post Day Two Cloud 020: Design Tips For Cloud Networking Success appeared first on Packet Pushers.
David Bombal talks to Jeremy Grossmann (creator of GNS3) about the future of GNS3. Here we discuss Dynamips and VPCS and their future in GNS3. Will they be removed from GNS3? Are they recommended? What do they actually do? What should be used instead of them? Does Dynamips support switching? In future videos we will discuss additional options in gns3 such as Cisco VIRL and IOU. Menu: 0:12 - Devices in GNS3. It can be confusing. What is Dynamips 0:57 - Does GNS3 support switching? 1:17 - Are they real IOS images? 1:47 - Issue 1: Where do I get Cisco images? Cisco restrictions. 2:07 - Issues 2: Only older versions of Cisco IOS are supported on a lot of platforms 2:11 - Is it stable? Issue 3: More memory and processor intensive 2:25 - What is Idle PC Value 4:23 - Advantage 1: Supports serial interfaces 4:50 - Dynamips is a dying product 5:00 - You can run Dynamips locally 5:40 - What does Jeremy recommend we use? 5:50 - Switching in Dynamips? 7:18 - Will Dynamips be removed from GNS3? 7:48 - What is VPCS? 8:28 - What is the advantage of VPCS? 8:55 - Should we be using VPCS? 9:58 - Will VPCS be removed from GNS3? David's details: YouTube: https://www.youtube.com/davidbombal Twitter: https://www.twitter.com/davidbombal Instagram: https://www.instagram.com/davidbombal LinkedIn: https://www.linkedin.com/in/davidbombal Facebook: https://www.facebook.com/davidbombal.co Website: http://www.davidbombal.com #gns3 #dynamips #virl
Simon and Nicki share a bumper-crop of interesting, useful and cool new services and features for AWS customers! Chapter Timings 00:01:17 Storage 00:03:15 Compute 00:07:13 Network 00:10:27 Databases 00:16:04 Migration 00:17:43 Developer Tools 00:22:47 Analytics 00:27:07 IoT 00:28:14 End User Computing 00:29:25 Machine Learning 00:30:49 Application Integration 00:34:18 Management and Governance 00:41:42 Customer Engagement 00:42:47 Media 00:44:03 Security 00:46:26 Gaming 00:47:54 AWS Marketplace 00:49:07 Robotics Shownotes Topic || Storage Optimize Cost with Amazon EFS Infrequent Access Lifecycle Management | https://aws.amazon.com/about-aws/whats-new/2019/07/optimize-cost-amazon-efs-infrequent-access-lifecycle-management/ Amazon FSx for Windows File Server Now Enables You to Use File Systems Directly With Your Organization’s Self-Managed Active Directory | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-fsx-for-windows-file-server-now-enables-you-to-use-file-systems-directly-with-your-organizations-self-managed-active-directory/ Amazon FSx for Windows File Server now enables you to use a single AWS Managed AD with file systems across VPCs or accounts | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-fsx-for-windows-file-server-now-enables-you-to-use-a-single-aws-managed-ad-with-file-systems-across-vpcs-or-accounts/ AWS Storage Gateway now supports Amazon VPC endpoints with AWS PrivateLink | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-storage-gateway-now-supports-amazon-vpc-endpoints-aws-privatelink/ File Gateway adds encryption & signing options for SMB clients – Amazon Web Services | https://aws.amazon.com/about-aws/whats-new/2019/06/file-gateway-adds-options-to-enforce-encryption-and-signing-for-smb-shares/ New AWS Public Datasets Available from Facebook, Yale, Allen Institute for Brain Science, NOAA, and others | https://aws.amazon.com/about-aws/whats-new/2019/07/new-aws-public-datasets-available-from-facebook-yale-allen/ Topic || Compute Introducing Amazon EC2 Instance Connect | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-amazon-ec2-instance-connect/ Introducing New Instances Sizes for Amazon EC2 M5 and R5 Instances | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-new-instances-sizes-for-amazon-ec2-m5-and-r5-instances/ Introducing New Instance Sizes for Amazon EC2 C5 Instances | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-new-instance-sizes-for-amazon-ec2-c5-instances/ Amazon ECS now supports additional resource-level permissions and tag-based access controls | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-ecs-now-supports-resource-level-permissions-and-tag-based-access-controls/ Amazon ECS now offers improved capabilities for local testing | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-ecs-now-offers-improved-capabilities-for-local-testing/ AWS Container Services launches AWS For Fluent Bit | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-container-services-launches-aws-for-fluent-bit/ Amazon EKS now supports Kubernetes version 1.13, ECR PrivateLink, and Kubernetes Pod Security Policies | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-eks-now-supports-kubernetes113-ecr-privatelink-kubernetes-pod-security/ AWS VPC CNI Version 1.5.0 Now Default for Amazon EKS Clusters | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-vpc-cni-version-150-now-default-for-amazon-eks-clusters/ Announcing Enhanced Lambda@Edge Monitoring within the Amazon CloudFront Console | https://aws.amazon.com/about-aws/whats-new/2019/06/announcing-enhanced-lambda-edge-monitoring-amazon-cloudfront-console/ AWS Lambda Console shows recent invocations using CloudWatch Logs Insights | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-lambda-console-recent-invocations-using-cloudwatch-logs-insights/ AWS Thinkbox Deadline with Resource Tracker | https://aws.amazon.com/about-aws/whats-new/2019/06/thinkbox-deadline-resource-tracker/ Topic || Network Network Load Balancer Now Supports UDP Protocol | https://aws.amazon.com/about-aws/whats-new/2019/06/network-load-balancer-now-supports-udp-protocol/ Announcing Amazon VPC Traffic Mirroring for Amazon EC2 Instances | https://aws.amazon.com/about-aws/whats-new/2019/06/announcing-amazon-vpc-traffic-mirroring-for-amazon-ec2-instances/ AWS ParallelCluster now supports Elastic Fabric Adapter (EFA) | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-parallelcluster-supports-elastic-fabric-adapter/ AWS Direct Connect launches first location in Italy | https://aws.amazon.com/about-aws/whats-new/2019/06/aws_direct_connect_locations_in_italy/ Amazon CloudFront announces seven new Edge locations in North America, Europe, and Australia | https://aws.amazon.com/about-aws/whats-new/2019/06/cloudfront-seven-edge-locations-june2019/ Now Add Endpoint Policies to Interface Endpoints for AWS Services | https://aws.amazon.com/about-aws/whats-new/2019/06/now-add-endpoint-policies-to-interface-endpoints-for-aws-services/ Topic || Databases Amazon Aurora with PostgreSQL Compatibility Supports Serverless | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-aurora-with-postgresql-compatibility-supports-serverless/ Amazon RDS now supports Storage Auto Scaling | https://aws.amazon.com/about-aws/whats-new/2019/06/rds-storage-auto-scaling/ Amazon RDS Introduces Compatibility Checks for Upgrades from MySQL 5.7 to MySQL 8.0 | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon_rds_introduces_compatibility_checks/ Amazon RDS for PostgreSQL Supports New Minor Versions 11.4, 10.9, 9.6.14, 9.5.18, and 9.4.23 | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-rds-postgresql-supports-minor-version-114/ Amazon Aurora with PostgreSQL Compatibility Supports Cluster Cache Management | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-aurora-with-postgresql-compatibility-supports-cluster-cache-management/ Amazon Aurora with PostgreSQL Compatibility Supports Data Import from Amazon S3 | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-aurora-with-postgresql-compatibility-supports-data-import-from-amazon-s3/ Amazon Aurora Supports Cloning Across AWS Accounts | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon_aurora_supportscloningacrossawsaccounts-/ Amazon RDS for Oracle now supports z1d instance types | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-rds-for-oracle-now-supports-z1d-instance-types/ Amazon RDS for Oracle Supports Oracle Application Express (APEX) Version 19.1 | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-rds-oracle-supports-oracle-application-express-version-191/ Amazon ElastiCache launches reader endpoints for Redis | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-elasticache-launches-reader-endpoint-for-redis/ Amazon DocumentDB (with MongoDB compatibility) Now Supports Stopping and Starting Clusters | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-documentdb-supports-stopping-starting-cluters/ Amazon DocumentDB (with MongoDB compatibility) Now Provides Cluster Deletion Protection | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-documentdb-provides-cluster-deletion-protection/ You can now publish Amazon Neptune Audit Logs to Cloudwatch | https://aws.amazon.com/about-aws/whats-new/2019/06/you-can-now-publish-amazon-neptune-audit-logs-to-cloudwatch/ Amazon DynamoDB now supports deleting a global secondary index before it finishes building | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-dynamodb-now-supports-deleting-a-global-secondary-index-before-it-finishes-building/ Amazon DynamoDB now supports up to 25 unique items and 4 MB of data per transactional request | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-dynamodb-now-supports-up-to-25-unique-items-and-4-mb-of-data-per-transactional-request/ Topic || Migration CloudEndure Migration is now available at no charge | https://aws.amazon.com/about-aws/whats-new/2019/06/cloudendure-migration-available-at-no-charge/ New AWS ISV Workload Migration Program | https://aws.amazon.com/about-aws/whats-new/2019/06/isv-workload-migration/ AWS Migration Hub Adds Support for Service-Linked Roles | https://aws.amazon.com/about-aws/whats-new/2019/06/aws_migration_hub_adds_support_for_service_linked_roles/ Topic || Developer Tools The AWS Toolkit for Visual Studio Code is Now Generally Available | https://aws.amazon.com/about-aws/whats-new/2019/07/announcing-aws-toolkit-for-visual-studio-code/ The AWS Cloud Development Kit (AWS CDK) is Now Generally Available | https://aws.amazon.com/about-aws/whats-new/2019/07/the-aws-cloud-development-kit-aws-cdk-is-now-generally-available1/ AWS CodeCommit Supports Two Additional Merge Strategies and Merge Conflict Resolution | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-codecommit-supports-2-additional-merge-strategies-and-merge-conflict-resolution/ AWS CodeCommit Now Supports Resource Tagging | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-codecommit-now-supports-resource-tagging/ AWS CodeBuild adds Support for Polyglot Builds | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-codebuild-adds-support-for-polyglot-builds/ AWS Amplify Console Updates Build image with SAM CLI and Custom Container Support | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-amplify-console-updates-build-image-sam-cli-and-custom-container-support/ AWS Amplify Console announces Manual Deploys for Static Web Hosting | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-amplify-console-announces-manual-deploys-for-static-web-hosting/ Amplify Framework now Supports Adding AWS Lambda Triggers for events in Auth and Storage categories | https://aws.amazon.com/about-aws/whats-new/2019/07/amplify-framework-now-supports-adding-aws-lambda-triggers-for-events-auth-storage-categories/ AWS Amplify Console now supports AWS CloudFormation | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-amplify-console-supports-aws-cloudformation/ AWS CloudFormation updates for Amazon EC2, Amazon ECS, Amazon EFS, Amazon S3 and more | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-cloudformation-updates-amazon-ec2-ecs-efs-s3-and-more/ Topic || Analytics Amazon QuickSight launches multi-sheet dashboards, new visual types and more | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-quickSight-launches-multi-sheet-dashboards-new-visual-types-and-more/ Amazon QuickSight now supports fine-grained access control over Amazon S3 and Amazon Athena! | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-quickSight-now-supports-fine-grained-access-control-over-amazon-S3-and-amazon-athena/ Announcing EMR Release 5.24.0: With performance improvements in Spark, new versions of Flink, Presto, and Hue, and enhanced CloudFormation support for EMR Instance Fleets | https://aws.amazon.com/about-aws/whats-new/2019/06/announcing-emr-release-5240-with-performance-improvements-in-spark-new-versions-of-flink-presto-Hue-and-cloudformation-support-for-launching-clusters-in-multiple-subnets-through-emr-instance-fleets/ AWS Glue now provides workflows to orchestrate your ETL workloads | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-glue-now-provides-workflows-to-orchestrate-etl-workloads/ Amazon Elasticsearch Service increases data protection with automated hourly snapshots at no extra charge | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-elasticsearch-service-increases-data-protection-with-automated-hourly-snapshots-at-no-extra-charge/ Amazon MSK is Now Integrated with AWS CloudFormation and Terraform | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon_msk_is_now_integrated_with_aws_cloudformation_and_terraform/ Kinesis Video Streams adds support for Dynamic Adaptive Streaming over HTTP (DASH) and H.265 video | https://aws.amazon.com/about-aws/whats-new/2019/07/kinesis-video-streams-adds-support-for-dynamic-adaptive-streaming-over-http-dash-and-h-2-6-5-video/ Announcing the availability of Amazon Kinesis Video Producer SDK in C | https://aws.amazon.com/about-aws/whats-new/2019/07/announcing-availability-of-amazon-kinesis-video-producer-sdk-in-c/ Topic || IoT AWS IoT Expands Globally | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-iot-expands-globally/ Bluetooth Low Energy Support and New MQTT Library Now Generally Available in Amazon FreeRTOS 201906.00 Major | https://aws.amazon.com/about-aws/whats-new/2019/06/bluetooth-low-energy-support-amazon-freertos-now-available/ AWS IoT Greengrass 1.9.2 With Support for OpenWrt and AWS IoT Device Tester is Now Available | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-iot-greengrass-support-openwrt-aws-iot-device-tester-available/ Topic || End User Computing Amazon Chime Achieves HIPAA Eligibility | https://aws.amazon.com/about-aws/whats-new/2019/06/chime_hipaa_eligibility/ Amazon WorkSpaces now supports copying Images across AWS Regions | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon_workspaces_now_supports_copying_images_across_aws_regions/ Amazon AppStream 2.0 adds support for Windows Server 2016 and Windows Server 2019 | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-appstream-20-adds-support-for-windows-server-2016-and-windows-server-2019/ AWS Client VPN now includes support for AWS CloudFormation | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-client-vpn-includes-support-for-aws-cloudformation/ Topic || Machine Learning Amazon Comprehend Medical is now Available in Sydney, London, and Canada | https://aws.amazon.com/about-aws/whats-new/2019/06/comprehend-medical-available-in-asia-pacific-eu-canada/ Amazon Personalize Now Generally Available | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-personalize-now-generally-available/ New in AWS Deep Learning Containers: Support for Amazon SageMaker and MXNet 1.4.1 with CUDA 10.0 | https://aws.amazon.com/about-aws/whats-new/2019/06/new-in-aws-deep-learning-containers-support-for-amazon-sagemaker-libraries-and-mxnet-1-4-1-with-cuda-10-0/ Topic || Application Integration Introducing Amazon EventBridge | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-amazon-eventbridge/ AWS App Mesh Service Discovery with AWS Cloud Map generally available. | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-app-mesh-service-discovery-with-aws-cloud-map-generally-available/ Amazon API Gateway Now Supports Tag-Based Access Control and Tags on WebSocket APIs | https://aws.amazon.com/about-aws/whats-new/2019/07/amazon-api-gateway-supports-tag-based-access-control-tags-on-websocket/ Amazon API Gateway Adds Configurable Transport Layer Security Version for Custom Domains | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-api-gateway-adds-configurable-transport-layer-security-version-custom-domains/ Topic || Management and Governance Introducing AWS Systems Manager OpsCenter to enable faster issue resolution | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-aws-systems-manager-opscenter-to-enable-faster-issue-resolution/ Introducing Service Quotas: View and manage your quotas for AWS services from one central location | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-service-quotas-view-and-manage-quotas-for-aws-services-from-one-location/ Introducing AWS Budgets Reports | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-aws-budgets-reports/ Introducing Amazon CloudWatch Anomaly Detection – Now in Preview | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-amazon-cloudwatch-anomaly-detection-now-in-preview/ Amazon CloudWatch Launches Dynamic Labels on Dashboards | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-cloudwatch-launches-dynamic-labels-on-dashboards/ Amazon CloudWatch Adds Visibility for your .NET and SQL Server Application Health | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-cloudwatch-adds-visibility-for-your-net-sql-server-application-health/ Amazon CloudWatch Events Now Supports Amazon CloudWatch Logs as a Target and Tagging of CloudWatch Events Rules | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-cloudwatch-events-now-supports-amazon-cloudwatch-logs-target-tagging-cloudwatch-events-rules/ Introducing Amazon CloudWatch Container Insights for Amazon ECS and AWS Fargate - Now in Preview | https://aws.amazon.com/about-aws/whats-new/2019/07/introducing-container-insights-for-ecs-and-aws-fargate-in-preview/ AWS Config now enables you to provision AWS Config rules across all AWS accounts in your organization | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-config-now-enables-you-to-provision-config-rules-across-all-aws-accounts-in-your-organization/ Session Manager launches Run As to start interactive sessions with your own operating system user account | https://aws.amazon.com/about-aws/whats-new/2019/07/session-manager-launches-run-as-to-start-interactive-sessions-with-your-own-operating-system-user-account/ Session Manager launches tunneling support for SSH and SCP | https://aws.amazon.com/about-aws/whats-new/2019/07/session-manager-launches-tunneling-support-for-ssh-and-scp/ Use IAM access advisor with AWS Organizations to set permission guardrails confidently | https://aws.amazon.com/about-aws/whats-new/2019/06/now-use-iam-access-advisor-with-aws-organizations-to-set-permission-guardrails-confidently/ AWS Resource Groups is Now SOC Compliant | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-resource-groups-is-now-soc-compliant/ Topic || Customer Engagement Introducing AI Powered Speech Analytics for Amazon Connect | https://aws.amazon.com/about-aws/whats-new/2019/06/introducing-ai-powered-speech-analytics-for-amazon-connect/ Amazon Connect Launches Contact Flow Versioning | https://aws.amazon.com/about-aws/whats-new/2019/06/amazon-connect-launches-contact-flow-versioning/ Topic || Media AWS Elemental MediaConnect Now Supports SPEKE for Conditional Access | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-elemental-mediaconnect-now-supports-speke-for-conditional-access/ AWS Elemental MediaLive Now Supports AWS CloudFormation | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-medialive-now-supports-aws-cloudformation/ AWS Elemental MediaConvert Now Ingests Files from HTTPS Sources | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-elemental-mediaconvert-now-ingests-files-from-https-sources/ Topic || Security AWS Certificate Manager Private Certificate Authority now supports root CA hierarchies | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-certificate-manager-private-certificate-authority-now-supports-root-CA-heirarchies/ AWS Control Tower is now generally available | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-control-tower-is-now-generally-available/ AWS Security Hub is now generally available | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-security-hub-now-generally-available/ AWS Single Sign-On now makes it easy to access more business applications including Asana and Jamf | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-single-sign-on-access-business-applications-including-asana-and-jamf/ Topic || Gaming Large Match Support for Amazon GameLift Now Available | https://aws.amazon.com/about-aws/whats-new/2019/07/large-match-support-for-amazon-gameLift-now-available/ New Dynamic Vegetation System in Lumberyard Beta 1.19 – Available Now | https://aws.amazon.com/about-aws/whats-new/2019/06/lumberyard-beta-119-available-now/ Topic || AWS Marketplace AWS Marketplace now integrates with your procurement systems | https://aws.amazon.com/about-aws/whats-new/2019/06/aws-marketplace-now-integrates-with-your-procurement-systems/ Topic || Robotics AWS RoboMaker announces support for Robot Operating System (ROS) Melodic | https://aws.amazon.com/about-aws/whats-new/2019/07/aws-robomaker-support-robot-operating-system-melodic/
Podcast Outline – Season 1 – Episode 13 – Podio Deep Dive 2a: "Building Saas that businesses can rely on"Discussion Outline:Introduction – Talk aboutTopic: "Cloud computing concepts for business software design"PaaS, SaaS, IaaS…what does “as a service” even mean, and when should it matter?Where are the services hosted?How are they secured?How reliable are they?API, SDK, Service (REST?)VMs, VPCs, Containers, Docker, Kubernetes, Lambda (runtimes)Public internet concepts important for cloud security:Domains, DNSHostingCertificatesUp Next Episode: Continuation into Part 2.Audience Engagement: Solving Podio Gaps – Listener ForumOutro: SUBSCRIBE and Thank you.Support the show (http://www.brickbridgeconsulting.com/podcast)
AWS Transit Gateways, an evolution of Transit VPCs, centralize VPN connectivity to multiple VPCs, allowing for greater scale and simpler connectivity and management. Today's Heavy Networking drills into this topic with guest Nick Matthews, an AWS solutions architect. We also examine Global Accelerator and TLS termination on Network Load Balancer. The post Heavy Networking 433: An Insider’s Guide To AWS Transit Gateways appeared first on Packet Pushers.
AWS Transit Gateways, an evolution of Transit VPCs, centralize VPN connectivity to multiple VPCs, allowing for greater scale and simpler connectivity and management. Today's Heavy Networking drills into this topic with guest Nick Matthews, an AWS solutions architect. We also examine Global Accelerator and TLS termination on Network Load Balancer. The post Heavy Networking 433: An Insider’s Guide To AWS Transit Gateways appeared first on Packet Pushers.
AWS Transit Gateways, an evolution of Transit VPCs, centralize VPN connectivity to multiple VPCs, allowing for greater scale and simpler connectivity and management. Today's Heavy Networking drills into this topic with guest Nick Matthews, an AWS solutions architect. We also examine Global Accelerator and TLS termination on Network Load Balancer. The post Heavy Networking 433: An Insider’s Guide To AWS Transit Gateways appeared first on Packet Pushers.
In this session, we walk through the fundamentals of Amazon VPC. First, we cover build-out and design fundamentals for VPCs, including picking your IP space, subnetting, routing, security, NAT, and much more. We then transition to different approaches and use cases for optionally connecting your VPC to your physical data center with VPN or AWS Direct Connect. This mid-level architecture discussion is aimed at architects, network administrators, and technology decision makers interested in understanding the building blocks that AWS makes available with Amazon VPC. Learn how you can connect VPCs with your offices and current data center footprint.
Complex applications generally require a way to provision resources at scale to enable an organization to onboard customers in a frictionless way while remaining operationally efficient. In this session, we describe how you can architect a control plane built on AWS that is responsible for provisioning and maintaining infrastructure and application resources for multiple customers across a number of AWS accounts, VPCs, and AWS services.
In this session, we review how technology and consulting partners can utilize AWS PrivateLink, a networking service that allows for a service behind a load balancer to be privately placed into other VPCs as well as on-premises. You can use PrivateLink to help scale a SaaS service, simplify microservices, simplify the network connectivity of managed service providers, and create a more secure environment for partner products inside customer VPCs. In this session, we focus on the design and service architecture requirements as well as the business considerations for implementing PrivateLink for your product or service. We also hear from APN Partner, Snowflake, and its customer, ARC, about how they deployed PrivateLink.
As more customers adopt Amazon VPC architectures, the features and flexibility of the service are encountering the obstacles of evolving design requirements. In this session, we follow the evolution of a single regional VPC to a multi-VPC, multi-region design with diverse connectivity into on-premises systems and infrastructure. Along the way, we investigate creative customer solutions for scaling and securing outbound VPC traffic, securing private access to Amazon S3, managing multi-tenant VPCs, integrating existing customer networks through AWS Direct Connect, and building a full VPC mesh network across global regions. Please join us for a speaker meet-and-greet following this session at the Speaker Lounge (ARIA East, Level 1, Willow Lounge). The meet-and-greet starts 15 minutes after the session and runs for half an hour.
As customers put more workloads into AWS, the number of Virtual Private Clouds (VPCs) a customer needs to manage grows. Scaling out an AWS environment can create challenges in manageability, workload segmentation, and security. SD-WAN solutions offered by AWS Partners can enable organizations to scale up the number of VPCs as needed while segmenting and isolating workloads for easier management, application quality monitoring, and security. In this session, we walk through a customer example of how an SD-WAN implementation simplified the management of a multi-VPC footprint while also improving application performance to WAN-connected branch offices. Complete Title: AWS re:Invent 2018: [REPEAT 2] Use SD-WAN to Manage Your AWS Environment & Branch Office Connectivity (NET311-R2)
To deliver your applications to millions of users you need to scale your network across thousands of VPCs. AWS Transit Gateway helps scale your workloads and vastly simplifies how you connect your AWS networks. AWS Transit Gateway also makes it easier to connect your on-premises networks across those VPCs. Using secure operational controls, you can implement and maintain centralized policies to connect Amazon VPCs with each other and with your on-premises networks. This session will enable you to get started quickly and get an insight into the various capabilities that AWS Transit Gateway introduces.
[NEW LAUNCH!] AWS Transit Gateway and Transit VPCs, Reference Architectures for Many VPCs (NET402) In this session, we will review the new AWS Transit Gateway and new networking features. We compare AWS Transit Gateway and Transit VPCs and discuss how to architect your accounts and VPCs. This session will be helpful if the developers have been let loose, and you are planning lots of VPCs or accounts. How should you connect them; what limits do you need to be aware of; and how does routing work with many VPCs? We dive into the details of recent launches and how to work with concepts like Transit VPCs, account strategies, scaling services, using firewalls, and direct connect gateways to solve problems of many VPCs.
As Moody's AWS presence continues to grow, automation becomes a critical tool that can facilitate the rapid onboarding of new applications, VPCs, and acquisitions while ensuring they are secured appropriately. Moody's has chosen Terraform as their tool of choice to define and deploy their application and security infrastructure for a range of different use cases on AWS. In this session, we dive deep into a range of automation use cases, including: AWS infrastructure creation; deployment of a shared service environment; onboarding of new/existing lines-of-business; and Integration with threat intelligence services such as Amazon GuardDuty. This session is brought to you by AWS partner, Palo Alto Networks.
Making decisions today for tomorrow's technology-from DNS to AWS Direct Connect, ELBs to ENIs, VPCs to VPNs, the Cloud Network Engineering team at Netflix are resident subject matter experts for a myriad of AWS resources. Learn how a cross-functional team automates and manages an infrastructure that services over 125 million customers while evaluating new features that enable us to continue to grow through our next 100 million customers and beyond.
Simon shares a great list of new capabilities for customers! Chapters: 00:00- 00:08 Opening 00:09 - 10:50 Compute 10:51 - 25:50 Database and Storage 25:51 - 28:25 Network 28:26 - 35:01 Development 35:09 - 39:03 AI/ML 39:04 - 45:04 System Management and Operations 45:05 - 46:18 Identity 46:19 - 48:05 Video Streaming 48:06 - 49:14 Public Datasets 49:15 - 49:54 AWS Marketplace 49:55 - 51:03 YubiKey Support for MFA 51:04 - 51:18 Closing Shownotes: Amazon EC2 F1 Instance Expands to More Regions, Adds New Features, and Improves Development Tools | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-ec2-f1-instance-expands-to-more-regions-adds-new-features-and-improves-development-tools/ Amazon EC2 F1 instances now Available in an Additional Size | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-ec2-f1-instances-now-available-in-an-additional-size/ Amazon EC2 R5 and R5D instances now Available in 8 Additional AWS Regions | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-ec2-r5-and-r5d-instances-now-available-in-8-additional-aws-regions/ Introducing Amazon EC2 High Memory Instances with up to 12 TB of memory, Purpose-built to Run Large In-memory Databases, like SAP HANA | https://aws.amazon.com/about-aws/whats-new/2018/09/introducing-amazon-ec2-high-memory-instances-purpose-built-to-run-large-in-memory-databases/ Introducing a New Size for Amazon EC2 G3 Graphics Accelerated Instances | https://aws.amazon.com/about-aws/whats-new/2018/10/introducing-a-new-size-for-amazon-ec2-g3-graphics-accelerated-instances/ Amazon EC2 Spot Console Now Supports Scheduled Scaling for Application Auto Scaling | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-ec2-spot-console-now-supports-scheduled-scaling-for-application-auto-scaling/ Amazon Linux 2 Now Supports 32-bit Applications and Libraries | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-linux-2-now-supports-32-bit-applications-and-libraries/ AWS Server Migration Service Adds Support for Migrating Larger Data Volumes | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-server-migration-service-adds-support-for-migrating-larger-data-volumes/ AWS Migration Hub Saves Time Migrating with Application Migration Status Automation | https://aws.amazon.com/about-aws/whats-new/2018/10/aws_migration_hub_saves_time_migrating_with_application_migration_status_automation/ Plan Your Migration with AWS Application Discovery Service Data Exploration | https://aws.amazon.com/about-aws/whats-new/2018/09/plan-your-migration-with-aws-application-discovery-service-data-exploration/ AWS Lambda enables functions that can run up to 15 minutes | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-supports-functions-that-can-run-up-to-15-minutes/ AWS Lambda announces service level agreement | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-lambda-introduces-service-level-agreement/ AWS Lambda Console Now Enables You to Manage and Monitor Serverless Applications | https://aws.amazon.com/about-aws/whats-new/2018/08/aws-lambda-console-enables-managing-and-monitoring/ Amazon EKS Enables Support for Kubernetes Dynamic Admission Controllers | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-eks-enables-support-for-kubernetes-dynamic-admission-cont/ Amazon EKS Simplifies Cluster Setup with update-kubeconfig CLI Command | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-eks-simplifies-cluster-setup-with-update-kubeconfig-cli-command/ Amazon Aurora Parallel Query is Generally Available | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-aurora-parallel-query-is-generally-available/ Amazon Aurora Now Supports Stopping and Starting of Database Clusters | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-aurora-stop-and-start/ Amazon Aurora Databases Support up to Five Cross-Region Read Replicas | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-aurora-databases-support-up-to-five-cross-region-read-replicas/ Amazon RDS Now Provides Database Deletion Protection | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-rds-now-provides-database-deletion-protection/ Announcing Managed Databases for Amazon Lightsail | https://aws.amazon.com/about-aws/whats-new/2018/10/announcing-managed-databases-for-amazon-lightsail/ Amazon RDS for MySQL and MariaDB now Support M5 Instance Types | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-rds-for-mysql-and-mariadb-support-m5-instance-types/ Amazon RDS for Oracle Now Supports Database Storage Size up to 32TiB | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-rds-for-oracle-now-supports-32tib/ Specify Parameter Groups when Restoring Amazon RDS Backups | https://aws.amazon.com/about-aws/whats-new/2018/10/specify-parameter-groups-when-restoring-amazon-rds-backups/ Amazon ElastiCache for Redis adds read replica scaling for Redis Cluster | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-elasticache-for-redis-adds-read-replica-scaling-for-redis-cluster/ Amazon Elasticsearch Service now supports encrypted communication between Elasticsearch nodes | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon_elasticsearch_service_now_supports_encrypted_communication_between_elasticsearch_nodes/ Amazon Athena adds support for Creating Tables using the results of a Select query (CTAS) | https://aws.amazon.com/about-aws/whats-new/2018/10/athena_ctas_support/ Amazon Redshift announces Query Editor to run queries directly from the AWS Management Console | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon_redshift_announces_query_editor_to_run_queries_directly_from_the_aws_console/ Support for TensorFlow and S3 select with Spark on Amazon EMR release 5.17.0 | https://aws.amazon.com/about-aws/whats-new/2018/09/support-for-tensorflow-s3-select-with-spark-on-amazon-emr-release-517/ AWS Database Migration Service Makes It Easier to Migrate Cassandra Databases to Amazon DynamoDB | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-dms-aws-sct-now-support-the-migration-of-apache-cassandra-databases/ The Data Lake Solution Now Integrates with Microsoft Active Directory | https://aws.amazon.com/about-aws/whats-new/2018/09/the-data-lake-solution-now-integrates-with-microsoft-active-directory/ Amazon S3 Announces Selective Cross-Region Replication Based on Object Tags | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-s3-announces-selective-crr-based-on-object-tags/ AWS Storage Gateway Is Now Available as a Hardware Appliance | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-storage-gateway-is-now-available-as-a-hardware-appliance/ AWS PrivateLink now supports access over AWS VPN | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-privatelink-now-supports-access-over-aws-vpn/ AWS PrivateLink now supports access over Inter-Region VPC Peering | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-privatelink-now-supports-access-over-inter-region-vpc-peering/ Network Load Balancer now supports AWS VPN | https://aws.amazon.com/about-aws/whats-new/2018/09/network-load-balancer-now-supports-aws-vpn/ Network Load Balancer now supports Inter-Region VPC Peering | https://aws.amazon.com/about-aws/whats-new/2018/10/network-load-balancer-now-supports-inter-region-vpc-peering/ AWS Direct Connect now Supports Jumbo Frames for Amazon Virtual Private Cloud Traffic | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-direct-connect-now-supports-jumbo-frames-for-amazon-virtual-private-cloud-traffic/ Amazon CloudFront announces two new Edge locations, including its second location in Fujairah, United Arab Emirates | https://aws.amazon.com/about-aws/whats-new/2018/10/cloudfront-fujairah/ AWS CodeBuild Now Supports Building Bitbucket Pull Requests | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-codebuild-now-supports-building-bitbucket-pull-requests/ AWS CodeCommit Supports New File and Folder Actions via the CLI and SDKs | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-codecommit-supports-new-file-and-folder-actions-via-the-cli-and-sdks/ AWS Cloud9 Now Supports TypeScript | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-cloud9-now-supports-typescript/ AWS CloudFormation coverage updates for Amazon API Gateway, Amazon ECS, Amazon Aurora Serverless, Amazon ElastiCache, and more | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-cloudformation-coverage-updates-for-amazon-api-gateway--amaz/ AWS Elastic Beanstalk adds support for T3 instance and Go 1.11 | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-elastic-beanstalk-adds-support-for-t3-instance-and-go-1-11/ AWS Elastic Beanstalk Console Supports Network Load Balancer | https://aws.amazon.com/about-aws/whats-new/2018/10/aws_elastic_beanstalk_console_supports_network_load_balancer/ AWS Amplify Announces Vue.js Support for Building Cloud-powered Web Applications | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-amplify-announces-vuejs-support-for-building-cloud-powered-web-applications/ AWS Amplify Adds Support for Securely Embedding Amazon Sumerian AR/VR Scenes in Web Applications | https://aws.amazon.com/about-aws/whats-new/2018/09/AWS-Amplify-adds-support-for-securely-embedding-Amazon-Sumerian/ Amazon API Gateway adds support for multi-value parameters | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-api-gateway-adds-support-for-multi-parameters/ Amazon API Gateway adds support for OpenAPI 3.0 API specification | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-api-gateway-adds-support-for-openapi-3-api-specification/ AWS AppSync Launches a Guided API Builder for Mobile and Web Apps | https://aws.amazon.com/about-aws/whats-new/2018/09/AWS-AppSync-launches-a-guided-API-builder-for-apps/ Amazon Polly Adds Mandarin Chinese Language Support | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-polly-adds-mandarin-chinese-language-support/ Amazon Comprehend Extends Natural Language Processing for Additional Languages and Region | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon_comprehend_extends_natural_language_processing_for_additional_languages_and_region/ Amazon Transcribe Supports Deletion of Completed Transcription Jobs | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon_transcribe_supports_deletion_of_completed_transcription_jobs/ Amazon Rekognition improves the accuracy of image moderation | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-rekognition-improves-the-accuracy-of-image-moderation/ Save time and money by filtering faces during indexing with Amazon Rekognition | https://aws.amazon.com/about-aws/whats-new/2018/09/save-time-and-money-by-filtering-faces-during-indexing-with-amazon-rekognition/ Amazon SageMaker Now Supports Tagging for Hyperparameter Tuning Jobs | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-sagemaker-now-supports-tagging-for-hyperparameter-tuning-/ Amazon SageMaker Now Supports an Improved Pipe Mode Implementation | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-sagemaker-now-supports-an-improved-pipe-mode-implementati/ Amazon SageMaker Announces Enhancements to its Built-In Image Classification Algorithm | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-sagemaker-announces-enhancements-to-its-built-in-image-cl/ AWS Glue now supports connecting Amazon SageMaker notebooks to development endpoints | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-glue-now-supports-connecting-amazon-sagemaker-notebooks-to-development-endpoints/ AWS Glue now supports resource-based policies and resource-level permissions for the AWS Glue Data Catalog | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-glue-now-supports-resource-based-policies-and-resource-level-permissions-and-for-the-AWS-Glue-Data-Catalog/ Resource Groups Tagging API Supports Additional AWS Services | https://aws.amazon.com/about-aws/whats-new/2018/10/resource-groups-tagging-api-supports-additional-aws-services/ Changes to Tags on AWS Resources Now Generate Amazon CloudWatch Events | https://aws.amazon.com/about-aws/whats-new/2018/09/changes-to-tags-on-aws-resources-now-generate-amazon-cloudwatch-events/ AWS Systems Manager Announces Enhanced Compliance Dashboard | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-systems-manager-announces-enhanced-compliance-dashboard/ Conditional Branching Now Supported in AWS Systems Manager Automation | https://aws.amazon.com/about-aws/whats-new/2018/09/Conditional_Branching_Now_Supported_in_AWS_Systems_Manager_Automation/ AWS Systems Manager Launches Custom Approvals for Patching | https://aws.amazon.com/about-aws/whats-new/2018/10/AWS_Systems_Manager_Launches_Custom_Approvals_for_Patching/ Amazon CloudWatch adds Ability to Build Custom Dashboards Outside the AWS Console | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-cloudwatch-adds-ability-to-build-custom-dashboards-outside-the-aws-console/ Amazon CloudWatch Agent adds Custom Metrics Support | https://aws.amazon.com/about-aws/whats-new/2018/09/amazon-cloudwatch-agent-adds-custom-metrics-support/ Amazon CloudWatch Launches Client-side Metric Data Aggregations | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-cloudWatch-launches-client-side-metric-data-aggregations/ AWS IoT Device Management Now Provides In Progress Timeouts and Step Timeouts for Jobs | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-iot-device-management-now-provides-in-progress-timeouts-and-step-timeouts-for-jobs/ Amazon GuardDuty Provides Customization of Notification Frequency to Amazon CloudWatch Events | https://aws.amazon.com/about-aws/whats-new/2018/10/amazon-guardduty-provides-customization-of-notification-frequency-to-amazon-cloudwatch-events/ AWS Managed Microsoft AD Now Offers Additional Configurations to Connect to Your Existing Microsoft AD | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-managed-microsoft-ad-now-offers-additional-configurations-to-connect-to-our-existing-microsoft-ad/ Easily Deploy Directory-Aware Workloads in Multiple AWS Accounts and VPCs by Sharing a Single AWS Managed Microsoft AD | https://aws.amazon.com/about-aws/whats-new/2018/09/aws-directory-service-share-directory-across-accounts-and-vpcs/ AWS Single Sign-on Now Enables You to Customize the User Experience to Business Applications | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-single-sign-on-now-enables-you-to-customize-the-user-experience-to-business-applications/ Live Streaming on AWS Now Features AWS Elemental MediaLive and MediaPackage | https://aws.amazon.com/about-aws/whats-new/2018/09/live-streaming-on-aws-now-features-aws-elemental-medialive-and-mediapackage/ AWS Elemental MediaStore Increases Object Size Limit to 25 Megabytes | https://aws.amazon.com/about-aws/whats-new/2018/10/aws-elemental-mediastore-increase-object-size-limit-to-25-megabytes/ Amazon Kinesis Video Streams now supports adding and retrieving Metadata at Fragment-Level | https://aws.amazon.com/about-aws/whats-new/2018/10/kinesis-video-streams-fragment-level-metadata-support/ AWS Public Datasets Now Available from the German Meteorological Office, Broad Institute, Chan Zuckerberg Biohub, fast.ai, and Others | https://aws.amazon.com/about-aws/whats-new/2018/10/public-datasets/ Customize Your Payment Frequency and More with AWS Marketplace Flexible Payment Scheduler | https://aws.amazon.com/about-aws/whats-new/2018/10/customize-your-payment-frequency-and-more-with-awsmarketplace-flexible-payment-scheduler/ Sign in to your AWS Management Console with YubiKey Security Key for Multi-factor Authentication (MFA) | https://aws.amazon.com/about-aws/whats-new/2018/09/aws_sign_in_support_for_yubikey_security_key_as_mfa/
Did you know that AWS PrivateLink provides private connectivity between VPCs, AWS services, and on-premises applications, securely on the Amazon network? Learn how in this episode! https://aws.amazon.com/privatelink/
As more customers adopt Amazon VPC architectures, the features and flexibility of the service are squaring off against evolving design requirements. This session follows this evolution of a single regional VPC into a multi-VPC, multi-region design with diverse connectivity into on-premises systems and infrastructure. Along the way, we investigate creative customer solutions for scaling and securing outbound VPC traffic, securing private access to Amazon S3, managing multi-tenant VPCs, integrating existing customer networks through AWS Direct Connect, and building a full VPC mesh network across global regions.
As customers put more workloads into AWS, the number of VPCs a customer needs to manage also grows. VPCs can exist across geographically disparate AWS Regions, or run in separate AWS accounts connecting to a common VPC that serves as a global network transit center. This session shows you how to implement a networking construct that AWS calls a transit VPC using the Cisco Cloud Services Router. This network topology simplifies network management and minimizes the number of connections that you need to set up and manage. Even better, it is implemented virtually and doesn't require physical network gear or a physical presence in a colocation transit hub. Come hear why customers are procuring this service through AWS Marketplace and how customers are using the transit VPC for private networking, shared connectivity, and cross-account AWS usage.
In this session, we walk through the fundamentals of Amazon VPC. First, we cover build-out and design fundamentals for VPCs, including picking your IP space, subnetting, routing, security, NAT, and much more. We then transition into different approaches and use cases for optionally connecting your VPC to your physical data center with VPN or AWS Direct Connect. This mid-level architecture discussion is aimed at architects, network administrators, and technology decision-makers interested in understanding the building blocks that AWS makes available with Amazon VPC. Learn how you can connect VPCs with your offices and current data center footprint.
Many customers are hesitant to adopt SaaS solutions due to the concerns on the safety of the network connectivity traversing internet. It is also difficult to manage the firewall rules, NAT Gateway or VPN connections. AWS PrivateLink provided solution that let our customers' applications, whether in a VPC or in their own data center, to connect to SaaS solutions in a highly scalable and highly available manner, while keeping all the network traffic within the AWS network.
PrivateLink provides private connectivity between VPCs, AWS Services and on-premises applications. Built with the same underlying technology that powers NAT Gateway, Network Load Balancer and AWS Service endpoints, PrivateLink is now available for use with your own applications. In the session, we'll do a deep dive into the underlying network technology that is used by PrivateLink and explore how PrivateLink can be deployed to improve your network topologies and application architectures. We'll also look at how PrivateLink improves micro-service architectures, allowing for services to be vended between AWS accounts and over DirectConnect connections.
This session focuses on best practices for connectivity between many virtual private clouds (VPCs), including the Transit VPC. We review how the Transit VPC works and use cases for centralization, network security, and connectivity. We include best practices for multiple accounts, multiple regions, and designing for scale. In addition, we also review some of the variants and extensions to the Transit VPC, including how to customize your own.
Simon takes a walk through LOTS of the updates that have been happening for AWS Customers. Shownotes New – Per-Second Billing for EC2 Instances and EBS Volumes - AWS Blog | https://aws.amazon.com/blogs/aws/new-per-second-billing-for-ec2-instances-and-ebs-volumes/ Amazon Virtual Private Cloud (VPC) now allows customers to expand their existing VPCs | https://aws.amazon.com/about-aws/whats-new/2017/08/amazon-virtual-private-cloud-vpc-now-allows-customers-to-expand-their-existing-vpcs/ New – Descriptions for Security Group Rules - AWS Blog | https://aws.amazon.com/blogs/aws/new-descriptions-for-security-group-rules/ New – Stop & Resume Workloads on EC2 Spot Instances - AWS Blog | https://aws.amazon.com/blogs/aws/new-stop-resume-workloads-on-ec2-spot-instances/ Amazon VPC NAT Gateways now support Amazon CloudWatch Monitoring and Resource Tagging | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-vpc-nat-gateways-now-support-amazon-cloudwatch-monitoring-and-resource-tagging/ AWS VPN Update – Custom PSK, Inside Tunnel IP, and SDK update | https://aws.amazon.com/about-aws/whats-new/2017/10/aws-vpn-update-custom-psk-inside-tunnel-ip-and-sdk-update/ Elasticsearch 5.5 now available on Amazon Elasticsearch Service | https://aws.amazon.com/about-aws/whats-new/2017/09/elasticsearch-5_5-now-available-on-amazon-elasticsearch-service/ Amazon Route 53 Traffic Flow Announces Support For Geoproximity Routing With Traffic Biasing | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-route-53-traffic-flow-announces-support-for-geoproximity-routing-with-traffic-biasing/ New Network Load Balancer – Effortless Scaling to Millions of Requests per Second - AWS Blog | https://aws.amazon.com/blogs/aws/new-network-load-balancer-effortless-scaling-to-millions-of-requests-per-second/ Now Available – EC2 Instances with 4 TB of Memory - AWS Blog | https://aws.amazon.com/blogs/aws/now-available-ec2-instances-with-4-tb-of-memory/ Use OpenCL Development Environment with Amazon EC2 F1 FPGA Instances to accelerate your C/C++ applications, also F1 instances are now available in US West (Oregon) and EU (Ireland) Regions | https://aws.amazon.com/about-aws/whats-new/2017/09/use-opencl-development-environment-with-amazon-ec2-f1-fpga-instances-to-accelerate-your-c-c-plus-plus-applications-also-f1-instances-are-now-available-in-us-west-oregon-and-eu-ireland-regions/ Announcing: React Native Starter Project with One-Click AWS Deployment and Serverless Infrastructure - AWS Mobile Blog | https://aws.amazon.com/blogs/mobile/announcing-react-native-starter-project-with-one-click-aws-deployment-and-serverless-infrastructure/ Announcing enhancements to the Amazon Lex test console | https://aws.amazon.com/about-aws/whats-new/2017/09/announcing-enhancements-to-the-amazon-lex-test-console/ Announcing support for synonyms and slot value validation on Amazon Lex | https://aws.amazon.com/about-aws/whats-new/2017/08/announcing-support-for-synonyms-and-slot-value-validation-on-amazon-lex/ Now Specify Request Level Attributes with Amazon Lex | https://aws.amazon.com/about-aws/whats-new/2017/09/now-specify-request-level-attributes-with-amazon-lex/ New Amazon Lex Built-in Slot Types for Phone numbers, Speed, and Weight, Available in Preview | https://aws.amazon.com/about-aws/whats-new/2017/09/new-amazon-lex-built-in-slot-types-for-phone-numbers-speed-and-weight-available-in-preview/ Export your Amazon Lex chatbot to the Alexa Skills Kit | https://aws.amazon.com/about-aws/whats-new/2017/09/export-your-amazon-lex-chatbot-to-the-alexa-skills-kit/ Apple Core ML and Keras Support Now Available for Apache MXNet - AWS AI Blog | https://aws.amazon.com/blogs/ai/apple-core-ml-and-keras-support-now-available-for-apache-mxnet/ AWS CodePipeline now provides notifications on pipeline, stage, and action status changes | https://aws.amazon.com/about-aws/whats-new/2017/09/aws-codepipeline-now-provides-notifications-on-pipeline-stage-and-action-status-changes/ Amazon Pinpoint Introduces Two-Way Text Messaging | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-pinpoint-introduces-two-way-text-messaging/ Amazon Cognito Integrates with Amazon Pinpoint to Add Analytics for User Pools and Enrich Pinpoint Campaigns | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-cognito-integrates-with-amazon-pinpoint-to-add-analytics-for-user-pools-and-enrich-pinpoint-campaigns/ Amazon Redshift now supports late-binding views referencing Amazon Redshift and Redshift Spectrum external tables | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-redshift-now-supports-late-binding-views-referencing-amazon-redshift-and-redshift-spectrum-external-tables/ Custom Artifacts on AWS Device Farm - AWS Mobile Blog | https://aws.amazon.com/blogs/mobile/custom-artifacts-on-aws-device-farm/ Amazon Aurora Can Migrate Encrypted Databases from Amazon RDS for MySQL | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-aurora-can-migrate-encrypted-databases-from-amazon-rds-for-mysql/ Amazon EC2 Systems Manager Adds Raspbian OS and Raspberry Pi Support | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-ec2-systems-manager-adds-raspbian-os-and-raspberry-pi-support/ AWS Greengrass is available in Asia Pacific (Tokyo) Region. | https://aws.amazon.com/about-aws/whats-new/2017/09/aws-greengrass-is-available-in-asia-pacific-tokyo-region/ AWS CloudTrail Enables Option to Add All Amazon S3 Buckets to Data Events | https://aws.amazon.com/about-aws/whats-new/2017/09/aws-cloudtrail-enables-option-to-add-all-amazon-s3-buckets-to-data-events/ Amazon Kinesis Analytics improves application performance for high volume data streams | https://aws.amazon.com/about-aws/whats-new/2017/09/amazon-kinesis-analytics-improves-application-performance-for-high-volume-data-streams/ New Kinesis Analytics stream processing functions for time series analytics, real time sessionization, and more | https://aws.amazon.com/about-aws/whats-new/2017/09/new-kinesis-analytics-stream-processing-functions-for-time-series-analytics-real-time-sessionization-and-more/ AWS CloudFormation provides Stack Termination Protection | https://aws.amazon.com/about-aws/whats-new/2017/09/aws-cloudformation-provides-stack-termination-protection/
As more customers adopt Amazon VPC architectures, the features and flexibility of the service are squaring off against evolving design requirements. This session follows this evolution of a single regional VPC into a multi-VPC, multi-region design with diverse connectivity into on-premises systems and infrastructure. Along the way, we investigate creative customer solutions for scaling and securing outbound VPC traffic, securing private access to Amazon S3, managing multi-tenant VPCs, integrating existing customer networks through AWS Direct Connect, and building a full VPC mesh network across global regions.
Customers from over all over the world streamed forty-two billion hours of Netflix content last year. Various Netflix batch jobs and an increasing number of service applications use containers for their processing. In this session, Netflix presents a deep dive on the motivations and the technology powering container deployment on top of Amazon Web Services. The session covers our approach to resource management and scheduling with the open source Fenzo library, along with details of how we integrate Docker and Netflix container scheduling running on AWS. We cover the approach we have taken to deliver AWS platform features to containers such as IAM roles, VPCs, security groups, metadata proxies, and user data. We want to take advantage of native AWS container resource management using Amazon ECS to reduce operational responsibilities. We are delivering these integrations in collaboration with the Amazon ECS engineering team. The session also shares some of the results so far, and lessons learned throughout our implementation and operations.
Ines Envid, a Product Manager for Cloud Networking, joins the podcast today to tell us how mind blowing Google's network is and how you can make the best of it! Let Francesc and Mark ask all the questions about VPCs, Load Balancers, and Routers you always wanted to know the answer to. About Ines Ines is a product manager in Cloud Networking. She has dedicated her career to in product, and development roles for carrier and enterprise networking infrastructure and applications, from access, edge and backbone cores. Ines is currently leading the Google cloud networking VPC topology and policy product areas. Cool things of the week New undersea cable expands capacity for Google APAC customers and users blog Managing containerized ASP.NET Core apps with Kubernetes blog We're Hiring join us! New undersea cable expands capacity for Google APAC customers and users Interview Google Cloud Networking docs Google Cloud Security docs Google Security Whitepaper research Using Networks and Firewalls docs Google Cloud Load Balancer docs VPCs: Virtual Private Clouds aka Cloud Virtual Networks docs IP addresses, ranges, and subnetworks docs Google Cloud VPN docs Jupiter Rising: A Decade of Clos Topologies and Centralized Control in Google's Datacenter Network research Want to learn the more? Using Google's cloud networking products: a guide to all the guides Question of the week How do I use the Proxy protocol with a network load balancer? How can I know the IP address of the original sender in a TCP connection over Load Balancers? SSL proxy for Google Cloud Load Balancing docs Were will we be? You can find Mark at Connect.Tech in Atlanta from October 20th to the 22nd, and the week after that GAMEACON in Atlantic City. Francesc is working on more episodes of justforfunc before he goes to Brazil next month for GopherCon Brasil and GCPNext Brazil.
VPCs are required, in many cases, to connect with the disparate E911 networks. What do you need to to understand about these services? And most importantly, what is the cost? While a lot of similarity in services exists, this can also create a confusing environment.
Tierärztliche Fakultät - Digitale Hochschulschriften der LMU - Teil 04/07
Up to now, the occult phase of the cardiomyopathy in Doberman Pinschers, with a cumulative prevalence of 63 % in Europe (WESS et al., 2009b), can only be diagnosed by specialized and cost-intensive methods, such as the 24-h-ECG (Holter) and the echocardiogram. Electrocardiographic and echocardiographic changes usually appear at an age, in which the dogs were already used for breeding. Considering the genetic health of the Doberman Pinscher breed, it is of great importance to diagnose the genetically determined cardiomyopathy before it can be passed on to the offspring. The aim of this study Analysis of NTpro-BNP in Dilated Cardiomyopathy of the Doberman Pinscher was to evaluate NTpro-BNP in the Doberman Pinscher population, in healthy animals as well as in various stages of cardiomyopathy, to define reference values and to validate the use of NTpro-BNP as a diagnostic tool. Therefore, 250 healthy and 108 dogs with cardiomyopathy were studied between 2004 and 2008, in total 480 examinations. NTpro-BNP measurements were performed in plasma samples, using the ELISA VETSIGNTM Canine CardioSCREEN Ntpro-BNP, Guildhay Ltd., UK. NTpro-BNP concentrations increased in correlation with the severity of disease. There was a statistically significant difference between the healthy control group (mean 315.7 pmol/l) and the occult and decompensated groups as well as the whole dog population with cardiomyopathy. To differentiate healthy dogs from dogs with cardiomyopathy, a cut-off value of 413.8 pmol/l was established that provided an area under the receiver operator curve of 0.820, with a sensitivity and specifity of 72.4 % and 80.2 %, respectively. There was no significant influence of age, weight or sex on the NTpro-BNP concentration. Sensitivity reached 90.0 % for predicting abnormal echocardiographic changes. Interestingly, there was a statistically significant increase of NTpro-BNP in the occult group with exclusively VPCs even before echocardiographic changes were present. The “still normal” group, consisting of examinations of dogs that appeared to be healthy without any measurable signs at that time, but which later on developed cardiomyopathy, is highly interesting. Although the mean NTpro-BNP level (499.7 pmol/l) was higher than the mean NTpro-BNP concentration in the occult group with exclusively VPCs, no statistically significant difference was achieved in comparision to the healthy control group using Bonferroni’s equitation. The small number of only 17 dogs in this group might be a reason for the lack of statistical significance. One limitation of this study is the fact that not all dogs, especially those in the healthy control group, could be followed up to the end of their lives. Therefore, it might be possible that “still healthy” dogs with genetically determined cardiomyopathy are included in the healthy control group. As a consequence, the reference value could be falsely high. Long-term studies will be needed to validate the reference value in a group of definitively genetically healthy animals. The tendency to higher NTpro-BNP values in the still normal group also should be studied in a larger population in order to evaluate the potential of NTpro-BNP as an early marker of cardiomyopathy in Doberman Pinschers. NTpro-BNP cannot replace the Holter ECG and echocardiographic measurements in the diagnosis of occult cardiomyopathy at present. But if the time-consuming and cost-intensive methods that require a high specialization are not available, NTpro-BNP levels could indicate the necessity of referring the patient to a cardiologist. Because of the high sensitivity of 90 % for predicting abnormal echocardiographic changes, it might be possible that the measurement of NTpro-BNP eventually could replace the echocardiographic examination, but not the Holter-ECG.