POPULARITY
A panel discussion with AI industry leaders revealing how enterprises are scaling AI today, with predictions on coming breakthroughs for AI and the impact on Fortune 500 companies and beyond.Topics Include:Three technical leaders discuss production challenges: security, interoperability, and scaling agentic systemsPanelists represent Enkrypt (security), Anyscale (infrastructure), and CrewAI (agent orchestration platforms)Industry moving from flashy demos to dependable agents with real business outcomesBreakthrough examples include 70-page IRS form processing and multimodal workflow automationMultimodal data integration becoming crucial - incorporating video, audio, screenshots into decisionsLess than 10% of future applications expected to be text-onlyCompanies shifting from experimenting with individual models to deploying agent networksNeed for governance frameworks as enterprises scale to hundreds of agentsGrowing software stack complexity requires specialized infrastructure between applications and GPUsSecurity teams need centralized visibility across fragmented agent deployments across enterprisesExisting industry regulations apply to AI services - no special AI laws neededInteroperability standards debate: MCP gaining adoption while A2A seems premature solutionMCP shows higher API reliability than OpenAI tool calling for implementationsMultimodal systems more vulnerable to attacks but value proposition too high ignoreFortune 500 company automated price operations approval process using 630 brands data87% of enterprise customers deploy agents in private VPCs or on-premises infrastructureSpecialized AI systems needed to oversee other agents at machine speed scalesCost optimization through model specialization rather than always using most powerful modelsFuture learning may happen through context/prompting rather than traditional weight fine-tuningPredictions include AI meeting moderators and agents working autonomously for hoursParticipants:Robert Nishihara - Co-founder, AnyscaleJoão Moura - CEO, CrewAISahil Agarwal - Co-Founder & CEO, Enkrypt AIJillian D'Arcy - Sr. ISV Sales Leader, Amazon Web ServicesFurther Links:Anyscale – Website | LinkedIn | AWS MarketplaceCrewAI - Website | LinkedIn | AWS MarketplaceEnkrypt AI - Website | LinkedIn | AWS MarketplaceSee how Amazon Web Services gives you the freedom to migrate, innovate, and scale your software company at https://aws.amazon.com/isv/
Send us a textWhat happens when three major cloud providers each reimagine network design from scratch? You get three completely different approaches to solving the same fundamental problem.The foundation of cloud networking begins with the virtual containers that hold your resources: AWS's Virtual Private Clouds (VPCs), Azure's Virtual Networks (VNets), and Google Cloud's VPCs (yes, the same name, very different implementation). While they all serve the same basic purpose—providing logical isolation for your workloads—their design philosophies reveal profound differences in how each provider expects you to architect your solutions.AWS took the explicit control approach. When you create subnets within an AWS VPC, you must assign each to a specific Availability Zone. This creates a vertical architecture pattern where you're deliberately placing resources in specific physical locations and designing resilience across those boundaries. Network engineers often find this intuitive because it matches traditional fault domain thinking. However, this design means you must account for cross-AZ data transfer costs and explicit resiliency patterns.Azure flipped the script with their horizontal approach. By default, subnets span across all AZs in a region, with Microsoft's automation handling the resilience for you. This "let us handle the complexity" philosophy makes initial deployment simpler but provides less granular control. Meanwhile, Google Cloud went global, allowing a single VPC to span regions worldwide—an approach that simplifies global connectivity but introduces new challenges for security segmentation.These architectural differences aren't merely academic—they fundamentally change how you design for resilience, manage costs, and implement security. The cloud introduced "toll booth" pricing for data movement, where crossing availability zones or regions incurs charges that didn't exist in traditional data centers. Understanding these nuances is crucial whether you're migrating existing networks or designing new ones.Want to dive deeper into cloud networking concepts? Let us know what topics you'd like us to cover next as we explore how traditional networking skills translate to the cloud world.Purchase Chris and Tim's new book on AWS Cloud Networking: https://www.amazon.com/Certified-Advanced-Networking-Certification-certification/dp/1835080839/ Check out the Fortnightly Cloud Networking Newshttps://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/Visit our website and subscribe: https://www.cables2clouds.com/Follow us on BlueSky: https://bsky.app/profile/cables2clouds.comFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatj
Send us a textThe story of cloud networking rarely gets told from the perspective of those building it inside unicorn startups, but that's exactly what this episode delivers. Richard Olson, cloud networking expert at Canva, takes us behind the scenes of building network infrastructure for one of the world's fastest-growing SaaS platforms.Richard's fascinating career journey began with literally throwing rocks with phone lines into trees during his military service, progressing through network operations centers and pre-sales engineering before landing at AWS and eventually Canva. His unique perspective bridges traditional networking expertise with cloud-native development approaches.Unlike enterprises migrating from legacy environments, Canva started entirely in the cloud with minimal networking considerations. Richard explains how this trajectory created different challenges - starting with overlapping 10.0.0.0/16 addresses across development environments and evolving to hundreds of VPCs requiring sophisticated connectivity solutions. By mid-2022, these networking challenges had grown complex enough to warrant forming a dedicated cloud networking team, which Richard helped establish.The conversation takes a deep technical turn exploring Kubernetes networking challenges that even experienced network engineers might not anticipate. Richard explains why "Kubernetes eats IP addresses for breakfast" in cloud environments, detailing the complex interaction between VPC CIDR allocations, prefix delegations, and worker node configurations that can quickly exhaust even large IP spaces. This pressure is finally creating compelling business cases for IPv6 adoption after decades of slow uptake.Whether you're managing cloud infrastructure today or planning your organization's network strategy for tomorrow, this episode offers invaluable insights into the evolution and challenges of cloud networking at unicorn scale. Listen now to understand why companies are increasingly forming dedicated cloud networking teams and the unique skill sets they require.Connect with Richard:https://www.linkedin.com/in/richard-olson-auCheck out the Fortnightly Cloud Networking Newshttps://docs.google.com/document/d/1fkBWCGwXDUX9OfZ9_MvSVup8tJJzJeqrauaE6VPT2b0/Visit our website and subscribe: https://www.cables2clouds.com/Follow us on Twitter: https://twitter.com/cables2cloudsFollow us on YouTube: https://www.youtube.com/@cables2clouds/Follow us on TikTok: https://www.tiktok.com/@cables2cloudsMerch Store: https://store.cables2clouds.com/Join the Discord Study group: https://artofneteng.com/iaatj
In this episode, Meg Ashby, a senior cloud security engineer shares how her team tackled AWS's centralized VPC interface endpoints, a design often seen as an anti-pattern. She explains how they turned this unconventional approach into a cost-efficient and scalable solution, all while maintaining granular controls and network visibility. She shares why centralized VPC endpoints are considered an AWS anti-pattern, how to implement granular IAM controls in a centralized model and the challenges of monitoring and detecting VPC endpoint traffic. Guest Socials: Meg's Linkedin Podcast Twitter - @CloudSecPod If you want to watch videos of this LIVE STREAMED episode and past episodes - Check out our other Cloud Security Social Channels: - Cloud Security Podcast- Youtube - Cloud Security Newsletter - Cloud Security BootCamp Questions asked: (00:00) Introduction (02:48) A bit about Meg Ashby (03:44) What is VPC interface endpoints? (05:26) Egress and Ingress for Private Networks (08:21) Reason for using VPC endpoints (14:22) Limitations when using centralised endpoint VPCs (19:01) Marrying VPC endpoint and IAM policy (21:34) VPC endpoint specific conditions (27:52) Is this solution for everyone? (38:16) Does VPC endpoint have logging? (41:24) Improvements for the next phase Thank you to our episode sponsor Wiz. Cloud Security Podcast listeners can also get a free cloud security health scan by going to wiz.io/csp
In this episode, David Lynam provides an overview of AWS Transit Gateway, which aims to simplify complex network connectivity between VPCs, VPNs, and on-premises networks. We discuss the limitations of using VPC peering and the benefits Transit Gateway provides through its hub-and-spoke model. The main components of Transit Gateway are explained, including attachments, route tables, associations, and route propagation. We go through some example use cases like sharing Transit Gateways across accounts, network isolation for compliance, routing traffic through security services, and bandwidth/scaling capabilities. In this episode, we mentioned the following resources: - How Amazon VPC Transit Gateways work Do you have any AWS questions you would like us to address? Leave a comment here or connect with us on X/Twitter: - https://twitter.com/eoins - https://twitter.com/loige
Main Themes: The rise of multicloud 2.0: Organizations are moving beyond a single primary cloud and embracing a true multicloud strategy to leverage best-of-breed services from different providers. Kubernetes networking and security challenges: Multicloud Kubernetes deployments face issues with IP address exhaustion, overlapping IPs, egress security, and high-bandwidth secure inter-cluster connectivity. Aviatrix solutions for multicloud Kubernetes: Aviatrix offers a controller-based, intent-based networking and security platform that addresses these challenges with dynamic segmentation, secure egress, and hybrid connectivity. Key Ideas and Facts: Multicloud 2.0: Shifting landscape: The cloud landscape has evolved significantly in the 18 years since AWS launched. Organizations now have access to hyperscalers, regional clouds, and specialized clouds. True multicloud strategy: Organizations are adopting a true multicloud strategy to leverage the unique strengths of different cloud providers and enable developers to build better applications and services. Cloud 2.0: Many organizations are calling this shift "Cloud 2.0," driven by the need for distributed data, models, and applications, especially with the rise of GenAI and AI/ML applications. Kubernetes Networking and Security Challenges: IP address exhaustion: Kubernetes is "IP hungry," leading to IP address exhaustion and challenges with overlapping IPs, especially in large deployments with thousands of VPCs. Egress security: Millions of VPCs have weak or non-existent egress security, posing a significant risk to sensitive data. Inter-cluster connectivity: Establishing high-bandwidth, secure connectivity between Kubernetes clusters across different clouds and on-premises environments is complex and challenging. Aviatrix Solutions: Controller-based, intent-based networking: Aviatrix provides a centralized multicloud controller and uses intent-based policies to dynamically segment and secure traffic across Kubernetes clusters, regardless of the underlying IP addresses. Secure egress: Aviatrix replaces traditional NAT gateways with secure Aviatrix gateways, offering embedded NAT, visibility, and granular egress security policies based on Kubernetes resources. Dynamic scaling: Aviatrix automatically discovers and incorporates new Kubernetes resources into security policies as clusters scale up or down, eliminating manual configuration and ensuring consistent security. Hybrid connectivity: Aviatrix facilitates secure connectivity between cloud Kubernetes clusters and on-premises environments, including edge locations, enabling hybrid deployments for AI/ML and other workloads. Customer Success: Large-scale deployments: Aviatrix has customers with thousands of island VPCs and overlapping IP spaces, successfully using its platform to manage their multicloud Kubernetes environments. Operational efficiency: Aviatrix simplifies operations with its controller-based approach, dynamic policy updates, and world-class SRE team handling upgrades and troubleshooting. Key Quotes: Anirban Sengupta (Aviatrix): "Today every organization should embrace multicloud. That's the best way to get ahead with their competitors and help their developers." Anirban Sengupta (Aviatrix): "Networking and security should be top of mind... without connectivity and without security, you really can't have a multicloud strategy." Anirban Sengupta (Aviatrix): "Kubernetes is very IP hungry. There is exhaustion, IP address exhaustion is the key." Call to Action: Organizations looking to embrace a true multicloud strategy and overcome the networking and security challenges of Kubernetes should consider Aviatrix's controller-based platform. Contact Aviatrix for a demo and learn how their solutions can help you achieve secure and efficient multicloud Kubernetes deployments.
Main Themes: The rise of multicloud 2.0: Organizations are moving beyond a single primary cloud and embracing a true multicloud strategy to leverage best-of-breed services from different providers. Kubernetes networking and security challenges: Multicloud Kubernetes deployments face issues with IP address exhaustion, overlapping IPs, egress security, and high-bandwidth secure inter-cluster connectivity. Aviatrix solutions for multicloud Kubernetes: Aviatrix offers a controller-based, intent-based networking and security platform that addresses these challenges with dynamic segmentation, secure egress, and hybrid connectivity. Key Ideas and Facts: Multicloud 2.0: Shifting landscape: The cloud landscape has evolved significantly in the 18 years since AWS launched. Organizations now have access to hyperscalers, regional clouds, and specialized clouds. True multicloud strategy: Organizations are adopting a true multicloud strategy to leverage the unique strengths of different cloud providers and enable developers to build better applications and services. Cloud 2.0: Many organizations are calling this shift "Cloud 2.0," driven by the need for distributed data, models, and applications, especially with the rise of GenAI and AI/ML applications. Kubernetes Networking and Security Challenges: IP address exhaustion: Kubernetes is "IP hungry," leading to IP address exhaustion and challenges with overlapping IPs, especially in large deployments with thousands of VPCs. Egress security: Millions of VPCs have weak or non-existent egress security, posing a significant risk to sensitive data. Inter-cluster connectivity: Establishing high-bandwidth, secure connectivity between Kubernetes clusters across different clouds and on-premises environments is complex and challenging. Aviatrix Solutions: Controller-based, intent-based networking: Aviatrix provides a centralized multicloud controller and uses intent-based policies to dynamically segment and secure traffic across Kubernetes clusters, regardless of the underlying IP addresses. Secure egress: Aviatrix replaces traditional NAT gateways with secure Aviatrix gateways, offering embedded NAT, visibility, and granular egress security policies based on Kubernetes resources. Dynamic scaling: Aviatrix automatically discovers and incorporates new Kubernetes resources into security policies as clusters scale up or down, eliminating manual configuration and ensuring consistent security. Hybrid connectivity: Aviatrix facilitates secure connectivity between cloud Kubernetes clusters and on-premises environments, including edge locations, enabling hybrid deployments for AI/ML and other workloads. Customer Success: Large-scale deployments: Aviatrix has customers with thousands of island VPCs and overlapping IP spaces, successfully using its platform to manage their multicloud Kubernetes environments. Operational efficiency: Aviatrix simplifies operations with its controller-based approach, dynamic policy updates, and world-class SRE team handling upgrades and troubleshooting. Key Quotes: Anirban Sengupta (Aviatrix): "Today every organization should embrace multicloud. That's the best way to get ahead with their competitors and help their developers." Anirban Sengupta (Aviatrix): "Networking and security should be top of mind... without connectivity and without security, you really can't have a multicloud strategy." Anirban Sengupta (Aviatrix): "Kubernetes is very IP hungry. There is exhaustion, IP address exhaustion is the key." Call to Action: Organizations looking to embrace a true multicloud strategy and overcome the networking and security challenges of Kubernetes should consider Aviatrix's controller-based platform. Contact Aviatrix for a demo and learn how their solutions can help you achieve secure and efficient multicloud Kubernetes deployments.
In this episode, Simon Elisha and Brett Looney dive deep into the AWS Transit Gateway, a cloud-scale router that connects VPCs and other networking resources in AWS. They explain how Transit Gateway works, its advantages over VPC peering, and its scalability and resilience. They also touch on concepts like attachments, route tables, and VRF (virtual routing and forwarding). The conversation highlights the benefits of Transit Gateway for both experienced network administrators and newcomers to networking in the cloud. https://aws.amazon.com/transit-gateway
AWS Morning Brief for the week of Monday, August 12th with Mike Julian. Links:Introducing AWS End User MessagingAmazon EFS now supports up to 30 GiB/s (a 50% increase) of read throughputAmazon RDS for Db2 supports loading data from Amazon S3AWS announces private IPv6 addressing for VPCs and subnetsAnnouncing delegated administrator for Cost Optimization HubOpenSearch optimized instance (OR1) is game changing for indexing performance and cost
This episode discusses solutions for securely accessing private VPC resources for debugging and troubleshooting. We cover traditional approaches like bastion hosts and VPNs and newer solutions using containers and AWS services like Fargate, ECS, and SSM. We explain how to set up a Fargate task with a container image with the necessary tools, enable ECS integration with SSM, and use SSM to start remote shells and port forwarding tunnels into the container. This provides on-demand access without exposing resources on the public internet. We share a Python script to simplify the process. We suggest ideas for improvements like auto-scaling the container down when idle. Overall, this lightweight containerized approach can provide easy access for debugging compared to managing EC2 instances.
Welcome to episode 248 of the CloudPod Podcast – where the forecast is always cloudy! It's the return of our Cloud Journey Series! Plus, today we're talking shared VPCs and why you should avoid them, Amazon’s new data centers ( we think they forgot about the sustainability pledge,) new threats to and from AI, and a quick preview of Next ‘24 programs – plus much more! Titles we almost went with this week: The Cloud Pod Isn't a Basic Bitch New AWS Data Solutions Framework – or – How You Accidentally Spent $100k's A PSA on Shared VPCs in AWS Amazon Doesn't Even Pay Attention to Climate When it’s on a Building Vector Search I Hardly Know Her Google Migs are Less Fun than Russian Migs AI Can Now Attack Us; Who Didn't See That Coming Who is Surprised That AWS is Using More Power Than the Rest of the State of Oregon Spend all the Dinero in Spain A big thanks to this week's sponsor: We're sponsorless this week! Interested in sponsoring us and having access to a specialized and targeted market? We'd love to talk to you. Send us an email or hit us up on our Slack Channel. AI is Going Great (or how ML Makes all Its Money) 01:24 Disrupting malicious uses of AI by state-affiliated threat actors In this week’s chapter of AI nightmares, ChatGPT tells us how they are blocking the usage of AI by state-affiliated threat actors. Awesome; things went from bad to worse in one week. Cool. Cool cool cool. In partnership with Microsoft Threat Intelligence, they have disrupted five state-affiliated actors that sought to use their AI service in support of malicious cyber activities These actors generally sought to use OpenAI services for querying open-source information, translating, finding coding errors, and running basic coding tasks. Charcoal Typhoon (China affiliated) researched various companies and cybersecurity tools, debugged code and generated scripts, and created content likely for use in phishing campaigns. Salmon Typhoon (China affiliated) translated technical papers, retrieved publicly available information on multiple intelligence agencies and regional threat actors, assisted with coding, and researched common ways processes could be hidden on a system. Crimson Sandstorm (Iran affiliated) used OpenAI services for scripting support related to app and web development, generating content likely for spear-phishing campaigns, and researching common ways malware could evade detection. Emerald Sleet (North Korea affiliated) identified experts and organizations focused on defense issues in the Asia-Pacific region, to understand publicly available vulnerabilities, and used OpenAI services for help with basic scripting tasks, and drafting content that could be used in phishing campaigns. Forest Blizzard (Russia-affiliated) primarily for performing research on open-source data into satellite communication protocols and radar imaging technology, as well as for support with scripting tasks. OpenAI says the capabilities of the current models are limited, they believe it’s important to stay ahead of significant and evolving threats. To continue making sure their platform is used for good they have a multi-pronged approach:
Evelyn Osman, Principal Platform Engineer at AutoScout24, joins Corey on Screaming in the Cloud to discuss the dire need for developers to agree on a standardized tool set in order to scale their projects and innovate quickly. Corey and Evelyn pick apart the new products being launched in cloud computing and discover a large disconnect between what the industry needs and what is actually being created. Evelyn shares her thoughts on why viewing platforms as products themselves forces developers to get into the minds of their users and produces a better end result.About EvelynEvelyn is a recovering improviser currently role playing as a Lead Platform Engineer at Autoscout24 in Munich, Germany. While she says she specializes in AWS architecture and integration after spending 11 years with it, in truth she spends her days convincing engineers that a product mindset will make them hate their product managers less.Links Referenced:LinkedIn: https://www.linkedin.com/in/evelyn-osman/TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Evelyn Osman, engineering manager at AutoScout24. Evelyn, thank you for joining me.Evelyn: Thank you very much, Corey. It's actually really fun to be on here.Corey: I have to say one of the big reasons that I was enthused to talk to you is that you have been using AWS—to be direct—longer than I have, and that puts you in a somewhat rarefied position where AWS's customer base has absolutely exploded over the past 15 years that it's been around, but at the beginning, it was a very different type of thing. Nowadays, it seems like we've lost some of that magic from the beginning. Where do you land on that whole topic?Evelyn: That's actually a really good point because I always like to say, you know, when I come into a room, you know, I really started doing introductions like, “Oh, you know, hey,” I'm like, you know, “I'm this director, I've done this XYZ,” and I always say, like, “I'm Evelyn, engineering manager, or architect, or however,” and then I say, you know, “I've been working with AWS, you know, 11, 12 years,” or now I can't quite remember.Corey: Time becomes a flat circle. The pandemic didn't help.Evelyn: [laugh] Yeah, I just, like, a look at that the year, and I'm like, “Jesus. It's been that long.” Yeah. And usually, like you know, you get some odd looks like, “Oh, my God, you must be a sage.” And for me, I'm… you see how different services kind of, like, have just been reinventions of another one, or they just take a managed service and make another managed service around it. So, I feel that there's a lot of where it's just, you know, wrapping up a pretty bow, and calling it something different, it feels like.Corey: That's what I've been low-key asking people for a while now over the past year, namely, “What is the most foundational, interesting thing that AWS has done lately, that winds up solving for this problem of whatever it is you do as a company? What is it that has foundationally made things better that AWS has put out in the last service? What was it?” And the answers I get are all depressingly far in the past, I have to say. What's yours?Evelyn: Honestly, I think the biggest game-changer I remember experiencing was at an analyst summit in Stockholm when they announced Lambda.Corey: That was announced before I even got into this space, as an example of how far back things were. And you're right. That was transformative. That was awesome.Evelyn: Yeah, precisely. Because before, you know, we were always, like, trying to figure, okay, how do we, like, launch an instance, run some short code, and then clean it up. AWS is going to charge for an hour, so we need to figure out, you know, how to pack everything into one instance, run for one hour. And then they announced Lambda, and suddenly, like, holy shit, this is actually a game changer. We can actually write small functions that do specific things.And, you know, you go from, like, microservices, like, to like, tiny, serverless functions. So, that was huge. And then DynamoDB along with that, really kind of like, transformed the entire space for us in many ways. So, back when I was at TIBCO, there was a few innovations around that, even, like, one startup inside TIBCO that quite literally, their entire product was just Lambda functions. And one of their problems was, they wanted to sell in the Marketplace, and they couldn't figure out how to sell Lambda on the marketplace.Corey: It's kind of wild when we see just how far it's come, but also how much they've announced that doesn't change that much, to be direct. For me, one of the big changes that I remember that really made things better for customers—thought it took a couple of years—was EFS. And even that's a little bit embarrassing because all that is, “All right, we finally found a way to stuff a NetApp into us-east-1,” so now NFS, just like you used to use it in the 90s and the naughts, can be done responsibly in the cloud. And that, on some level, wasn't a feature launch so much as it was a concession to the ways that companies had built things and weren't likely to change.Evelyn: Honestly, I found the EFS launch to be a bit embarrassing because, like, you know, when you look closer at it, you realize, like, the performance isn't actually that great.Corey: Oh, it was horrible when it launched. It would just slam to a halt because you got the IOPS scaled with how much data you stored on it. The documentation explicitly said to use dd to start loading a bunch of data onto it to increase the performance. It's like, “Look, just sandbag the thing so it does what you'd want.” And all that stuff got fixed, but at the time it looked like it was clown shoes.Evelyn: Yeah, and that reminds me of, like, EBS's, like, gp2 when we're, like you know, we're talking, like, okay, provision IOPS with gp2. We just kept saying, like, just give yourself really big volume for performance. And it feel like they just kind of kept that with EFS. And it took years for them to really iterate off of that. Yeah, so, like, EFS was a huge thing, and I see us, we're still using it now today, and like, we're trying to integrate, especially for, like, data center migrations, but yeah, you always see that a lot of these were first more for, like, you know, data centers to the cloud, you know. So, first I had, like, EC2 classic. That's where I started. And I always like to tell a story that in my team, we're talking about using AWS, I was the only person fiercely against it because we did basically large data processing—sorry, I forget the right words—data analytics. There we go [laugh].Corey: I remember that, too. When it first came out, it was, “This sounds dangerous and scary, and it's going to be a flash in the pan because who would ever trust their core compute infrastructure to some random third-party company, especially a bookstore?” And yeah, I think I got that one very wrong.Evelyn: Yeah, exactly. I was just like, no way. You know, I see all these articles talking about, like, terrible disk performance, and here I am, where it's like, it's my bread and butter. I'm specialized in it, you know? I write code in my sleep and such.[Yeah, the interesting thing is, I was like, first, it was like, I can 00:06:03] launch services, you know, to kind of replicate when you get in a data center to make it feature comparable, and then it was taking all this complex services and wrapping it up in a pretty bow for—as a managed service. Like, EKS, I think, was the biggest one, if we're looking at managed services. Technically Elasticsearch, but I feel like that was the redheaded stepchild for quite some time.Corey: Yeah, there was—Elasticsearch was a weird one, and still is. It's not a pleasant service to run in any meaningful sense. Like, what people actually want as the next enhancement that would excite everyone is, I want a serverless version of this thing where I can just point it at a bunch of data, I hit an API that I don't have to manage, and get Elasticsearch results back from. They finally launched a serverless offering that's anything but. You have to still provision compute units for it, so apparently, the word serverless just means managed service over at AWS-land now. And it just, it ties into the increasing sense of disappointment I've had with almost all of their recent launches versus what I felt they could have been.Evelyn: Yeah, the interesting thing about Elasticsearch is, a couple of years ago, they came out with OpenSearch, a competing Elasticsearch after [unintelligible 00:07:08] kind of gave us the finger and change the licensing. I mean, OpenSearch actually become a really great offering if you run it yourself, but if you use their managed service, it can kind—you lose all the benefits, in a way.Corey: I'm curious, as well, to get your take on what I've been seeing that I think could only be described as an internal shift, where it's almost as if there's been a decree passed down that every service has to run its own P&L or whatnot, and as a result, everything that gets put out seems to be monetized in weird ways, even when I'd argue it shouldn't be. The classic example I like to use for this is AWS Config, where it charges you per evaluation, and that happens whenever a cloud resource changes. What that means is that by using the cloud dynamically—the way that they supposedly want us to do—we wind up paying a fee for that as a result. And it's not like anyone is using that service in isolation; it is definitionally being used as people are using other cloud resources, so why does it cost money? And the answer is because literally everything they put out costs money.Evelyn: Yep, pretty simple. Oftentimes, there's, like, R&D that goes into it, but the charges seem a bit… odd. Like from an S3 lens, was, I mean, that's, like, you know, if you're talking about services, that was actually a really nice one, very nice holistic overview, you know, like, I could drill into a data lake and, like, look into things. But if you actually want to get anything useful, you have to pay for it.Corey: Yeah. Everything seems to, for one reason or another, be stuck in this place where, “Well, if you want to use it, it's going to cost.” And what that means is that it gets harder and harder to do anything that even remotely resembles being able to wind up figuring out where's the spend going, or what's it going to cost me as time goes on? Because it's not just what are the resources I'm spinning up going to cost, what are the second, third, and fourth-order effects of that? And the honest answer is, well, nobody knows. You're going to have to basically run an experiment and find out.Evelyn: Yeah. No, true. So, what I… at AutoScout, we actually ended up doing is—because we're trying to figure out how to tackle these costs—is they—we built an in-house cost allocation solution so we could track all of that. Now, AWS has actually improved Cost Explorer quite a bit, and even, I think, Billing Conductor was one that came out [unintelligible 00:09:21], kind of like, do a custom tiered and account pricing model where you can kind of do the same thing. But even that also, there is a cost with it.I think that was trying to compete with other, you know, vendors doing similar solutions. But it still isn't something where we see that either there's, like, arbitrarily low pricing there, or the costs itself doesn't really quite make sense. Like, AWS [unintelligible 00:09:45], as you mentioned, it's a terrific service. You know, we try to use it for compliance enforcement and other things, catching bad behavior, but then as soon as people see the price tag, we just run away from it. So, a lot of the security services themselves, actually, the costs, kind of like, goes—skyrockets tremendously when you start trying to use it across a large organization. And oftentimes, the organization isn't actually that large.Corey: Yeah, it gets to this point where, especially in small environments, you have to spend more energy and money chasing down what the cost is than you're actually spending on the thing. There were blog posts early on that, “Oh, here's how you analyze your bill with Redshift,” and that was a minimum 750 bucks a month. It's, well, I'm guessing that that's not really for my $50 a month account.Evelyn: Yeah. No, precisely. I remember seeing that, like, entire ETL process is just, you know, analyze your invoice. Cost [unintelligible 00:10:33], you know, is fantastic, but at the end of the day, like, what you're actually looking at [laugh], is infinitesimally small compared to all the data in that report. Like, I think oftentimes, it's simply, you know, like, I just want to look at my resources and allocate them in a multidimensional way. Which actually isn't really that multidimensional, when you think about it [laugh].Corey: Increasingly, Cost Explorer has gotten better. It's not a new service, but every iteration seems to improve it to a point now where I'm talking to folks, and they're having a hard time justifying most of the tools in the cost optimization space, just because, okay, they want a percentage of my spend on AWS to basically be a slightly better version of a thing that's already improving and works for free. That doesn't necessarily make sense. And I feel like that's what you get trapped into when you start going down the VC path in the cost optimization space. You've got to wind up having a revenue model and an offering that scales through software… and I thought, originally, I was going to be doing something like that. At this point, I'm unconvinced that anything like that is really tenable.Evelyn: Yeah. When you're a small organization you're trying to optimize, you might not have the expertise and the knowledge to do so, so when one of these small consultancies comes along, saying, “Hey, we're going to charge you a really small percentage of your invoice,” like, okay, great. That's, like, you know, like, a few $100 a month to make sure I'm fully optimized, and I'm saving, you know, far more than that. But as soon as your invoice turns into, you know, it's like $100,000, or $300,000 or more, that percentage becomes rather significant. And I've had vendors come to me and, like, talk to me and is like, “Hey, we can, you know, for a small percentage, you know, we're going to do this machine learning, you know, AI optimization for you. You know, you don't have to do anything. We guaranteed buybacks your RIs.” And as soon as you look at the price tag with it, we just have to walk away. Or oftentimes we look at it, and there are truly very simple ways to do it on your own, if you just kind of put some thought into it.Corey: While we want to talking a bit before this show, you taught me something new about GameLift, which I think is a different problem that AWS has been dealing with lately. I've never paid much attention to it because it is the—as I assume from what it says on the tin, oh, it's a service for just running a whole bunch of games at scale, and I'm not generally doing that. My favorite computer game remains to be Twitter at this point, but that's okay. What is GameLift, though, because you want to shining a different light on it, which makes me annoyed that Amazon Marketing has not pointed this out.Evelyn: Yeah, so I'll preface this by saying, like, I'm not an expert on GameLift. I haven't even spun it up myself because there's quite a bit of price. I learned this fall while chatting with an SA who works in the gaming space, and it kind of like, I went, like, “Back up a second.” If you think about, like, I'm, you know, like, World of Warcraft, all you have are thousands of game clients all over the world, playing the same game, you know, on the same server, in the same instance, and you need to make sure, you know, that when I'm running, and you're running, that we know that we're going to reach the same point the same time, or if there's one object in that room, that only one of us can get it. So, all these servers are doing is tracking state across thousands of clients.And GameLift, when you think about your dedicated game service, it really is just multi-region distributed state management. Like, at the basic, that's really what it is. Now, there's, you know, quite a bit more happening within GameLift, but that's what I was going to explain is, like, it's just state management. And there are far more use cases for it than just for video games.Corey: That's maddening to me because having a global session state store, for lack of a better term, is something that so many customers have built themselves repeatedly. They can build it on top of primitives like DynamoDB global tables, or alternately, you have a dedicated region where that thing has to live and everything far away takes forever to round-trip. If they've solved some of those things, why on earth would they bury it under a gaming-branded service? Like, offer that primitive to the rest of us because that's useful.Evelyn: No, absolutely. And honestly, I wouldn't be surprised if you peeled back the curtain with GameLift, you'll find a lot of—like, several other you know, AWS services that it's just built on top of. I kind of mentioned earlier is, like, what I see now with innovation, it's like we just see other services packaged together and releases a new product.Corey: Yeah, IoT had the same problem going on for years where there was a lot of really good stuff buried in there, like IOT events. People were talking about using that for things like browser extensions and whatnot, but you need to be explicitly told that that's a thing that exists and is handy, but otherwise you'd never know it was there because, “Well, I'm not building anything that's IoT-related. Why would I bother?” It feels like that was one direction that they tended to go in.And now they take existing services that are, mmm, kind of milquetoast, if I'm being honest, and then saying, “Oh, like, we have Comprehend that does, effectively detection of themes, keywords, and whatnot, from text. We're going to wind up re-releasing that as Comprehend Medical.” Same type of thing, but now focused on a particular vertical. Seems to me that instead of being a specific service for that vertical, just improve the baseline the service and offer HIPAA compliance if it didn't exist already, and you're mostly there. But what do I know? I'm not a product manager trying to get promoted.Evelyn: Yeah, that's true. Well, I was going to mention that maybe it's the HIPAA compliance, but actually, a lot of their services already have HIPAA compliance. And I've stared far too long at that compliance section on AWS's site to know this, but you know, a lot of them actually are HIPAA-compliant, they're PCI-compliant, and ISO-compliant, and you know, and everything. So, I'm actually pretty intrigued to know why they [wouldn't 00:16:04] take that advantage.Corey: I just checked. Amazon Comprehend is itself HIPAA-compliant and is qualified and certified to hold Personal Health Information—PHI—Private Health Information, whatever the acronym stands for. Now, what's the difference, then, between that and Medical? In fact, the HIPAA section says for Comprehend Medical, “For guidance, see the previous section on Amazon Comprehend.” So, there's no difference from a regulatory point of view.Evelyn: That's fascinating. I am intrigued because I do know that, like, within AWS, you know, they have different segments, you know? There's, like, Digital Native Business, there's Enterprise, there's Startup. So, I am curious how things look over the engineering side. I'm going to talk to somebody about this now [laugh].Corey: Yeah, it's the—like, I almost wonder, on some level, it feels like, “Well, we wound to building this thing in the hopes that someone would use it for something. And well, if we just use different words, it checks a box in some analyst's chart somewhere.” I don't know. I mean, I hate to sound that negative about it, but it's… increasingly when I talk to customers who are active in these spaces around the industry vertical targeted stuff aimed at their industry, they're like, “Yeah, we took a look at it. It was adorable, but we're not using it that way. We're going to use either the baseline version or we're going to work with someone who actively gets our industry.” And I've heard that repeated about three or four different releases that they've put out across the board of what they've been doing. It feels like it is a misunderstanding between what the world needs and what they're able to or willing to build for us.Evelyn: Not sure. I wouldn't be surprised, if we go far enough, it could probably be that it's just a product manager saying, like, “We have to advertise directly to the industry.” And if you look at it, you know, in the backend, you know, it's an engineer, you know, kicking off a build and just changing the name from Comprehend to Comprehend Medical.Corey: And, on some level, too, they're moving a lot more slowly than they used to. There was a time where they were, in many cases, if not the first mover, the first one to do it well. Take Code Whisperer, their AI powered coding assistant. That would have been a transformative thing if GitHub Copilot hadn't beaten them every punch, come out with new features, and frankly, in head-to-head experiments that I've run, came out way better as a product than what Code Whisperer is. And while I'd like to say that this is great, but it's too little too late. And when I talk to engineers, they're very excited about what Copilot can do, and the only people I see who are even talking about Code Whisperer work at AWS.Evelyn: No, that's true. And so, I think what's happening—and this is my opinion—is that first you had AWS, like, launching a really innovative new services, you know, that kind of like, it's like, “Ah, it's a whole new way of running your workloads in the cloud.” Instead of you know, basically, hiring a whole team, I just click a button, you have your instance, you use it, sell software, blah, blah, blah, blah. And then they went towards serverless, and then IoT, and then it started targeting large data lakes, and then eventually that kind of run backwards towards security, after the umpteenth S3 data leak.Corey: Oh, yeah. And especially now, like, so they had a hit in some corners with SageMaker, so now there are 40 services all starting with the word SageMaker. That's always pleasant.Evelyn: Yeah, precisely. And what I kind of notice is… now they're actually having to run it even further back because they caught all the corporations that could pivot to the cloud, they caught all the startups who started in the cloud, and now they're going for the larger behemoths who have massive data centers, and they don't want to innovate. They just want to reduce this massive sysadmin team. And I always like to use the example of a Bare Metal. When that came out in 2019, everybody—we've all kind of scratched your head. I'm like, really [laugh]?Corey: Yeah, I could see where it makes some sense just for very specific workloads that involve things like specific capabilities of processors that don't work under emulation in some weird way, but it's also such a weird niche that I'm sure it's there for someone. My default assumption, just given the breadth of AWS's customer base, is that whenever I see something that they just announced, well, okay, it's clearly not for me; that doesn't mean it's not meeting the needs of someone who looks nothing like me. But increasingly as I start exploring the industry in these services have time to percolate in the popular imagination and I still don't see anything interesting coming out with it, it really makes you start to wonder.Evelyn: Yeah. But then, like, I think, like, roughly a year or something, right after Bare Metal came out, they announced Outposts. So, then it was like, another way to just stay within your data center and be in the cloud.Corey: Yeah. There's a bunch of different ways they have that, okay, here's ways you can run AWS services on-prem, but still pay us by the hour for the privilege of running things that you have living in your facility. And that doesn't seem like it's quite fair.Evelyn: That's exactly it. So, I feel like now it's sort of in diminishing returns and sort of doing more cloud-native work compared to, you know, these huge opportunities, which is everybody who still has a data center for various reasons, or they're cloud-native, and they grow so big, that they actually start running their own data centers.Corey: I want to call out as well before we wind up being accused of being oblivious, that we're recording this before re:Invent. So, it's entirely possible—I hope this happens—that they announce something or several some things that make this look ridiculous, and we're embarrassed to have had this conversation. And yeah, they're totally getting it now, and they have completely surprised us with stuff that's going to be transformative for almost every customer. I've been expecting and hoping for that for the last three or four re:Invents now, and I haven't gotten it.Evelyn: Yeah, that's right. And I think there's even a new service launches that actually are missing fairly obvious things in a way. Like, mine is the Managed Workflow for Amazon—it's Managed Airflow, sorry. So, we were using Data Pipeline for, you know, big ETL processing, so it was an in-house tool we kind of built at Autoscout, we do platform engineering.And it was deprecated, so we looked at a new—what to replace it with. And so, we looked at Airflow, and we decided this is the way to go, we want to use managed because we don't want to maintain our own infrastructure. And the problem we ran into is that it doesn't have support for shared VPCs. And we actually talked to our account team, and they were confused. Because they said, like, “Well, every new service should support it natively.” But it just didn't have it. And that's, kind of, what, I kind of found is, like, there's—it feels—sometimes it's—there's a—it's getting rushed out the door, and it'll actually have a new managed service or new service launched out, but they're also sort of cutting some corners just to actually make sure it's packaged up and ready to go.Corey: When I'm looking at this, and seeing how this stuff gets packaged, and how it's built out, I start to understand a pattern that I've been relatively down on across the board. I'm curious to get your take because you work at a fairly sizable company as an engineering manager, running teams of people who do this sort of thing. Where do you land on the idea of companies building internal platforms to wrap around the offerings that the cloud service providers that they use make available to them?Evelyn: So, my opinion is that you need to build out some form of standardized tool set in order to actually be able to innovate quickly. Now, this sounds counterintuitive because everyone is like, “Oh, you know, if I want to innovate, I should be able to do this experiment, and try out everything, and use what works, and just release it.” And that greatness [unintelligible 00:23:14] mentality, you know, it's like five talented engineers working to build something. But when you have, instead of five engineers, you have five teams of five engineers each, and every single team does something totally different. You know, one uses Scala, and other on TypeScript, another one, you know .NET, and then there could have been a [last 00:23:30] one, you know, comes in, you know, saying they're still using Ruby.And then next thing you know, you know, you have, like, incredibly diverse platforms for services. And if you want to do any sort of like hiring or cross-training, it becomes incredibly difficult. And actually, as the organization grows, you want to hire talent, and so you're going to have to hire, you know, a developer for this team, you going to have to hire, you know, Ruby developer for this one, a Scala guy here, a Node.js guy over there.And so, this is where we say, “Okay, let's agree. We're going to be a Scala shop. Great. All right, are we running serverless? Are we running containerized?” And you agree on those things. So, that's already, like, the formation of it. And oftentimes, you start with DevOps. You'll say, like, “I'm a DevOps team,” you know, or doing a DevOps culture, if you do it properly, but you always hit this scaling issue where you start growing, and then how do you maintain that common tool set? And that's where we start looking at, you know, having a platform… approach, but I'm going to say it's Platform-as-a-Product. That's the key.Corey: Yeah, that's a good way of framing it because originally, the entire world needed that. That's what RightScale was when EC2 first came out. It was a reimagining of the EC2 console that was actually usable. And in time, AWS improved that to the point where RightScale didn't really have a place anymore in a way that it had previously, and that became a business challenge for them. But you have, what is it now, 2, 300 services that AWS has put out, and out, and okay, great. Most companies are really only actively working with a handful of those. How do you make those available in a reasonable way to your teams, in ways that aren't distracting, dangerous, et cetera? I don't know the answer on that one.Evelyn: Yeah. No, that's true. So, full disclosure. At AutoScout, we do platform engineering. So, I'm part of, like, the platform engineering group, and we built a platform for our product teams. It's kind of like, you need to decide to [follow 00:25:24] those answers, you know? Like, are we going to be fully containerized? Okay, then, great, we're going to use Fargate. All right, how do we do it so that developers don't actually—don't need to think that they're running Fargate workloads?And that's, like, you know, where it's really important to have those standardized abstractions that developers actually enjoy using. And I'd even say that, before you start saying, “Ah, we're going to do platform,” you say, “We should probably think about developer experience.” Because you can do a developer experience without a platform. You can do that, you know, in a DevOps approach, you know? It's basically build tools that makes it easy for developers to write code. That's the first step for anything. It's just, like, you have people writing the code; make sure that they can do the things easily, and then look at how to operate it.Corey: That sure would be nice. There's a lack of focus on usability, especially when it comes to a number of developer tools that we see out there in the wild, in that, they're clearly built by people who understand the problem space super well, but they're designing these things to be used by people who just want to make the website work. They don't have the insight, the knowledge, the approach, any of it, nor should they necessarily be expected to.Evelyn: No, that's true. And what I see is, a lot of the times, it's a couple really talented engineers who are just getting shit done, and they get shit done however they can. So, it's basically like, if they're just trying to run the website, they're just going to write the code to get things out there and call it a day. And then somebody else comes along, has a heart attack when see what's been done, and they're kind of stuck with it because there is no guardrails or paved path or however you want to call it.Corey: I really hope—truly—that this is going to be something that we look back and laugh when this episode airs, that, “Oh, yeah, we just got it so wrong. Look at all the amazing stuff that came out of re:Invent.” Are you going to be there this year?Evelyn: I am going to be there this year.Corey: My condolences. I keep hoping people get to escape.Evelyn: This is actually my first one in, I think, five years. So, I mean, the last time I was there was when everybody's going crazy over pins. And I still have a bag of them [laugh].Corey: Yeah, that did seem like a hot-second collectable moment, didn't it?Evelyn: Yeah. And then at the—I think, what, the very last day, as everybody's heading to re:Play, you could just go into the registration area, and they just had, like, bags of them lying around to take. So, all the competing, you know, to get the requirements for a pin was kind of moot [laugh].Corey: Don't you hate it at some point where it's like, you feel like I'm going to finally get this crowning achievement, it's like or just show up at the buffet at the end and grab one of everything, and wow, that would have saved me a lot of pain and trouble.Evelyn: Yeah.Corey: Ugh, scavenger hunts are hard, as I'm about to learn to my own detriment.Evelyn: Yeah. No, true. Yeah. But I am really hoping that re:Invent proves me wrong. Embarrassingly wrong, and then all my colleagues can proceed to mock me for this ridiculous podcast that I made with you. But I am a fierce skeptic. Optimistic nihilist, but still a nihilist, so we'll see how re:Invent turns out.Corey: So, I am curious, given your experience at more large companies than I tend to be embedded with for any period of time, how have you found that these large organizations tend to pick up new technologies? What does the adoption process look like? And honestly, if you feel like throwing some shade, how do they tend to get it wrong?Evelyn: In most cases, I've seen it go… terrible. Like, it just blows up in their face. And I say that is because a lot of the time, an organization will say, “Hey, we're going to adopt this new way of organizing teams or developing products,” and they look at all the practices. They say, “Okay, great. Product management is going to bring it in, they're going to structure things, how we do the planning, here's some great charts and diagrams,” but they don't really look at the culture aspect.And that's always where I've seen things fall apart. I've been in a room where, you know, our VP was really excited about team topologies and say, “Hey, we're going to adopt it.” And then an engineering manager proceeded to say, “Okay, you're responsible for this team, you're responsible for that team, you're responsible for this team talking to, like, a team of, like, five engineers,” which doesn't really work at all. Or, like, I think the best example is DevOps, you know, where you say, “Ah, we're going to adopt DevOps, we're going to have a DevOps team, or have a DevOps engineer.”Corey: Step one: we're going to rebadge everyone with existing job titles to have the new fancy job titles that reflect it. It turns out that's not necessarily sufficient in and of itself.Evelyn: Not really. The Spotify model. People say, like, “Oh, we're going to do the Spotify model. We're going to do skills, tribes, you know, and everything. It's going to be awesome, it's going to be great, you know, and nice, cross-functional.”The reason I say it bails on us every single time is because somebody wants to be in control of the process, and if the process is meant to encourage collaboration and innovation, that person actually becomes a chokehold for it. And it could be somebody that says, like, “Ah, I need to be involved in every single team, and listen to know what's happening, just so I'm aware of it.” What ends up happening is that everybody differs to them. So, there is no collaboration, there is no innovation. DevOps, you say, like, “Hey, we're going to have a team to do everything, so your developers don't need to worry about it.” What ends up happening is you're still an ops team, you still have your silos.And that's always a challenge is you actually have to say, “Okay, what are the cultural values around this process?” You know, what is SRE? What is DevOps, you know? Is it seen as processes, is it a series of principles, platform, maybe, you know? We have to say, like—that's why I say, Platform-as-a-Product because you need to have that product mindset, that culture of product thinking, to really build a platform that works because it's all about the user journey.It's not about building a common set of tools. It's the user journey of how a person interacts with their code to get it into a production environment. And so, you need to understand how that person sits down at their desk, starts the laptop up, logs in, opens the IDE, what they're actually trying to get done. And once you understand that, then you know your requirements, and you build something to fill those things so that they are happy to use it, as opposed to saying, “This is our platform, and you're going to use it.” And they're probably going to say, “No.” And the next thing, you know, they're just doing their own thing on the side.Corey: Yeah, the rise of Shadow IT has never gone away. It's just, on some level, it's the natural expression, I think it's an immune reaction that companies tend to have when process gets in the way. Great, we have an outcome that we need to drive towards; we don't have a choice. Cloud empowered a lot of that and also has given tools to help rein it in, and as with everything, the arms race continues.Evelyn: Yeah. And so, what I'm going to continue now, kind of like, toot the platform horn. So, Gregor Hohpe, he's a [solutions architect 00:31:56]—I always f- up his name. I'm so sorry, Gregor. He has a great book, and even a talk, called The Magic of Platforms, that if somebody is actually curious about understanding of why platforms are nice, they should really watch that talk.If you see him at re:Invent, or a summit or somewhere giving a talk, go listen to that, and just pick his brain. Because that's—for me, I really kind of strongly agree with his approach because that's really how, like, you know, as he says, like, boost innovation is, you know, where you're actually building a platform that really works.Corey: Yeah, it's a hard problem, but it's also one of those things where you're trying to focus on—at least ideally—an outcome or a better situation than you currently find yourselves in. It's hard to turn down things that might very well get you there sooner, faster, but it's like trying to effectively cargo-cult the leadership principles from your last employer into your new one. It just doesn't work. I mean, you see more startups from Amazonians who try that, and it just goes horribly because without the cultural understanding and the supporting structures, it doesn't work.Evelyn: Exactly. So, I've worked with, like, organizations, like, 4000-plus people, I've worked for, like, small startups, consulted, and this is why I say, almost every single transformation, it fails the first time because somebody needs to be in control and track things and basically be really, really certain that people are doing it right. And as soon as it blows up in their face, that's when they realize they should actually take a step back. And so, even for building out a platform, you know, doing Platform-as-a-Product, I always reiterate that you have to really be willing to just invest upfront, and not get very much back. Because you have to figure out the whole user journey, and what you're actually building, before you actually build it.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?Evelyn: So, I used to be on Twitter, but I've actually got off there after it kind of turned a bit toxic and crazy.Corey: Feels like that was years ago, but that's beside the point.Evelyn: Yeah, precisely. So, I would even just say because this feels like a corporate show, but find me on LinkedIn of all places because I will be sharing whatever I find on there, you know? So, just look me up on my name, Evelyn Osman, and give me a follow, and I'll probably be screaming into the cloud like you are.Corey: And we will, of course, put links to that in the show notes. Thank you so much for taking the time to speak with me. I appreciate it.Evelyn: Thank you, Corey.Corey: Evelyn Osman, engineering manager at AutoScout24. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, and I will read it once I finish building an internal platform to normalize all of those platforms together into one.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business, and we get to the point. Visit duckbillgroup.com to get started.
AWS Morning Brief for the week of November 20, 2023 with Corey Quinn. Links: re:Quinnvent Wednesday night drinkup at Atomic Liquors Nature Walk Amazon CloudWatch Logs announces regular expression filter pattern support for Live Tail Amazon EBS announces Snapshot Lock to protect snapshots from inadvertent or malicious deletions Amazon MSK Serverless now supports all programming languages Amazon Time Sync Service now supports microsecond-accurate time AWS CloudTrail Lake announces new pricing option optimized for flexible retention AWS Cost Explorer now provides more historical and granular data AWS announces IPv6 tiered VPCs and subnets AWS Lambda console now features a single pane view of metrics, logs, and traces Announcing Research and Engineering Studio on AWS Announcing PartyRock, an Amazon Bedrock Playground Amazon Bedrock now provides access to Meta's Llama 2 Chat 13B model Happy anniversary, Amazon CloudFront: 15 years of evolution and internet advancements New – Multi-account search in AWS Resource Explorer Introducing instance maintenance policy for Amazon EC2 Auto Scaling The serverless attendee's guide to AWS re:Invent 2023 Amazon EKS and Kubernetes sessions at AWS re:Invent 2023 Optimize AZ traffic costs using Amazon EKS, Karpenter, and Istio Editorial Join us for a week of AWS Amplify launches
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls.
Today's Day Two Cloud kicks off an occasional series on cloud essentials. For the first episode we discuss the Virtual Private Cloud (VPC). A VPC is an fundamental construct of a public cloud. It's essentially your slice of the shared cloud infrastructure, and you can launch and run other elements within a VPC to support your workload. Ned Bellavance walks through key VPC components including regions and AZs, networking and IP addressing, paid add-ons, data egress and associated charges, monitoring and troubleshooting, and basic security controls. The post Day Two Cloud 209: Cloud Essentials – Virtual Private Clouds (VPCs) appeared first on Packet Pushers.
In this episode, Woody dives into the world of cloud security using open source systems with our special guest, Susan Hinrichs. Susan Hinrichs, Chief Scientist at Aviatrix, is a multifaceted professional with a strong background in the open source networking and security space. As a designer and implementer, she has contributed significantly to the development of distributed cloud firewall. Susan's expertise extends well beyond traditional networking, encompassing diverse areas such as cloud routing, application security, policy-based traffic engineering, and distributed systems. Throughout this insightful conversation, Susan discusses the advantages of open source platforms, Aviatrix contributions to the open source community, and the open source DNA of the Aviatrix Distributed Cloud Firewall. Susan and Woody also explore possible directions for Distributed Cloud Firewall and the role that AI and ML could play in network security. Learn more about Altitude and Host Woody: https://aviatrix.com/altitude/ Susan's LinkedIn: https://www.linkedin.com/in/shinrich/ Timestamps: [00:02:11] Group responsible for traffic termination and scrubbing. Used open source software and contributed back. [00:06:55] Extended Berkeley Packet Filter (eBPF) enables efficient traffic analysis in kernel space, particularly for dropping network traffic at low levels with minimal effort. It provides a more cost-effective alternative to IP tables for implementing firewall policies. [00:10:07] Approach: Not everyone is the root. All processes aren't root. Need to elevate. Complicated product made simple. [00:14:27] Open Stack's limitations revealed as enterprise-scale businesses require dedicated specialists, making it costly. Distributed cloud firewall innovates multicloud security. Scaling security in the cloud is challenging due to layer 3 and up the stack complexities. [00:16:38] Distributed firewall challenges and solutions summarized. [00:21:53] Smart groups are created with tags on VMs, subnets, and VPCs. These groups are used to create rules for traffic routing. With Aviatrix fabric, gateways are protected, and traffic routes are understood. The controller analyzes gateways and enforces rules accordingly. Rules are pushed or pulled to the gateways. [00:26:15] Security group orchestration across different cloud platforms has limitations due to varying models and rule limits. Difficulties arise when translating intermixed allows and denies into only allows, potentially causing networks to split and requiring more rules. Despite extensive work, there are cases where policy expression is not possible. Other tools, like VMware and Cisco, offer similar orchestration capabilities, but the physical enforcement points may still restrict the unified view presented to customers. [00:30:30] Moving towards intrusion protection, analytics, and service mesh for enhanced security. [00:34:05] The impact of AI and machine learning on security systems. [00:35:16] AI helps with alarm fatigue and data correlation.
Jake Gold, Infrastructure Engineer at Bluesky, joins Corey on Screaming in the Cloud to discuss his experience helping to build Bluesky and why he's so excited about it. Jake and Corey discuss the major differences when building a truly open-source social media platform, and Jake highlights his focus on reliability. Jake explains why he feels downtime can actually be a huge benefit to reliability engineers, and why how he views abstractions based on the size of the team he's working on. Corey and Jake also discuss whether cloud is truly living up to its original promise of lowered costs. About JakeJake Gold leads infrastructure at Bluesky, where the team is developing and deploying the decentralized social media protocol, ATP. Jake has previously managed infrastructure at companies such as Docker and Flipboard, and most recently, he was the founding leader of the Robot Reliability Team at Nuro, an autonomous delivery vehicle company.Links Referenced: Bluesky: https://blueskyweb.xyz/ Bluesky waitlist signup: https://bsky.app TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. In case folks have missed this, I spent an inordinate amount of time on Twitter over the last decade or so, to the point where my wife, my business partner, and a couple of friends all went in over the holidays and got me a leather-bound set of books titled The Collected Works of Corey Quinn. It turns out that I have over a million words of shitpost on Twitter. If you've also been living in a cave for the last year, you'll notice that Twitter has basically been bought and driven into the ground by the world's saddest manchild, so there's been a bit of a diaspora as far as people trying to figure out where community lives.Jake Gold is an infrastructure engineer at Bluesky—which I will continue to be mispronouncing as Blue-ski because that's the kind of person I am—which is, as best I can tell, one of the leading contenders, if not the leading contender to replace what Twitter was for me. Jake, welcome to the show.Jake: Thanks a lot, Corey. Glad to be here.Corey: So, there's a lot of different angles we can take on this. We can talk about the policy side of it, we can talk about social networks and things we learn watching people in large groups with quasi-anonymity, we can talk about all kinds of different nonsense. But I don't want to do that because I am an old-school Linux systems administrator. And I believe you came from the exact same path, given that as we were making sure that I had, you know, the right person on the show, you came into work at a company after I'd left previously. So, not only are you good at the whole Linux server thing; you also have seen exactly how good I am not at the Linux server thing.Jake: Well, I don't remember there being any problems at TrueCar, where you worked before me. But yeah, my background is doing Linux systems administration, which turned into, sort of, Linux programming. And these days, we call it, you know, site reliability engineering. But yeah, I discovered Linux in the late-90s, as a teenager and, you know, installing Slackware on 50 floppy disks and things like that. And I just fell in love with the magic of, like, being able to run a web server, you know? I got a hosting account at, you know, my local ISP, and I was like, how do they do that, right?And then I figured out how to do it. I ran Apache, and it was like, still one of my core memories of getting, you know, httpd running and being able to access it over the internet and telling my friends on IRC. And so, I've done a whole bunch of things since then, but that's still, like, the part that I love the most.Corey: The thing that continually surprises me is just what I think I'm out and we've moved into a fully modern world where oh, all I do is I write code anymore, which I didn't realize I was doing until I realized if you call YAML code, you can get away with anything. And I get dragged—myself getting dragged back in. It's the falling back to fundamentals in these weird moments of yes, yes, immutable everything, Infrastructure is code, but when the server is misbehaving and you want to log in and get your hands dirty, the skill set rears its head yet again. At least that's what I've been noticing, at least as far as I've gone down a number of interesting IoT-based projects lately. Is that something you experience or have you evolved fully and not looked back?Jake: Yeah. No, what I try to do is on my personal projects, I'll use all the latest cool, flashy things, any abstraction you want, I'll try out everything, and then what I do it at work, I kind of have, like, a one or two year, sort of, lagging adoption of technologies, like, when I've actually shaken them out in my own stuff, then I use them at work. But yeah, I think one of my favorite quotes is, like, “Programmers first learn the power of abstraction, then they learn the cost of abstraction, and then they're ready to program.” And that's how I view infrastructure, very similar thing where, you know, certain abstractions like container orchestration, or you know, things like that can be super powerful if you need them, but like, you know, that's generally very large companies with lots of teams and things like that. And if you're not that, it pays dividends to not use overly complicated, overly abstracted things. And so, that tends to be [where 00:04:22] I follow up most of the time.Corey: I'm sure someone's going to consider this to be heresy, but if I'm tasked with getting a web application up and running in short order, I'm putting it on an old-school traditional three-tier architecture where you have a database server, a web server or two, and maybe a job server that lives between them. Because is it the hotness? No. Is it going to be resume bait? Not really.But you know, it's deterministic as far as where things live. When something breaks, I know where to find it. And you can miss me with the, “Well, that's not webscale,” response because yeah, by the time I'm getting something up overnight, to this has to serve the entire internet, there's probably a number of architectural iterations I'm going to be able to go through. The question is, what am I most comfortable with and what can I get things up and running with that's tried and tested?I'm also remarkably conservative on things like databases and file systems because mistakes at that level are absolutely going to show. Now, I don't know how much you're able to talk about the Blue-ski infrastructure without getting yelled at by various folks, but how modern versus… reliable—I guess that's probably a fair axis to put it on: modernity versus reliability—where on that spectrum, does the official Blue-ski infrastructure land these days?Jake: Yeah. So, I mean, we're in a fortunate position of being an open-source company working on an open protocol, and so we feel very comfortable talking about basically everything. Yeah, and I've talked about this a bit on the app, but the basic idea we have right now is we're using AWS, we have auto-scaling groups, and those auto-scaling groups are just EC2 instances running Docker CE—the Community Edition—for the runtime and for containers. And then we have a load balancer in front and a Postgres multi-AZ instance in the back on RDS, and it is really, really simple.And, like, when I talk about the difference between, like, a reliability engineer and a normal software engineer is, software engineers tend to be very feature-focused, you know, they're adding capabilities to a system. And the goal and the mission of a reliability team is to focus on reliability, right? Like, that's the primary thing that we're worried about. So, what I find to be the best resume builder is that I can say with a lot of certainty that if you talk to any teams that I've worked on, they will say that the infrastructure I ran was very reliable, it was very secure, and it ended up being very scalable because you know, the way we solve the, sort of, integration thing is you just version your infrastructure, right? And I think this works really well.You just say, “Hey, this was the way we did it now and we're going to call that V1. And now we're going to work on V2. And what should V2 be?” And maybe that does need something more complicated. Maybe you need to bring in Kubernetes, you maybe need to bring in a super-cool reverse proxy that has all sorts of capabilities that your current one doesn't.Yeah, but by versioning it, you just—it takes away a lot of the, sort of, interpersonal issues that can happen where, like, “Hey, we're replacing Jake's infrastructure with Bob's infrastructure or whatever.” I just say it's V1, it's V2, it's V3, and then I find that solves a huge number of the problems with that sort of dynamic. But yeah, at Bluesky, like, you know, the big thing that we are focused on is federation is scaling for us because the idea is not for us to run the entire global infrastructure for AT Proto, which is the protocol that Bluesky is based on. The idea is that it's this big open thing like the web, right? Like, you know, Netscape popularized the web, but they didn't run every web server, they didn't run every search engine, right, they didn't run all the payment stuff. They just did all of the core stuff, you know, they created SSL, right, which became TLS, and they did all the things that were necessary to make the whole system large, federated, and scalable. But they didn't run it all. And that's exactly the same goal we have.Corey: The obvious counterexample is, no, but then you take basically their spiritual successor, which is Google, and they build the security, they build—they run a lot of the servers, they have the search engine, they have the payments infrastructure, and then they turn a lot of it off for fun and… I would say profit, except it's the exact opposite of that. But I digress. I do have a question for you that I love to throw at people whenever they start talking about how their infrastructure involves auto-scaling. And I found this during the pandemic in that a lot of people believed in their heart-of-hearts that they were auto-scaling, but people lie, mostly to themselves. And you would look at their daily or hourly spend of their infrastructure and their user traffic dropped off a cliff and their spend was so flat you could basically eat off of it and set a table on top of it. If you pull up Cost Explorer and look through your environment, how large are the peaks and valleys over the course of a given day or week cycle?Jake: Yeah, no, that's a really good point. I think my basic approach right now is that we're so small, we don't really need to optimize very much for cost, you know? We have this sort of base level of traffic and it's not worth a huge amount of engineering time to do a lot of dynamic scaling and things like that. The main benefit we get from auto-scaling groups is really just doing the refresh to replace all of them, right? So, we're also doing the immutable server concept, right, which was popularized by Netflix.And so, that's what we're really getting from auto-scaling groups. We're not even doing dynamic scaling, right? So, it's not keyed to some metric, you know, the number of instances that we have at the app server layer. But the cool thing is, you can do that when you're ready for it, right? The big issue is, you know, okay, you're scaling up your app instances, but is your database scaling up, right, because there's not a lot of use in having a whole bunch of app servers if the database is overloaded? And that tends to be the bottleneck for, kind of, any complicated kind of application like ours. So, right now, the bill is very flat; you could eat off, and—if it wasn't for the CDN traffic and the load balancer traffic and things like that, which are relatively minor.Corey: I just want to stop for a second and marvel at just how educated that answer was. It's, I talk to a lot of folks who are early-stage who come and ask me about their AWS bills and what sort of things should they concern themselves with, and my answer tends to surprise them, which is, “You almost certainly should not unless things are bizarre and ridiculous. You are not going to build your way to your next milestone by cutting costs or optimizing your infrastructure.” The one thing that I would make sure to do is plan for a future of success, which means having account segregation where it makes sense, having tags in place so that when, “Huh, this thing's gotten really expensive. What's driving all of that?” Can be answered without a six-week research project attached to it.But those are baseline AWS Hygiene 101. How do I optimize my bill further, usually the right answer is go build. Don't worry about the small stuff. What's always disturbing is people have that perspective and they're spending $300 million a year. But it turns out that not caring about your AWS bill was, in fact, a zero interest rate phenomenon.Jake: Yeah. So, we do all of those basic things. I think I went a little further than many people would where every single one of our—so we have different projects, right? So, we have the big graph server, which is sort of like the indexer for the whole network, and we have the PDS, which is the Personal Data Server, which is, kind of, where all of people's actual social data goes, your likes and your posts and things like that. And then we have a dev staging, sandbox, prod environment for each one of those, right? And there's more services besides. But the way we have it is those are all in completely separated VPCs with no peering whatsoever between them. They are all on distinct IP addresses, IP ranges, so that we could do VPC peering very easily across all of them.Corey: Ah, that's someone who's done data center work before with overlapping IP address ranges and swore, never again.Jake: Exactly. That is when I had been burned. I have cleaned up my mess and other people's messes. And there's nothing less fun than renumbering a large complicated network. But yeah, so once we have all these separate VPCs and so it's very easy for us to say, hey, we're going to take this whole stack from here and move it over to a different region, a different provider, you know?And the other thing is that we're doing is, we're completely cloud agnostic, right? I really like AWS, I think they are the… the market leader for a reason: they're very reliable. But we're building this large federated network, so we're going to need to place infrastructure in places where AWS doesn't exist, for example, right? So, we need the ability to take an environment and replicate it in wherever. And of course, they have very good coverage, but there are places they don't exist. And that's all made much easier by the fact that we've had a very strong separation of concerns.Corey: I always found it fun that when you had these decentralized projects that were invariably NFT or cryptocurrency-driven over the past, eh, five or six years or so, and then AWS would take a us-east-1 outage in a variety of different and exciting ways,j and all these projects would go down hard. It's, okay, you talk a lot about decentralization for having hard dependencies on one company in one data center, effectively, doing something right. And it becomes a harder problem in the fullness of time. There is the counterargument, in that when us-east-1 is having problems, most of the internet isn't working, so does your offering need to be up and running at all costs? There are some people for whom that answer is very much, yes. People will die if what we're running is not up and running. Usually, a social network is not on that list.Jake: Yeah. One of the things that is surprising, I think, often when I talk about this as a reliability engineer, is that I think people sometimes over-index on downtime, you know? They just, they think it's much bigger deal than it is. You know, I've worked on systems where there was credit card processing where you're losing a million dollars a minute or something. And like, in that case, okay, it matters a lot because you can put a real dollar figure on it, but it's amazing how a few of the bumps in the road we've already had with Bluesky have turned into, sort of, fun events, right?Like, we had a bug in our invite code system where people were getting too many invite codes and it was sort of caused a problem, but it was a super fun event. We all think back on it fondly, right? And so, outages are not fun, but they're not life and death, generally. And if you look at the traffic, usually what happens is after an outage traffic tends to go up. And a lot of the people that joined, they're just, they're talking about the fun outage that they missed because they weren't even on the network, right?So, it's like, I also like to remind people that eBay for many years used to have, like, an outage Wednesday, right? Whereas they could put a huge dollar figure on how much money they lost every Wednesday and yet eBay did quite well, right? Like, it's amazing what you can do if you relax the constraints of downtime a little bit. You can do maintenance things that would be impossible otherwise, which makes the whole thing work better the rest of the time, for example.Corey: I mean, it's 2023 and the Social Security Administration's website still has business hours. They take a nightly four to six-hour maintenance window. It's like, the last person out of the office turns off the server or something. I imagine some horrifying mainframe job that needs to wind up sweeping after itself are running some compute jobs. But yeah, for a lot of these use cases, that downtime is absolutely acceptable.I am curious as to… as you just said, you're building this out with an idea that it runs everywhere. So, you're on AWS right now because yeah, they are the market leader for a reason. If I'm building something from scratch, I'd be hard-pressed not to pick AWS for a variety of reasons. If I didn't have cloud expertise, I think I'd be more strongly inclined toward Google, but that's neither here nor there. But the problem is these large cloud providers have certain economic factors that they all treat similarly since they're competing with each other, and that causes me to believe things that aren't necessarily true.One of those is that egress bandwidth to the internet is very expensive. I've worked in data centers. I know how 95th percentile commit bandwidth billing works. It is not overwhelmingly expensive, but you can be forgiven for believing that it is looking at cloud environments. Today, Blue-ski does not support animated GIFs—however you want to mispronounce that word—they don't support embedded videos, and my immediate thought is, “Oh yeah, those things would be super expensive to wind up sharing.”I don't know that that's true. I don't get the sense that those are major cost drivers. I think it's more a matter of complexity than the rest. But how are you making sure that the large cloud provider economic models don't inherently shape your view of what to build versus what not to build?Jake: Yeah, no, I kind of knew where you're going as soon as you mentioned that because anyone who's worked in data centers knows that the bandwidth pricing is out of control. And I think one of the cool things that Cloudflare did is they stopped charging for egress bandwidth in certain scenarios, which is kind of amazing. And I think it's—the other thing that a lot of people don't realize is that, you know, these network connections tend to be fully symmetric, right? So, if it's a gigabit down, it's also a gigabit up at the same time, right? There's two gigabits that can be transferred per second.And then the other thing that I find a little bit frustrating on the public cloud is that they don't really pass on the compute performance improvements that have happened over the last few years, right? Like computers are really fast, right? So, if you look at a provider like Hetzner, they're giving you these monster machines for $128 a month or something, right? And then you go and try to buy that same thing on the public, the big cloud providers, and the equivalent is ten times that, right? And then if you add in the bandwidth, it's another multiple, depending on how much you're transferring.Corey: You can get Mac Minis on EC2 now, and you do the math out and the Mac Mini hardware is paid for in the first two or three months of spinning that thing up. And yes, there's value in AWS's engineering and being able to map IAM and EBS to it. In some use cases, yeah, it's well worth having, but not in every case. And the economics get very hard to justify for an awful lot of work cases.Jake: Yeah, I mean, to your point, though, about, like, limiting product features and things like that, like, one of the goals I have with doing infrastructure at Bluesky is to not let the infrastructure be a limiter on our product decisions. And a lot of that means that we'll put servers on Hetzner, we'll colo servers for things like that. I find that there's a really good hybrid cloud thing where you use AWS or GCP or Azure, and you use them for your most critical things, you're relatively low bandwidth things and the things that need to be the most flexible in terms of region and things like that—and security—and then for these, sort of, bulk services, pushing a lot of video content, right, or pushing a lot of images, those things, you put in a colo somewhere and you have these sort of CDN-like servers. And that kind of gives you the best of both worlds. And so, you know, that's the approach that we'll most likely take at Bluesky.Corey: I want to emphasize something you said a minute ago about CloudFlare, where when they first announced R2, their object store alternative, when it first came out, I did an analysis on this to explain to people just why this was as big as it was. Let's say you have a one-gigabyte file and it blows up and a million people download it over the course of a month. AWS will come to you with a completely straight face, give you a bill for $65,000 and expect you to pay it. The exact same pattern with R2 in front of it, at the end of the month, you will be faced with a bill for 13 cents rounded up, and you will be expected to pay it, and something like 9 to 12 cents of that initially would have just been the storage cost on S3 and the single egress fee for it. The rest is there is no egress cost tied to it.Now, is Cloudflare going to let you send petabytes to the internet and not charge you on a bandwidth basis? Probably not. But they're also going to reach out with an upsell and they're going to have a conversation with you. “Would you like to transition to our enterprise plan?” Which is a hell of a lot better than, “I got Slashdotted”—or whatever the modern version of that is—“And here's a surprise bill that's going to cost as much as a Tesla.”Jake: Yeah, I mean, I think one of the things that the cloud providers should hopefully eventually do—I hope Cloudflare pushes them in this direction—is to start—the original vision of AWS when I first started using it in 2006 or whenever launched, was—and they said this—they said they're going to lower your bill every so often, you know, as Moore's law makes their bill lower. And that kind of happened a little bit here and there, but it hasn't happened to the same degree that you know, I think all of us hoped it would. And I would love to see a cloud provider—and you know, Hetzner does this to some degree, but I'd love to see these really big cloud providers that are so great in so many ways, just pass on the savings of technology to the customer so we'll use more stuff there. I think it's a very enlightened viewpoint is to just say, “Hey, we're going to lower the costs, increase the efficiency, and then pass it on to customers, and then they will use more of our services as a result.” And I think Cloudflare is kind of leading the way in there, which I love.Corey: I do need to add something there—because otherwise we're going to get letters and I don't think we want that—where AWS reps will, of course, reach out and say that they have cut prices over a hundred times. And they're going to ignore the fact that a lot of these were a service you don't use in a region you couldn't find a map if your life depended on it now is going to be 10% less. Great. But let's look at the general case, where from C3 to C4—if you get the same size instance—it cut the price by a lot. C4 to C5, somewhat. C5 to C6 effectively is no change. And now, from C6 to C7, it is 6% more expensive like for like.And they're making noises about price performance is still better, but there are an awful lot of us who say things like, “I need ten of these servers to live over there.” That workload gets more expensive when you start treating it that way. And maybe the price performance is there, maybe it's not, but it is clear that the bill always goes down is not true.Jake: Yeah, and I think for certain kinds of organizations, it's totally fine the way that they do it. They do a pretty good job on price and performance. But for sort of more technical companies—especially—it's just you can see the gaps there, where that Hetzner is filling and that colocation is still filling. And I personally, you know, if I didn't need to do those things, I wouldn't do them, right? But the fact that you need to do them, I think, says kind of everything.Corey: Tired of wrestling with Apache Kafka's complexity and cost? Feel like you're stuck in a Kafka novel, but with more latency spikes and less existential dread by at least 10%? You're not alone.What if there was a way to 10x your streaming data performance without having to rob a bank? Enter Redpanda. It's not just another Kafka wannabe. Redpanda powers mission-critical workloads without making your AWS bill look like a phone number.And with full Kafka API compatibility, migration is smoother than a fresh jar of peanut butter. Imagine cutting as much as 50% off your AWS bills. With Redpanda, it's not a pipedream, it's reality.Visit go.redpanda.com/duckbill today. Redpanda: Because your data infrastructure shouldn't give you Kafkaesque nightmares.Corey: There are so many weird AWS billing stories that all distill down to you not knowing this one piece of trivia about how AWS works, either as a system, as a billing construct, or as something else. And there's a reason this has become my career of tracing these things down. And sometimes I'll talk to prospective clients, and they'll say, “Well, what if you don't discover any misconfigurations like that in our account?” It's, “Well, you would be the first company I've ever seen where that [laugh] was not true.” So honestly, I want to do a case study if we do.And I've never had to write that case study, just because it's the tax on not having the forcing function of building in data centers. There's always this idea that in a data center, you're going to run out of power, space, capacity, at some point and it's going to force a reckoning. The cloud has what distills down to infinite capacity; they can add it faster than you can fill it. So, at some point it's always just keep adding more things to it. There's never a let's clean out all of the cruft story. And it just accumulates and the bill continues to go up and to the right.Jake: Yeah, I mean, one of the things that they've done so well is handle the provisioning part, right, which is kind of what you're getting out there. One of the hardest things in the old days, before we all used AWS and GCP, is you'd have to sort of requisition hardware and there'd be this whole process with legal and financing and there'd be this big lag between the time you need a bunch more servers in your data center and when you actually have them, right, and that's not even counting the time takes to rack them and get them, you know, on network. The fact that basically, every developer now just gets an unlimited credit card, they can just, you know, use that's hugely empowering, and it's for the benefit of the companies they work for almost all the time. But it is an uncapped credit card. I know, they actually support controls and things like that, but in general, the way we treated it—Corey: Not as much as you would think, as it turns out. But yeah, it's—yeah, and that's a problem. Because again, if I want to spin up $65,000 an hour worth of compute right now, the fact that I can do that is massive. The fact that I could do that accidentally when I don't intend to is also massive.Jake: Yeah, it's very easy to think you're going to spend a certain amount and then oh, traffic's a lot higher, or, oh, I didn't realize when you enable that thing, it charges you an extra fee or something like that. So, it's very opaque. It's very complicated. All of these things are, you know, the result of just building more and more stuff on top of more and more stuff to support more and more use cases. Which is great, but then it does create this very sort of opaque billing problem, which I think, you know, you're helping companies solve. And I totally get why they need your help.Corey: What's interesting to me about distributed social networks is that I've been using Mastodon for a little bit and I've started to see some of the challenges around a lot of these things, just from an infrastructure and architecture perspective. Tim Bray, former Distinguished Engineer at AWS posted a blog post yesterday, and okay, well, if Tim wants to put something up there that he thinks people should read, I advise people generally read it. I have yet to find him wasting my time. And I clicked it and got a, “Server over resource limits.” It's like wow, you're very popular. You wound up getting—got effectively Slashdotted.And he said, “No, no. Whatever I post a link to Mastodon, two thousand instances all hidden at the same time.” And it's, “Oh, yeah. The hug of death. That becomes a challenge.” Not to mention the fact that, depending upon architecture and preferences that you make, running a Mastodon instance can be extraordinarily expensive in terms of storage, just because it'll, by default, attempt to cache everything that it encounters for a period of time. And that gets very heavy very quickly. Does the AT Protocol—AT Protocol? I don't know how you pronounce it officially these days—take into account the challenges of running infrastructures designed for folks who have corporate budgets behind them? Or is that really a future problem for us to worry about when the time comes?Jake: No, yeah, that's a core thing that we talked about a lot in the recent, sort of, architecture discussions. I'm going to go back quite a ways, but there were some changes made about six months ago in our thinking, and one of the big things that we wanted to get right was the ability for people to host their own PDS, which is equivalent to, like, posting a WordPress or something. It's where you post your content, it's where you post your likes, and all that kind of thing. We call it your repository or your repo. But that we wanted to make it so that people could self-host that on a, you know, four or five $6-a-month droplet on DigitalOcean or wherever and that not be a problem, not go down when they got a lot of traffic.And so, the architecture of AT Proto in general, but the Bluesky app on AT Proto is such that you really don't need a lot of resources. The data is all signed with your cryptographic keys—like, not something you have to worry about as a non-technical user—but all the data is authenticated. That's what—it's Authenticated Transfer Protocol. And because of that, it doesn't matter where you get the data, right? So, we have this idea of this big indexer that's looking at the entire network called the BGS, the Big Graph Server and you can go to the BGS and get the data that came from somebody's PDS and it's just as good as if you got it directly from the PDS. And that makes it highly cacheable, highly conducive to CDNs and things like that. So no, we intend to solve that problem entirely.Corey: I'm looking forward to seeing how that plays out because the idea of self-hosting always kind of appealed to me when I was younger, which is why when I met my wife, I had a two-bedroom apartment—because I lived in Los Angeles, not San Francisco, and could afford such a thing—and the guest bedroom was always, you know, 10 to 15 degrees warmer than the rest of the apartment because I had a bunch of quote-unquote, “Servers” there, meaning deprecated desktops that my employer had no use for and said, “It's either going to e-waste or your place if you want some.” And, okay, why not? I'll build my own cluster at home. And increasingly over time, I found that it got harder and harder to do things that I liked and that made sense. I used to have a partial rack in downtown LA where I ran my own mail server, among other things.And when I switched to Google for email solutions, I suddenly found that I was spending five bucks a month at the time, instead of the rack rental, and I was spending two hours less a week just fighting spam in a variety of different ways because that is where my technical background lives. Being able to not have to think about problems like that, and just do the fun part was great. But I worry about the centralization that that implies. I was opposed to it at the idea because I didn't want to give Google access to all of my mail. And then I checked and something like 43% of the people I was emailing were at Gmail-hosted addresses, so they already had my email anyway. What was I really doing by not engaging with them? I worry that self-hosting is going to become passe, so I love projects that do it in sane and simple ways that don't require massive amounts of startup capital to get started with.Jake: Yeah, the account portability feature of AT Proto is super, super core. You can backup all of your data to your phone—the [AT 00:28:36] doesn't do this yet, but it most likely will in the future—you can backup all of your data to your phone and then you can synchronize it all to another server. So, if for whatever reason, you're on a PDS instance and it disappears—which is a common problem in the Mastodon world—it's not really a problem. You just sync all that data to a new PDS and you're back where you were. You didn't lose any followers, you didn't lose any posts, you didn't lose any likes.And we're also making sure that this works for non-technical people. So, you know, you don't have to host your own PDS, right? That's something that technical people can self-host if they want to, non-technical people can just get a host from anywhere and it doesn't really matter where your host is. But we are absolutely trying to avoid the fate of SMTP and, you know, other protocols. The web itself, right, is sort of… it's hard to launch a search engine because the—first of all, the bar is billions of dollars a year in investment, and a lot of websites will only let us crawl them at a higher rate if you're actually coming from a Google IP, right? They're doing reverse DNS lookups, and things like that to verify that you are Google.And the problem with that is now there's sort of this centralization with a search engine that can't be fixed. With AT Proto, it's much easier to scrape all of the PDSes, right? So, if you want to crawl all the PDSes out on the AT Proto network, they're designed to be crawled from day one. It's all structured data, we're working on, sort of, how you handle rate limits and things like that still, but the idea is it's very easy to create an index of the entire network, which makes it very easy to create feed generators, search engines, or any other kind of sort of big world networking thing out there. And then without making the PDSes have to be very high power, right? So, they can do low power and still scrapeable, still crawlable.Corey: Yeah, the idea of having portability is super important. Question I've got—you know, while I'm talking to you, it's, we'll turn this into technical support hour as well because why not—I tend to always historically put my Twitter handle on conference slides. When I had the first template made, I used it as soon as it came in and there was an extra n in the @quinnypig username at the bottom. And of course, someone asked about that during Q&A.So, the answer I gave was, of course, n+1 redundancy. But great. If I were to have one domain there today and change it tomorrow, is there a redirect option in place where someone could go and find that on Blue-ski, and oh, they'll get redirected to where I am now. Or is it just one of those 404, sucks to be you moments? Because I can see validity to both.Jake: Yeah, so the way we handle it right now is if you have a, something.bsky.social name and you switch it to your own domain or something like that, we don't yet forward it from the old.bsky.social name. But that is totally feasible. It's totally possible. Like, the way that those are stored in your what's called your [DID record 00:31:16] or [DID document 00:31:17] is that there's, like, a list that currently only has one item in general, but it's a list of all of your different names, right? So, you could have different domain names, different subdomain names, and they would all point back to the same user. And so yeah, so basically, the idea is that you have these aliases and they will forward to the new one, whatever the current canonical one is.Corey: Excellent. That is something that concerns me because it feels like it's one of those one-way doors, in the same way that picking an email address was a one-way door. I know people who still pay money to their ancient crappy ISP because they have a few mails that come in once in a while that are super-important. I was fortunate enough to have jumped on the bandwagon early enough that my vanity domain is 22 years old this year. And my email address still works,which, great, every once in a while, I still get stuff to, like, variants of my name I no longer use anymore since 2005. And it's usually spam, but every once in a blue moon, it's something important, like, “Hey, I don't know if you remember me. We went to college together many years ago.” It's ho-ly crap, the world is smaller than we think.Jake: Yeah.j I mean, I love that we're using domains, I think that's one of the greatest decisions we made is… is that you own your own domain. You're not really stuck in our namespace, right? Like, one of the things with traditional social networks is you're sort of, their domain.com/yourname, right?And with the way AT Proto and Bluesky work is, you can go and get a domain name from any registrar, there's hundreds of them—you know, we'd like Namecheap, you can go there and you can grab a domain and you can point it to your account. And if you ever don't like anything, you can change your domain, you can change, you know which PDS you're on, it's all completely controlled by you. And there's nearly no way we as a company can do anything to change that. Like, that's all sort of locked into the way that the protocol works, which creates this really great incentive where, you know, if we want to provide you services or somebody else wants to provide you services, they just have to compete on doing a really good job; you're not locked in. And that's, like, one of my favorite features of the network.Corey: I just want to point something out because you mentioned oh, we're big fans of Namecheap. I am too, for weird half-drunk domain registrations on a lark. Like, “Why am I poor?” It's like, $3,000 a month of my budget goes to domain purchases, great. But I did a quick whois on the official Bluesky domain and it's hosted at Route 53, which is Amazon's, of course, premier database offering.But I'm a big fan of using a enterprise registrar for enterprise-y things. Wasabi, if I recall correctly, wound up having their primary domain registered through GoDaddy, and the public domain that their bucket equivalent would serve data out of got shut down for 12 hours because some bad actor put something there that shouldn't have been. And GoDaddy is not an enterprise registrar, despite what they might think—for God's sake, the word ‘daddy' is in their name. Do you really think that's enterprise? Good luck.So, the fact that you have a responsible company handling these central singular points of failure speaks very well to just your own implementation of these things. Because that's the sort of thing that everyone figures out the second time.Jake: Yeah, yeah. I think there's a big difference between corporate domain registration, and corporate DNS and, like, your personal handle on social networking. I think a lot of the consumer, sort of, domain registries are—registrars—are great for consumers. And I think if you—yeah, you're running a big corporate domain, you want to make sure it's, you know, it's transfer locked and, you know, there's two-factor authentication and doing all those kinds of things right because that is a single point of failure; you can lose a lot by having your domain taken. So, I completely agree with you on there.Corey: Oh, absolutely. I am curious about this to see if it's still the case or not because I haven't checked this in over a year—and they did fix it. Okay. As of at least when we're recording this, which is the end of May 2023, Amazon's Authoritative Name Servers are no longer half at Oracle. Good for them. They now have a bunch of Amazon-specific name servers on them instead of, you know, their competitor that they clearly despise. Good work, good work.I really want to thank you for taking the time to speak with me about how you're viewing these things and honestly giving me a chance to go ambling down memory lane. If people want to learn more about what you're up to, where's the best place for them to find you?Jake: Yeah, so I'm on Bluesky. It's invite only. I apologize for that right now. But if you check out bsky.app, you can see how to sign up for the waitlist, and we are trying to get people on as quickly as possible.Corey: And I will, of course, be talking to you there and will put links to that in the show notes. Thank you so much for taking the time to speak with me. I really appreciate it.Jake: Thanks a lot, Corey. It was great.Corey: Jake Gold, infrastructure engineer at Bluesky, slash Blue-ski. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that will no doubt result in a surprise $60,000 bill after you posted.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Avi Freedman, CEO at Kentik, joins Corey on Screaming in the Cloud to discuss the fun of solving for observability. Corey and Avi discuss how great simplicity can be deceiving, and Avi points out that with great simplicity comes great complexity. Avi discusses examples of this that he sees in Kentik customer environments, as well as the differences he sees in cloud environments from traditional data center environments. Avi also reveals his predictions for the future and how enterprise M&A will affect the way companies view data centers and VPCs. About AviAvi Freedman is the co-founder and CEO of network observability company Kentik. He has decades of experience as a networking technologist and executive. As a network pioneer in 1992, Freedman started Philadelphia's first ISP, known as netaxs. He went on to run network operations at Akamai for over a decade as VP of network infrastructure and then as chief network scientist. He also ran the network at AboveNet and was the CTO of ServerCentral.Links Referenced: Kentik: https://kentik.com Email: avi@kentik.com Twitter: https://twitter.com/avifreedman LinkedIn: https://www.linkedin.com/in/avifreedman TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Most Companies find out way too late that they've been breached. Thinkst Canary changes this. Deploy Canaries and Canarytokens in minutes and then forget about them. Attackers tip their hand by touching 'em giving you the one alert, when it matters. With 0 admin overhead and almost no false-positives, Canaries are deployed (and loved) on all 7 continents. Check out what people are saying at canary.love today!Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. This promoted guest episode is brought to us by our friends at Kentik. And into my social grist mill, they have thrown Avi Freedman, their CEO. Avi, thank you for joining me.Avi: Thank you for having me, Corey. I've been a big fan for some time, I have never actually fallen off my seat laughing, but I've come close a couple times on some of your threads.Corey: You must have a great chair.Avi: I should probably upgrade it [laugh].Corey: [laugh]. I have been looking forward to this conversation for a while because you are one of those rare creatures who comes from a similar world to what I did where we were grumpy and old before our time because we worked on physical infrastructure in data centers, we basically wrangled servers into doing the things that we wanted them to do when hardware reliability was an aspiration rather than a reality. And we also moved on from that, in many ways. We are not blind to the modern order of how computers work. But you still run a lot of what you do in data centers, but many of your customers are in cloud. You speak both languages very fluently because of the unifying thread between all of this, which is, of course, the network. How did you wind up in, I guess we'll call it network hell.Avi: [laugh]. I mean, network hell was truly… in the '90s, when the internet was—I mean, the internet is sort of like the human body: the more you study it, the more amazing it is that it ever worked in the first place, not that it breaks sometimes—was the bugs, and trying to put together the technology back then, you know, that we had the life is pretty good nowadays, other than the [laugh] immense complexity that has been unleashed on us by everyone taking the same technology and then writing it in their own software and giving it their own marketing names. And thus, you have multi-cloud networking. So, got into it because it's a problem that needs to be solved, right? There's no ESP that connects the applications together; the network still needs to make it work. And now people own some of it, and then more of it, they don't own, but they're still responsible for it. So, it's a fun problem to solve.Corey: The timing of this episode is apt because I've used Kentik myself for a few things over the years. And to be fair, using it for any of my personal networking problems is a bit like noticing, “Oh, I have a loose thread here on my shirt. Pass me the chainsaw.” It's, my environment is tiny and it's over-scoped. But I just earlier this week wound up having to analyze a day's worth of Flow Logs from one of my clients, and to do this, I had to spin up an EC2 instance with 128 gigs of RAM and then load the Flow Logs for that day into RAM, and then—not kidding—I ran into OOM Killer because I ran out of RAM on this thing.Avi: [laugh].Corey: It is, like, yeah, that's right. The network is chatty, the logs are immense, and it's easy to forget. Because the reason I was doing this was just to figure out what are the things that are talking to each other in this environment to drive up some aspects of data transfer costs. But that is an esoteric use case for this; it's not why most people tend to think about network observability. So, I'm going to ask you the blunt question up front here because it might be a really short episode. Do we have to care about networking in the least now that cloud is the default in most locations? It is just an API call away, isn't it?Avi: With great simplicity comes great complexity. So, to the people running infrastructure, to developers or architects, turning it all on, it looks like just API calls. But did you set the policies right? Can the things talk to each other? Are they talking in patterns that are causing you wild data transfer costs?All these things ultimately come back to some team that actually has to make it go. And can be pretty hard to figure that out, right, because it's not just the VPC Flow Logs. It's, what's the policy? It's, what are they talking to that maybe isn't in that cloud, that's maybe in another cloud? So, how do you bring it all together? Like, you could have—and maybe you should have—used Athena, right? You can put VPC Flow Logs in S3 buckets and use Athena and run SQL queries if all you want is your top talker.Corey: Oh, I did. That's how I started, but Athena is, uh… it has some challenges. Let's just put it that way and leave it there. DuckDB is what I was using and I'm much happier with it for a variety of excellent reasons.Avi: Okay. Well, I'll tease you another time about, you know—I lost this battle at Kentik. We actually don't use swap, but I'm a big fan of having swap and monitoring it so the OOM Killer only does what you want or doesn't fire at all. But that's a separate religious debate.Corey: There's a counterargument of running an in-memory data store. And then oh, we're going to use it as swap though, so it's like, hang on, this just feels like running a normal database with extra steps.Avi: Computers allow you to do amazing things and only occasionally slap you nowadays with it. It's pretty amazing. But back to the question. APIs make it easy to turn on, but not so easy to run. The observability that you get within a given cloud is typically very limited.Google actually has the best. They show some topology and other things. I mean, a lot of what we do involves scraping API calls in the cloud to figure out what does this all mean, then convolving it with the VPC Flow Logs and making it look like a network, and what are the gateways, and what are the rules being applied and what can't talk to itself? If you just look at VPC Flow Logs like it's Syslog, good luck trying to figure out what VPCs are talking to each other. It's exactly the problem that you were describing.So, the ease of turning it on is exactly inversely proportional to the ease of running it. And, you know, as a vendor, we think it's an awesome [laugh] problem, but we feel for our customers. And you know, occasionally it's a pain to get the IAM roles set up to scrape things and help them, but that's you know, that's just part of the job.Corey: It's fascinating to me, just looking from an AWS perspective, just how much work clearly has to be done to translate their Byzantine and very strange networking environment and concepts into things that customers see. Because in many cases, the things that the virtual machines that we've run on top of EC2, let alone anything higher level, is being lied to the entire time about what the actual topology of the environment is. It was most notable, for me at least, at re:Invent 2022, the most recent one, where they announced they have a TCP replacement, scalable, reliable data grammar SRD. It's a new protocol entirely. It's, “Oh, wow, can we use it?” “No.” “Okay.” Like, I get that it's a lot of work, I get you're excited about it. Are you going to talk to us about how it actually works? “Oh, absolutely not.” So… okay, good for you, I guess.Avi: Doesn't Amazon have to write a press release before they build anything, and doesn't the press release have to say, like, why people give a shit, why people care?Corey: Yep. And their story on this was oh, it enables us to be a lot faster at letting EBS volumes talk to some of our beefier instances.Avi: [laugh].Corey: And that's all well and good, don't get me wrong, but it's also, “Yay, it's more reliable,” is a difficult message to send. I mean, it's hard enough when—and it's necessary because you've got to tacitly admit that reliability and performance haven't been all they could be. But when it's no longer an issue for most folks, now you're making them wonder, like, wait, how bad was it? It's just a strange message.Avi: Yeah. One of my projects for this weekend is, I actually got a gaming PC and I'm going to try compression offload to the CUDA cores because right now, we do compress and decompress with Intel cores. And like, if I'm successful there and we can get 30% faster subqueries—which doesn't really matter, you know, on the kind of massive queries we run—and 20% more use out of the computers that we actually run, I'm probably not going to do a press release about it. But good to see the pattern.But you know, what you said is pretty interesting. As people like Kentik, we have to put together, well, on Azure, you can have VPCs that cross regions, right? And in other places, you can't. And in Google, you have performance metrics that come out and you can get it very frequently, and in Amazon and Azure, you can't. Like, how do you take these kinds of telemetry that are all the same stuff underneath, but packaged up differently in different quantos and different things and make it all look the same is actually pretty fun and interesting.And it's pretty—you know, if you give some cloud engineers who focus on the infrastructure layer enough beers or alcohol or just room to talk, you can hear some funny stories. And it all made sense to somebody in the first place, but unpacking it and actually running it as a common infrastructure can be quite fun.Corey: One of the things that I have found notable about your perspective, as particularly, you're running all of the network ingest, to my understanding, in your data center environment. Because we talked about this when you were kind enough to invite me to your company all-hands offsite, presumably I assume when people do that, it's so they can beat me up in the alley, but that only happened twice. I was very pleasantly surprised.Avi: [And you 00:09:23] made fun of us only three times, so you know, you beat us—Corey: Exactly.Avi: —but it was all enjoyed.Corey: But always with love. Now, what I found fascinating was you and I sat down for a while and you talked about your data center architecture. And you asked me—since I don't have anything to sell you—is there an economical way that I could see running your environment on top of AWS? And the answer was sure, if by economical you mean an absolute minimum of six times what you're currently paying a year, sure you can get there. But it just does not make sense for any realistic approach to doing this.And the reason I bring this up is that you're in a data center not because of religious beliefs, “Of, well, this is good enough for my grandpappy, so it's good enough for me.” It's because it solves the problem you have in a way that the cloud providers clearly cannot. But you also are not anti-cloud. So, many folks who are all-in on data centers seem to be doing it out of pure self-interest where, well, if everyone goes all-in on cloud, then we have nothing left to sell them. I've used AWS VPC Flow Logs. They have nothing that could even remotely be termed network observability. Your future is assured as long as people understand what it is that you're providing them and what are you that adds. So yeah, people keep going in a cloud direction, you're happy as houses.Avi: We'll use the best tools for building our infrastructure that we can, right? We use cloud. In fact, we're just buying some reserved instances, which always, you know, I give it the hairy eyeball, but you know, we're probably always going to have our CI/CD bursty stuff in the cloud. We have performance testing regions on all the major clouds so that we can tell people what performance is to and from cloud. Like, that we have to use cloud for.And if there's an always-on model, which starts making sense in the cloud, then I try not to be the first to use anything, but [laugh] we'll be one of the first to use it. But every year, we talk to, you know, the major clouds because we're customers of all them, for as I said, our testing infrastructure if nothing else, and you know, some of them for some other parts, you know, for example, proxying VPC Flow Logs, we run infrastructure on Kubernetes in all—in the three biggest to proxy VPC Flow Logs, you know, and so that's part of our bill. But if something's always on, you know, one of our storage servers, it's a $15,000 machine that, you know, realistically runs five years, but even if you assume it runs three years, we get financing for it, cost a couple $100 a month to host, and that's inclusive of our ops team that runs, sort of, everything, you just do the math. That same machine would be, you know, even not including data transfer would be maybe 3500 a month on cloud. The economics just don't quite make sense.For burst, for things like CI/CD, test, seasonality, I think it's great. And if we have patterns like that, you know, we're the first to use it. So, it's just a question of using what's best. And a lot of our customers are in that realm, too. I would say some of them are a little over-rotated, you know, they've had big mandates to go one way or the other and don't have the right, you know, sort of nuanced view, but I think over time, that's going to fix itself. And yeah, as you were saying, like, the more people use cloud, the better we do, so it's just really a question of what's the best for us at our infrastructure and at any given time.Corey: I think that that is something that is not fully appreciated or well understood is that I work with cloud technologies because for what I do, it makes an awful lot of sense. But I've been lately doing a significant build-out in my home network on the perspective of yeah, this makes sense for what I do. And I now have increased number of workloads that I'm running here and I got to say, it feels a little strange, on some level, not to be paying AWS on something metered by the second whenever I'm running a job here. That always feels a little on the weird side. But I'm not suggesting I filled my house with servers either.Avi: [unintelligible 00:13:18] going to report you to the House on Cloudian Activities Committee [laugh] for—Corey: [laugh].Avi: To straighten you out about your infrastructure use and beliefs. I do have to ask you, and I do have some foreknowledge of this, where is the controller for your network running? Is it running in your house or—Corey: Oh, the WiFi controller lives in Ohio with all the other unpleasant things. I mean, even data transfer between Ohio and Virginia—if you're on AWS—is half-price because data wants to get out of Ohio just as much as the people do. And that's fine, but it can also fail out of band. I can chill that thing for a while and I'm not able to provision new equipment, I can't spin up new SSIDs, but—Avi: Right. It's the same as [kale scale 00:14:00], which is, like, sufficiently indistinguishable from magic, but it's nice there's [head scale 00:14:05] in case something happened to them. But yeah, you know, you just can't set up new stuff without your SSHing old way while it's down. So.Corey: And worst case, it goes away irretrievably, I can spin a new one up, I can pair everything locally, do it by repointing DNS locally, and life will go on. It's one of those areas where, like, I would not have this in Ohio if latency was a concern if it was routing every packet out halfway across the country before it hit the general internet. That would be a challenge for me. But that's not what I'm doing.Avi: Yeah, yeah. No, that makes sense. And I think also—Corey: And I certainly pay AWS by the second for that thing. That's—I have a three-year savings plan for that thing, and if nothing else, it was useful for me just to figure out what the living hell was going on with the savings plan purchase project one year. That was just, it was challenged to get that straightened out in some ways. Turns out that the high watermark of the console is a hundred-and-some-odd-thirty-million dollars you can add to cart and click the buy button. Have fun.Avi: My goodness. Okay, well.Corey: The API goes up to $26.2 billion. Try that in a free tier account, preferably someone else's.Avi: I would love to have such problems. Right now, that is not one of them. We don't spend that much on infrastructure.Corey: Oh, that is more than Amazon's—AWS's at least—quarterly revenue. So, if you wind up doing a $26.2 billion, it's like—it's that old saw. You owe Amazon a million dollars, you have a problem. If you owe Amazon $26 billion, Amazon has a problem. Yeah, that's when Andy Jassy calls you 20 minutes after you make that purchase, and at least to me, he yells at me with a, “Listen here, asshole,” and it sort of devolves from there.Avi: Well, I do live in Seattle, so you know, they send the posse out, I'm pretty sure.Corey: [laugh] I will be keynoting DevOpsDays Seattle on August 1st with a talk that might very well resonate with your perspective, “The Modern Devops: A Million Ways to Die in Production.”Avi: That is very cool. I mean, ultimately, I think that's what cloud comes back to. When cloud was being formed, it's just other people's computers, storage, and network. I don't know if you'd argue that there's a politics, control plane, or a—Corey: Oh, I would say, “Cloud? There's no cloud; just someone else's cost center.”Avi: Exactly. And so, how do you configure it? And back to the question of, should everything be on-prem or does cloud abstract at all, it's all the same stuff that we've been doing for decades and decades, just with other people's software and names, which you help decode. And then it's the question we've always had: what's the best thing to do? Do you like [Wellfleet 00:16:33] or [Protion 00:16:35]? Now, do you like Azure [laugh] or Google or Amazon or somebody else or running your own?Corey: It's almost this generation's equivalent of Vi versus Emacs.Avi: Yes. I guess there could be a crowd equivalent. I use VI, but only because I'm a lisp addict and I don't want to get stuck refining Eliza macros and connecting to the ChatGPT in Emacs. So, you know. Someone just did a Emacs as PID 0. So basically, no init, just, you know, the kernel boots into Emacs, and then someone of course had to do a VI as PID 0. And I have to admit, Emacs would be a lot more useful as a PID 0, even though I use VI.Corey: I would say that—I mean, you wind up in writing in Emacs and writing lisp in it, then I've got to say every third thing you say becomes a parenthetical.Avi: Exactly. Ha.Corey: But I want to say that there's also a definite moving of data going on there that I think is a scale that, for those of us working mostly in home labs and whatnot, can be hard to imagine. And I see that just in terms of the volume of Flow Logs, which to be clear, are smaller than the data transfer they are representing in almost every case.Avi: Almost every.Corey: You see so much of the telemetry that comes out of there and what customers are seeing and what their problems are, in different ways. It's not just Flow Logs, you ingest a whole bunch of different telemetry through a variety of modern and ancient and everything in between variety of protocols to support, you know, the horror that is network equipment interoperability. And just, I can't—I feel like I can't do a terrific job of doing justice to describing just how comprehensive Kentik is, once you get it set up as a product. What is on the wire has always been for me the arbiter of truth because computers will lie to you, but it's very tricky to get them to lie and get the network story to cover for it.Avi: Right. I mean, ultimately, that's one of the sources of truth. There's routing, there's performance testing, there's a whole lot of different things, and as you were saying, in any one of these slices of your, let's just pick the network. There's many different things that all mean the same, but look different that you need to put together. You could—the nerd term would be, you know, normalizing. You need to take all this stuff and normalize it.But traffic, we agree, that's where we started with. We call it the what if what is. What's actually happening on the infrastructure and that's the ancient stuff like IPFIX and NetFlow and sFlow. Some people that would argue that, you know, the [IATF 00:19:04] would say, “Oh, we're still innovating and it's still current,” but you know, it's certainly on-prem only. The major cloud vendors would say, “Oh, well, you can run the router—cloud routers—or you could run cloud versions of the big routers,” but we don't really see that as a super common pattern today.But what's really the difference between NetFlow and the VPC Flow Log? Well, some VPC Flow Logs have permit deny because they're really firewall logs, but ultimately, it's something went from here to there. There might not be a TCP flag, but there might be something else in cloud. And, you know, maybe there's rum data, which is also another kind of traffic. And ultimately, all together, we try to take that and then the business metadata to say, whether it's NetBox in the old world or Kubernetes in the new world, or some other [unintelligible 00:19:49], what application is this? What user is this?So, you can ask questions about why am I blowing up between these cloud regions? What applications are doing it, right? VPC Flow Logs by themselves don't know that, so you need to add that kind of metadata in. And then there's performance testing, which is sort of the what is. Something we do, Thousand Eyes does, some other people do.It's not the actual source of truth, but for example, if you're having a performance problem getting between, you know, us-east and Azure in the east, well, there's three other ways you can get there. If your actual traffic isn't getting there that way, then how do you know which one to use? Well, let's fire up some tests. There's all the metrics on what all of the devices are reporting, just like you get metrics from your machines and from your applications, and then there's stuff even up at the routing layer, which God help you, hopefully you don't need to actually get in and debug, but sometimes you do. And sometimes, you know, your neighbor tells the mailman that that mail is for me and not for you and they believe them and then you have a big problem when your bills don't get paid.The same thing happens in the cloud, the same thing happens on the internet [unintelligible 00:20:52] at the routing. So, the goal is, take all the different sources of it, make it the same within each type, and then pull it all together so you can look at a single place, you can look at a map, you can look at everything, whether it's the cloud, whether it's your own data centers, your own WAN, into the internet and in between in a coherent way that understands your application. So, it's a small task that we've bit off, but you know, we have fun solving it.Corey: Do you find that when you look at customer environments, that they are, and I don't mean to be disparaging here, truly I don't, but if you were to ask me to design something today, I would probably not even be using VPCs if I'm doing this completely greenfield. I would be a lot more cloud-first, et cetera, et cetera. Whereas in many cases, that is not the right path, especially if you know, customers have the temerity to not be founded within the last 18 months before AWS existed in some ways. Do you find that the majority of what they're doing looks like they're treating the cloud like data centers or do you find that they are leveraging cloud in ways that surprise you and would not be possible in traditional data centers? Because I can't shake the feeling that the network has a source of truth for figuring out what's really going on [is very hard to beat 00:22:05].Avi: Yes, for the most part, to both your assertion at the end and sort of the question. So, in terms of the question, for the most part, people think of VPCs as… you know, they could just equivalent be VLANs and [unintelligible 00:22:21], right? I've got policies, and I have these things that are talking to each other, and everything else is not local. And I've got—you know, it's not a perfect mapping to physical interfaces in VLANs but it's the equivalent of that.And that is sort of how people think about it. In the data center, you'd call it micro-segmentation, in the cloud, you call it clouding, but you know, just applying all the same policies and saying this stuff can talk to each other and not. Which is always sort of interesting, if you don't actually know what is talking [laugh] to each other to apply those policies. Which is a lot of what you know, Kentik gets brought in for first. I think where we see the cloud-native thinking, which is overlaid on top of that—you could call it overlay, I guess—which is service mesh.Now, putting aside the question of what's going to be a service mesh, what's going to be a network mesh, where there's something like [unintelligible 00:23:13] sit, the idea that there's a way that you look at traffic above the packets at, you know, layers three to more layer seven, that can do things like load balancing, do things like telemetry, do things like policy enforcement, that is a layer that we see very commonly that a lot of the old school folks have—you know, they want their lsu F5s and they want their F5 script. And they're like, “Why can't I have this in the cloud?”—which I guess you could buy it from F5 if you really want—but that's pretty common. Now, not everything's a sidecar anymore and there's still debates about what's going on there, but that's pretty common, even where the underlying cloud just looks like it could just be a data center.And that seems to be state of the art, I would say, our traditional enterprise customers, for sure. Our web company customers, and you know, service providers use cloud more for their OTT and some other things. As we work with them, they're a little bit more likely to be on-prem, you know, historic. But remember, in the enterprise, there's still a lot of M&A going on, I think that's even going to pick up in the next couple of years and a lot of what they're doing is lift-and-shift of [laugh] actual data centers. And my theory is, it's got to be easier to just make it look like VPCs than completely redo it.Corey: I'd say that there's reasons that things are the way that they are. Like, ignoring that this is the better approach from a technical perspective entirely because that's often not the only answer, it's we have assurances we made as part of audit compliance regimes, of our SOC 2, of how we handle certain things and what those controls are. And yeah, it's not hard for even a junior employee, most of the time, to design a reasonable architecture on a whiteboard. The problem is, how do you take something pre-existing and get it to a state that closely resembles that while not turning it off for a long time?Avi: Right. And I think we're starting to see some things that probably shouldn't exist, like, people trying to do VXLAN as overlays into and between VPCs because that was how their data s—you know, they're more modern on the data center side and trying to do that. But generally, I think people have an understanding they need to be designing architecture for greenfield things that aren't too far bleeding edge, unless it's like a pure developer shop, and also can map to the least common denominator kinds of infrastructure that people have. Now, sometimes that may be serverless, which means, you know, more CDN use and abstracted layers in front, but for, you know, running your own components, we see a lot of differences but also a lot of commonality. It's differences at the micro but commonality the macro. And I don't know what you see in your practice. So.Corey: I will say that what I see in practice is that there's a dichotomy where you have born-in-the-cloud companies where 80% of their spend is on a single workload and you can do a whole bunch of deep optimizations. And then you see the conglomerate approach where it's giant spend, but it's all very diffuse across 1500 different applications. And different philosophies, different processes, different cultures give rise to a lot of these things. I will say that if I had a magic wand, I would—and again, the fact that you sponsor and promote this episode is deeply appreciated. Thank you—Avi: You're welcome.Corey: —but it does not mean that you get to compromise my authenticity and objectivity. You can't buy my opinion, just my attention. But I will say this, that I would love it if my customers used Kentik because it is one of the best things I've ever seen to describe what is talking to what that scale and in volume without going super deep into the weeds. Now, obviously, I'm not allowed to start rolling out random things into customer environments. That's how I get sued to death. But, ugh, I wish it was there.Avi: You probably shouldn't set up IAM rules without asking them, yes. That wouldn't be bad.Corey: There's a reason that the only writable stuff that I have access to is generating reports in Cost Explorer.Avi: [laugh]. Okay.Corey: Everything else is read-only. All we do is to have conversations with folks. It sets context for those conversations. I used to think that we'd be doing this as a software offering. I no longer believe that actually solves the in-depth problems that people have.Avi: Well, I appreciate the praise. I even take some of the backhanded praise slash critique at the beginning because we think a lot about, you know, we did design for these complex and often hybrid infrastructures and it's true, we didn't design it for the two or four router, you know, infrastructure. If we had bootstrapped longer, if we'd done some other things, we might have done it that way. We don't want to be exclusionary. It's just sort of how we focus.But in the kind of customers that you have, these are things that we're thinking about what can we do to make it easier to onboard because people have these massive challenges seeing the traffic and understanding it and the cost and security and the performance, but to do more with the VPC Flow Logs, we need to get some of those metrics. We think about should we make an open-source thing. I don't know how much you've seen the concern that people have universally across cloud providers that they turn on something like Kentik, and they're going to hit their API rate limiter. Which is like, really, you can't build a cache for that at the scale that these guys run at, the large cloud providers. I don't really understand that. But it is what it is.We spent a lot of time thinking about that because of security policy, and getting the kind of metrics that we need. You know, if we open-source some of that, would it make it easier, plug it into people's observability infrastructure, we'd like to get that onboarding time down, even for those more complex infrastructures. But you know, the payoff is there, you know? It only takes a day of elapsed time and one hour or so. It's just you got to get a lot of approvals to get the kind of telemetry that you need to make sense of this in some environments.Corey: Oh, yes. And that's part of the problem, too, is like, you could talk about one of those big environments where you have 1500 apps all talking to each other. You can't make sense of any of it without talking to people and having contacts and occasionally get a little bit of [unintelligible 00:29:07] just what these things are named. But at that point, you're just speculating wildly. And, you know, it's an engineering trap, where I'm just going to guess rather than asking someone who knows the answer because I don't want to look foolish. It's… you just three weeks chasing your own tail. Who's the foolish one?Avi: We're not in a competitive business to yours—Corey: [laugh].Avi: But I do often ask when we're starting off, “So, can you point us at the source of truth that describes what all your applications are?” And usually, they're, like, “[laugh]. No.” But you know, at the same time to make sense of this stuff, you also need that metadata and that's something that we designed to be able to take.Now, Kubernetes has some of that. You may have some of it in ServiceNow, a lot of people use. You may have it in your own text file, CSV somewhere. It may be in NetBox, which we've seen people actually use for the cloud, more on the web company and service provider side, but even some traditional enterprise is starting to use it. So, a lot of what we have to do as a vendor is put all that together because yeah, when you're running across multiple environments and thousands of applications, ultimately scrying at IP addresses and VPC IDs is not going to be sufficient.So, the good news is, almost everybody has those sources and we just tried to drag it out of them and pull it back together. And for now, we refuse to actually try to get into that business because it's not a—seems sort of like, you know, SAP where you're going to be sending consultants forever, and not as interesting as the problems we're trying to solve.Corey: I really want to thank you, not just for supporting the show of course, but also for coming here to suffer my slings and arrows. If people want to learn more, where's the best place for them to find you? And please don't respond with an IP address.Avi: 127.0.0.1. You're welcome at my home at any time.Corey: There's no place like localhost.Avi: There's no place like localhost. Indeed. So, the company is kentik.com, K-E-N-T-I-K. I am avi@kentik.com. I am@avifriedman on Twitter and LinkedIn and some other things. And happy to chat with nerds, infrastructure nerds, cloud nerds, network nerds, software nerds, debate, maybe not VI versus Emacs, but should you swap space or not, and what should your cloud architecture look like?Corey: And we will, of course, put links to that in the [show notes 00:31:20].Avi: Thank you.Corey: Thank you so much for being so generous with your time. I really appreciate it.Avi: Thank you for having this forum. And I will let you know when I am down in San Francisco with some time.Corey: I would be offended if you didn't take the time to at least say hello. Avi Friedman, CEO at Kentik. I'm Cloud Economist Corey Quinn, and this has been a promoted guest episode of Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a all five-star review on your podcast platform of choice, along with an angry comment saying how everything, start to finish, is somehow because of the network.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
Jeremy Snyder, Founder of FireTail, joins Corey on Screaming in the Cloud to discuss his career journey and what led him to start FireTail. Jeremy reveals what's changed in cloud since he was an AE and AWS, and walks through how the need for customization in cloud security has led to a boom in the number of security companies out there. Corey and Jeremy also discuss the costs of cloud security, and Jeremy points out some of his observations in the world of cloud security pricing and packaging. About JeremyJeremy is the founder and CEO of FireTail.io, an end-to-end API security startup. Prior to FireTail, Jeremy worked in M&A at Rapid7, a global cyber leader, where he worked on the acquisitions of 3 companies during the pandemic. Jeremy previously led sales at DivvyCloud, one of the earliest cloud security posture management companies, and also led AWS sales in southeast Asia. Jeremy started his career with 13 years in cyber and IT operations. Jeremy has an MBA from Mason, a BA in computational linguistics from UNC, and has completed additional studies in Finland at Aalto University. Jeremy speaks 5 languages and has lived in 5 countries. Once, Jeremy went 5 days without seeing another human, but saw plenty of reindeer.Links Referenced: Firetail: https://firetail.io Email: jeremy@firetail.io TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Jeremy Snyder, who's the founder at Firetail. Jeremy, thank you for joining me today. I appreciate you taking the time from your day to suffer my slings and arrows.Jeremy: My pleasure, Corey. I'm really happy to be here.Corey: So, we'll get to a point where we talk about what you're up to these days, but first, I want to dive into the jobs of yesteryear because over a decade ago, you did a stint at AWS doing sales. And not to besmirch your hard work, but it feels like at the time, that must have been a very easy job. Because back then it really felt across the board like the sales motion was basically responding to, “Well, why should we do business with you?” And the response is, “Oh, you misunderstand. You have 87 different accounts scattered throughout your organization. I'm just here to give you visibility, governance, and possibly some discounting over that.” It feels like times have changed in a lot of ways since then. Is that accurate?Jeremy: Well, yeah, but I will correct a couple of things in there. In my days—Corey: Oh, please.Jeremy: —almost nobody had more than one account. I was in the one account, no VPCs, you know, you only separate your workloads by tagging days of AWS. So, our job was a lot, actually, harder at the time because people couldn't wrap their heads around the lack of subnetting, the lack of workload segregation. All of that was really, like, brand new to people, and so you were trying to tell them like, “Hey, you're going to be launching something on an EC2 instance that's in the same subnet as everybody else's EC2 instance.” And people were really worried about lateral traffic and sniffing and what could their neighbors or other customers on AWS see. And by the way, I mean, this was the customers who even believed it was real. You know, a lot of the conversations we went into with people was, “Oh, so Amazon bought too many servers and you're trying to sell us excess capacity.”Corey: That legend refuses to die.Jeremy: And, you know, it is a legend. That is not at all the genesis of AWS. And you know, the genesis is pretty well publicized at this point; you can go just google, “how did AWS started?” You can find accurate stuff around that.Corey: I did it a few years ago with multiple Amazon execs and published it, and they said definitively that that story was not true. And you can say a lot about AWS folks, and I assure you, I do, but I also do not catch them lying to my face, ever. And as soon as that changes, well, now we're going to have a different series of [laugh] conversations that are a lot more pointed. But they've earned some trust there.Jeremy: Yeah, I would agree. And I mean, look, I saw it internally, the way that Amazon built stuff was at such a breakneck pace, that challenge that they had that was, you know, the published version of events for why AWS got created, developers needed a place to test code. And that was something that they could not get until they got EC2, or could not get in a reasonably enough timeframe for it to be, you know, real-time valid or relevant for what was going on with the company. So, you know, that really is the genesis of things, and you know, the early services, SQS, S3, EC2, they all really came out of that journey. But yeah, in our days at AWS, there was a lot of ease, in the sense that lots of customers had pent-up frustrations with their data center providers or their colo providers and lots of customers would experience bursts and they would have capacity constraints and they would need a lot of the features that AWS offered, but we had to overcome a lot of technical misunderstandings and trust issues and, you know, oh, hey, Amazon just wants to sniff our data and they want to see what we're up to, and explain to them how encryption works and why they have their own keys and all these things. You know, we had to go through a lot of that. So, it wasn't super easy, but there was some element of it where, you know, just demand actually did make some aspects easy.Corey: What have you seen change since, well I guess ten years ago and change now? And let's be clear, you don't work in AWS sales, but you also are not oblivious to what the market is doing.Jeremy: For sure. For sure. I left AWS in 2011 and I've stayed in the cloud ecosystem pretty much ever since. I did spend some time working for a system integrator where all we did was migrate customers to AWS. And then I spent about five, six years working on cloud security primarily focused on AWS, a lot of GCP, a little bit of Azure.So yeah, I mean, I certainly stay up to date with what's going on in the state of cloud. I mean, look, Cloud has evolved from this kind of, you know, developer-centric, very easy-to-launch type of platform into a fully-fledged enterprise IT platform and all of the management structures and all of the kind of bells and whistles that you would want that you probably wanted from your old VMware networks but never really got, they're all there now. It is a very different ballgame in terms of what the platform actually enables you to do, but fundamentally, a lot of the core building block constructs and the primitives are still kind of driving the heart of it. It's just a lot of nicer packaging.What I think is really interesting is actually how customers' usage of cloud platforms has changed over time. And I always think of it and kind of like the, going back to my days, what did I see from my customers? And it was kind of like the month zero, “I just don't believe you.” Like, “This thing can't be real, I don't trust it, et cetera.” Month one is, I'm going to assign some developer to work on some very low-priority, low-risk workload. In my days, that was SharePoint, by the way. Like, nine times out of ten, the first workload that customers stood up was a SharePoint instance that they had to share across multiple locations.Corey: That thing falls over all the time anyway. May as well put it in the cloud where it can do so without taking too much else down with it. Was that the thinking or?Jeremy: Well, and the other thing about it at the time, Corey, was that, like, so many customers worked in this, like, remote-first world, right? And so, SharePoint was inevitably hosted at somebody's office. And so, the workers at that office were so privileged over the workers everywhere else. The performance gap between consuming SharePoint in one location versus another was like, night and day. So, you know, employees in headquarters were like, “Yeah, SharePoint's great.” Employees in branch offices were like, “This thing is terrible,” you know? “It's so slow. I hate it, I hate it, I hate it.”And so, Cloud actually became, like, this neutral location to move SharePoint to that kind of had an equal performance for every office. And so, that was, I think, one of the reasons and it was also, you know, it had capacity problems, and customers were right at that point, uploading tons of static documents to it, like Word documents, Office attachments, et cetera, and so they were starting to have some of these, like, real disk sprawl problems with SharePoint. So, that was kind of the month one problem. And only after they get through kind of month two, three, and four, and they go through, “I don't understand my bill,” and, “Help me understand security implications,” then they think about, like, “Hey, should we go back and look at how we're running that SharePoint stuff and maybe do it more efficiently and, like, move those static Office documents onto S3?” And so on, and so on.And that's kind of one of the big things that I've changed that I would say is very different from, like, 2011 to now, is there's enough sophistication around understanding that, like, you don't just translate what you're doing in your office or in your data center to what you're doing on cloud. Or if you do, you're not getting the most out of your investment.Corey: I'm curious to get your take on how you have seen cloud adoption patterns differ, specifically tied to geo. I mean, I tend to see it from a world where there's a bifurcation of between born-in-the-cloud SaaS-type companies where one workload is 80% of their bill or whatnot, and of the big enterprises where the largest single component is 3%. So, it's a very different slice there. But I'm curious what you would see from a sales perspective, looking across a lot of different geographic boundaries because we're all, on some level, biased based upon where we tend to spend our time doing business. I'm in San Francisco, which is its very own strange universe that has a certain perspective about itself that is occasionally accurate, but not usually. But it's a big world out there.Jeremy: It is. One thing that I would say it's interesting. I spent my AWS days based in Singapore, living in Singapore at the time, and I was working with customers across Southeast Asia. And to your point, Corey, one of the most interesting things was this little bit of a leapfrog effect. Data centers in Asia-Pac, especially in places like the Philippines, were just terrible.You know, the Philippines had, like, the second highest electricity rates in Asia at the time, only behind Japan, even though the GDP per capita gap between those two countries is really large. And yet you're paying, like, these super-high electricity rates. Secondarily, data centers in the Philippines were prone to flooding. And so, a lot of companies in the Philippines never went the data center route. You know, they just hosted servers in their offices, you know, they had a bunch of desktop machines in a cubicle, that kind of situation because, like, data centers themselves were cost prohibitive.So, you saw this effect a little bit like cell phones in a lot of the developing world. Landline infrastructure was too expensive or never got done for whatever reason, and people went straight to cell phones. So actually, what I saw in a lot of emerging markets in Asia was, screw the data center; we're going to go straight to cloud. So, I saw a lot of Asia-Pac get a little bit ahead of places like Europe where you had, for instance, a lot of long-term data center contracts and you had customers really locked in. And we saw this over the next, let's say between, like, say, 2014 and 2018 when I was working with a systems integrator, and then started working on cloud security.We saw that US customers and Asia-Pac customers didn't have these obligations; European customers, a lot of them were still working off their lease, and still, you know, I'm locked into let's just say Equinix Frankfurt for another five years before I can think about cloud migration. So, that's definitely one aspect that I observed. Second thing I think is, like, the earlier you started, the earlier you reached the point where you realize that actually there is value in a lot of managed services and there actually is value in getting away from the kind of server mindset around EC2.Corey: It feels like there's a lot of, I want to call it legacy thinking, in some ways, except that's unfair because legacy remains a condescending engineering term for something that makes money. The problem that you have is that you get bound by choices you didn't necessarily realize you were making, and then something becomes revenue-bearing. And now there's a different way to do it, or you learn more about the platform, or the platform itself evolves, and, “Oh, I'm going to rewrite everything to take advantage of this,” isn't happening. So, it winds up feeling like, yeah, we're treating the cloud like a data center. And sometimes that's right; sometimes that's a problem, but ultimately, it still becomes a significant challenge. I mean, there's no way around it. And I don't know what the right answer is, I don't know what the fix is going to be, but it always feels like I'm doing something wrong somewhere.Jeremy: I think a lot of customers go through that same set of feelings and they realize that they have the active runway problem, where you know, how do you do maintenance on an active runway? You kind of can't because you've got flights going in and out. And I think you're seeing this in your part of the world at SFO with a lot of the work that got done in, like, 2018, 2019 where they kind of had to close down a runway and had, like, near misses because they consolidated all flights onto the one active runway, right? It is a challenge. And I actually think that some of the evolution that I've seen our customers go through over the last, like, two, three years, is starting to get away from that challenge.So, to your point, when you have revenue-bearing workloads that you can't really modify and things are pretty tightly coupled, it is very hard to make change. But when you start to have it where things are broken down into more microservices, it makes it a lot easier to cycle out Service A for Service B, or let's say more accurately, Service A1 with Service A2 where you can kind of just, like, plug and play different APIs, and maybe, you know, repoint services at the new stuff as they come online. But getting to that point is definitely a painful process. It does require architectural changes and often those architectural changes aren't at the infrastructure level; they're actually inside the application or they're between things like applications and third-party dependencies where the customers may not have full control over the dependencies, and that does become a real challenge for people to break down and start to attack. You've heard of the Strangler Methodology?Corey: Oh, yes. Both in terms of the Boston Strangler, as well—Jeremy: [laugh]. Right.Corey: As the Strangler design pattern.Jeremy: Yeah, yeah. But I think, like, getting to that is challenging until, like, once you understand that you want to do that, it makes a lot of sense. But getting to the starting point for that journey can be really challenging for a lot of customers because it involves stakeholders that are often not involved on infrastructure conversations, and organizational dysfunction can really creep in there, where you have teams that don't necessarily play nice together, not for any particular reason, but just because historically they haven't had to. So, that's something that I've seen and definitely takes a little bit of cultural work to overcome.Corey: When you take a look across the board of cloud adoption, it's interesting to have seen the patterns that wound up unfolding. Your career path, though, seem to have gotten away from the selling cloud and into some strange directions leading up to what you're doing now, where you founded Firetail. What do you folks do?Jeremy: We do API security. And it really is kind of the culmination of, like, the last several years and what we saw. I mean, to your point, we saw customers going through kind of Phase One, Two, Three of cloud adoption. Phase One, the, you know, for lack of a better phrase, lift-and-shift and Phase Two, the kind of first step on the path towards quote-unquote, “Enlightenment,” where they start to see that, like, actually, we can get better operational efficiency if we, you know, move our databases off of EC2 and on to RDS and we move our static content onto S3.And then Phase Three, where they realize actually EC2 kind of sucks, and it's a lot of management overhead, it's a lot of attack surface, I hate having to bake AMIs. What I really want to do is just drop some code on a platform and run my application. And that might be serverless. That might be containerized, et cetera. But one path or the other, where we pretty much always see customers ending up is with an API sitting on a network.And that API is doing two things. It front-ends a data set and at front-ends a set of functionality, and most cases. And so, what that really means is that the thing that sits on the network that does represent the attack surface, both in terms of accessing data or in terms of let's say, like, abusing an application is an API. And that's what led us to where I am today, what led me and my co-founder Riley to, you know, start the company and try to make it easier for customers to build more secure APIs. So yeah, that's kind of the change that I've observed over the last few years that really, as you said, lead to what I'm doing now.Corey: There is a lot of, I guess, challenge in the entire space when we bound that to—even API security, though as soon as you going down the security path it starts seeming like there's a massive problem, just in terms of proliferation of companies that each do different things, that each focus on different parts of the story. It feels like everything winds up spitting out huge amounts of security-focused, or at least security-adjacent telemetry. Everything has findings on top of that, and at least in the AWS universe, “Oh, we have a service that spits out a lot of that stuff. We're going to launch another service on top of it that, of course, cost more money that then winds up organizing it for you. And then another service on top of that that does the same thing yet again.” And it feels like we're building a tower of these things that are just… shouldn't just be a feature in the original underlying thing that turns down the noise? “Well, yes, but then we couldn't sell you three more things around it.”Jeremy: Yeah, I mean—Corey: Agree? Disagree?Jeremy: I don't entirely disagree. I think there is a lot of validity on what you just said there. I mean, if you look at like the proliferation of even the security services, and you see GuardDuty and Config and Security Hub, or things like log analysis with Athena or log analysis with an ELK stack, or OpenSearch, et cetera, I mean, you see all these proliferation of services around that. I do think the thing to bear in mind is that for most customers, like, security is not a one size fits all. Security is fundamentally kind of a risk management exercise, right? If it wasn't a risk management exercise, then all security would really be about is, like, keeping your data off of networks and making sure that, like, none of your data could ever leave.But that's not how companies work. They do interact with the outside world and so then you kind of always have this decision and this trade-off to make about how much data you expose. And so, when you have that decision, then it leads you down a path of determining what data is important to your organization and what would be most critical if it were breached. And so, the point of all of that is honestly that, like, security is not the same for you as it is for me, right? And so, to that end, you might be all about Security Hub, and Config instead of basic checks across all your accounts and all your active regions, and I might be much more about, let's say I'm quote-unquote, “Digital-native, cloud-native,” blah, blah, blah, I really care about detection and response on top of events.And so, I only care about log aggregation and, let's say, GuardDuty or Athena analysis on top of that because I feel like I've got all of my security configurations in Infrastructure as Code. So, there's not a right and wrong answer and I do think that's part of why there are a gazillion security services out there.Corey: On some level, I've been of the opinion for a while now that the cloud providers themselves should not necessarily be selling security services directly because, on some level, that becomes an inherent conflict of interest. Why make the underlying platform more secure or easier to use from a security standpoint when you can now turn that into a revenue source? I used to make comments that Microsoft Defender was a classic example of getting this right because they didn't charge for it and a bunch of antivirus companies screamed and whined about it. And then of course, Microsoft's like, “Oh, Corey saying nice things about us. We can't have that.” And they started charging for it. So okay, that more or less completely subverts my entire point. But it still feels squicky.Jeremy: I mean, I kind of doubt that's why they started charging for it. But—Corey: Oh, I refuse to accept that I'm not that influential. There we are.Jeremy: [laugh]. Fair enough.Corey: Yeah, I just can't get away from the idea that it feels squicky when the company providing the infrastructure now makes doing the secure thing on top of it into an investment decision.Jeremy: Yeah.Corey: “Do you want the crappy, insecure version of what we build or do you want the top-of-the-line secure version?” That shouldn't be a choice people have to make. Because people don't care about security until right after they really should have cared about security.Jeremy: Yeah. Look, and I think the changes to S3 configuration, for instance, kind of bear out your point. Like, it shouldn't be the case that you have to go through a lot of extra steps to not make your S3 data public, it should always be the case that, like, you have to go through a lot of steps if you want to expose your data. And then you have explicitly made a set of choices on your own to make some data public, right? So, I kind of agree with the underlying logic. I think the counterargument, if there is one to be made, is that it's not up to them to define what is and is not right for your organization.Because again, going back to my example, what is secure for you may not be secure for me because we might have very different modes of operation, we might have very different modes of building our infrastructure, deploying our infrastructure, et cetera. And I think every cloud provider would tell you, “Hey, we're just here to enable customers.” Now, do I think that they could be doing more? Do I think that they could have more secure defaults? You know, in general, yes, of course, they could. And really, like, the fundamentals of what I worry about are people building insecure applications, not so much people deploying infrastructure with bad configurations.Corey: It's funny, we talk about this now. Earlier today, I was lamenting some of the detritus from some of my earlier builds, where I've been running some of these things in my old legacy single account for a while now. And the build service is dramatically overscoped, just because trying to get the security permissions right, was an exercise in frustration at the time. It was, “Nope, that's not it. Nope, blocked again.”So, I finally said to hell with it, overscope it massively, and then with a, “Todo: fix this later,” which of course, never happened. And if there's ever a breach on something like that, I know that I'll have AWS wagging its finger at me and talking about the shared responsibility model, but it's really kind of a disaster plan of their own making because there's not a great way to say easily and explicitly—or honestly, by default the way Google Cloud does—of okay, by default, everything in this project can talk to everything in this project, but the outside world can't talk to any of it, which I think is where a lot of people start off. And the security purists love to say, “That's terrible. That won't work at a bank.” You're right, it won't, but a bank has a dedicated security apparatus, internally. They can address those things, whereas your individual student learner does not. And that's how you wind up with open S3 bucket monstrosities left and right.Jeremy: I think a lot of security fundamentalists would say that what you just described about that Google project structure, defeats zero trust, and you know, that on its own is actually a bad thing. I might counterargue and say that, like, hey, you can have a GCP project as a zero trust, like, first principle, you know? That can be the building block of zero trust for your organization and then it's up to you to explicitly create these trust relationships to other projects, and so on. But the thing that I think in what you said that really kind of does resonate with me in particular as an area that AWS—and really this case, just AWS—should have done better or should do better, is IAM permissions. Because every developer in the world that I know has had that exact experience that you described, which is, they get to a point where they're like, “Okay, this thing isn't working. It's probably something with IAM.”And then they try one thing, two things, and usually on the third or fourth try, they end up with a star permission, and maybe a comment in that IAM policy or maybe a Jira ticket that, you know, gets filed into backlog of, “Review those permissions at some point in the future,” which pretty much never happens. So, IAM in particular, I think, is one where, like, Amazon should do better, or should at least make it, like, easy for us to kind of graphically build an IAM policy that is scoped to least permissions required, et cetera. That one, I'll a hundred percent agree with your comments and your statement.Corey: As you take a look across the largest, I guess, environments you see, and as well as some of the folks who are just getting started in this space, it feels like, on some level, it's two different universes. Do you see points of commonality? Do you see that there is an opportunity to get the individual learner who's just starting on their cloud journey to do things that make sense without breaking the bank that they then can basically have instilled in them as they start scaling up as they enter corporate environments where security budgets are different orders of magnitude? Because it seems to me that my options for everything that I've looked at start at tens of thousands of dollars a year, or are a bunch of crappy things I find on GitHub somewhere. And it feels like there should be something between those two.Jeremy: In terms of training, or in terms of, like, tooling to build—Corey: In terms of security software across the board, which I know—Jeremy: Yeah.Corey: —is sort of a vague term. Like, I first discovered this when trying to find something to make sense of CloudTrail logs. It was a bunch of sketchy things off GitHub or a bunch of very expensive products. Same thing with VPC flow logs, same thing with trying to parse other security alerting and aggregate things in a sensible way. Like, very often it's, oh, there's a few very damning log lines surrounded by a million lines of nonsense that no one's going to look through. It's the needle in a haystack problem.Jeremy: Yeah, well, I'm really sorry if you spent much time trying to analyze VPC flow logs because that is just an exercise in futility. First of all, the level of information that's in them is pretty useless, and the SLA on actually, like, log delivery, A, whether it'll actually happen, and B, whether it will happen in a timely fashion is just pretty much non-existent. So—Corey: Oh, from a security perspective I agree wholeheartedly, but remember, I'm coming from a billing perspective, where it's—Jeremy: Ah, fair enough.Corey: —huh, we're taking a petabyte in and moving 300 petabytes between availability zones. It's great. It's a fun game called find whatever is chatty because, on some level, it's like, run two of whatever that is—or three—rather than having it replicate. What is the deal here? And just try to identify, especially in the godforsaken hellscape that is Kubernetes, what is that thing that's talking? And sometimes flow logs are the only real tool you've got, other than oral freaking tradition.Jeremy: But God forbid you forgot to tag your [ENI 00:24:53] so that the flow log can actually be attributed to, you know, what workload is responsible for it behind the scenes. And so yeah, I mean, I think that's a—boy that's a case study and, like, a miserable job that I don't think anybody would really want to have in this day and age.Corey: The timing of this is apt. I sent out my newsletter for the week a couple hours before this recording, and in the bottom section, I asked anyone who's got an interesting solution for solving what's talking to what with VPC flow logs, please let me know because I found this original thing that AWS put up as part of their workshops and a lab to figure this out, but other than that, it's more or less guess-and-check. What is the hotness? It's been a while since I explored the landscape. And now we see if the audience is helpful or disappoints me. It's all on you folks.Jeremy: Isn't the hotness to segregate every microservice into an account and run it through a load balancer so that it's like much more properly tagged and it's also consumable on an account-by-account basis for better attribution?Corey: And then everything you see winds up incurring a direct fee when passing through that load balancer, instead of the same thing within the same subnet being able to talk to one another for free.Jeremy: Yeah, yeah.Corey: So, at scale—so yes, for visibility, you're absolutely right. From a, I would like to spend less money giving it directly to Amazon, not so much.Jeremy: [unintelligible 00:26:08] spend more money for the joy of attribution of workload?Corey: Not to mention as well that coming into an environment that exists and is scaled out—which is sort of a prerequisite for me going in on a consulting project—and saying, “Oh, you should rebuild everything using serverless and microservice principles,” is a great way to get thrown out of the engagement in the first 20 minutes. Because yes, in theory, anyone can design something great, that works, that solves a problem on a whiteboard, but most of us don't get to throw the old thing away and build fresh. And when we do great, I'm greenfielding something; there's always constraints and challenges down the road that you don't see coming. So, you finally wind up building the most extensible thing in the universe that can handle all these things, and your business dies before you get to MVP because that takes time, energy and effort. There are many more companies that have died due to failure to find product-market fit than have died because, “Oh hey, your software architecture was terrible.” If you hit the market correctly, there is budget to fix these things down the road, whereas your code could be pristine and your company's still dead.Jeremy: Yeah. I don't really have a solution for you on that one, Corey [laugh].Corey: [laugh].Jeremy: I will come back to your one question—Corey: I was hoping you did.Jeremy: Yeah, sorry. I will come back to the question about, you know, how should people kind of get started in thinking about assessing security. And you know, to your point, look, I mean, I think Config is a low-ish cost, but should it cost anything? Probably not, at least for, like, basic CIS foundation benchmark checks. I mean, like, if the best practice that Amazon tells everybody is, “Turn on these 40-ish checks at last count,” you know, maybe those 40-ish checks should just be free and included and on in everybody's account for any account that you tag as production, right?Like, I will wholeheartedly agree with that sentiment, and it would be a trivial thing for Amazon to do, with one kind of caveat—and this is something that I think a lot of people don't necessarily understand—collecting all the required data for security is actually really expensive. Security is an extremely data-intensive thing at this day and age. And I have a former coworker who used to hate the expression that security is data science, but there is some truth in it at this point, other than the kind of the magic around it is not actually that big because there's not a lot of, let's say, heuristic analysis or magic that goes into what queries, et cetera. A lot of security is very rule-based. It's a lot of, you know, just binary checks: is this bit set to zero or one?And some of those things are like relatively simple, but what ends up inevitably happening is that customers want more out of it. They don't just want to know, is my security good or bad? They want to know things like is it good or bad now relative to last week? Has it gotten better or worse over time? And so, then you start accumulating lots of data and time series data, and that becomes really expensive.And secondarily, the thing that's really starting to happen more and more in the security world is correlation of multiple layers of data, infrastructure with applications, infrastructure with operating system, infrastructure with OS and app vulnerabilities, infrastructure plus vulnerabilities plus Kubernetes configurations plus API sitting at the edge of that. Because realistically, like, so many organizations that are built out at scale, the truth of the matter is, is just like on their operating system vulnerabilities, they're going to have tens of thousands, if not millions of individual items to deal with and no human can realistically prioritize those without some context around it. And that is where the data, kind of, management becomes really expensive.Corey: I hear you. Particularly the complaints about AWS Config, which many things like Control Tower setup for you. And on some level, it is a tax on using the cloud as the cloud should be used because it charges for evaluation of changes to your environment. So, if you're spinning things up all the time and then turning them down when they're not in use, that incurs a bunch of Config charges, whereas if you've treat it like a big dumb version of your data center where you just spin [unintelligible 00:30:13] things forever, your Config charge is nice and low. When you start seeing it entering the top ten of your spend on services, something is very wrong somewhere.Jeremy: Yeah. I would actually say, like, a good compromise in my mind would be that we should be included with something like business support. If you pay for support with AWS, why not include Config, or some level of Config, for all the accounts that are in scope for your production support? That would seem like a very reasonable compromise.Corey: For a lot of folks that have it enabled but they don't see any direct value from it either, so it's one of those things where not knowing how to turn it off becomes a tax on what you're doing, in some cases. In SCPs, but often with Control Tower don't allow you to do that. So, it's your training people who are learning this in their test environments to avoid it, but you want them to be using it at scale in an enterprise environment. So, I agree with you, there has to be a better way to deliver that value to customers. Because, yeah, this thing is now, you know, 3 or 4% of your cloud bill, it's not adding that much value, folks.Jeremy: Yeah, one thing I will say just on that point, and, like, it's a super small semantic nitpick that I have, I hate when people talk about security as a tax because I think it tends to kind of engender the wrong types of relationships to security. Because if you think about taxes, two things about them, I mean, one is that they're kind of prescribed for you, and so in some sense, this kind of Control Tower implementation is similar because, like you know, it's hard for you to turn off, et cetera, but on the other hand, like, you don't get to choose how that tax money is spent. And really, like, you get to set your security budget as an organization. Maybe this Control Tower Config scenario is a slight outlier on that side, but you know, there are ways to turn it off, et cetera.The other thing, though, is that, like, people tend to relate to tax, like, this thing that they really, really hate. It comes once a year, you should really do everything you can to minimize it and to, like, not spend any time on it or on getting it right. And in fact, like, there's a lot of people who kind of like to cheat on taxes, right? And so, like, you don't really want people to have that kind of mindset of, like, pay as little as possible, spend as little time as possible, and yes, let's cheat on it. Like, that's not how I hope people are addressing security in their cloud environments.Corey: I agree wholeheartedly, but if you have a service like Config, for example—that's what we're talking about—and it isn't adding value to you, and you just you don't know what it does, how it works, than it [unintelligible 00:32:37]—or more or less how to turn it off, then it does effectively become directly in line of a tax, regardless of how people want to view the principle of taxation. It's a—yeah, security should not be a tax. I agree with you wholeheartedly. The problem is, is it is—Jeremy: It should be an enabler.Corey: —unclea—yeah, the relationship between Config and security in many cases is fairly attenuated in a lot of people's minds.Jeremy: Yeah. I mean, I think if you don't have, kind of, ideas in mind for how you want to use it or consume it, or how you want to use it, let's say as an assessment against your own environment, then it's particularly vexing. So, if you don't know, like, “Hey, I'm going to use Config. I'm going to use Config for this set of rules. This is how I'm going to consume that data and how I'm going to then, like, pass the results on to people to make change in the organization,” then it's particularly useless.Corey: Yeah. I really want to thank you for taking the time to speak with me. If people want to learn more, where's the best place for them to find you?Jeremy: Easy, breezy. We are just firetail.io. That's ‘fire' like the, you know, flaming substance, and ‘tail' like the tail of an animal, not like a story. But yeah, just firetail.io.And if you come now, we've actually got, like, a white paper that we just put out around API security and kind of analyzing ten years of API-based data breaches and trying to understand what actually went wrong in most of those cases. And you're more than welcome to grab that off of our website. And if you have any questions, just reach out to me. I'm just jeremy@firetail.io.Corey: And we'll put links to all of that in the [show notes 00:34:03]. Thank you so much for your time. I appreciate it.Jeremy: My pleasure, Corey. Thanks so much for having me.Corey: Jeremy Snyder, founder and CEO at Firetail. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment pointing out that listening to my nonsense is a tax on you going about your day.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.
AWS networking provides a broad set of services that enable you to , connect, and secure your applications and data. With AWS networking, you can easily connect your on-premises networks to AWS, create virtual private clouds (VPCs), and use a variety of networking services to route traffic, control access, and monitor your network. * Scalability: AWS networking is designed to scale with your needs. You can easily add or remove resources as your traffic demands change. * Reliability: AWS networking is built on a highly reliable global infrastructure. Your applications will be available even if there is a problem with one of your network components. * Security: AWS networking provides a variety of security features to protect your applications and data. You can control who has access to your network and what they can do.
In this episode of AWS Bites, we explore the future of Virtual Private Clouds (VPCs) in the context of the zero-trust security trend. We'll dive into the pros and cons of using VPCs, including their usefulness when dealing with sensitive data or when you need fine-grained control over your network environment. But let's be real, sometimes VPCs can be a bit of a headache. We'll discuss why you might want to avoid them, including the added complexity they can bring to your network environment. Fear not, we'll also provide a summary of when to use and when not to use VPCs, as well as alternatives to using VPCs, such as services that don't require them. So, are ready to talk VPCs!?
On this episode of The Cloud Pod, the team discusses the upcoming 2023 in-person Google Cloud conference, the accessibility of AWS CloudTrail Lake for non-AWS activity events, the new updates from Azure Chaos studio, and the comparison between Oracle Cloud service and other Cloud providers. They also highlight the application and importance of VPCs in CCOE. A big thanks to this week's sponsor, Foghorn Consulting, which provides full-stack cloud solutions with a focus on strategy, planning and execution for enterprises seeking to take advantage of the transformative capabilities of AWS, Google Cloud and Azure. This week's highlights
About Chris Chris Farris has been in the IT field since 1994 primarily focused on Linux, networking, and security. For the last 8 years, he has focused on public-cloud and public-cloud security. He has built and evolved multiple cloud security programs for major media companies, focusing on enabling the broader security team's objectives of secure design, incident response and vulnerability management. He has developed cloud security standards and baselines to provide risk-based guidance to development and operations teams. As a practitioner, he's architected and implemented multiple serverless and traditional cloud applications focused on deployment, security, operations, and financial modeling.Chris now does cloud security research for Turbot and evangelizes for the open source tool Steampipe. He is one if the organizers of the fwd:cloudsec conference (https://fwdcloudsec.org) and has given multiple presentations at AWS conferences and BSides events.When not building things with AWS's building blocks, he enjoys building Legos with his kid and figuring out what interesting part of the globe to travel to next. He opines on security and technology on Twitter and his website https://www.chrisfarris.comLinks Referenced: Turbot: https://turbot.com/ fwd:cloudsec: https://fwdcloudsec.org/ Steampipe: https://steampipe.io/ Steampipe block: https://steampipe.io/blog TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: This episode is sponsored by our friends at Logicworks. Getting to the cloud is challenging enough for many places, especially maintaining security, resiliency, cost control, agility, etc, etc, etc. Things break, configurations drift, technology advances, and organizations, frankly, need to evolve. How can you get to the cloud faster and ensure you have the right team in place to maintain success over time? Day 2 matters. Work with a partner who gets it - Logicworks combines the cloud expertise and platform automation to customize solutions to meet your unique requirements. Get started by chatting with a cloud specialist today at snark.cloud/logicworks. That's snark.cloud/logicworksCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I have been meaning to invite slash drag onto this show for a number of years. We first met at re:Inforce the first year that they had such a thing, Amazon's security conference for cloud, as is Amazon's tradition, named after an email subject line. Chris Farris is a cloud security nerd at Turbot. He's also one of the organizers for fwd:cloudsec, another security conference named after an email subject line with a lot more self-awareness than any of Amazon's stuff. Chris, thank you for joining me.Chris: Oh, thank you for dragging me on. You can let go of my hair now.Corey: Wonderful, wonderful. That's why we're all having the thinning hair going on. People just use it to drag us to and fro, it seems. So, you've been doing something that I'm only going to describe as weird lately because your background—not that dissimilar from mine—is as a practitioner. You've been heavily involved in the security space for a while and lately, I keep seeing an awful lot of things with your name on them getting sucked up by the giant app surveillance apparatus deployed to the internet, looking for basically any mention of AWS that I wind up using to write my newsletter and feed the content grist mill every year. What are you doing and how'd you get there?Chris: So, what am I doing right now is, I'm in marketing. It's kind of a, you know, “Oops, I'm sorry I did that.”Corey: Oh, the running gag is, you work in DevRel; that means, “Oh, you're in marketing, but they're scared to tell you that.” You're self-aware.Chris: Yeah.Corey: Good for you.Chris: I'm willing to address that I'm in marketing now. And I've been a cloud practitioner since probably 2014, cloud security since about 2017. And then just decided, the problem that we have in the cloud security community is a lot of us are just kind of sitting in a corner in our companies and solving problems for our companies, but we're not solving the problems at scale. So, I wanted a job that would allow me to reach a broader audience and help a broader audience. Where I see cloud security having—you know, or cloud in general falling down is Amazon makes it really hard for you to do your side of shared responsibility, and so we need to be out there helping customers understand what they need to be doing. So, I am now at a company called Turbot and we're really trying to promote cloud security.Corey: One of the first promoted guest episodes of this show was David Boeke, your CTO, and one of the things that I regret is that I've sort of lost track of Turbot over the past few years because, yeah, one or two things might have been going on during that timeline as I look back at having kids in the middle of a pandemic and the deadly plague o'er land. And suddenly, every conversation takes place over Zoom, which is like, “Oh, good, it's like a happy hour only instead, now it's just like a conference call for work.” It's like, ‘Conference Calls: The Drinking Game' is never the great direction to go in. But it seems the world is recovering. We're going to be able to spend some time together at re:Invent by all accounts that I'm actively looking forward to.As of this recording, you're relatively new to Turbot, and I figured out that you were going there because, once again, content hits my filters. You wrote a fascinating blog post that hits on an interest of mine that I don't usually talk about much because it's off-putting to some folk, and these days, I don't want to get yelled at and more than I have to about the experience of traveling, I believe it was to an all-hands on the other side of the world.Chris: Yep. So, my first day on the job at Turbot, I was landing in Kuala Lumpur, Malaysia, having left the United States 24 hours—or was it 48? It's hard to tell when you go to the other side of the planet and the time zones have also shifted—and then having left my prior company day before that. But yeah, so Turbot about traditionally has an annual event where we all get together in person. We're a completely remote company, but once a year, we all get together in person in our integrate event.And so, that was my first day on the job. And then you know, it was basically two weeks of reasonably intense hackathons, building out a lot of stuff that hopefully will show up open-source shortly. And then yeah, meeting all of my coworkers. And that was nice.Corey: You've always had a focus through all the time that I've known you and all the public content that you've put out there that has come across my desk that seems to center around security. It's sort of an area that I give a nod to more often than I would like, on some level, but that tends to be your bread and butter. Your focus seems to be almost overwhelmingly on I would call it AWS security. Is that fair to say or is that a mischaracterization of how you view it slash what you actually do? Because, again, we have these parasocial relationships with voices on the internet. And it's like, “Oh, yeah, I know all about that person.” Yeah, you've met them once and all you know other than that is what they put on Twitter.Chris: You follow me on Twitter. Yeah, I would argue that yes, a lot of what I do is AWS-related security because in the past, a lot of what I've been responsible for is cloud security in AWS. But I've always worked for companies that were multi-cloud; it's just that 90% of everything was Amazon and so therefore 90% of my time, 90% of my problems, 90% of my risk was all in AWS. I've been trying to break out of that. I've been trying to understand the other clouds.One of the nice aspects of this role and working on Steampipe is I am now experimenting with other clouds. The whole goal here is to be able to scale our ability as an industry and as security practitioners to support multiple clouds. Because whether we want to or not, we've got it. And so, even though 90% of my spend, 90% of my resources, 90% of my applications may be in AWS, that 10% that I'm ignoring is probably more than 10% of my risk, and we really do need to understand and support major clouds equally.Corey: One post you had recently that I find myself in wholehearted agreement with is on the adoption of Tailscale in the enterprise. I use it for all of my personal nonsense and it is transformative. I like the idea of what that portends for a multi-cloud, or poly-cloud, or whatever the hell we're calling it this week, sort of architectures were historically one of the biggest problems in getting to clouds two speak to one another and manage them in an intelligent way is the security models are different, the user identity stuff is different as well, and the network stuff has always been nightmarish. Well, with Tailscale, you don't have to worry about that in the same way at all. You can, more or less, ignore it, turn on host-based firewalls for everything and just allow Tailscale. And suddenly, okay, I don't really have to think about this in the same way.Chris: Yeah. And you get the micro-segmentation out of it, too, which is really nice. I will agree that I had not looked at Tailscale until I was asked to look at Tailscale, and then it was just like, “Oh, I am completely redoing my home network on that.” But looking at it, it's going to scare some old-school network engineers, it's going to impact their livelihoods and that is going to make them very defensive. And so, what I wanted to do in that post was kind of address, as a practitioner, if I was looking at this with an enterprise lens, what are the concerns you would have on deploying Tailscale in your environment?A lot of those were, you know, around user management. I think the big one that is—it's a new thing in enterprise security, but kind of this host profiling, which is hey, before I let your laptop on the network, I'm going to go make sure that you have antivirus and some kind of EDR, XDR, blah-DR agents so that you know we have a reasonable thing that you're not going to just go and drop [unintelligible 00:09:01] on the network and next thing you know, we're Maersk. Tailscale, that's going to be their biggest thing that they are going to have to figure out is how do they work with some of these enterprise concerns and things along those lines. But I think it's an excellent technology, it was super easy to set up. And the ability to fine-tune and microsegment is great.Corey: Wildly so. They occasionally sponsor my nonsense. I have no earthly idea whether this episode is one of them because we have an editorial firewall—they're not paying me to set any of this stuff, like, “And this is brought to you by whatever.” Yeah, that's the sponsored ad part. This is just, I'm in love with the product.One of the most annoying things about it to me is that I haven't found a reason to give them money yet because the free tier for my personal stuff is very comfortably sized and I don't have a traditional enterprise network or anything like that people would benefit from over here. For one area in cloud security that I think I have potentially been misunderstood around, so I want to take at least this opportunity to clear the air on it a little bit has been that, by all accounts, I've spent the last, mmm, few months or so just absolutely beating the crap out of Azure. Before I wind up adding a little nuance and context to that, I'd love to get your take on what, by all accounts, has been a pretty disastrous year-and-a-half for Azure security.Chris: I think it's been a disastrous year-and-a-half for Azure security. Um—[laugh].Corey: [laugh]. That was something of a leading question, wasn't it?Chris: Yeah, no, I mean, it is. And if you think, though, back, Microsoft's repeatedly had these the ebb and flow of security disasters. You know, Code Red back in whatever the 2000s, NT 4.0 patching back in the '90s. So, I think we're just hitting one of those peaks again, or hopefully, we're hitting the peak and not [laugh] just starting the uptick. A lot of what Azure has built is stuff that they already had, commercial off-the-shelf software, they wrapped multi-tenancy around it, gave it a new SKU under the Azure name, and called is cloud. So, am I super-surprised that somebody figured out how to leverage a Jupyter notebook to find the back-end credentials to drop the firewall tables to go find the next guy over's Cosmos DB? No, I'm not.Corey: I find their failures to be less egregious on a technical basis because let's face it, let's be very clear here, this stuff is hard. I am not pretending for even a slight second that I'm a better security engineer than the very capable, very competent people who work there. This stuff is incredibly hard. And I'm not—Chris: And very well-funded people.Corey: Oh, absolutely, yeah. They make more than I do, presumably. But it's one of those areas where I'm not sitting here trying to dunk on them, their work, their efforts, et cetera, and I don't do a good enough job of clarifying that. My problem is the complete radio silence coming out of Microsoft on this. If AWS had a series of issues like this, I'm hard-pressed to imagine a scenario where they would not have much more transparent communications, they might very well trot out a number of their execs to go on a tour to wind up talking about these things and what they're doing systemically to change it.Because six of these in, it's like, okay, this is now a cultural problem. It's not one rando engineer wandering around the company screwing things up on a rotational basis. It's, what are you going to do? It's unlikely that firing Steven is going to be your fix for these things. So, that is part of it.And then most recently, they wound up having a blog post on the MSRC, the Microsoft Security Resource Center is I believe that acronym? The [mrsth], whatever; and it sounds like a virus you pick up in a hospital—but the problem that I have with it is that they spent most of that being overly defensive and dunking on SOCRadar, the vulnerability researcher who found this and reported it to them. And they had all kinds of quibbles with how it was done, what they did with it, et cetera, et cetera. It's, “Excuse me, you're the ones that left customer data sitting out there in the Azure equivalent of an S3 bucket and you're calling other people out for basically doing your job for you? Excuse me?”Chris: But it wasn't sensitive customer data. It was only the contract information, so therefore it was okay.Corey: Yeah, if I put my contract information out there and try and claim it's not sensitive information, my clients will laugh and laugh as they sue me into the Stone Age.Chris: Yeah well, clearly, you don't have the same level of clickthrough terms that Microsoft is able to negotiate because, you know, [laugh].Corey: It's awful as well, it doesn't even work because, “Oh, it's okay, I lost some of your data, but that's okay because it wasn't particularly sensitive.” Isn't that kind of up to you?Chris: Yes. And if A, I'm actually, you know, a big AWS shop and then I'm looking at Azure and I've got my negotiations in there and Amazon gets wind that I'm negotiating with Azure, that's not going to do well for me and my business. So no, this kind of material is incredibly sensitive. And that was an incredibly tone-deaf response on their part. But you know, to some extent, it was more of a response than we've seen from some of the other Azure multi-tenancy breakdowns.Corey: Yeah, at least they actually said something. I mean, there is that. It's just—it's wild to me. And again, I say this as an Azure customer myself. Their computer vision API is basically just this side of magic, as best I can tell, and none of the other providers have anything like it.That's what I want. But, you know, it almost feels like that service is under NDA because no one talks about it when they're using this service. I did a whole blog post singing its praises and no one from that team reached out to me to say, “Hey, glad you liked it.” Not that they owe me anything, but at the same time it's incredible. Why am I getting shut out? It's like, does this company just have an entire policy of not saying anything ever to anyone at any time? It seems it.Chris: So, a long time ago, I came to this realization that even if you just look at the terminology of the three providers, Amazon has accounts. Why does Amazon have Amazon—or AWS accounts? Because they're a retail company and that's what you signed up with to buy your underwear. Google has projects because they were, I guess, a developer-first thing and that was how they thought about it is, “Oh, you're going to go build something. Here's your project.”What does Microsoft have? Microsoft Azure Subscriptions. Because they are still about the corporate enterprise IT model of it's really about how much we're charging you, not really about what you're getting. So, given that you're not a big enterprise IT customer, you don't—I presume—do lots and lots of golfing at expensive golf resorts, you're probably not fitting their demographic.Corey: You're absolutely not. And that's wild to me. And yet, here we are.Chris: Now, what's scary is they are doing so many interesting things with artificial intelligence… that if… their multi-tenancy boundaries are as bad as we're starting to see, then what else is out there? And more and more, we is carbon-based life forms are relying on Microsoft and other cloud providers to build AI, that's kind of a scary thing. Go watch Satya's keynote at Microsoft Ignite and he's showing you all sorts of ways that AI is going to start replacing the gig economy. You know, it's not just Tesla and self-driving cars at this point. Dali is going to replace the independent graphics designer.They've got things coming out in their office suite that are going to replace the mom-and-pop marketing shops that are generating menus and doing marketing plans for your local restaurants or whatever. There's a whole slew of things where they're really trying to replace people.Corey: That is a wild thing to me. And part of the problem I have in covering AWS is that I have to differentiate in a bunch of different ways between AWS and its Amazon corporate parent. And they have that problem, too, internally. Part of the challenge they have, in many cases, is that perks you give to employees have to scale to one-and-a-half million people, many of them in fulfillment center warehouse things. And that is a different type of problem that a company, like for example, Google, where most of their employees tend to be in office job-style environments.That's a weird thing and I don't know how to even start conceptualizing things operating at that scale. Everything that they do is definitionally a very hard problem when you have to make it scale to that point. What all of the hyperscale cloud providers do is, from where I sit, complete freaking magic. The fact that it works as well as it does is nothing short of a modern-day miracle.Chris: Yeah, and it is more than just throwing hardware at the problem, which was my on-prem solution to most of the things. “Oh, hey. We need higher availability? Okay, we're going to buy two of everything.” We called it the Noah's Ark model, and we have an A side and a B side.And, “Oh, you know what? Just in case we're going to buy some extra capacity and put it in a different city so that, you know, we can just fail from our primary city to our secondary city.” That doesn't work at the cloud provider scale. And really, we haven't seen a major cloud outage—I mean, like, a bad one—in quite a while.Corey: This episode is sponsored in part by Honeycomb. When production is running slow, it's hard to know where problems originate. Is it your application code, users, or the underlying systems? I've got five bucks on DNS, personally. Why scroll through endless dashboards while dealing with alert floods, going from tool to tool to tool that you employ, guessing at which puzzle pieces matter? Context switching and tool sprawl are slowly killing both your team and your business. You should care more about one of those than the other; which one is up to you. Drop the separate pillars and enter a world of getting one unified understanding of the one thing driving your business: production. With Honeycomb, you guess less and know more. Try it for free at honeycomb.io/screaminginthecloud. Observability: it's more than just hipster monitoring.Corey: The outages are always fascinating, just from the way that they are reported in the mainstream media. And again, this is hard, I get it. I am not here to crap on journalists. They, for some ungodly, unknowable reason, have decided not to spend their entire career focusing on the nuances of one very specific, very deep industry. I don't know why.But as [laugh] a result, they wind up getting a lot of their baseline facts wrong about these things. And that's fair. I'm not here to necessarily act as an Amazon spokesperson when these things happen. They have an awful lot of very well-paid people who can do that. But it is interesting just watching the blowback and the reaction of whatever there's an outage, the conversation is never “Does Amazon or Azure or Google suck?” It's, “Does cloud suck as a whole?”That's part of the reason I care so much about Azure getting their act together. If it were just torpedoing Microsoft's reputation, then well, that's sad, but okay. But it extends far beyond that to a point where it's almost where the enterprise groundhog sees the shadow of a data breach and then we get six more years of data center build-outs instead of moving things to a cloud. I spent too many years working in data centers and I have the scars from the cage nuts and crimping patch cables frantically in the middle of the night to prove it. I am thrilled at the fact that I don't believe I will ever again have to frantically drive across town in the middle of the night to replace a hard drive before the rest of the array degrades. Cloud has solved those problems beautifully. I don't want to go back to the Dark Ages.Chris: Yeah, and I think that there's a general potential that we could start seeing this big push towards going back on-prem for effectively sovereign data reasons, whether it's this country has said, “You cannot store your data about our citizens outside of our borders,” and either they're doing that because they do not trust the US Silicon Valley privacy or whatever, or because if it's outside of our borders, then our secret police agents can come knocking on the door at two in the morning to go find out what some dissidents' viewings habits might have been, I see sovereign cloud as this thing that may be a back step from this ubiquitous thing that we have right now in Amazon, Azure, and Google. And so, as we start getting to the point in the history books where we start seeing maps with lots of flags, I think we're going to start seeing a bifurcation of cloud as just a whole thing. We see it already right now. The AWS China partition is not owned by Amazon, it is not run by Amazon, it is not controlled by Amazon. It is controlled by the communist government of China. And nobody is doing business in Russia right now, but if they had not done what they had done earlier this year, we might very well see somebody spinning up a cloud provider that is completely controlled by and in the Russian government.Corey: Well, yes or no, but I want to challenge that assessment for a second because I've had conversations with a number of folks about this where people say, “Okay, great. Like, is the alt-right, for example, going to have better options now that there might be a cloud provider spinning up there?” Or, “Well, okay, what about a new cloud provider to challenge the dominance of the big three?” And there are all these edge cases, either geopolitically or politically based upo—or folks wanting to wind up approaching it from a particular angle, but if we were hired to build out an MVP of a hyperscale cloud provider, like, the budget for that MVP would look like one 100 billion at this point to get started and just get up to a point of critical mass before you could actually see if this thing has legs. And we'd probably burn through almost all of that before doing a single dime in revenue.Chris: Right. And then you're doing that in small markets. Outside of the China partition, these are not massively large markets. I think Oracle is going down an interesting path with its idea of Dedicated Cloud and Oracle Alloy [unintelligible 00:22:52].Corey: I like a lot of what Oracle's doing, and if younger me heard me say that, I don't know how hard I'd hit myself, but here we are. Their free tier for Oracle Cloud is amazing, their data transfer prices are great, and their entire approach of, “We'll build an entire feature complete region in your facility and charge you what, from what I can tell, is a very reasonable amount of money,” works. And it is feature complete, not, “Well, here are the three services that we're going to put in here and everything else is well… it's just sort of a toehold there so you can start migrating it into our big cloud.” No. They're doing it right from that perspective.The biggest problem they've got is the word Oracle at the front end and their, I would say borderline addiction to big-E enterprise markets. I think the future of cloud looks a lot more like cloud-native companies being founded because those big enterprises are starting to describe themselves in similar terminology. And as we've seen in the developer ecosystem, as go startups, so do big companies a few years later. Walk around any big company that's undergoing a digital transformation, you'll see a lot more Macs on desktops, for example. You'll see CI/CD processes in place as opposed to, “Well, oh, you want something new, it's going to be eight weeks to get a server rack downstairs and accounting is going to have 18 pages of forms for you to fill out.” No, it's “click the button,” or—Chris: Don't forget the six months of just getting the financial CapEx approvals.Corey: Exactly.Chris: You have to go through the finance thing before you even get to start talking to techies about when you get your server. I think Oracle is in an interesting place though because it is embracing the fact that it is number four, and so therefore, it's like we are going to work with AWS, we are going to work with Azure, our database can run in AWS or it can run in our cloud, we can interconnect directly, natively, seamlessly with Azure. If I were building a consumer-based thing and I was moving into one of these markets where one of these governments was demanding something like a sovereign cloud, Oracle is a great place to go and throw—okay, all of our front-end consumer whatever is all going to sit in AWS because that's what we do for all other countries. For this one country, we're just going to go and build this thing in Oracle and we're going to leverage Oracle Alloy or whatever, and now suddenly, okay, their data is in their country and it's subject to their laws but I don't have to re-architect to go into one of these, you know, little countries with tin horn dictators.Corey: It's the way to do multi-cloud right, from my perspective. I'll use a component service in a different cloud, I'm under no illusions, though, in doing that I'm increasing my resiliency. I'm not removing single points of failure; I'm adding them. And I make that trade-off on a case-by-case basis, knowingly. But there is a case for some workloads—probably not yours if you're listening to this; assume not, but when you have more context, maybe so—where, okay, we need to be across multiple providers for a variety of strategic or contextual reasons for this workload.That does not mean everything you build needs to be able to do that. It means you're going to make trade-offs for that workload, and understanding the boundaries of where that starts and where that stops is going to be important. That is not the worst idea in the world for a given appropriate workload, that you can optimize stuff into a container and then can run, more or less, anywhere that can take a container. But that is also not the majority of most people's workloads.Chris: Yeah. And I think what that comes back to from the security practitioner standpoint is you have to support not just your primary cloud, your favorite cloud, the one you know, you have to support any cloud. And whether that's, you know, hey, congratulations. Your developers want to use Tailscale because it bypasses a ton of complexity in getting these remote island VPCs from this recent acquisition integrated into your network or because you're going into a new market and you have to support Oracle Cloud in Saudi Arabia, then you as a practitioner have to kind of support any cloud.And so, one of the reasons that I've joined and I'm working on, and so excited about Steampipe is it kind of does give you that. It is a uniform interface to not just AWS, Azure, and Google, but all sorts of clouds, whether it's GitHub or Oracle, or Tailscale. So, that's kind of the message I have for security practitioners at this point is, I tried, I fought, I screamed and yelled and ranted on Twitter, against, you know, doing multi-cloud, but at the end of the day, we were still multi-cloud.Corey: When I see these things evolving, is that, yeah, as a practitioner, we're increasingly having to work across multiple providers, but not to a stupendous depth that's the intimidating thing that scares the hell out of people. I still remember my first time with the AWS console, being so overwhelmed with a number of services, and there were 12. Now, there are hundreds, and I still feel that same sense of being overwhelmed, but I also have the context now to realize that over half of all customer spend globally is on EC2. That's one service. Yes, you need, like, five more to get it to work, but okay.And once you go through learning that to get started, and there's a lot of moving parts around it, like, “Oh, God, I have to do this for every service?” No, take Route 53—my favorite database, but most people use it as a DNS service—you can go start to finish on basically everything that service does that a human being is going to use in less than four hours, and then you're more or less ready to go. Everything is not the hairy beast that is EC2. And most of those services are not for you, whoever you are, whatever you do, most AWS services are not for you. Full stop.Chris: Yes and no. I mean, as a security practitioner, you need to know what your developers are doing, and I've worked in large organizations with lots of things and I would joke that, oh, yeah, I'm sure we're using every service but the IoT, and then I go and I look at our bill, and I was like, “Oh, why are we dropping that much on IoT?” Oh, because they wanted to use the Managed MQTT service.Corey: Ah, I start with the bill because the bill is the source of truth.Chris: Yes, they wanted to use the Managed MQTT service. Okay, great. So, we're now in IoT. But how many of those things have resource policies, how many of those things can be made public, and how many of those things are your CSPM actually checking for and telling you that, hey, a developer has gone out somewhere and made this SageMaker notebook public, or this MQTT topic public. And so, that's where you know, you need to have that level of depth and then you've got to have that level of depth in each cloud. To some extent, if the cloud is just the core basic VMs, object storage, maybe some networking, and a managed relational database, super simple to understand what all you need to do to build a baseline to secure that. As soon as you start adding in on all of the fancy services that AWS has. I re—Corey: Yeah, migrating your Step Functions workflow to other cloud is going to be a living goddamn nightmare. Migrating something that you stuffed into a container and run on EC2 or Fargate is probably going to be a lot simpler. But there are always nuances.Chris: Yep. But the security profile of a Step Function is significantly different. So, you know, there's not much you can do there wrong, yet.Corey: You say that now, but wait for their next security breach, and then we start calling them Stumble Functions instead.Chris: Yeah. I say that. And the next thing, you know, we're going to have something like Lambda [unintelligible 00:30:31] show up and I'm just going to be able to put my Step Function on the internet unauthenticated. Because, you know, that's what Amazon does: they innovate, but they don't necessarily warn security practitioners ahead of their innovation that, hey, you're we're about to release this thing. You might want to prepare for it and adjust your baselines, or talk to your developers, or here's a service control policy that you can drop in place to, you know, like, suppress it for a little bit. No, it's like, “Hey, these things are there,” and by the time you see the tweets or read the documentation, you've got some developer who's put it in production somewhere. And then it becomes a lot more difficult for you as a security practitioner to put the brakes on it.Corey: I really want to thank you for spending so much time talking to me. If people want to learn more and follow your exploits—as they should—where can they find you?Chris: They can find me at steampipe.io/blog. That is where all of my latest rants, raves, research, and how-tos show up.Corey: And we will, of course, put a link to that in the [show notes 00:31:37]. Thank you so much for being so generous with your time. I appreciate it.Chris: Perfect, thank you. You have a good one.Corey: Chris Farris, cloud security nerd at Turbot. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry insulting comment, and be sure to mention exactly which Azure communications team you work on.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About HarryHarry has worked at Sysdig for over 6 years, helping organizations mature their journey to cloud native. He's witnessed the evolution of bare metal, VMs, and finally Kubernetes establish itself as the de-facto for container orchestration. He is part of the product team building Sysdig's troubleshooting and cost offering, helping customers increase their confidence operating and managing Kubernetes.Previously, Harry ran, and later sold, a cloud hosting provider where he was working hands on with systems administration. He studied information security and lives in the UK.Links Referenced:Sysdig: https://sysdig.com/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is brought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs, and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, that's V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode has been brought to us by our friends at Sysdig, and they have sent one of their principal product managers to suffer my slings and arrows. Please welcome Harry Perks.Harry: Hey, Corey, thanks for hosting me. Good to meet you.Corey: An absolute pleasure and thanks for basically being willing to suffer all of the various nonsense about to throw your direction. Let's start with origin stories; I find that those tend to wind up resonating the most. Back when I first noticed Sysdig coming into the market, because it was just launching at that point, it seemed like it was a… we'll call it an innovative approach to observability, though I don't recall that we use the term observability back then. It more or less took a look at whatever an application was doing almost at a system call level and tracing what was going on as those requests worked on an individual system, and then providing those in a variety of different forms to reason about. Is that directionally correct as far as the origin story goes, where my misremembering an evening event I went to what feels like half a lifetime ago?Harry: I'd say the latter, but just because it's a funnier answer. But that's correct. So, Sysdig was created by Loris Degioanni, one of the founders of Wireshark. And when containers and Kubernetes was being incepted, you know, it kind of created this problem where you kind of lacked visibility into what's going on inside these opaque boxes, right? These black boxes which are containers.So, we started using system calls as a source of truth for… I don't want to say observability, but observability, and using those system calls to essentially see what's going on inside containers from the outside. And leveraging system calls, we were able to pull up metrics, such as what are the golden signals of applications running in containers, network traffic. So, it's a very simple way to instrument applications. And that was really how monitoring started. And then Sysdig kind of morphed into a security product.Corey: What was it that drove that transformation? Because generally speaking, when you have a product that's in a particular space that's aimed at a particular niche pivots into something that feels as orthogonal as security don't tend to be something that you see all that often. What did you folks see that wound up driving that change?Harry: The same challenges that were being presented by containers and microservices for monitoring were the same challenges for security. So, for runtime security, it was very difficult for our customers to be able to understand what the heck is going on inside the container. Is a crypto miner being spun up? Is there malicious activity going on? So, it made logical sense to use that same data source - system calls - to understand the monitoring and the security posture of applications.Corey: One of the big challenges out there is that security tends to be one of those pervasive things—I would argue that observability does too—where once you have a position of being able to see what is going on inside of an environment and be able to reason about it. And this goes double for inside of containers, which from a cloud provider perspective, at least seems to be, “Oh, yeah, just give us the containers, we don't care what's going on inside, so we're never going to ask, notice, or care.” And being able to bridge between that lack of visibility between—from the outside of container land and inside of container land has been a perennial problem. There are security implications, there are cost implications, there are observability challenges to be sure, and of course, reliability concerns that flow directly from that, which is, I think, most people, at least historically, contextualize observability. It's a fancy word to describe is the site about to fall over and crash into the sea. At least in my experience. Is that your definition of observability, or if I basically been hijacked by a number of vendors who have decided to relabel what they'd been doing for 15 years as observability?Harry: [laugh]. I think observability is one of those things that is down to interpretation depending on what is the most recent vendor you've been speaking with. But to me, observability is: am I happy? Am I sad? Are my applications happy? Are they sad?Am I able to complete business-critical transactions that keep me online, and keep me afloat? So, it's really as simple as that. There are different ways to implement observability, but it's really, you know, you can't improve the performance, and you can't improve the security posture of things, you can't see, right? So, how do I make sure I can see everything? And what do I do with that data is really what observability means to me.Corey: The entire observability space across the board is really one of those areas that is defined, on some level, by outliers within it. It's easy to wind up saying that any given observability tool will—oh, it alerts you when your application breaks. The problem is that the interesting stuff is often found in the margins, in the outlier products that wind up emerging from it. What is the specific area of that space where Sysdig tends to shine the most?Harry: Yeah, so you're right. The outliers typically cause problems and often you don't know what you don't know. And I think if you look at Kubernetes specifically, there is a whole bunch of new problems and challenges and things that you need to be looking at that didn't exist five to ten years ago, right? There are new things that can break. You know, you've got a pod that's stuck in a CrashLoopBackOff.And hey, I'm a developer who's running my application on Kubernetes. I've got this pod in a CrashLoopBackOff. I don't know what that means. And then suddenly I'm being expected to alert on these problems. Well, how can I alert on things that I didn't even know were a problem?So, one of the things that Sysdig is doing on the observability side is we're looking at all of this data and we're actually presenting opinionated views that help customers make sense of that data. Almost like, you know, I could present this data and give it to my grandma, and she would say, “Oh, yeah, okay. You've got these pods in CrashLoopBackoff you've got these pods that are being CPU throttled. Hey, you know, I didn't know I had to worry about CPU limits, or, you know, memory limits and now I'm suffering, kind of, OOM kills.” So, I think one of the things that's quite unique about Sysdig on the monitoring side that a lot of customers are getting value from is kind of demystifying some of those challenges and making a lot of that data actionable.Corey: At the time of this recording, I've not yet bothered to run Kubernetes in anger by which I, of course, mean production. My production environment is of course called ‘Anger' similarly to the way that my staging environment is called ‘Theory' because things work in theory, but not in production. That is going to be changing in the first quarter of next year, give or take. The challenge with that, though, is that so much has changed—we'll say—since the evolution of Kubernetes into something that is mainstream production in most shops. I stopped working in production environments before that switch really happened, so I'm still at a relatively amateurish level of understanding around a lot of these things.I'm still thinking about old-school problems, like, “Okay, how big do I make each one of the nodes in my Kubernetes cluster?” Yeah, if I get big systems, it's likelier that there will be economies of scale that start factoring in fewer nodes to manage, but it does increase the blast radius if one of those nodes gets affected by something that takes it offline for a while. I'm still at the very early stages of trying to wrap my head around the nuances of running these things in a production environment. Cost is, of course, a separate argument. My clients run it everywhere and I can reason about it surprisingly well for something that is not lending itself to easy understanding it by any sense of the word and you almost have to intuit its existence just by looking at the AWS bill.Harry: No, I like your observations. And I think the last part there around costs is something that I'm seeing a lot in the industry and in our customers is, okay, suddenly, you know, I've got a great monitoring posture, or observability posture, whatever that really means. I've got a great security posture. As customers are maturing in their journey to Kubernetes, suddenly there are a bunch of questions that are being asked from atop—and we've kind of seen this internally—such as, “Hey, what is the ROI of each customer?”Or, “What is the ROI of a specific product line or feature that we deliver to our customers?”And we couldn't answer those problems. And we couldn't answer those problems because we're running a bunch of applications and software on Kubernetes and when we receive our billing reports from the multiple different cloud providers we use— Azure, AWS, and GCP—we just received a big fat bill that was compute, and we were unable to kind of break that down by the different teams and business units, which is a real problem. And one of the problems that we really wanted to start solving, both for internal uses, but also for our customers, as well.Corey: Yeah, when you have a customer coming in, the easy part of the equation is well how much revenue are we getting from a customer? Well, that's easy enough to just wind up polling your finance group and, “Yeah, how much have they paid us this year?” “Great. Good to know.” Then it gets really confusing over on the cost side because it gets into a unit economic model that I think most shops don't have a particularly advanced understanding of.If we have another hundred customers sign up this month, what will it cost us to service them? And what are the variables that change those numbers? It really gets into a fascinating model where people more or less, do some gut checks and some rounding, but there are a bunch of areas where people get extraordinarily confused, start to finish. Kubernetes is very much one of them because from a cloud provider's perspective, it's just a single-tenant app that is really gnarly in terms of its behavior, it does a bunch of different things, and from the bill alone, it's hard to tell that you're even running Kubernetes unless you ask.Harry: Yeah, absolutely. And there was a survey from the CNCF recently that said 68% of folks are seeing increased Kubernetes costs—of course—and 69% of respondents said that they have no cost monitoring in place or just cost estimates, which is simply not good enough, right? People want to break down that line item to those individual business units and in teams. Which is a huge challenge that cloud providers aren't fulfilling today.Corey: Where do you see most of the cost issue breaking down? I mean, there's some of the stuff that we are never allowed to talk about when it comes to cost, which is the realistic assessment that people to work on technology cost more than the technology itself. There's a certain—how do we put this—unflattering perspective that a lot of people are deploying Kubernetes into environments because they want to bolster their own resume, not because it's the actual right answer to anything that they have going on. So, that's a little hit or miss, on some level. I don't know that I necessarily buy into that, but you take a look at the compute storage, you look at the data transfer side, which it seems that almost everyone mostly tends to ignore, despite the fact that Kubernetes itself has no zone affinity, so it has no idea whether its internal communication is free or expensive, and it just adds up to a giant question mark.Then you look at Kubernetes architecture diagrams, or God forbid the CNCF landscape diagram, and realize, oh, my God, they have more of these things, and they do Pokemon, and people give up any hope of understanding it other than just saying, “It's complicated,” and accepting that that's just the way that it is. I'm a little less fatalistic, but I also think it's a heck of a challenge.Harry: Absolutely. I mean, the economics of cloud, right? Why is ingress free, but egress is not free? Why is it so difficult to [laugh] understand that intra AZ traffic is completely billed separately to public traffic, for example? And I think network costs is one thing that is extremely challenging for customers.One, they don't even have that visibility into what is the network traffic: what is internal traffic, what is public traffic. But then there's also a whole bunch of other challenges that are causing Kubernetes costs to rise, right? You've got folks that struggle with setting the right requests for Kubernetes, which ultimately blows up the scale of a Kubernetes cluster. You've got the complexity of AWS, for example, economics of instance types, you know? I don't know whether I need to be running ten m5.xlarge versus four, Graviton instances.And this ability to, kind of, size a cluster correctly as well as size a workload correctly is very, very difficult and customers are not able to establish that baseline today. And obviously, you can't optimize what you can't see, right, so I think a lot of customers struggle with both that visibility. But then the complexity means that it's incredibly difficult to optimize those costs.Corey: You folks are starting to dip your toes in the Kubernetes costing space. What approach are you taking?Harry: Sysdig builds products to Kubernetes first. So, if you look at what we're doing on the monitoring space, we were really kind of pioneered what customers want to get out of Kubernetes observability, and then we were doing similar things for security? So, making sure our security product is, [I want to say,] Kubernetes-native. And what we're doing on the cost side of the things is, of course, there are a lot of cost products out there that will give you the ability to slice and dice by AWS service, for example, but they don't give you that Kubernetes context to then break those costs down by teams and business units. So at Sysdig, we've already been collecting usage information, resource usage information–requests, the container CPU, the memory usage–and a lot of customers have been using that data today for right-sizing, but one of the things they said was, “Hey, I need to quantify this. I need to put a big fat dollar sign in front of some of these numbers we're seeing so I can go to these teams and management and actually prompt them to right-size.”So, it's quite simple. We're essentially augmenting that resource usage information with cost data from cloud providers. So, instead of customers saying, “Hey, I'm wasting one terabyte of memory, they can say, hey, I'm wasting 500 bucks on memory each month,” So, it's very much Kubernetes specific, using a lot of Kubernetes context and metadata.Corey: This episode is sponsored in part by our friends at Uptycs, because they believe that many of you are looking to bolster your security posture with CNAPP and XDR solutions. They offer both cloud and endpoint security in a single UI and data model. Listeners can get Uptycs for up to 1,000 assets through the end of 2023 (that is next year) for $1. But this offer is only available for a limited time on UptycsSecretMenu.com. That's U-P-T-Y-C-S Secret Menu dot com.Corey: Part of the whole problem that I see across the space is that the way to solve some of these problems internally has been when you start trying to divide costs between different teams is well, we're just going to give each one their own cluster, or their own environment. That does definitely solve the problem of shared services. The counterpoint is it solves them by making every team individually incur them. That doesn't necessarily seem like the best approach in every scenario. One thing I have learned, though, is that, for some customers, that is the right approach. Sounds odd, but that's the world we live in where context absolutely matters a lot. I'm very reluctant these days to say at a glance, “Oh, you're doing it wrong.” You eat a whole lot of crow when you do that, it turns out.Harry: I see this a lot. And I see customers giving their own business units, their own AWS account, which I kind of feel like is a step backwards, right? I don't think you're properly harnessing the power of Kubernetes and creating this, kind of, shared tenancy model, when you're giving a team their own AWS account. I think it's important we break down those silos. You know, there's so much operational overhead with maintaining these different accounts, but there must be a better way to address some of these challenges.Corey: It's one of those areas where “it depends” becomes the appropriate answer to almost anything. I'm a fan of having almost every workload have its own AWS account within the same shared AWS organization, then with shared VPCs, which tend to work out. But that does add some complexity to observing how things interact there. One of the guidances that I've given people is assume in the future that in any architecture diagram you ever put up there, that there will be an AWS account boundary between any two resources because someone's going to be doing it somewhere. And that seems to be something that AWS themselves are just slowly starting to awaken to as well. It's getting easier and easier every week to wind up working with multiple accounts in a more complicated structure.Harry: Absolutely. And I think when you start to adopt a multi-cloud strategy, suddenly, you've got so many more increased dimensions. I'm running an application in AWS, Azure, and GCP, and now suddenly, I've got all of these subaccounts. That is an operational overhead that I don't think jives very well, considering there is such a shortage of folks that are real experts—I want to say experts—in operating these environments. And that's really, you know, I think one of the challenges that isn't being spoken enough about today.Corey: It feels like so much of the time that the Kubernetes is winding up being an expression of the same way that getting into microservices was, which is, “Well, we have a people problem, we're going to solve it with this approach.” Great, but then you wind up with people adopting it where they don't have the context that applied when the stuff was originally built and designed for. Like with mono repos. Yeah, it was a problem when you had 5000 developers all try to work on the same thing and stomping each other, so breaking that apart made sense. But the counterpoint of where you wind up with companies with 20 developers and 200 microservices starts to be a little… okay, has this pendulum swung too far?Harry: Yeah, absolutely. And I think that when you've got so many people being thrown at a problem, there's lots of kinds of changes being made, there's new deployments, and I think things can spiral out of control pretty quickly, especially when it comes to costs. “Hey, I'm a developer and I've just made this change. And how do I understand, you know, what is the financial impact of this change?” “Has this blown up my network costs because suddenly, I'm not traversing the right network path?” Or, suddenly, I'm consuming so much more CPU, and actually, there is a physical compute cost of this. There's a lot of cooks in the kitchen and I think that is causing a lot of challenges for organizations.Corey: You've been working in product for a while and one of my favorite parts of being in a position where you are so close to the core of what it is your company does, is that you find it's almost impossible to not continue learning things just based upon how customers take what you built and the problems that they experienced, both that they bring you in to solve, and of course, the new and exciting problems that you wind up causing for them—or to be more charitable surfacing that they didn't realize already existed. What have you learned lately from your customers that you didn't see coming?Harry: One of the biggest problems that I've been seeing is—I speak to a lot of customers and I've maybe spoken to 40 or 50 customers over the last, you know, few months, about a variety of topics, whether it's observability, in general, or, you know, on the financial side, Kubernetes costs–and what I hear about time and time again, regardless as to the vertical or the size of the organization, is the platform teams, the people closest to Kubernetes know their stuff. They get it. But a lot of their internal customers,so the internal business units and teams, they, of course, don't have the same kind of clarity and understanding, and these are the people that are getting the most frustrated. I've been shipping software for 20 years and now I'm modernizing applications, I'm starting to use Kubernetes, I've got so many new different things to learn about that I'm simply drowning, in problems, in cloud-native problems.And I think we forget about that, right? Too often, we kind of spend time throwing fancy technology at the people, such as the, you know, the DevOps engineers, the platform teams, but a lot of internal customers are struggling to leverage that technology to actually solve their own problems. They can't make sense of this data and they can't make the right changes based off of that data.Corey: I would say that is a very common affliction of Kubernetes where so often it winds up handling things that are now abstracted away to the point where we don't need to worry about that. That's true right up until the point where they break and now you have to go diving into the magic. That's one of the reasons that I was such a fan of Sysdig when it first came out was the idea that it was getting into what I viewed at the time as operating system fundamentals and actually seeing what was going on, abstracted away from the vagaries of the code and a lot more into what system calls is it making. Great, okay, now I'm starting to see a lot of calls that it shouldn't necessarily be making, or it's thrashing in a particular way. And it's almost impossible to get to that level of insight—historically—through traditional observability tools, but being able to take a look at what's going on from a more fundamentals point of view was extraordinarily helpful.I'm optimistic if you can get to a point where you're able to do that with Kubernetes, given its enraging ecosystem, for lack of a better term. Whenever you wind up rolling out Kubernetes, you've also got to pick some service delivery stuff, some observability tooling, some log routers, and so on and so forth. It feels like by the time you're running anything in production, you've made so many choices along the way that the odds that anyone else has made the same choices you have are vanishingly small, so you're running your own bespoke unicorn somewhere.Harry: Absolutely. Flip a coin. And that's probably one [laugh] of the solutions that you're going to throw at a problem, right? And you keep flipping that coin and then suddenly, you're going to reach a combination that nobody else has done before. And you're right, the knowledge that you have gained from, I don't know, Corey Quinn Enterprises is probably not going to ring true at Harry Perks Enterprise Limited, right?There is a whole different set of problems and technology and people that, you know, of course, you can bring some of that knowledge along—there are some common denominators—but every organization is ultimately using technology in different ways. Which is problematic, right to the people that are actually pioneering some of these cloud native applications.Corey: Given my professional interest, I am curious about what it is you're doing as you start moving a little bit away from the security and observability sides and into cost observability. How are you approaching that? What are the mistakes that you see people making and how are you meeting them where they are?Harry: The biggest challenge that I am seeing is with sizing workloads and sizing clusters. And I see this time and time again. Our product shines the light on the capacity utilization of compute. And what it really boils down to is two things. Platform teams are not using the correct instance types or the combination of instance types to run the workloads for their teams, their application teams, but also application developers are not setting things like requests correctly.Which makes sense. Again, I flip a coin and maybe that's the request I'm going to set. I used to size a VM with one gig of memory, so now I'm going to size my pod with one gig of memory. But it doesn't really work like that. And of course, when you request usage is essentially my slice of the pizza that's been carved out.And even if I don't see that entire slice of pizza, it's for me, nobody else can use it. So, what we're trying to do is really help customers with that challenge. So, if I'm a developer, I would be looking at the historical usage of our workloads. Maybe it's the maximum usage or, you know, the p99 or the p95 and then setting my workload request to that. You keep doing that over the course of the different team's applications you have and suddenly, you start to establish this baseline of what is the compute actually needed to run all of these applications.And that helps me answer the question, what should I size my cluster to? And that's really important because until you've established that baseline, you can't start to do things like cluster reshaping, to pick a different combination of instance types to power your cluster.Corey: Some level, a lack of diversity in instance types is a bit of a red flag, just because it generally means that someone said, “Oh, yeah, we're going to start with this default instance size and then we'll adjust as time goes on,” and spoilers just like anything else labeled ‘TODO' in your codebase, it never gets done. So, you find yourself pretty quickly in a scenario where some workloads are struggling to get the resources they need inside of whatever that default instance size is, and on the other, you wind up with some things that are more or less running a cron job once a day and sitting there completely idle but running the whole time, regardless. And optimization and right-sizing on a lot of these scenarios is a little bit tricky. I've been something of a, I'll say, a pessimist, when it comes to the idea of right-sizing EC2 instances, just because so many historical workloads are challenging to get recertified on newer instance families and the rest, whereas when we're running on Kubernetes already, presumably everything's built in such a way that it can stop existing in a stateless way and the service still continues to work. If not, it feels like there are some necessary Kubernetes prerequisites that may not have circulated fully internally yet.Harry: Right. And to make this even more complicated, you've got applications that may be more memory intensive or CPU intensive, so understanding the ratio of CPU to memory requirements for their applications depending on how they've been architected makes this more challenging, right? I mean, pods are jumping around and that makes it incredibly difficult to track these movements and actually pick the instances that are going to be most appropriate for my workloads and for my clusters.Corey: I really want to thank you for being so generous with your time. If people want to learn more, where's the best place for them to find you?Harry: sysdig.com is where you can learn more about what Sysdig is doing as a company and our platform in general.Corey: And we will, of course, put a link to that in the show notes. Thank you so much for your time. I appreciate it.Harry: Thank you, Corey. Hope to speak to you again soon.Corey: Harry Perks, principal product manager at Sysdig. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment that we will lose track of because we don't know where it was automatically provisioned.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news. The post Network Break 410: AWS Previews Secure Remote Access; Broadcom Promises Not To Raise VMware Prices appeared first on Packet Pushers.
On today's Network Break we discuss new AWS previews for secure remote access and for connecting applications and services across VPCs. We also discuss a serious outage at Hive Social, Open RAN 5G coming to fighter jets, a promise from Broadcom not to raise prices if the VMware acquisition goes through, and more IT news.
About SamSam Nicholls: Veeam's Director of Public Cloud Product Marketing, with 10+ years of sales, alliance management and product marketing experience in IT. Sam has evolved from his on-premises storage days and is now laser-focused on spreading the word about cloud-native backup and recovery, packing in thousands of viewers on his webinars, blogs and webpages.Links Referenced: Veeam AWS Backup: https://www.veeam.com/aws-backup.html Veeam: https://veeam.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. Tired of observability costs going up every year without getting additional value? Or being locked in to a vendor due to proprietary data collection, querying and visualization? Modern day, containerized environments require a new kind of observability technology that accounts for the massive increase in scale and attendant cost of data. With Chronosphere, choose where and how your data is routed and stored, query it easily, and get better context and control. 100% open source compatibility means that no matter what your setup is, they can help. Learn how Chronosphere provides complete and real-time insight into ECS, EKS, and your microservices, whereever they may be at snark.cloud/chronosphere That's snark.cloud/chronosphere Corey: This episode is brought to us by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out. Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted guest episode is brought to us by and sponsored by our friends over at Veeam. And as a part of that, they have thrown one of their own to the proverbial lion. My guest today is Sam Nicholls, Director of Public Cloud over at Veeam. Sam, thank you for joining me.Sam: Hey. Thanks for having me, Corey, and thanks for everyone joining and listening in. I do know that I've been thrown into the lion's den, and I am [laugh] hopefully well-prepared to answer anything and everything that Corey throws my way. Fingers crossed. [laugh].Corey: I don't think there's too much room for criticizing here, to be direct. I mean, Veeam is a company that is solidly and thoroughly built around a problem that absolutely no one cares about. I mean, what could possibly be wrong with that? You do backups; which no one ever cares about. Restores, on the other hand, people care very much about restores. And that's when they learn, “Oh, I really should have cared about backups at any point prior to 20 minutes ago.”Sam: Yeah, it's a great point. It's kind of like taxes and insurance. It's almost like, you know, something that you have to do that you don't necessarily want to do, but when push comes to shove, and something's burning down, a file has been deleted, someone's made their way into your account and, you know, running a right mess within there, that's when you really, kind of, care about what you mentioned, which is the recovery piece, the speed of recovery, the reliability of recovery.Corey: It's been over a decade, and I'm still sore about losing my email archives from 2006 to 2009. There's no way to get it back. I ran my own mail server; it was an iPhone setting that said, “Oh, yeah, automatically delete everything in your trash folder—or archive folder—after 30 days.” It was just a weird default setting back in that era. I didn't realize it was doing that. Yeah, painful stuff.And we learned the hard way in some of these cases. Not that I really have much need for email from that era of my life, but every once in a while it still bugs me. Which gets speaks to the point that the people who are the most fanatical about backing things up are the people who have been burned by not having a backup. And I'm fortunate in that it wasn't someone else's data with which I had been entrusted that really cemented that lesson for me.Sam: Yeah, yeah. It's a good point. I could remember a few years ago, my wife migrated a very aging, polycarbonate white Mac to one of the shiny new aluminum ones and thought everything was good—Corey: As the white polycarbonate Mac becomes yellow, then yeah, all right, you know, it's time to replace it. Yeah. So yeah, so she wiped the drive, and what happened?Sam: That was her moment where she learned the value and importance of backup unless she backs everything up now. I fortunately have never gone through it. But I'm employed by a backup vendor and that's why I care about it. But it's incredibly important to have, of course.Corey: Oh, yes. My spouse has many wonderful qualities, but one that drives me slightly nuts is she's something of a digital packrat where her hard drives on her laptop will periodically fill up. And I used to take the approach of oh, you can be more efficient and do the rest. And I realized no, telling other people they're doing it wrong is generally poor practice, whereas just buying bigger drives is way easier. Let's go ahead and do that. It's small price to pay for domestic tranquility.And there's a lesson in that. We can map that almost perfectly to the corporate world where you folks tend to operate in. You're not doing home backup, last time I checked; you are doing public cloud backup. Actually, I should ask that. Where do you folks start and where do you stop?Sam: Yeah, no, it's a great question. You know, we started over 15 years ago when virtualization, specifically VMware vSphere, was really the up-and-coming thing, and, you know, a lot of folks were there trying to utilize agents to protect their vSphere instances, just like they were doing with physical Windows and Linux boxes. And, you know, it kind of got the job done, but was it the best way of doing it? No. And that's kind of why Veeam was pioneered; it was this agentless backup, image-based backup for vSphere.And, of course, you know, in the last 15 years, we've seen lots of transitions, of course, we're here at Screaming in the Cloud, with you, Corey, so AWS, as well as a number of other public cloud vendors we can help protect as well, as a number of SaaS applications like Microsoft 365, metadata and data within Salesforce. So, Veeam's really kind of come a long way from just virtual machines to really taking a global look at the entirety of modern environments, and how can we best protect each and every single one of those without trying to take a square peg and fit it in a round hole?Corey: It's a good question and a common one. We wind up with an awful lot of folks who are confused by the proliferation of data. And I'm one of them, let's be very clear here. It comes down to a problem where backups are a multifaceted, deep problem, and I don't think that people necessarily think of it that way. But I take a look at all of the different, even AWS services that I use for my various nonsense, and which ones can be used to store data?Well, all of them. Some of them, you have to hold it in a particularly wrong sort of way, but they all store data. And in various contexts, a lot of that data becomes very important. So, what service am I using, in which account am I using, and in what region am I using it, and you wind up with data sprawl, where it's a tremendous amount of data that you can generally only track down by looking at your bills at the end of the month. Okay, so what am I being charged, and for what service?That seems like a good place to start, but where is it getting backed up? How do you think about that? So, some people, I think, tend to ignore the problem, which we're seeing less and less, but other folks tend to go to the opposite extreme and we're just going to backup absolutely everything, and we're going to keep that data for the rest of our natural lives. It feels to me that there's probably an answer that is more appropriate somewhere nestled between those two extremes.Sam: Yeah, snapshot sprawl is a real thing, and it gets very, very expensive very, very quickly. You know, your snapshots of EC2 instances are stored on those attached EBS volumes. Five cents per gig per month doesn't sound like a lot, but when you're dealing with thousands of snapshots for thousands machines, it gets out of hand very, very quickly. And you don't know when to delete them. Like you say, folks are just retaining them forever and dealing with this unfortunate bill shock.So, you know, where to start is automating the lifecycle of a snapshot, right, from its creation—how often do we want to be creating them—from the retention—how long do we want to keep these for—and where do we want to keep them because there are other storage services outside of just EBS volumes. And then, of course, the ultimate: deletion. And that's important even from a compliance perspective as well, right? You've got to retain data for a specific number of years, I think healthcare is like seven years, but then you've—Corey: And then not a day more.Sam: Yeah, and then not a day more because that puts you out of compliance, too. So, policy-based automation is your friend and we see a number of folks building these policies out: gold, silver, bronze tiers based on criticality of data compliance and really just kind of letting the machine do the rest. And you can focus on not babysitting backup.Corey: What was it that led to the rise of snapshots? Because back in my very early days, there was no such thing. We wound up using a bunch of servers stuffed in a rack somewhere and virtualization was not really in play, so we had file systems on physical disks. And how do you back that up? Well, you have an agent of some sort that basically looks at all the files and according to some ruleset that it has, it copies them off somewhere else.It was slow, it was fraught, it had a whole bunch of logic that was pushed out to the very edge, and forget about restoring that data in a timely fashion or even validating a lot of those backups worked other than via checksum. And God help you if you had data that was constantly in the state of flux, where anything changing during the backup run would leave your backups in an inconsistent state. That on some level seems to have largely been solved by snapshots. But what's your take on it? You're a lot closer to this part of the world than I am.Sam: Yeah, snapshots, I think folks have turned to snapshots for the speed, the lack of impact that they have on production performance, and again, just the ease of accessibility. We have access to all different kinds of snapshots for EC2, RDS, EFS throughout the entirety of our AWS environment. So, I think the snapshots are kind of like the default go-to for folks. They can help deliver those very, very quick RPOs, especially in, for example, databases, like you were saying, that change very, very quickly and we all of a sudden are stranded with a crash-consistent backup or snapshot versus an application-consistent snapshot. And then they're also very, very quick to recover from.So, snapshots are very, very appealing, but they absolutely do have their limitations. And I think, you know, it's not a one or the other; it's that they've got to go hand-in-hand with something else. And typically, that is an image-based backup that is stored in a separate location to the snapshot because that snapshot is not independent of the disk that it is protecting.Corey: One of the challenges with snapshots is most of them are created in a copy-on-write sense. It takes basically an instant frozen point in time back—once upon a time when we ran MySQL databases on top of the NetApp Filer—which works surprisingly well—we would have a script that would automatically quiesce the database so that it would be in a consistent state, snapshot the file and then un-quiesce it, which took less than a second, start to finish. And that was awesome, but then you had this snapshot type of thing. It wasn't super portable, it needed to reference a previous snapshot in some cases, and AWS takes the same approach where the first snapshot it captures every block, then subsequent snapshots wind up only taking up as much size as there have been changes since the first snapshots. So, large quantities of data that generally don't get access to a whole lot have remarkably small, subsequent snapshot sizes.But that's not at all obvious from the outside, and looking at these things. They're not the most portable thing in the world. But it's definitely the direction that the industry has trended in. So, rather than having a cron job fire off an AWS API call to take snapshots of my volumes as a sort of the baseline approach that we all started with, what is the value proposition that you folks bring? And please don't say it's, “Well, cron jobs are hard and we have a friendlier interface for that.”Sam: [laugh]. I think it's really starting to look at the proliferation of those snapshots, understanding what they're good at, and what they are good for within your environment—as previously mentioned, low RPOs, low RTOs, how quickly can I take a backup, how frequently can I take a backup, and more importantly, how quickly can I restore—but then looking at their limitations. So, I mentioned that they were not independent of that disk, so that certainly does introduce a single point of failure as well as being not so secure. We've kind of touched on the cost component of that as well. So, what Veeam can come in and do is then take an image-based backup of those snapshots, right—so you've got your initial snapshot and then your incremental ones—we'll take the backup from that snapshot, and then we'll start to store that elsewhere.And that is likely going to be in a different account. We can look at the Well-Architected Framework, AWS deeming accounts as a security boundary, so having that cross-account function is critically important so you don't have that single point of failure. Locking down with IAM roles is also incredibly important so we haven't just got a big wide open door between the two. But that data is then stored in a separate account—potentially in a separate region, maybe in the same region—Amazon S3 storage. And S3 has the wonderful benefit of being still relatively performant, so we can have quick recoveries, but it is much, much cheaper. You're dealing with 2.3 cents per gig per month, instead of—Corey: To start, and it goes down from there with sizeable volumes.Sam: Absolutely, yeah. You can go down to S3 Glacier, where you're looking at, I forget how many points and zeros and nines it is, but it's fractions of a cent per gig per month, but it's going to take you a couple of days to recover that da—Corey: Even infrequent access cuts that in half.Sam: Oh yeah.Corey: And let's be clear, these are snapshot backups; you probably should not be accessing them on a consistent, sustained basis.Sam: Well, exactly. And this is where it's kind of almost like having your cake and eating it as well. Compliance or regulatory mandates or corporate mandates are saying you must keep this data for this length of time. Keeping that—you know, let's just say it's three years' worth of snapshots in an EBS volume is going to be incredibly expensive. What's the likelihood of you needing to recover something from two years—actually, even two months ago? It's very, very small.So, the performance part of S3 is, you don't need to take it as much into consideration. Can you recover? Yes. Is it going to take a little bit longer? Absolutely. But it's going to help you meet those retention requirements while keeping your backup bill low, avoiding that bill shock, right, spending tens and tens of thousands every single month on snapshots. This is what I mean by kind of having your cake and eating it.Corey: I somewhat recently have had a client where EBS snapshots are one of the driving costs behind their bill. It is one of their largest single line items. And I want to be very clear here because if one of those people who listen to this and thinking, “Well, hang on. Wait, they're telling stories about us, even though they're not naming us by name?” Yeah, there were three of you in the last quarter.So, at that point, it becomes clear it is not about something that one individual company has done and more about an overall driving trend. I am personalizing it a little bit by referring to as one company when there were three of you. This is a narrative device, not me breaking confidentiality. Disclaimer over. Now, when you talk to people about, “So, tell me why you've got 80 times more snapshots than you do EBS volumes?” The answer is as, “Well, we wanted to back things up and we needed to get hourly backups to a point, then daily backups, then monthly, and so on and so forth. And when this was set up, there wasn't a great way to do this natively and we don't always necessarily know what we need versus what we don't. And the cost of us backing this up, well, you can see it on the bill. The cost of us deleting too much and needing it as soon as we do? Well, that cost is almost incalculable. So, this is the safe way to go.” And they're not wrong in anything that they're saying. But the world has definitely evolved since then.Sam: Yeah, yeah. It's a really great point. Again, it just folds back into my whole having your cake and eating it conversation. Yes, you need to retain data; it gives you that kind of nice, warm, cozy feeling, it's a nice blanket on a winter's day that that data, irrespective of what happens, you're going to have something to recover from. But the question is does that need to be living on an EBS volume as a snapshot? Why can't it be living on much, much more cost-effective storage that's going to give you the warm and fuzzies, but is going to make your finance team much, much happier [laugh].Corey: One of the inherent challenges I think people have is that snapshots by themselves are almost worthless, in that I have an EBS snapshot, it is sitting there now, it's costing me an undetermined amount of money because it's not exactly clear on a per snapshot basis exactly how large it is, and okay, great. Well, I'm looking for a file that was not modified since X date, as it was on this time. Well, great, you're going to have to take that snapshot, restore it to a volume and then go exploring by hand. Oh, it was the wrong one. Great. Try it again, with a different one.And after, like, the fifth or six in a row, you start doing a binary search approach on this thing. But it's expensive, it's time-consuming, it takes forever, and it's not a fun user experience at all. Part of the problem is it seems that historically, backup systems have no context or no contextual awareness whatsoever around what is actually contained within that backup.Sam: Yeah, yeah. I mean, you kind of highlighted two of the steps. It's more like a ten-step process to do, you know, granular file or folder-level recovery from a snapshot, right? You've got to, like you say, you've got to determine the point in time when that, you know, you knew the last time that it was around, then you're going to have to determine the volume size, the region, the OS, you're going to have to create an EBS volume of the same size, region, from that snapshot, create the EC2 instance with the same OS, connect the two together, boot the EC2 instance, mount the volume search for the files to restore, download them manually, at which point you have your file back. It's not back in the machine where it was, it's now been downloaded locally to whatever machine you're accessing that from. And then you got to tear it all down.And that is again, like you say, predicated on the fact that you knew exactly that that was the right time. It might not be and then you have to start from scratch from a different point in time. So, backup tooling from backup vendors that have been doing this for many, many years, knew about this problem long, long ago, and really seek to not only automate the entirety of that process but make the whole e-discovery, the search, the location of those files, much, much easier. I don't necessarily want to do a vendor pitch, but I will say with Veeam, we have explorer-like functionality, whereby it's just a simple web browser. Once that machine is all spun up again, automatic process, you can just search for your individual file, folder, locate it, you can download it locally, you can inject it back into the instance where it was through Amazon Kinesis or AWS Kinesis—I forget the right terminology for it; some of its AWS, some of its Amazon.But by-the-by, the whole recovery process, especially from a file or folder level, is much more pain-free, but also much faster. And that's ultimately what people care about how reliable is my backup? How quickly can I get stuff online? Because the time that I'm down is costing me an indescribable amount of time or money.Corey: This episode is sponsored in part by our friends at Redis, the company behind the incredibly popular open source database. If you're tired of managing open source Redis on your own, or if you are looking to go beyond just caching and unlocking your data's full potential, these folks have you covered. Redis Enterprise is the go-to managed Redis service that allows you to reimagine how your geo-distributed applications process, deliver, and store data. To learn more from the experts in Redis how to be real-time, right now, from anywhere, visit redis.com/duckbill. That's R - E - D - I - S dot com slash duckbill.Corey: Right, the idea of RPO versus RTO: recovery point objective and recovery time objective. With an RPO, it's great, disaster strikes right now, how long is acceptable to it have been since the last time we backed up data to a restorable point? Sometimes it's measured in minutes, sometimes it's measured in fractions of a second. It really depends on what we're talking about. Payments databases, that needs to be—the RPO is basically an asymptotically approaches zero.The RTO is okay, how long is acceptable before we have that data restored and are back up and running? And that is almost always a longer time, but not always. And there's a different series of trade-offs that go into that. But both of those also presuppose that you've already dealt with the existential question of is it possible for us to recover this data. And that's where I know that you are obviously—you have a position on this that is informed by where you work, but I don't, and I will call this out as what I see in the industry: AWS backup is compelling to me except for one fatal flaw that it has, and that is it starts and stops with AWS.I am not a proponent of multi-cloud. Lord knows I've gotten flack for that position a bunch of times, but the one area where it makes absolute sense to me is backups. Have your data in a rehydrate-the-business level state backed up somewhere that is not your primary cloud provider because you're otherwise single point of failure-ing through a company, through the payment instrument you have on file with that company, in the blast radius of someone who can successfully impersonate you to that vendor. There has to be a gap of some sort for the truly business-critical data. Yes, egress to other providers is expensive, but you know what also is expensive? Irrevocably losing the data that powers your business. Is it likely? No, but I would much rather do it than have to justify why I'm not doing it.Sam: Yeah. Wasn't likely that I was going to win that 2 billion or 2.1 billion on the Powerball, but [laugh] I still play [laugh]. But I understand your standpoint on multi-cloud and I read your newsletters and understand where you're coming from, but I think the reality is that we do live in at least a hybrid cloud world, if not multi-cloud. The number of organizations that are sole-sourced on a single cloud and nothing else is relatively small, single-digit percentage. It's around 80-some percent that are hybrid, and the remainder of them are your favorite: multi-cloud.But again, having something that is one hundred percent sole-source on a single platform or a single vendor does expose you to a certain degree of risk. So, having the ability to do cross-platform backups, recoveries, migrations, for whatever reason, right, because it might not just be a disaster like you'd mentioned, it might also just be… I don't know, the company has been taken over and all of a sudden, the preference is now towards another cloud provider and I want you to refactor and re-architect everything for this other cloud provider. If all that data is locked into one platform, that's going to make your job very, very difficult. So, we mentioned at the beginning of the call, Veeam is capable of protecting a vast number of heterogeneous workloads on different platforms, in different environments, on-premises, in multiple different clouds, but the other key piece is that we always use the same backup file format. And why that's key is because it enables portability.If I have backups of EC2 instances that are stored in S3, I could copy those onto on-premises disk, I could copy those into Azure, I could do the same with my Azure VMs and store those on S3, or again, on-premises disk, and any other endless combination that goes with that. And it's really kind of centered around, like control and ownership of your data. We are not prescriptive by any means. Like, you do what is best for your organization. We just want to provide you with the toolset that enables you to do that without steering you one direction or the other with fee structures, disparate feature sets, whatever it might be.Corey: One of the big challenges that I keep seeing across the board is just a lack of awareness of what the data that matters is, where you see people backing up endless fleets of web server instances that are auto-scaled into existence and then removed, but you can create those things at will; why do you care about the actual data that's on these things? It winds up almost at the library management problem, on some level. And in that scenario, snapshots are almost certainly the wrong answer. One thing that I saw previously that really changed my way of thinking about this was back many years ago when I was working at a startup that had just started using GitHub and they were paying for a third-party service that wound up backing up Git repos. Today, that makes a lot more sense because you have a bunch of other stuff on GitHub that goes well beyond the stuff contained within Git, but at the time, it was silly. It was, why do that? Every Git clone is a full copy of the entire repository history. Just grab it off some developer's laptop somewhere.It's like, “Really? You want to bet the company, slash your job, slash everyone else's job on that being feasible and doable or do you want to spend the 39 bucks a month or whatever it was to wind up getting that out the door now so we don't have to think about it, and they validate that it works?” And that was really a shift in my way of thinking because, yeah, backing up things can get expensive when you have multiple copies of the data living in different places, but what's really expensive is not having a company anymore.Sam: Yeah, yeah, absolutely. We can tie it back to my insurance dynamic earlier where, you know, it's something that you know that you have to have, but you don't necessarily want to pay for it. Well, you know, just like with insurances, there's multiple different ways to go about recovering your data and it's only in crunch time, do you really care about what it is that you've been paying for, right, when it comes to backup?Could you get your backup through a git clone? Absolutely. Could you get your data back—how long is that going to take you? How painful is that going to be? What's going to be the impact to the business where you're trying to figure that out versus, like you say, the 39 bucks a month, a year, or whatever it might be to have something purpose-built for that, that is going to make the recovery process as quick and painless as possible and just get things back up online.Corey: I am not a big fan of the fear, uncertainty, and doubt approach, but I do practice what I preach here in that yeah, there is a real fear against data loss. It's not, “People are coming to get you, so you absolutely have to buy whatever it is I'm selling,” but it is something you absolutely have to think about. My core consulting proposition is that I optimize the AWS bill. And sometimes that means spending more. Okay, that one S3 bucket is extremely important to you and you say you can't sustain the loss of it ever so one zone is not an option. Where is it being backed up? Oh, it's not? Yeah, I suggest you spend more money and back that thing up if it's as irreplaceable as you say. It's about doing the right thing.Sam: Yeah, yeah, it's interesting, and it's going to be hard for you to prove the value of doing that when you are driving their bill up when you're trying to bring it down. But again, you have to look at something that's not itemized on that bill, which is going to be the impact of downtime. I'm not going to pretend to try and recall the exact figures because it also varies depending on your business, your industry, the size, but the impact of downtime is massive financially. Tens of thousands of dollars for small organizations per hour, millions and millions of dollars per hour for much larger organizations. The backup component of that is relatively small in comparison, so having something that is purpose-built, and is going to protect your data and help mitigate that impact of downtime.Because that's ultimately what you're trying to protect against. It is the recovery piece that you're buying is the most important piece. And like you, I would say, at least be cognizant of it and evaluate your options and what can you live with and what can you live without.Corey: That's the big burning question that I think a lot of people do not have a good answer to. And when you don't have an answer, you either backup everything or nothing. And I'm not a big fan of doing either of those things blindly.Sam: Yeah, absolutely. And I think this is why we see varying different backup options as well, you know? You're not going to try and apply the same data protection policies each and every single workload within your environment because they've all got different types of workload criticality. And like you say, some of them might not even need to be backed up at all, just because they don't have data that needs to be protected. So, you need something that is going to be able to be flexible enough to apply across the entirety of your environment, protect it with the right policy, in terms of how frequently do you protect it, where do you store it, how often, or when are you eventually going to delete that and apply that on a workload by workload basis. And this is where the joy of things like tags come into play as well.Corey: One last thing I want to bring up is that I'm a big fan of watching for companies saying the quiet part out loud. And one area in which they do this—because they're forced to by brevity—is in the title tag of their website. I pull up veeam.com and I hover over the tab in my browser, and it says, “Veeam Software: Modern Data Protection.”And I want to call that out because you're not framing it as explicitly backup. So, the last topic I want to get into is the idea of security. Because I think it is not fully appreciated on a lived-experience basis—although people will of course agree to this when they're having ivory tower whiteboard discussions—that every place your data lives is a potential for a security breach to happen. So, you want to have your data living in a bunch of places ideally, for backup and resiliency purposes. But you also want it to be completely unworkable or illegible to anyone who is not authorized to have access to it.How do you balance those trade-offs yourself given that what you're fundamentally saying is, “Trust us with your Holy of Holies when it comes to things that power your entire business?” I mean, I can barely get some companies to agree to show me their AWS bill, let alone this is the data that contains all of this stuff to destroy our company.Sam: Yeah. Yeah, it's a great question. Before I explicitly answer that piece, I will just go to say that modern data protection does absolutely have a security component to it, and I think that backup absolutely needs to be a—I'm going to say this an air quotes—a “first class citizen” of any security strategy. I think when people think about security, their mind goes to the preventative, like how do we keep these bad people out?This is going to be a bit of the FUD that you love, but ultimately, the bad guys on the outside have an infinite number of attempts to get into your environment and only have to be right once to get in and start wreaking havoc. You on the other hand, as the good guy with your cape and whatnot, you have got to be right each and every single one of those times. And we as humans are fallible, right? None of us are perfect, and it's incredibly difficult to defend against these ever-evolving, more complex attacks. So backup, if someone does get in, having a clean, verifiable, recoverable backup, is really going to be the only thing that is going to save your organization, should that actually happen.And what's key to a secure backup? I would say separation, isolation of backup data from the production data, I would say utilizing things like immutability, so in AWS, we've got Amazon S3 object lock, so it's that write once, read many state for whatever retention period that you put on it. So, the data that they're seeking to encrypt, whether it's in production or in their backup, they cannot encrypt it. And then the other piece that I think is becoming more and more into play, and it's almost table stakes is encryption, right? And we can utilize things like AWS KMS for that encryption.But that's there to help defend against the exfiltration attempts. Because these bad guys are realizing, “Hey, people aren't paying me my ransom because they're just recovering from a clean backup, so now I'm going to take that backup data, I'm going to leak the personally identifiable information, trade secrets, or whatever on the internet, and that's going to put them in breach compliance and give them a hefty fine that way unless they pay me my ransom.” So encryption, so they can't read that data. So, not only can they not change it, but they can't read it is equally important. So, I would say those are the three big things for me on what's needed for backup to make sure it is clean and recoverable.Corey: I think that is one of those areas where people need to put additional levels of thought in. I think that if you have access to the production environment and have full administrative rights throughout it, you should definitionally not—at least with that account and ideally not you at all personally—have access to alter the backups. Full stop. I would say, on some level, there should not be the ability to alter backups for some particular workloads, the idea being that if you get hit with a ransomware infection, it's pretty bad, let's be clear, but if you can get all of your data back, it's more of an annoyance than it is, again, the existential business crisis that becomes something that redefines you as a company if you still are a company.Sam: Yeah. Yeah, I mean, we can turn to a number of organizations. Code Spaces always springs to mind for me, I love Code Spaces. It was kind of one of those precursors to—Corey: It's amazing.Sam: Yeah, but they were running on AWS and they had everything, production and backups, all stored in one account. Got into the account. “We're going to delete your data if you don't pay us this ransom.” They were like, “Well, we're not paying you the ransoms. We got backups.” Well, they deleted those, too. And, you know, unfortunately, Code Spaces isn't around anymore. But it really kind of goes to show just the importance of at least logically separating your data across different accounts and not having that god-like access to absolutely everything.Corey: Yeah, when you talked about Code Spaces, I was in [unintelligible 00:32:29] talking about GitHub Codespaces specifically, where they have their developer workstations in the cloud. They're still very much around, at least last time I saw unless you know something I don't.Sam: Precursor to that. I can send you the link—Corey: Oh oh—Sam: You can share it with the listeners.Corey: Oh, yes, please do. I'd love to see that.Sam: Yeah. Yeah, absolutely.Corey: And it's been a long and strange time in this industry. Speaking of links for the show notes, I appreciate you're spending so much time with me. Where can people go to learn more?Sam: Yeah, absolutely. I think veeam.com is kind of the first place that people gravitate towards. Me personally, I'm kind of like a hands-on learning kind of guy, so we always make free product available.And then you can find that on the AWS Marketplace. Simply search ‘Veeam' through there. A number of free products; we don't put time limits on it, we don't put feature limitations. You can backup ten instances, including your VPCs, which we actually didn't talk about today, but I do think is important. But I won't waste any more time on that.Corey: Oh, configuration of these things is critically important. If you don't know how everything was structured and built out, you're basically trying to re-architect from first principles based upon archaeology.Sam: Yeah [laugh], that's a real pain. So, we can help protect those VPCs and we actually don't put any limitations on the number of VPCs that you can protect; it's always free. So, if you're going to use it for anything, use it for that. But hands-on, marketplace, if you want more documentation, want to learn more, want to speak to someone veeam.com is the place to go.Corey: And we will, of course, include that in the show notes. Thank you so much for taking so much time to speak with me today. It's appreciated.Sam: Thank you, Corey, and thanks for all the listeners tuning in today.Corey: Sam Nicholls, Director of Public Cloud at Veeam. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that takes you two hours to type out but then you lose it because you forgot to back it up.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
An airhacks.fm conversation with Mark Sailes (@MarkSailes3) about: CRaC API, C1 and C2 compilers, GraalVM and Random, CRaC and Stateful EJB beans, Lambda SnapStart and snapshotting the Firecracker VM, the CraC resource interface and listener methods, priming the critical path, Quarkus with MicroProfile AWS on Lambda CDK template, Plain Java AWS Lambda with CDK template, SDKs calls in the beforeCheckpoint hook, SnapStart state never leaves the region, SnapStart state is cached in caches within Availability Zones, SnapStart is available within VPCs, only versioned AWS Lambdas can be optimized, Provisioned Concurrency and SnapStart, The Other Feature of AWS Lambda Provisioned Concurrency — Saving Money, A serverless journey: AWS Lambda under the hood provisioned concurrency and EC 2 reserved instances, AWS Lambda function starts at bare metal, Mark Sailes on twitter: @MarkSailes3
About ClintonClinton Herget is Field CTO at Snyk, the leader is Developer Security. He focuses on helping Snyk's strategic customers on their journey to DevSecOps maturity. A seasoned technnologist, Cliton spent his 20-year career prior to Snyk as a web software developer, DevOps consultant, cloud solutions architect, and engineering director. Cluinton is passionate about empowering software engineering to do their best work in the chaotic cloud-native world, and is a frequent conference speaker, developer advocate, and technical thought leader.Links Referenced: Snyk: https://snyk.io/ duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is brought to us in part by our friends at Pinecone. They believe that all anyone really wants is to be understood, and that includes your users. AI models combined with the Pinecone vector database let your applications understand and act on what your users want… without making them spell it out.Make your search application find results by meaning instead of just keywords, your personalization system make picks based on relevance instead of just tags, and your security applications match threats by resemblance instead of just regular expressions. Pinecone provides the cloud infrastructure that makes this easy, fast, and scalable. Thanks to my friends at Pinecone for sponsoring this episode. Visit Pinecone.io to understand more.Corey: This episode is bought to you in part by our friends at Veeam. Do you care about backups? Of course you don't. Nobody cares about backups. Stop lying to yourselves! You care about restores, usually right after you didn't care enough about backups. If you're tired of the vulnerabilities, costs and slow recoveries when using snapshots to restore your data, assuming you even have them at all living in AWS-land, there is an alternative for you. Check out Veeam, thats V-E-E-A-M for secure, zero-fuss AWS backup that won't leave you high and dry when it's time to restore. Stop taking chances with your data. Talk to Veeam. My thanks to them for sponsoring this ridiculous podcast.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. One of the fun things about establishing traditions is that the first time you do it, you don't really know that that's what's happening. Almost exactly a year ago, I sat down for a previous promoted guest episode much like this one, With Clinton Herget at Snyk—or Synic; however you want to pronounce that. He is apparently a scarecrow of some sorts because when last we spoke, he was a principal solutions engineer, but like any good scarecrow, he was outstanding in his field, and now, as a result, is a Field CTO. Clinton, Thanks for coming back, and let me start by congratulating you on the promotion. Or consoling you depending upon how good or bad it is.Clinton: You know, Corey, a little bit of column A, a little bit of column B. But very glad to be here again, and frankly, I think it's because you insist on mispronouncing Snyk as Synic, and so you get me again.Corey: Yeah, you could add a couple of new letters to it and just call the company [Synack 00:01:27]. Now, it's a hard pivot to a networking company. So, there's always options.Clinton: I acknowledge what you did there, Corey.Corey: I like that quite a bit. I wasn't sure you'd get it.Clinton: I'm a nerd going way, way back, so we'll have to go pretty deep in the stack for you to stump me on some of this stuff.Corey: As we did with the, “I wasn't sure you'd get it.” See that one sailed right past you. And I win. Chalk another one up for me and the networking pun wars. Great, we'll loop back for that later.Clinton: I don't even know where I am right now.Corey: [laugh]. So, let's go back to a question that one would think that I'd already established a year ago, but I have the attention span of basically a goldfish, let's not kid ourselves. So, as I'm visiting the Snyk website, I find that it says different words than it did a year ago, which is generally a sign that is positive; when nothing's been updated including the copyright date, things are going really well or really badly. One wonders. But no, now you're talking about Snyk Cloud, you're talking about several other offerings as well, and my understanding of what it is you folks do no longer appears to be completely accurate. So, let me be direct. What the hell do you folks do over there?Clinton: It's a really great question. Glad you asked me on a year later to answer it. I would say at a very high level, what we do hasn't changed. However, I think the industry has certainly come a long way in the past couple years and our job is to adapt to that Snyk—again, pronounced like a pair of sneakers are sneaking around—it's a developer security platform. So, we focus on enabling the people who build applications—which as of today, means modern applications built in the cloud—to have better visibility, and ultimately a better chance of mitigating the risk that goes into those applications when it matters most, which is actually in their workflow.Now, you're exactly right. Things have certainly expanded in that remit because the job of a software engineer is very different, I think this year than it even was last year, and that's continually evolving over time. As a developer now, I'm doing a lot more than I was doing a few years ago. And one of the things I'm doing is building infrastructure in the cloud, I'm writing YAML files, I'm writing CloudFormation templates to deploy things out to AWS. And what happens in the cloud has a lot to do with the risk to my organization associated with those applications that I'm building.So, I'd love to talk a little bit more about why we decided to make that move, but I don't think that represents a watering down of what we're trying to do at Snyk. I think it recognizes that developer security vision fundamentally can't exist without some understanding of what's happening in the cloud.Corey: One of the things that always scares me is—and sets the spidey sense tingling—is when I see a company who has a product, and I'm familiar—ish—with what they do. And then they take their product name and slap the word cloud at the end, which is almost always codes to, “Okay, so we took the thing that we sold in boxes in data centers, and now we're making a shitty hosted version available because it turns out you rubes will absolutely pay a subscription for it.” Yeah, I don't get the sense that at all is what you're doing. In fact, I don't believe that you're offering a hosted managed service at the moment, are you?Clinton: No, the cloud part, that fundamentally refers to a new product, an offering that looks at the security or potentially the risks being introduced into cloud infrastructure, by now the engineers who were doing it who are writing infrastructure as code. We previously had an infrastructure-as-code security product, and that served alongside our static analysis tool which is Snyk Code, our open-source tool, our container scanner, recognizing that the kinds of vulnerabilities you can potentially introduce in writing cloud infrastructure are not only bad to the organization on their own—I mean, nobody wants to create an S3 bucket that's wide open to the world—but also, those misconfigurations can increase the blast radius of other kinds of vulnerabilities in the stack. So, I think what it does is it recognizes that, as you and I think your listeners well know, Corey, there's no such thing as the cloud, right? The cloud is just a bunch of fancy software designed to abstract away from the fact that you're running stuff on somebody else's computer, right?Corey: Unfortunately, in this case, the fact that you're calling it Snyk Cloud does not mean that you're doing what so many other companies in that same space do it would have led to a really short interview because I have no faith that it's the right path forward, especially for you folks, where it's, “Oh, you want to be secure? You've got to host your stuff on our stuff instead. That's why we called it cloud.” That's the direction that I've seen a lot of folks try and pivot in, and I always find it disastrous. It's, “Yeah, well, at Snyk if we run your code or your shitty applications here in our environment, it's going to be safer than if you run it yourself on something untested like AWS.” And yeah, those stories hold absolutely no water. And may I just say, I'm gratified that's not what you're doing?Clinton: Absolutely not. No, I would say we have no interest in running anyone's applications. We do want to scan them though, right? We do want to give the developers insight into the potential misconfigurations, the risks, the vulnerabilities that you're introducing. What sets Snyk apart, I think, from others in that application security testing space is we focus on the experience of the developer, rather than just being another tool that runs and generates a bunch of PDFs and then throws them back to say, “Here's everything you did wrong.”We want to say to developers, “Here's what you could do better. Here's how that default in a CloudFormation template that leads to your bucket being, you know, wide open on the internet could be changed. Here's the remediation that you could introduce.” And if we do that at the right moment, which is inside that developer workflow, inside the IDE, on their local machine, before that gets deployed, there's a much greater chance that remediation is going to be implemented and it's going to happen much more cheaply, right? Because you no longer have to do the round trip all the way out to the cloud and back.So, the cloud part of it fundamentally means completing that story, recognizing that once things do get deployed, there's a lot of valuable context that's happening out there that a developer can really take advantage of. They can say, “Wait a minute. Not only do I have a Log4Shell vulnerability, right, in one of my open-source dependencies, but that artifact, that application is actually getting deployed to a VPC that has ingress from the internet,” right? So, not only do I have remote code execution in my application, but it's being put in an enclave that actually allows it to be exploited. You can only know that if you're actually looking at what's really happening in the cloud, right?So, not only does Snyk cloud allows us to provide an additional layer of security by looking at what's misconfigured in that cloud environment and help your developers make remediations by saying, “Here's the actual IAC file that caused that infrastructure to come into existence,” but we can also say, here's how that affects the risk of other kinds of vulnerabilities at different layers in the stack, right? Because it's all software; it's all connected. Very rarely does a vulnerability translate one-to-one into risk, right? They're compound because modern software is compound. And I think what developers lack is the tooling that fits into their workflow that understands what it means to be a software engineer and actually helps them make better choices rather than punishing them after the fact for guessing and making bad ones.Corey: That sounds awesome at a very high level. It is very aligned with how executives and decision-makers think about a lot of these things. Let's get down to brass tacks for a second. Assume that I am the type of developer that I am in real life, by which I mean shitty. What am I going to wind up attempting to do that Snyk will flag and, in other words, protect me from myself and warn me that I'm about to commit a dumb?Clinton: First of all, I would say, look, there's no such thing as a non-shitty developer, right? And I built software for 20 years and I decided that's really hard. What's a lot easier is talking about building software for a living. So, that's what I do now. But fundamentally, the reason I'm at Snyk, is I want to help people who are in the kinds of jobs that I had for a very long time, which is to say, you have a tremendous amount of anxiety because you recognize that the success of the organization rests on your shoulders, and you're making hundreds, if not thousands of decisions every day without the right context to understand fully how the results of that decision is going to affect the organization that you work for.So, I think every developer in the world has to deal with this constant cognitive dissonance of saying, “I don't know that this is right, but I have to do it anyway because I need to clear that ticket because that release needs to get into production.” And it becomes really easy to short-sightedly do things like pull an open-source dependency without checking whether it has any CVEs associated with it because that's the version that's easiest to implement with your code that already exists. So, that's one piece. Snyk Open Source, designed to traverse that entire tree of dependencies in open-source all the way down, all the hundreds and thousands of packages that you're pulling in to say, not only, here's a vulnerability that you should really know is going to end up in your application when it's built, but also here's what you can do about it, right? Here's the upgrade you can make, here's the minimum viable change that actually gets you out of this problem, and to do so when it's in the right context, which is in you know, as you're making that decision for the first time, right, inside your developer environment.That also applies to things like container vulnerabilities, right? I have even less visibility into what's happening inside a container than I do inside my application. Because I know, say, I'm using an Ubuntu or a Red Hat base image. I have no idea, what are all the Linux packages that are on it, let alone what are the vulnerabilities associated with them, right? So, being able to detect, I've got a version of OpenSSL 3.0 that has a potentially serious vulnerability associated with it before I've actually deployed that container out into the cloud very much helps me as a developer.Because I'm limiting the rework or the refactoring I would have to do by otherwise assuming I'm making a safe choice or guessing at it, and then only finding out after I've written a bunch more code that relies on that decision, that I have to go back and change it, and then rewrite all of the things that I wrote on top of it, right? So, it's the identifying the layer in the stack where that risk could be introduced, and then also seeing how it's affected by all of those other layers because modern software is inherently complex. And that complexity is what drives both the risk associated with it, and also things like efficiency, which I know your audience is, for good reason, very concerned about.Corey: I'm going to challenge you on aspect of this because on the tin, the way you describe it, it sounds like, “Oh, I already have something that does that. It's the GitHub Dependabot story where it winds up sending me a litany of complaints every week.” And we are talking, if I did nothing other than read this email in that day, that would be a tremendously efficient processing of that entire thing because so much of it is stuff that is ancient and archived, and specific aspects of the vulnerabilities are just not relevant. And you talk about the OpenSSL 3.0 issues that just recently came out.I have no doubt that somewhere in the most recent email I've gotten from that thing, it's buried two-thirds of the way down, like all the complaints like the dishwasher isn't loaded, you forgot to take the trash out, that baby needs a change, the kitchen is on fire, and the vacuuming, and the r—wait, wait. What was that thing about the kitchen? Seems like one of those things is not like the others. And it just gets lost in the noise. Now, I will admit to putting my thumb a little bit on the scale here because I've used Snyk before myself and I know that you don't do that. How do you avoid that trap?Clinton: Great question. And I think really, the key to the story here is, developers need to be able to prioritize, and in order to prioritize effectively, you need to understand the context of what happens to that application after it gets deployed. And so, this is a key part of why getting the data out of the cloud and bringing it back into the code is so important. So, for example, take an OpenSSL vulnerability. Do you have it on a container image you're using, right? So, that's question number one.Question two is, is there actually a way that code can be accessed from the outside? Is it included or is it called? Is the method activated by some other package that you have running on that container? Is that container image actually used in a production deployment? Or does it just go sit in a registry and no one ever touches it?What are the conditions required to make that vulnerability exploitable? You look at something like Spring Shell, for example, yes, you need a certain version of spring-beans in a JAR file somewhere, but you also need to be running a certain version of Tomcat, and you need to be packaging those JARs inside a WAR in a certain way.Corey: Exactly. I have a whole bunch of Lambda functions that provide the pipeline system that I use to build my newsletter every week, and I get screaming concerns about issues in, for example, a version of the markdown parser that I've subverted. Yeah, sure. I get that, on some level, if I were just giving it random untrusted input from the internet and random ad hoc users, but I'm not. It's just me when I write things for that particular Lambda function.And I'm not going to be actively attempting to subvert the thing that I built myself and no one else should have access to. And looking through the details of some of these things, it doesn't even apply to the way that I'm calling the libraries, so it's just noise, for lack of a better term. It is not something that basically ever needs to be adjusted or fixed.Clinton: Exactly. And I think cutting through that noise is so key to creating developer trust in any kind of tool that scanning an asset and providing you what, in theory, are a list of actionable steps, right? I need to be able to understand what is the thing, first of all. There's a lot of tools that do that, right, and we tend to mock them by saying things like, “Oh, it's just another PDF generator. It's just another thousand pages that you're never going to read.”So, getting the information in the right place is a big part of it, but filtering out all of the noise by saying, we looked at not just one layer of the stack, but multiple layers, right? We know that you're using this open-source dependency and we also know that the method that contains the vulnerability is actively called by your application in your first-party code because we ran our static analysis tool against that. Furthermore, we know because we looked at your cloud context, we connected to your AWS API—we're big partners with AWS and very proud of that relationship—but we can tell that there's inbound internet access available to that service, right? So, you start to build a compound case that maybe this is something that should be prioritized, right? Because there's a way into the asset from the outside world, there's a way into the vulnerable functions through the labyrinthine, you know, spaghetti of my code to get there, and the conditions required to exploit it actually exist in the wild.But you can't just run a single tool; you can't just run Dependabot to get that prioritization. You actually have to look at the entire holistic application context, which includes not just your dependencies, but what's happening in the container, what's happening in your first-party, your proprietary code, what's happening in your IAC, and I think most importantly for modern applications, what's actually happening in the cloud once it gets deployed, right? And that's sort of the holy grail of completing that loop to bring the right context back from the cloud into code to understand what change needs to be made, and where, and most importantly why. Because it's a priority that actually translates into organizational risk to get a developer to pay attention, right? I mean, that is the key to I think any security concern is how do you get engineering mindshare and trust that this is actually what you should be paying attention to and not a bunch of rework that doesn't actually make your software more secure?Corey: One of the challenges that I see across the board is that—well, let's back up a bit here. I have in previous episodes talked in some depth about my position that when it comes to the security of various cloud providers, Google is number one, and AWS is number two. Azure is a distant third because it figures out what Crayons tastes the best; I don't know. But the reason is not because of any inherent attribute of their security models, but rather that Google massively simplifies an awful lot of what happens. It automatically assumes that resources in the same project should be able to talk to one another, so I don't have to painstakingly configure that.In AWS-land, all of this must be done explicitly; no one has time for that, so we over-scope permissions massively and never go back and rein them in. It's a configuration vulnerability more than an underlying inherent weakness of the platform. Because complexity is the enemy of security in many respects. If you can't fit it all in your head to reason about it, how can you understand the security ramifications of it? AWS offers a tremendous number of security services. Many of them, when taken in some totality of their pricing, cost more than any breach, they could be expected to prevent. Adding more stuff that adds more complexity in the form of Snyk sounds like it's the exact opposite of what I would want to do. Change my mind.Clinton: I would love to. I would say, fundamentally, I think you and I—and by ‘I,' I mean Snyk and you know, Corey Quinn Enterprises Limited—I think we fundamentally have the same enemy here, right, which is the cyclomatic complexity of software, right, which is how many different pathways do the bits have to travel down to reach the same endpoint, right, the same goal. The more pathways there are, the more risk is introduced into your software, and the more inefficiency is introduced, right? And then I know you'd love to talk about how many different ways is there to run a container on AWS, right? It's either 30 or 400 or eleventy-million.I think you're exactly right that that complexity, it is great for, first of all, selling cloud resources, but also, I think, for innovating, right, for building new kinds of technology on top of that platform. The cost that comes along with that is a lack of visibility. And I think we are just now, as we approach the end of 2022 here, coming to recognize that fundamentally, the complexity of modern software is beyond the ability of a single engineer to understand. And that is really important from a security perspective, from a cost control perspective, especially because software now creates its own infrastructure, right? You can't just now secure the artifact and secure the perimeter that it gets deployed into and say, “I've done my job. Nobody can breach the perimeter and there's no vulnerabilities in the thing because we scanned it and that thing is immutable forever because it's pets, not cattle.”Where I think the complexity story comes in is to recognize like, “Hey, I'm deploying this based on a quickstart or CloudFormation template that is making certain assumptions that make my job easier,” right, in a very similar way that choosing an open-source dependency makes my job easier as a developer because I don't have to write all of that code myself. But what it does mean is I lack the visibility into, well hold on. How many different pathways are there for getting things done inside this dependency? How many other dependencies are brought on board? In the same way that when I create an EKS cluster, for example, from a CloudFormation template, what is it creating in the background? How many VPCs are involved? What are the subnets, right? How are they connected to each other? Where are the potential ingress points?So, I think fundamentally, getting visibility into that complexity is step number one, but understanding those pathways and how they could potentially translate into risk is critically important. But that prioritization has to involve looking at the software holistically and not just individual layers, right? I think we lose when we say, “We ran a static analysis tool and an open-source dependency scanner and a container scanner and a cloud config checker, and they all came up green, therefore the software doesn't have any risks,” right? That ignores the fundamental complexity in that all of these layers are connected together. And from an adversaries perspective, if my job is to go in and exploit software that's hosted in the cloud, I absolutely do not see the application model that way.I see it as it is inherently complex and that's a good thing for me because it means I can rely on the fact that those engineers had tremendous anxiety, we're making a lot of guesses, and crossing their fingers and hoping something would work and not be exploitable by me, right? So, the only way I think we get around that is to recognize that our engineers are critical stakeholders in that security process and you fundamentally lack that visibility if you don't do your scanning until after the fact. If you take that traditional audit-based approach that assumes a very waterfall, legacy approach to building software, and recognize that, hey, we're all on this infinite loop race track now. We're deploying every three-and-a-half seconds, everything's automated, it's all built at scale, but the ability to do that inherently implies all of this additional complexity that ultimately will, you know, end up haunting me, right? If I don't do anything about it, to make my engineer stakeholders in, you know, what actually gets deployed and what risks it brings on board.Corey: This episode is sponsored in part by our friends at Uptycs. Attackers don't think in silos, so why would you have siloed solutions protecting cloud, containers, and laptops distinctly? Meet Uptycs - the first unified solution that prioritizes risk across your modern attack surface—all from a single platform, UI, and data model. Stop by booth 3352 at AWS re:Invent in Las Vegas to see for yourself and visit uptycs.com. That's U-P-T-Y-C-S.com. My thanks to them for sponsoring my ridiculous nonsense.Corey: When I wind up hearing you talk about this—I'm going to divert us a little bit because you're dancing around something that it took me a long time to learn. When I first started fixing AWS bills for a living, I thought that it would be mostly math, by which I mean arithmetic. That's the great secret of cloud economics. It's addition, subtraction, and occasionally multiplication and division. No, turns out it's much more psychology than it is math. You're talking in many aspects about, I guess, what I'd call the psychology of a modern cloud engineer and how they think about these things. It's not a technology problem. It's a people problem, isn't it?Clinton: Oh, absolutely. I think it's the people that create the technology. And I think the longer you persist in what we would call the legacy viewpoint, right, not recognizing what the cloud is—which is fundamentally just software all the way down, right? It is abstraction layers that allow you to ignore the fact that you're running stuff on somebody else's computer—once you recognize that, you realize, oh, if it's all software, then the problems that it introduces are software problems that need software solutions, which means that it must involve activity by the people who write software, right? So, now that you're in that developer world, it unlocks, I think, a lot of potential to say, well, why don't developers tend to trust the security tools they've been provided with, right?I think a lot of it comes down to the question you asked earlier in terms of the noise, the lack of understanding of how those pieces are connected together, or the lack of context, or not even frankly, caring about looking beyond the single-point solution of the problem that solution was designed to solve. But more importantly than that, not recognizing what it's like to build modern software, right, all of the decisions that have to be made on a daily basis with very limited information, right? I might not even understand where that container image I'm building is going in the universe, let alone what's being built on top of it and how much critical customer data is being touched by the database, that that container now has the credentials to access, right? So, I think in order to change anything, we have to back way up and say, problems in the cloud or software problems and we have to treat them that way.Because if we don't if we continue to represent the cloud as some evolution of the old environment where you just have this perimeter that's pre-existing infrastructure that you're deploying things onto, and there's a guy with a neckbeard in the basement who is unplugging cables from a switch and plugging them back in and that's how networking problems are solved, I think you missed the idea that all of these abstraction layers introduced the very complexity that needs to be solved back in the build space. But that requires visibility into what actually happens when it gets deployed. The way I tend to think of it is, there's this firewall in place. Everybody wants to say, you know, we're doing DevOps or we're doing DevSecOps, right? And that's a lie a hundred percent of the time, right? No one is actually, I think, adhering completely to those principles.Corey: That's why one of the core tenets of ClickOps is lying about doing anything in the console.Clinton: Absolutely, right? And that's why shadow IT becomes more and more prevalent the deeper you get into modern development, not less and less prevalent because it's fundamentally hard to recognize the entirety of the potential implications, right, of a decision that you're making. So, it's a lot easier to just go in the console and say, “Okay, I'm going to deploy one EC2 to do this. I'm going to get it right at some point.” And that's why every application that's ever been produced by human hands has a comment in it that says something like, “I don't know why this works but it does. Please don't change it.”And then three years later because that developer has moved on to another job, someone else comes along and looks at that comment and says, “That should really work. I'm going to change it.” And they do and everything fails, and they have to go back and fix it the original way and then add another comment saying, “Hey, this person above me, they were right. Please don't change this line.” I think every engineer listening right now knows exactly where that weak spot is in the applications that they've written and they're terrified of that.And I think any tool that's designed to help developers fundamentally has to get into the mindset, get into the psychology of what that is, like, of not fundamentally being able to understand what those applications are doing all of the time, but having to write code against them anyway, right? And that's what leads to, I think, the fear that you're going to get woken up because your pager is going to go off at 3 a.m. because the building is literally on fire and it's because of code that you wrote. We have to solve that problem and it has to be those people who's psychology we get into to understand, how are you working and how can we make your life better, right? And I really do think it comes with that the noise reduction, the understanding of complexity, and really just being humble and saying, like, “We get that this job is really hard and that the only way it gets better is to begin admitting that to each other.”Corey: I really wish that there were a better way to articulate a lot of these things. This the reason that I started doing a security newsletter; it's because cost and security are deeply aligned in a few ways. One of them is that you care about them a lot right after you failed to care about them sufficiently, but the other is that you've got to build guardrails in such a way that doing the right thing is easier than doing it the wrong way, or you're never going to gain any traction.Clinton: I think that's absolutely right. And you use the key term there, which is guardrails. And I think that's where in their heart of hearts, that's where every security professional wants to be, right? They want to be defining policy, they want to be understanding the risk posture of the organization and nudging it in a better direction, right? They want to be talking up to the board, to the executive team, and creating confidence in that risk posture, rather than talking down or off to the side—depending on how that org chart looks—to the engineers and saying, “Fix this, fix that, and then fix this other thing.” A, B, and C, right?I think the problem is that everyone in a security role or an organization of any size at this point, is doing 90% of the latter and only about 10% of the former, right? They're acting as gatekeepers, not as guardrails. They're not defining policy, they're spending all of their time creating Jira tickets and all of their time tracking down who owns the piece of code that got deployed to this pod on EKS that's throwing all these errors on my console, and how can I get the person to make a decision to actually take an action that stops these notifications from happening, right? So, all they're doing is throwing footballs down the field without knowing if there's a receiver there, right, and I think that takes away from the job that our security analysts really shouldn't be doing, which is creating those guardrails, which is having confidence that the policy they set is readily understood by the developers making decisions, and that's happening in an automated way without them having to create friction by bothering people all the time. I don't think security people want to be [laugh] hated by the development teams that they work with, but they are. And the reason they are is I think, fundamentally, we lack the tooling, we lack—Corey: They are the barrier method.Clinton: Exactly. And we lacked the processes to get the right intelligence in a way that's consumable by the engineers when they're doing their job, and not after the fact, which is typically when the security people have done their jobs.Corey: It's sad but true. I wish that there were a better way to address these things, and yet here we are.Clinton: If only there were better way to address these things.Corey: [laugh].Clinton: Look, I wouldn't be here at Snyk if I didn't think there were a better way, and I wouldn't be coming on shows like yours to talk to the engineering communities, right, people who have walked the walk, right, who have built those Terraform files that contain these misconfigurations, not because they're bad people or because they're lazy, or because they don't do their jobs well, but because they lacked the visibility, they didn't have the understanding that that default is actually insecure. Because how would I know that otherwise, right? I'm building software; I don't see myself as an expert on infrastructure, right, or on Linux packages or on cyclomatic complexity or on any of these other things. I'm just trying to stay in my lane and do my job. It's not my fault that the software has become too complex for me to understand, right?But my management doesn't understand that and so I constantly have white knuckles worrying that, you know, the next breach is going to be my fault. So, I think the way forward really has to be, how do we make our developers stakeholders in the risk being introduced by the software they write to the organization? And that means everything we've been talking about: it means prioritization; it means understanding how the different layers of the stack affect each other, especially the cloud pieces; it means an extensible platform that lets me write code against it to inject my own reasoning, right? The piece that we haven't talked about here is that risk calculation doesn't just involve technical aspects, there's also business intelligence that's involved, right? What are my critical applications, right, what actually causes me to lose significant amounts of money if those services go offline?We at Snyk can't tell that. We can't run a scanner to say these are your crown jewel services that can't ever go down, but you can know that as an organization. So, where we're going with the platform is opening up the extensible process, creating APIs for you to be able to affect that risk triage, right, so that as the creators have guardrails as the security team, you are saying, “Here's how we want our developers to prioritize. Here are all of the factors that go into that decision-making.” And then you can be confident that in their environment, back over in developer-land, when I'm looking at IntelliJ, or, you know, or on my local command line, I am seeing the guardrails that my security team has set for me and I am confident that I'm fixing the right thing, and frankly, I'm grateful because I'm fixing it at the right time and I'm doing it in such a way and with a toolset that actually is helping me fix it rather than just telling me I've done something wrong, right, because everything we do at Snyk focuses on identifying the solution, not necessarily identifying the problem.It's great to know that I've got an unencrypted S3 bucket, but it's a whole lot better if you give me the line of code and tell me exactly where I have to copy and paste it so I can go on to the next thing, rather than spending an hour trying to figure out, you know, where I put that line and what I actually have to change it to, right? I often say that the most valuable currency for a developer, for a software engineer, it's not money, it's not time, it's not compute power or anything like that, it's the right context, right? I actually have to understand what are the implications of the decision that I'm making, and I need that to be in my own environment, not after the fact because that's what creates friction within an organization is when I could have known earlier and I could have known better, but instead, I had to guess I had to write a bunch of code that relies on the thing that was wrong, and now I have to redo it all for no good reason other than the tooling just hadn't adapted to the way modern software is built.Corey: So, one last question before we wind up calling it a day here. We are now heavily into what I will term pre:Invent where we're starting to see a whole bunch of announcements come out of the AWS universe in preparation for what I'm calling Crappy Cloud Hanukkah this year because I'm spending eight nights in Las Vegas. What are you doing these days with AWS specifically? I know I keep seeing your name in conjunction with their announcements, so there's something going on over there.Clinton: Absolutely. No, we're extremely excited about the partnership between Snyk and AWS. Our vulnerability intelligence is utilized as one of the data sources for AWS Inspector, particularly around open-source packages. We're doing a lot of work around things like the code suite, building Snyk into code pipeline, for example, to give developers using that code suite earlier visibility into those vulnerabilities. And really, I think the story kind of expands from there, right?So, we're moving forward with Amazon, recognizing that it is, you know, sort of the de facto. When we say cloud, very often we mean AWS. So, we're going to have a tremendous presence at re:Invent this year, I'm going to be there as well. I think we're actually going to have a bunch of handouts with your face on them is my understanding. So, please stop by the booth; would love to talk to folks, especially because we've now released the Snyk Cloud product and really completed that story. So, anything we can do to talk about how that additional context of the cloud helps engineers because it's all software all the way down, those are absolutely conversations we want to be having.Corey: Excellent. And we will, of course, put links to all of these things in the [show notes 00:35:00] so people can simply click, and there they are. Thank you so much for taking all this time to speak with me. I appreciate it.Clinton: All right. Thank you so much, Corey. Hope to do it again next year.Corey: Clinton Herget, Field CTO at Snyk. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment telling me that I'm being completely unfair to Azure, along with your favorite tasting color of Crayon.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
An airhacks.fm conversation with Victor Orozco (@tuxtor) about: focus on Jakarta EE and devops, faster release cycles, Apache Cactus - the test container, daily releases and DevOps challenges, the perfect Sun servers, the deprecated Java EE deployment J2EE API, JSR 88: Java EE Application Deployment, onboarding of new developers is harder today, lean Java EE code is reusable in serverless world, Heroku and openshift started the serverless movement, blog post: How To Push Java EE 6 Applications To The Cloud In 5 Minutes, portability of Java workloads in the clouds, kubernetes vs. docker Compose, the costs of the clouds, or Kubernetes vs. serverless, Kubernetes on linode, Kubernetes is a monolith in the cloud, running private VPCs, the payara cloud and the rethinking of clustering, back to efficient monoliths, the plain Quarkus CDK lambda template, a quarkus AWS Lambda looks like an old Glassfish application, buying CPU with RAM, Java's dependencies are easy to manage, Java's serverless comeback, Victor Orozco on twitter: @tuxtor, Victor's company: nabenik
About CaseyCasey spends his days leveraging AWS to help organizations improve the speed at which they deliver software. With a background in software development, he has spent the past 20 years architecting, building, and supporting software systems for organizations ranging from startups to Fortune 500 enterprises.Links Referenced: “17 Ways to Run Containers in AWS”: https://www.lastweekinaws.com/blog/the-17-ways-to-run-containers-on-aws/ “17 More Ways to Run Containers on AWS”: https://www.lastweekinaws.com/blog/17-more-ways-to-run-containers-on-aws/ kubernetestheeasyway.com: https://kubernetestheeasyway.com snark.cloud/quinntainers: https://snark.cloud/quinntainers ECS Chargeback: https://github.com/gaggle-net/ecs-chargeback twitter.com/nektos: https://twitter.com/nektos TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored by our friends at Revelo. Revelo is the Spanish word of the day, and its spelled R-E-V-E-L-O. It means “I reveal.” Now, have you tried to hire an engineer lately? I assure you it is significantly harder than it sounds. One of the things that Revelo has recognized is something I've been talking about for a while, specifically that while talent is evenly distributed, opportunity is absolutely not. They're exposing a new talent pool to, basically, those of us without a presence in Latin America via their platform. It's the largest tech talent marketplace in Latin America with over a million engineers in their network, which includes—but isn't limited to—talent in Mexico, Costa Rica, Brazil, and Argentina. Now, not only do they wind up spreading all of their talent on English ability, as well as you know, their engineering skills, but they go significantly beyond that. Some of the folks on their platform are hands down the most talented engineers that I've ever spoken to. Let's also not forget that Latin America has high time zone overlap with what we have here in the United States, so you can hire full-time remote engineers who share most of the workday as your team. It's an end-to-end talent service, so you can find and hire engineers in Central and South America without having to worry about, frankly, the colossal pain of cross-border payroll and benefits and compliance because Revelo handles all of it. If you're hiring engineers, check out revelo.io/screaming to get 20% off your first three months. That's R-E-V-E-L-O dot I-O slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone that I had the pleasure of meeting at re:Invent last year, but we'll get to that story in a minute. Casey Lee is the CTO with a company called Gaggle, which is—as they frame it—saving lives. Now, that seems to be a relatively common position that an awful lot of different tech companies take. “We're saving lives here.” It's, “You show banner ads and some of them are attack platforms for JavaScript malware. Let's be serious here.” Casey, thank you for joining me, and what makes the statement that Gaggle saves lives not patently ridiculous?Casey: Sure. Thanks, Corey. Thanks for having me on the show. So Gaggle, we're ed-tech company. We sell software to school districts, and school districts use our software to help protect their students while the students use the school-issued Google or Microsoft accounts.So, we're looking for signs of bullying, harassment, self-harm, and potentially suicide from K-12 students while they're using these platforms. They will take the thoughts, concerns, emotions they're struggling with and write them in their school-issued accounts. We detect that and then we notify the school districts, and they get the students the help they need before they can do any permanent damage to themselves. We protect about 6 million students throughout the US. We ingest a lot of content.Last school year, over 6 billion files, about the equal number of emails ingested. We're looking for concerning content and then we have humans review the stuff that our machine learning algorithms detect and flag. About 40 million items had to go in front of humans last year, resulted in about 20,000 what we call PSSes. These are Possible Student Situations where students are talking about harming themselves or harming others. And that resulted in what we like to track as lives saved. 1400 incidents last school year where a student was dealing with suicide ideation, they were planning to take their own lives. We detect that and get them help within minutes before they can act on that. That's what Gaggle has been doing. We're using tech, solving tech problems, and also saving lives as we do it.Corey: It's easy to lob a criticism at some of the things you're alluding to, the idea of oh, you're using machine learning on student data for young kids, yadda, yadda, yadda. Look at the outcome, look at the privacy controls you have in place, and look at the outcomes you're driving to. Now, I don't necessarily trust the number of school administrations not to become heavy-handed and overbearing with it, but let's be clear, that's not the intent. That is not what the success stories you have alluded to. I've got to say I'm a fan, so thanks for doing what you're doing. I don't say that very often to people who work in tech companies.Casey: Cool. Thanks, Corey.Corey: But let's rewind a bit because you and I had passed like ships in the night on Twitter for a while, but last year at re:Invent something odd happened. First, my business partner procrastinated at getting his ticket—that's not the odd part; he does that a lot—but then suddenly ticket sales slammed shut and none were to be had anywhere. You reached out with a, “Hey, I have a spare ticket because someone can't go. Let me get it to you.” And I said, “Terrific. Let me pay you for the ticket and take you to dinner.”You said, “Yes on the dinner, but I'd rather you just look at my AWS bill and don't worry about the cost of the ticket.” “All right,” said I. I know a deal when I see one. We grabbed dinner at the Venetian. I said, “Bust out your laptop.” And you said, “Oh, I was kidding.” And I said, “Great. I wasn't. Bust it out.”And you went from laughing to taking notes in about the usual time that happens when I start looking at these things. But how was your recollection of that? I always tend to romanticize some of these things. Like, “And then everyone's restaurant just turned, stopped, and clapped the entire time.” Maybe that part didn't happen.Casey: Everything was right up until the clapping part. That was a really cool experience. I appreciate you walking through that with me. Yeah, we've got lots of opportunity to save on our AWS bill here at Gaggle, and in that little bit of time that we had together, I think I walked away with no more than a dozen ideas for where to shave some costs. The most obvious one, the first thing that you keyed in on, is we had RIs coming due that weren't really well-optimized and you steered me towards savings plans. We put that in place and we're able to apply those savings plans not just to our EC2 instances but also to our serverless spend as well.So, that was a very worthwhile and cost-effective dinner for us. The thing that was most surprising though, Corey, was your approach. Your approach to how to review our bill was not what I thought at all.Corey: Well, what did you expect my approach was going to be? Because this always is of interest to me. Like, do you expect me to, like, whip a portable machine learning rig out of my backpack full of GPUs or something?Casey: I didn't know if you had, like, some secret tool you were going to hit, or if nothing else, I thought you were going to go for the Cost Explorer. I spend a lot of time in Cost Explorer, that's my go-to tool, and you wanted nothing to do with Cost Exp—I think I was actually pulling up Cost Explorer for you and you said, “I'm not interested. Take me to the bills.” So, we went right to the billing dashboard, you started opening up the invoices, and I thought to myself, “I don't remember the last time I looked at an AWS invoice.” I just, it's noise; it's not something that I pay attention to.And I learned something, that you get a real quick view of both the cost and the usage. And that's what you were keyed in on, right? And you were looking at things relative to each other. “Okay, I have no idea about Gaggle or what they do, but normally, for a company that's spending x amount of dollars in EC2, why is your data transfer cost the way it is? Is that high or low?” So, you're looking for kind of relative numbers, but it was really cool watching you slice and dice that bill through the dashboard there.Corey: There are a few things I tie together there. Part of it is that this is sort of a surprising thing that people don't think about but start with big numbers first, rather than going alphabetically because I don't really care about your $6 Alexa for Business spend. I care a bit more about the $6 million, or whatever it happens to be at EC2—I'm pulling numbers completely out of the ether, let's be clear; I don't recall what the exact magnitude of your bill is and it's not relevant to the conversation.And then you see that and it's like, “Huh. Okay, you're spending $6 million on EC2. Why are you spending 400 bucks on S3? Seems to me that those two should be a little closer aligned. What's the deal here? Oh, God, you're using eight petabytes of EBS volumes. Oh, dear.”And just, it tends to lead to interesting stuff. Break it down by region, service, and use case—or usage type, rather—is what shows up on those exploded bills, and that's where I tend to start. It also is one of the easiest things to wind up having someone throw into a PDF and email my way if I'm not doing it in a restaurant with, you know, people clapping standing around.Casey: [laugh]. Right.Corey: I also want to highlight that you've been using AWS for a long time. You're a Container Hero; you are not bad at understanding the nuances and depths of AWS, so I take praise from you around this stuff as valuing it very highly. This stuff is not intuitive, it is deeply nuanced, and you have a business outcome you are working towards that invariably is not oriented day in day out around, “How do I get these services for less money than I'm currently paying?” But that is how I see the world and I tend to live in a very different space just based on the nature of what I do. It's sort of a case study and the advantage of specialization. But I know remarkably little about containers, which is how we wound up reconnecting about a week or so before we did this recording.Casey: Yeah. I saw your tweet; you were trying to run some workload—container workload—and I could hear the frustration on the other end of Twitter when you were shaking your fist at—Corey: I should not tweet angrily, and I did in this case. And, eh, every time I do I regret it. But it played well with the people, so that does help. I believe my exact comment was, “‘me: I've got this container. Run it, please.' ‘Google Cloud: Run. You got it, boss.' AWS has 17 ways to run containers and they all suck.”And that's painting with an overly broad brush, let's be clear, but that was at the tail end of two or three days of work trying to solve a very specific, very common, business problem, that I was just beating my head off of a wall again and again and again. And it took less than half an hour from start to finish with Google Cloud Run and I didn't have to think about it anymore. And it's one of those moments where you look at this and realize that the future is here, we just don't see it in certain ways. And you took exception to this. So please, let's dive in because 280 characters of text after half a bottle of wine is not the best context to have a nuanced discussion that leaves friendships intact the following morning.Casey: Nice. Well, I just want to make sure I understand the use case first because I was trying to read between the lines on what you needed, but let me take a guess. My guess is you got your source code in GitHub, you have a Docker file, and you want to be able to take that repo from GitHub and just have it continuously deployed somewhere in Run. And you don't want to have headaches with it; you just want to push more changes up to GitHub, Docker Build runs and updates some service somewhere. Am I right so far?Corey: Ish, but think a little further up the stack. It was in service of this show. So, this show, as people who are listening to this are probably aware by this point, periodically has sponsors, which we love: We thank them for participating in the ongoing support of this show, which empowers conversations like this. Sometimes a sponsor will come to us with, “Oh, and here's the URL we want to give people.” And it's, “First, you misspelled your company name from the common English word; there are three sublevels within the domain, and then you have a complex UTM tagging tracking co—yeah, you realize people are driving to work when they're listening to this?”So, I've built a while back a link shortener, snark.cloud because is it the shortest thing in the world? Not really, but it's easily understandable when I say that, and people hear it for what it is. And that's been running for a long time as an S3 bucket with full of redirects, behind CloudFront. So, I wind up adding a zero-byte object with a redirect parameter on it, and it just works.Now, the challenge that I have here as a business is that I am increasingly prolific these days. So, anything that I am not directly required to be doing, I probably shouldn't necessarily be the one to do it. And care and feeding of those redirect links is a prime example of this. So, I went hunting, and the things that I was looking for were, obviously, do the redirect. Now, if you pull up GitHub, there are hundreds of solutions here.There are AWS blog posts. One that I really liked and almost got working was Eric Johnson's three-part blog post on how to do it serverlessly, with API Gateway, and DynamoDB, no Lambdas required. I really liked aspects of what that was, but it was complex, I kept smacking into weird challenges as I went, and front end is just baffling to me. Because I needed a front end app for people to be able to use here; I need to be able to secure that because it turns out that if you just have a, anyone who stumbles across the URL can redirect things to other places, well, you've just empowered a whole bunch of spam email, and you're going to find that service abused, and everyone starts blocking it, and then you have trouble. Nothing lasts the first encounter with jerks.And I was getting more and more frustrated, and then I found something by a Twitter engineer on GitHub, with a few creative search terms, who used to work at Google Cloud. And what it uses as a client is it doesn't build any kind of custom web app. Instead, as a database, it uses not S3 objects, not Route 53—the ideal database—but a Google sheet, which sounds ridiculous, but every business user here knows how to use that.Casey: Sure.Corey: And it looks for the two columns. The first one is the slug after the snark.cloud, and the second is the long URL. And it has a TTL of five seconds on cache, so make a change to that spreadsheet, five seconds later, it's live. Everyone gets it, I don't have to build anything new, I just put it somewhere around the relevant people can access it, I gave him a tutorial and a giant warning on it, and everyone gets that. And it just works well. It was, “Click here to deploy. Follow the steps.”And the documentation was a little, eh, okay, I had to undo it once and redo it again. Getting the domain registered was getting—ported over took a bit of time, and there were some weird SSL errors as the certificates were set up, but once all of that was done, it just worked. And I tested the heck out of it, and cold starts are relatively low, and the entire thing fits within the free tier. And it is reminiscent of the magic that I first saw when I started working with some of the cloud providers services, years ago. It's been a long time since I had that level of delight with something, especially after three days of frustration. It's one of the, “This is a great service. Why are people not shouting about this from the rooftops?” That was my perspective. And I put it out on Twitter and oh, Lord, did I get comments. What was your take on it?Casey: Well, so my take was, when you're evaluating a platform to use for running your applications, how fast it can get you to Hello World is not necessarily the best way to go. I just assumed you're wrong. I assumed of the 17 ways AWS has to run containers, Corey just doesn't understand. And so I went after it. And I said, “Okay, let me see if I can find a way that solves his use case, as I understand it, through a quick tweet.”And so I tried to App Runner; I saw that App Runner does not meet your needs because you have to somehow get your Docker image pushed up to a repo. App Runner can take an image that's already been pushed up and deployed for you or it can build from source but neither of those were the way I understood your use case.Corey: Having used App Runner before via the Copilot CLI, it is the closest as best I can tell to achieving what I want. But also let's be clear that I don't believe there's a free tier; there needs to be a load balancer in front of it, so you're starting with 15 bucks a month for this thing. Which is not the end of the world. Had I known at the beginning that all of this was going to be there, I would have just signed up for a bit.ly account and called it good. But here we are.Casey: Yeah. I tried Copilot. Copilot is a great developer experience, but it also is just pulling together tons of—I mean just trying to do a Copilot service deploy, VPCs are being created and tons IAM roles are being created, code pipelines, there's just so much going on. I was like 20 minutes into it, and I said, “Yeah, this is not fitting the bill for what Corey was looking for.” Plus, it doesn't solve my the way I understood your use case, which is you don't want to worry about builds, you just want to push code and have new Docker images get built for you.Corey: Well, honestly, let's be clear here, once it's up and running, I don't want to ever have to touch the silly thing again.Casey: Right.Corey: And that's so far has been the case, after I forked the repo and made a couple of changes to it that I wanted to see. One of them was to render the entire thing case insensitive because I get that one wrong a lot, and the other is I wanted to change the permanent 301 redirect to a temporary 302 redirect because occasionally, sponsors will want to change where it goes in the fullness of time. And that is just fine, but I want to be able to support that and not have to deal with old cached data. So, getting that up and running was a bit of a challenge. But the way that it worked, was following the instructions in the GitHub repo.The developer environment had spun up in the Google's Cloud Shell was just spectacular. It prompted me for a few things and it told me step by step what to do. This is the sort of thing I could have given a basically non-technical user, and they would have had success with it.Casey: So, I tried it as well. I said, “Well, okay, if I'm going to respond to Corey here and challenge him on this, I need to try Cloud Run.” I had no experience with Cloud Run. I had a small example repo that loosely mapped what I understood you were trying to do. Within five minutes, I had Cloud Run working.And I was surprised anytime I pushed a new change, within 45 seconds the change was built and deployed. So, here's my conclusion, Corey. Google Cloud Run is great for your use case, and AWS doesn't have the perfect answer. But here's my challenge to you. I think that you just proved why there's 17 different ways to run containers on AWS, is because there's that many different types of users that have different needs and you just happen to be number 18 that hasn't gotten the right attention yet from AWS.Corey: Well, let's be clear, like, my gag about 17 ways to run containers on AWS was largely a joke, and it went around the internet three times. So, I wrote a list of them on the blog post of “17 Ways to Run Containers in AWS” and people liked it. And then a few months later, I wrote “17 More Ways to Run Containers on AWS” listing 17 additional services that all run containers.And my favorite email that I think I've ever received in feedback was from a salty AWS employee, saying that one of them didn't really count because of some esoteric reason. And it turns out that when I'm trying to make a point of you have a sarcastic number of ways to run containers, pointing out that well, one of them isn't quite valid, doesn't really shatter the argument, let's be very clear here. So, I appreciate the feedback, I always do. And it's partially snark, but there is an element of truth to it in that customers don't want to run containers, by and large. That is what they do in service of a business goal.And they want their application to run which is in turn to serve as the business goal that continues to abstract out into, “Remain a going concern via the current position the company stakes out.” In your case, it is saving lives; in my case, it is fixing horrifying AWS bills and making fun of Amazon at the same time, and in most other places, there are somewhat more prosaic answers to that. But containers are simply an implementation detail, to some extent—to my way of thinking—of getting to that point. An important one [unintelligible 00:18:20], let's be clear, I was very anti-container for a long time. I wrote a talk, “Heresy in the Church of Docker” that then was accepted at ContainerCon. It's like, “Oh, boy, I'm not going to leave here alive.”And the honest answer is many years later, that Kubernetes solves almost all the criticisms that I had with the downside of well, first, you have to learn Kubernetes, and that continues to be mind-bogglingly complex from where I sit. There's a reason that I've registered kubernetestheeasyway.com and repointed it to ECS, Amazon's container service that is not requiring you to cosplay as a cloud provider yourself. But even ECS has a number of challenges to it, I want to be very clear here. There are no silver bullets in this.And you're completely correct in that I have a large, complex environment, and the application is nuanced, and I'm willing to invest a few weeks in setting up the baseline underlying infrastructure on AWS with some of these services, ideally not all of them at once because that's something a lunatic would do, but getting them up and running. The other side of it, though, is that if I am trying to evaluate a cloud provider's handling of containers and how this stuff works, the reason that everyone starts with a Hello World-style example is that it delivers ideally, the meantime to dopamine. There's a reason that Hello World doesn't have 18 different dependencies across a bunch of different databases and message queues and all the other complicated parts of running a modern application. Because you just want to see how it works out of the gate. And if getting that baseline empty container that just returns the string ‘Hello World' is that complicated and requires that much work, my takeaway is not that this user experience is going to get better once I'd make the application itself more complicated.So, I find that off-putting. My approach has always been find something that I can get the easy, minimum viable thing up and running on, and then as I expand know that you'll be there to catch me as my needs intensify and become ever more complex. But if I can't get the baseline thing up and running, I'm unlikely to be super enthused about continuing to beat my head against the wall like, “Well, I'll just make it more complex. That'll solve the problem.” Because it often does not. That's my position.Casey: Yeah, I agree that dopamine hit is valuable in getting attached to want to invest into whatever tech stack you're using. The challenge is your second part of that. Your second part is will it grow with me and scale with me and support the complex edge cases that I have? And the problem I've seen is a lot of organizations will start with something that's very easy to get started with and then quickly outgrow it, and then come up with all sorts of weird Rube Goldberg-type solutions. Because they jumped all in before seeing—I've got kind of an example of that.I'm happy to announce that there's now 18 ways to run containers on AWS. Because in your use case, in the spirit of AWS customer obsession, I hear your use case, I've created an open-source project that I want to share called Quinntainers—Corey: Oh, no.Casey: —and it solves—yes. Quinntainers is live and is ready for the world. So, now we've got 18 ways to run containers. And if you have Corey's use case of, “Hey, here's my container. Run it for me,” now we've got a one command that you can run to get things going for you. I can share a link for you and you could check it out. This is a [unintelligible 00:21:38]—Corey: Oh, we're putting that in the [show notes 00:21:37], for sure. In fact, if you go to snark.cloud/quinntainers, you'll find it.Casey: You'll find it. There you go. The idea here was this: There is a real use case that you had, and I looked at AWS does not have an out-of-the-box simple solution for you. I agree with that. And Google Cloud Run does.Well, the answer would have been from AWS, “Well, then here, we need to make that solution.” And so that's what this was, was a way to demonstrate that it is a solvable problem. AWS has all the right primitives, just that use case hadn't been covered. So, how does Quinntainers work? Real straightforward: It's a command-line—it's an NPM tool.You just run a [MPX 00:22:17] Quinntainer, it sets up a GitHub action role in your AWS account, it then creates a GitHub action workflow in your repo, and then uses the Quinntainer GitHub action—reusable action—that creates the image for you; every time you push to the branch, pushes it up to ECR, and then automatically pushes up that new version of the image to App Runner for you. So, now it's using App Runner under the covers, but it's providing that nice developer experience that you are getting out of Cloud Run. Look, is container really the right way to go with running containers? No, I'm not making that point at all. But the point is it is a—Corey: It might very well be.Casey: Well, if you want to show a good Hello World experience, Quinntainer's the best because within 30 seconds, your app is now set up to continuously deliver containers into AWS for your very specific use case. The problem is, it's not going to grow for you. I mean that it was something I did over the weekend just for fun; it's not something that would ever be worthy of hitching up a real production workload to. So, the point there is, you can build frameworks and tools that are very good at getting that initial dopamine hit, but then are not going to be there for you unnecessarily as you mature and get more complex.Corey: And yet, I've tilted a couple of times at the windmill of integrating GitHub actions in anything remotely resembling a programmatic way with AWS services, as far as instance roles go. Are you using permanent credentials for this as stored secrets or are you doing the [OICD 00:23:50][00:23:50] handoff?Casey: OIDC. So, what happens is the tool creates the IAM role for you with the trust policy on GitHub's OIDC provider, sets all that up for you in your account, locks it down so that just your repo and your main branch is able to push or is able to assume the role, the role is set up just to allow deployments to App Runner and ECR repository. And then that's it. At that point, it's out of your way. And you're just git push, and couple minutes later, your updates are now running an App Runner for you.Corey: This episode is sponsored in part by our friends at Vultr. Optimized cloud compute plans have landed at Vultr to deliver lightning fast processing power, courtesy of third gen AMD EPYC processors without the IO, or hardware limitations, of a traditional multi-tenant cloud server. Starting at just 28 bucks a month, users can deploy general purpose, CPU, memory, or storage optimized cloud instances in more than 20 locations across five continents. Without looking, I know that once again, Antarctica has gotten the short end of the stick. Launch your Vultr optimized compute instance in 60 seconds or less on your choice of included operating systems, or bring your own. It's time to ditch convoluted and unpredictable giant tech company billing practices, and say goodbye to noisy neighbors and egregious egress forever.Vultr delivers the power of the cloud with none of the bloat. "Screaming in the Cloud" listeners can try Vultr for free today with a $150 in credit when they visit getvultr.com/screaming. That's G E T V U L T R.com/screaming. My thanks to them for sponsoring this ridiculous podcast.Corey: Don't undersell what you've just built. This is something that—is this what I would use for a large-scale production deployment, obviously not, but it has streamlined and made incredibly accessible things that previously have been very complex for folks to get up and running. One of the most disturbing themes behind some of the feedback I got was, at one point I said, “Well, have you tried running a Docker container on Lambda?” Because now it supports containers as a packaging format. And I said no because I spent a few weeks getting Lambda up and running back when it first came out and I've basically been copying and pasting what I got working ever since the way most of us do.And response is, “Oh, that explains a lot.” With the implication being that I'm just a fool. Maybe, but let's be clear, I am never the only person in the room who doesn't know how to do something; I'm just loud about what I don't know. And the failure mode of a bad user experience is that a customer feels dumb. And that's not okay because this stuff is complicated, and when a user has a bad time, it's a bug.I learned that in 2012. From Jordan Sissel the creator of LogStash. He has been an inspiration to me for the last ten years. And that's something I try to live by that if a user has a bad time, something needs to get fixed. Maybe it's the tool itself, maybe it's the documentation, maybe it's the way that GitHub repo's readme is structured in a way that just makes it accessible.Because I am not a trailblazer in most things, nor do I intend to be. I'm not the world's best engineer by a landslide. Just look at my code and you'd argue the fact that I'm an engineer at all. But if it's bad and it works, how bad is it? Is sort of the other side of it.So, my problem is that there needs to be a couple of things. Ignore for a second the aspect of making it the right answer to get something out of the door. The fact that I want to take this container and just run it, and you and I both reach for App Runner as the default AWS service that does this because I've been swimming in the AWS waters a while and you're a frickin AWS Container Hero, where it is expected that you know what most of these things do. For someone who shows up on the containers webpage—which by the way lists, I believe 15 ways to run containers on mobile and 19 ways to run containers on non-mobile, which is just fascinating in its own right—and it's overwhelming, it's confusing, and it's not something that makes it is abundantly clear what the golden path is. First, get it up and working, get it running, then you can add nuance and flavor and the rest, and I think that's something that's gotten overlooked in our mad rush to pretend that we're all Google engineers, circa 2012.Casey: Mmm. I think people get stressed out when they tried to run containers in AWS because they think, “What is that golden path?” You said golden path. And my advice to people is there is no golden path. And the great thing about AWS is they do continue to invest in the solutions they come up with. I'm still bitter about Google Reader.Corey: As am I.Casey: Yeah. I built so much time getting my perfect set of RSS feeds and then I had to find somewhere else to—with AWS, the different offerings that are available for running containers, those are there intentionally, it's not by accident. They're there to solve specific problems, so the trick is finding what works best for you and don't feel like one is better than the other is going to get more attention than others. And they each have different use cases.And I approach it this way. I've seen a couple of different people do some great flowcharts—I think Forrest did one, Vlad did one—on ways to make the decision on how to run your containers. And I break it down to three questions. I ask people first of all, where are you going to run these workloads? If someone says, “It has to be in the data center,” okay, cool, then ECS Anywhere or EKS Anywhere and we'll figure out if Kubernetes is needed.If they need specific requirements, so if they say, “No, we can run in the cloud, but we need privileged mode for containers,” or, “We need EBS volumes,” or, “We want really small container sizes,” like, less than a quarter-VCP or less than half a gig of RAM—or if you have custom log requirements, Fargate is not going to work for you, so you're going to run on EC2. Otherwise, run it on Fargate. But that's the first question. Figure out where are you going to run your containers. That leads to the second question: What's your control plane?But those are different, sort of related but different questions. And I only see six options there. That's App Runner for your control plane, LightSail for your control plane, Rosa if you're invested in OpenShift already, EKS either if you have Momentum and Kubernetes or you have a bunch of engineers that have a bunch of experience with Kubernetes—if you don't have either, don't choose it—or ECS. The last option Elastic Beanstalk, but let's leave that as a—if you're not currently invested in Elastic Beanstalk don't start today. But I look at those as okay, so I—first question, where am I going to run my containers? Second question, what do I want to use for my control plane? And there's different pros and cons of each of those.And then the third question, how do I want to manage them? What tools do I want to use for managing deployment? All those other tools like Copilot or App2Container or Proton, those aren't my control plane; those aren't where I run my containers; that's how I manage, deploy, and orchestrate all the different containers. So, I look at it as those three questions. But I don't know, what do you think of that, Corey?Corey: I think you're onto something. I think that is a terrific way of exploring that question. I would argue that setting up a framework like that—one or very similar—is what the AWS containers page should be, just coming from the perspective of what is the neophyte customer experience. On some level, you almost need a slide of have choose your level of experience ranging from, “What's a container?” To, “I named my kid Kubernetes because I make terrible life decisions,” and anywhere in between.Casey: Sure. Yeah, well, and I think that really dictates the control plane level. So, for example, LightSail, where does LightSail fit? To me, the value of LightSail is the simplicity. I'm looking at a monthly pricing: Seven bucks a month for a container.I don't know how [unintelligible 00:30:23] works, but I can think in terms of monthly pricing. And it's tailored towards a console user, someone just wants to click in, point to an image. That's a very specific user, there's thousands of customers that are very happy with that experience, and they use it. App Runner presents that scale to zero. That's one of the big selling points I see with App Runner. Likewise, with Google Cloud Run. I've got that scale to zero. I can't do that with ECS, or EKS, or any of the other platforms. So, if you've got something that has a ton of idle time, I'd really be looking at those. I would argue that I think I did the math, Google Cloud Run is about 30% more expensive than App Runner.Corey: Yeah, if you disregard the free tier, I think that's have it—running persistently at all times throughout the month, the drop-out cold starts would cost something like 40 some odd bucks a month or something like that. Don't quote me on it. Again and to be clear, I wound up doing this very congratulatory and complimentary tweet about them on I think it was Thursday, and then they immediately apparently took one look at this and said, “Holy shit. Corey's saying nice things about us. What do we do? What do we do?” Panic.And the next morning, they raised prices on a bunch of cloud offerings. Whew, that'll fix it. Like—Casey: [laugh].Corey: Di-, did you miss the direction you're going on here? No, that's the exact opposite of what you should be doing. But here we are. Interestingly enough, to tie our two conversation threads together, when I look at an AWS bill, unless you're using Fargate, I can't tell whether you're using Kubernetes or not because EKS is a small charge. And almost every case for the control plane, or Fargate under it.Everything else just manifests as EC2 spend. From the perspective of the cloud provider. If you're running a Kubernetes cluster, it is a single-tenant application that can have some very funky behaviors like cross-AZ chatter back and fourth because there's no internal mechanism to say talk to the free thing, rather than the two cents a gigabyte thing. It winds up spinning up and down in a bunch of different ways, and the behavior patterns, because of how placement works are not necessarily deterministic, depending upon workload. And that becomes something that people find odd when, “Okay, we look at our bill for a week, what can you say?”“Well, first question. Are you running Kubernetes at all?” And they're like, “Who invited these clowns?” Understand, we're not prying into your workloads for a variety of excellent legal and contractual reasons, here. We are looking at how they behave, and for specific workloads, once we have a conversation engineering team, yeah, we're going to dive in, but it is not at all intuitive from the outside to make any determination whether you're running containers, or whether you're running VMs that you just haven't done anything with in 20 years, or what exactly is going on. And that's just an artifact of the billing system.Casey: We ran into this challenge in Gaggle. We don't use EKS, we use ECS, but we have some shared clusters, lots of EC2 spend, hard to figure out which team is creating the services that's running that up. We actually ended up creating a tool—we open-sourced it—ECS Chargeback, and what it does is it looks at the CPU memory reservations for each task definition, and then prorates the overall charge of the ECS cluster, and then creates metrics in Datadog to give us a breakdown of cost per ECS service. And it also measures what we like to refer to as waste, right? Because if you're reserving four gigs of memory, but your utilization never goes over two gigs, we're paying for that reservation, but you're underutilizing.So, we're able to also show which services have the highest degree of waste, not just utilization, so it helps us go after it. But this is a hard problem. I'd be curious, how do you approach these shared ECS resources and slicing and dicing those bills?Corey: Everyone has a different approach, too. This there is no unifiable, correct answer. A previous show guest, Peter Hamilton, over at Remind had done something very similar, open-sourced a bunch of these things. Understanding what your spend is important on this, and it comes down to getting at the actual business concern because in some cases, effectively dead reckoning is enough. You take a look at the cluster that is really hard to attribute because it's a shared service. Great. It is 5% of your bill.First pass, why don't we just agree that it is a third for Service A, two-thirds for Service B, and we'll call it mostly good at that point? That can be enough in a lot of cases. With scale [laugh] you're just sort of hand-waving over many millions of dollars a year there. How about we get into some more depth? And then you start instrumenting and reporting to something, be it CloudWatch, be a Datadog, be it something else, and understanding what the use case is.In some cases, customers have broken apart shared clusters for that specific reason. I don't think that's necessarily the best approach from an engineering perspective, but again, this is not purely an engineering decision. It comes down to serving the business need. And if you're taking up partial credits on that cluster, for a tax credit for R&D for example, you want that position to be extraordinarily defensible, and spending a few extra dollars to ensure that it is the right business decision. I mean, again, we're pure advisory; we advise customers on what we would do in their position, but people often mistake that to be we're going to go for the lowest possible price—bad idea, or that we're going to wind up doing this from a purely engineering-centric point of view.It's, be aware of that in almost every case, with some very notable weird exceptions, the AWS Bill costs significantly less than the payroll expense that you have of people working on the AWS environment in various ways. People are more expensive, so the idea of, well, you can save a whole bunch of engineering effort by spending a bit more on your cloud, yeah, let's go ahead and do that.Casey: Yeah, good point.Corey: The real mark of someone who's senior enough is their answer to almost any question is, “It depends.” And I feel I've fallen into that trap as well. Much as I'd love to sit here and say, “Oh, it's really simple. You do X, Y, and Z.” Yeah… honestly, my answer, the simple answer, is I think that we orchestrate a cyber-bullying campaign against AWS through the AWS wishlist hashtag, we get people to harass their account managers with repeated requests for, “Hey, could you go ahead and [dip 00:36:19] that thing in—they give that a plus-one for me, whatever internal system you're using?”Just because this is a problem we're seeing more and more. Given that it's an unbounded growth problem, we're going to see it more and more for the foreseeable future. So, I wish I had a better answer for you, but yeah, that's stuff's super hard is honest, but it's also not the most useful answer for most of us.Casey: I'd love feedback from anyone from you or your team on that tool that we created. I can share link after the fact. ECS Chargeback is what we call it.Corey: Excellent. I will follow up with you separately on that. That is always worth diving into. I'm curious to see new and exciting approaches to this. Just be aware that we have an obnoxious talent sometimes for seeing these things and, “Well, what about”—and asking about some weird corner edge case that either invalidates the entire thing, or you're like, “Who on earth would ever have a problem like that?” And the answer is always, “The next customer.”Casey: Yeah.Corey: For a bounded problem space of the AWS bill. Every time I think I've seen it all, I just have to talk to one more customer.Casey: Mmm. Cool.Corey: In fact, the way that we approached your teardown in the restaurant is how we launched our first pass approach. Because there's value in something like that is different than the value of a six to eight-week-long, deep-dive engagement to every nook and cranny. And—Casey: Yeah, for sure. It was valuable to us.Corey: Yeah, having someone come in to just spend a day with your team, diving into it up one side and down the other, it seems like a weird thing, like, “How much good could you possibly do in a day?” And the answer in some cases is—we had a Honeycomb saying that in a couple of days of something like this, we wound up blowing 10% off their entire operating budget for the company, it led to an increased valuation, Liz Fong-Jones says that—on multiple occasions—that the company would not be what it was without our efforts on their bill, which is just incredibly gratifying to hear. It's easy to get lost in the idea of well, it's the AWS bill. It's just making big companies spend a little bit less to another big company. And that's not exactly, you know, saving the lives of K through 12 students here.Casey: It's opening up opportunities.Corey: Yeah. It's about optimizing for the win for everyone. Because now AWS gets a lot more money from Honeycomb than they would if Honeycomb had not continued on their trajectory. It's, you can charge customers a lot right now, or you can charge them a little bit over time and grow with them in a partnership context. I've always opted for the second model rather than the first.Casey: Right on.Corey: But here we are. I want to thank you for taking so much time out of well, several days now to argue with me on Twitter, which is always appreciated, particularly when it's, you know, constructive—thanks for that—Casey: Yeah.Corey: For helping me get my business partner to re:Invent, although then he got me that horrible puzzle of 1000 pieces for the Cloud-Native Computing Foundation landscape and now I don't ever want to see him again—so you know, that happens—and of course, spending the time to write Quinntainers, which is going to be at snark.cloud/quinntainers as soon as we're done with this recording. Then I'm going to kick the tires and send some pull requests.Casey: Right on. Yeah, thanks for having me. I appreciate you starting the conversation. I would just conclude with I think that yes, there are a lot of ways to run containers in AWS; don't let it stress you out. They're there for intention, they're there by design. Understand them.I would also encourage people to go a little deeper, especially if you got a significantly large workload. You got to get your hands dirty. As a matter of fact, there's a hands-on lab that a company called Liatrio does. They call it their Night Lab; it's a one-day free, hands-on, you run legacy monolithic job applications on Kubernetes, gives you first-hand experience on how to—gets all the way up into observability and doing things like Canary deployments. It's a great, great lab.But you got to do something like that to really get your hands dirty and understand how these things work. So, don't sweat it; there's not one right way. There's a way that will probably work best for each user, and just take the time and understand the ways to make sure you're applying the one that's going to give you the most runway for your workload.Corey: I will definitely dig into that myself. But I think you're right, I think you have nailed a point that is, again, a nuanced one and challenging to put in a rage tweet. But the services don't exist in a vacuum. They're not there because, despite the joke, someone wants to get promoted. It's because there are customer needs that are going on that, and this is another way of meeting those needs.I think there could be better guidance, but I also understand that there are a lot of nuanced perspectives here and that… hell is someone else's workflow—Casey: [laugh].Corey: —and there's always value in broadening your perspective a bit on those things. If people want to learn more about you and how you see the world, where's the best place to find you?Casey: Probably on Twitter: twitter.com/nektos, N-E-K-T-O-S.Corey: That might be the first time Twitter has been described as a best place for anything. But—Casey: [laugh].Corey: Thank you once again, for your time. It is always appreciated.Casey: Thanks, Corey.Corey: Casey Lee, CTO at Gaggle and AWS Container Hero. And apparently writing code in anger to invalidate my points, which is always appreciated. Please do more of that, folks. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, or the YouTube comments, which is always a great place to go reading, whereas if you've hated this podcast, please leave a five-star review in the usual places and an angry comment telling me that I'm completely wrong, and then launching your own open-source tool to point out exactly what I've gotten wrong this time.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About TimTimothy William Bray is a Canadian software developer, environmentalist, political activist and one of the co-authors of the original XML specification. He worked for Amazon Web Services from December 2014 until May 2020 when he quit due to concerns over the terminating of whistleblowers. Previously he has been employed by Google, Sun Microsystemsand Digital Equipment Corporation (DEC). Bray has also founded or co-founded several start-ups such as Antarctica Systems.Links Referenced: Textuality Services: https://www.textuality.com/ laugh]. So, the impetus for having this conversation is, you had a [blog post: https://www.tbray.org/ongoing/When/202x/2022/01/30/Cloud-Lock-In @timbray: https://twitter.com/timbray tbray.org: https://tbray.org duckbillgroup.com: https://duckbillgroup.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Vultr. Spelled V-U-L-T-R because they're all about helping save money, including on things like, you know, vowels. So, what they do is they are a cloud provider that provides surprisingly high performance cloud compute at a price that—while sure they claim its better than AWS pricing—and when they say that they mean it is less money. Sure, I don't dispute that but what I find interesting is that it's predictable. They tell you in advance on a monthly basis what it's going to going to cost. They have a bunch of advanced networking features. They have nineteen global locations and scale things elastically. Not to be confused with openly, because apparently elastic and open can mean the same thing sometimes. They have had over a million users. Deployments take less that sixty seconds across twelve pre-selected operating systems. Or, if you're one of those nutters like me, you can bring your own ISO and install basically any operating system you want. Starting with pricing as low as $2.50 a month for Vultr cloud compute they have plans for developers and businesses of all sizes, except maybe Amazon, who stubbornly insists on having something to scale all on their own. Try Vultr today for free by visiting: vultr.com/screaming, and you'll receive a $100 in credit. Thats V-U-L-T-R.com slash screaming.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today has been on a year or two ago, but today, we're going in a bit of a different direction. Tim Bray is a principal at Textuality Services.Once upon a time, he was a Distinguished Engineer slash VP at AWS, but let's be clear, he isn't solely focused on one company; he also used to work at Google. Also, there is scuttlebutt that he might have had something to do, at one point, with the creation of God's true language, XML. Tim, thank you for coming back on the show and suffering my slings and arrows.Tim: Oh, you're just fine. Glad to be here.Corey: [laugh]. So, the impetus for having this conversation is, you had a blog post somewhat recently—by which I mean, January of 2022—where you talked about lock-in and multi-cloud, two subjects near and dear to my heart, mostly because I have what I thought was a fairly countercultural opinion. You seem to have a very closely aligned perspective on this. But let's not get too far ahead of ourselves. Where did this blog posts come from?Tim: Well, I advised a couple of companies and one of them happens to be using GCP and the other happens to be using AWS and I get involved in a lot of industry conversations, and I noticed that multi-cloud is a buzzword. If you go and type multi-cloud into Google, you get, like, a page of people saying, “We will solve your multi-cloud problems. Come to us and you will be multi-cloud.” And I was not sure what to think, so I started writing to find out what I would think. And I think it's not complicated anymore. I think the multi-cloud is a reality in most companies. I think that many mainstream, non-startup companies are really worried about cloud lock-in, and that's not entirely unreasonable. So, it's a reasonable thing to think about and it's a reasonable thing to try and find the right balance between avoiding lock-in and not slowing yourself down. And the issues were interesting. What was surprising is that I published that blog piece saying what I thought were some kind of controversial things, and I got no pushback. Which was, you know, why I started talking to you and saying, “Corey, you know, does nobody disagree with this? Do you disagree with this? Maybe we should have a talk and see if this is just the new conventional wisdom.”Corey: There's nothing worse than almost trying to pick a fight, but no one actually winds up taking you up on the opportunity. That always feels a little off. Let's break it down into two issues because I would argue that they are intertwined, but not necessarily the same thing. Let's start with multi-cloud because it turns out that there's just enough nuance to—at least where I sit on this position—that whenever I tweet about it, I wind up getting wildly misinterpreted. Do you find that as well?Tim: Not so much. It's not a subject I have really had too much to say about, but it does mean lots of different things. And so it's not totally surprising that that happens. I mean, some people think when you say multi-cloud, you mean, “Well, I'm going to take my strategic application, and I'm going to run it in parallel on AWS and GCP because that way, I'll be more resilient and other good things will happen.” And then there's another thing, which is that, “Well, you know, as my company grows, I'm naturally going to be using lots of different technologies and that might include more than one cloud.” So, there's a whole spectrum of things that multi-cloud could mean. So, I guess when we talk about it, we probably owe it to our audiences to be clear what we're talking about.Corey: Let's be clear, from my perspective, the common definition of multi-cloud is whatever the person talking is trying to sell you at that point in time is, of course, what multi-cloud is. If it's a third-party dashboard, for example, “Oh, yeah, you want to be able to look at all of your cloud usage on a single pane of glass.” If it's a certain—well, I guess, certain not a given cloud provider, well, they understand if you go all-in on a cloud provider, it's probably not going to be them so they're, of course, going to talk about multi-cloud. And if it's AWS, where they are the 8000-pound gorilla in the space, “Oh, yeah, multi-clouds, terrible. Put everything on AWS. The end.” It seems that most people who talk about this have a very self-serving motivation that they can't entirely escape. That bias does reflect itself.Tim: That's true. When I joined AWS, which was around 2014, the PR line was a very hard line. “Well, multi-cloud that's not something you should invest in.” And I've noticed that the conversation online has become much softer. And I think one reason for that is that going all-in on a single cloud is at least possible when you're a startup, but if you're a big company, you know, a insurance company, a tire manufacturer, that kind of thing, you're going to be multi-cloud, for the same reason that they already have COBOL on the mainframe and Java on the old Sun boxes, and Mongo running somewhere else, and five different programming languages.And that's just the way big companies are, it's a consequence of M&A, it's a consequence of research projects that succeeded, one kind or another. I mean, lots of big companies have been trying to get rid of COBOL for decades, literally, [laugh] and not succeeding and doing that. So—Corey: It's ‘legacy' which is, of course, the condescending engineering term for, “It makes money.”Tim: And works. And so I don't think it's realistic to, as a matter of principle, not be multi-cloud.Corey: Let's define our terms a little more closely because very often, people like to pull strange gotchas out of the air. Because when I talk about this, I'm talking about—like, when I speak about it off the cuff, I'm thinking in terms of where do I run my containers? Where do I run my virtual machines? Where does my database live? But you can also move in a bunch of different directions. Where do my Git repositories live? What Office suite am I using? What am I using for my CRM? Et cetera, et cetera? Where do you draw the boundary lines because it's very easy to talk past each other if we're not careful here?Tim: Right. And, you know, let's grant that if you're a mainstream enterprise, you're running your Office automation on Microsoft, and they're twisting your arm to use the cloud version, so you probably are. And if you have any sense at all, you're not running your own Exchange Server, so let's assume that you're using Microsoft Azure for that. And you're running Salesforce, and that means you're on Salesforce's cloud. And a lot of other Software-as-a-Service offerings might be on AWS or Azure or GCP; they don't even tell you.So, I think probably the crucial issue that we should focus our conversation on is my own apps, my own software that is my core competence that I actually use to run the core of my business. And typically, that's the only place where a company would and should invest serious engineering resources to build software. And that's where the question comes, where should that software that I'm going to build run? And should it run on just one cloud, or—Corey: I found that when I gave a conference talk on this, in the before times, I had to have a ever lengthier section about, “I'm speaking in the general sense; there are specific cases where it does make sense for you to go in a multi-cloud direction.” And when I'm talking about multi-cloud, I'm not necessarily talking about Workload A lives on Azure and Workload B lives on AWS, through mergers, or weird corporate approaches, or shadow IT that—surprise—that's not revenue-bearing. Well, I guess we have to live with it. There are a lot of different divisions doing different things and you're going to see that a fair bit. And I'm not convinced that's a terrible idea as such. I'm talking about the single workload that we're going to spread across two or more clouds, intentionally.Tim: That's probably not a good idea. I just can't see that being a good idea, simply because you get into a problem of just terminology and semantics. You know, the different providers mean different things by the word ‘region' and the word ‘instance,' and things like that. And then there's the people problem. I mean, I don't think I personally know anybody who would claim to be able to build and deploy an application on AWS and also on GCP. I'm sure some people exist, but I don't know any of them.Corey: Well, Forrest Brazeal was deep in the AWS weeds and now he's the head of content at Google Cloud. I will credit him that he probably has learned to smack an API around over there.Tim: But you know, you're going to have a hard time hiring a person like that.Corey: Yeah. You can count these people almost as individuals.Tim: And that's a big problem. And you know, in a lot of cases, it's clearly the case that our profession is talent-starved—I mean, the whole world is talent-starved at the moment, but our profession in particular—and a lot of the decisions about what you can build and what you can do are highly contingent on who you can hire. And you can't hire a multi-cloud expert, well, you should not deploy, [laugh] you know, a multi-cloud application.Now, having said that, I just want to dot this i here and say that it can be made to kind of work. I've got this one company I advise—I wrote about it in the blog piece—that used to be on AWS and switched over to GCP. I don't even know why; this happened before I joined them. And they have a lot of applications and then they have some integrations with third-party partners which they implemented with AWS Lambda functions. So, when they moved over to GCP, they didn't stop doing that.So, this mission-critical latency-sensitive application of theirs runs on GCP that calls out to AWS to make calls into their partners' APIs and so on. And works fine. Solid as a rock, reliable, low latency. And so I talked to a person I know who knows over on the AWS side, and they said, “Oh, yeah sure, you know, we talked to those guys. Lots of people do that. We make sure, you know, the connections are low latency and solid.” So, technically speaking, it can be done. But for a variety of business reasons—maybe the most important one being expertise and who you can hire—it's probably just not a good idea.Corey: One of the areas where I think is an exception case is if you are a SaaS provider. Let's pick a big easy example: Snowflake, where they are a data warehouse. They've got to run their data warehousing application in all of the major clouds because that is where their customers are. And it turns out that if you're going to send a few petabytes into a data warehouse, you really don't want to be paying cloud egress rates to do it because it turns out, you can just bootstrap a second company for that much money.Tim: Well, Zoom would be another example, obviously.Corey: Oh, yeah. Anything that's heavy on data transfer is going to be a strange one. And there's being close to customers; gaming companies are another good example on this where a lot of the game servers themselves will be spread across a bunch of different providers, just purely based on latency metrics around what is close to certain customer clusters.Tim: I can't disagree with that. You know, I wonder how large a segment that is, of people who are, I think you're talking about core technology companies. Now, of the potential customers of the cloud providers, how many of them are core technology companies, like the kind we're talking about, who have such a need, and how many people who just are people who just want to run their manufacturing and product design and stuff. And for those, buying into a particular cloud is probably a perfectly sensible choice.Corey: I've also seen regulatory stories about this. I haven't been able to track them down specifically, but there is a pervasive belief that one interpretation of UK banking regulations stipulates that you have to be able to get back up and running within 30 days on a different cloud provider entirely. And also, they have the regulatory requirement that I believe the data remain in-country. So, that's a little odd. And honestly, when it comes to best practices and how you should architect things, I'm going to take a distinct backseat to legal requirements imposed upon you by your regulator. But let's be clear here, I'm not advising people to go and tell their auditors that they're wrong on these things.Tim: I had not heard that story, but you know, it sounds plausible. So, I wonder if that is actually in effect, which is to say, could a huge British banking company, in fact do that? Could they in fact, decamp from Azure and move over to GCP or AWS in 30 days? Boy.Corey: That is what one bank I spoke to over there was insistent on. A second bank I spoke to in that same jurisdiction had never heard of such a thing, so I feel like a lot of this is subject to auditor interpretation. Again, I am not an expert in this space. I do not pretend to be—I know I'm that rarest of all breeds: A white guy with a microphone in tech who admits he doesn't know something. But here we are.Tim: Yeah, I mean, I imagine it could be plausible if you didn't use any higher-level services, and you just, you know, rented instances and were careful about which version of Linux you ran and we're just running a bunch of Java code, which actually, you know, describes the workload of a lot of financial institutions. So, it should be a matter of getting… all the right instances configured and the JVM configured and launched. I mean, there are no… architecturally terrifying barriers to doing that. Of course, to do that, it would mean you would have to avoid using any of the higher-level services that are particular to any cloud provider and basically just treat them as people you rent boxes from, which is probably not a good choice for other business reasons.Corey: Which can also include things as seemingly low-level is load balancers, just based upon different provisioning modes, failure modes, and the rest. You're probably going to have a more consistent experience running HAProxy or nginx yourself to do it. But Tim, I have it on good authority that this is the old way of thinking, and that Kubernetes solves all of it. And through the power of containers and powers combining and whatnot, that frees us from being beholden to any given provider and our workloads are now all free as birds.Tim: Well, I will go as far as saying that if you are in the position of trying to be portable, probably using containers is a smart thing to do because that's a more tractable level of abstraction that does give you some insulation from, you know, which version of Linux you're running and things like that. The proposition that configuring and running Kubernetes is easier than configuring and running [laugh] JVM on Linux [laugh] is unsupported by any evidence I've seen. So, I'm dubious of the proposition that operating at the Kubernetes-level at the [unintelligible 00:14:42] level, you know, there's good reasons why some people want to do that, but I'm dubious of the proposition that really makes you more portable in an essential way.Corey: Well, you're also not the target market for Kubernetes. You have worked at multiple cloud providers and I feel like the real advantage of Kubernetes is people who happen to want to protect that they do so they can act as a sort of a cosplay of being their own cloud provider by running all the intricacies of Kubernetes. I'm halfway kidding, but there is an uncomfortable element of truth to that to some of the conversations I've had with some of its more, shall we say, fanatical adherents.Tim: Well, I think you and I are neither of us huge fans of Kubernetes, but my reasons are maybe a little different. Kubernetes does some really useful things. It really, really does. It allows you to take n VMs, and pack m different applications onto them in a way that takes reasonably good advantage of the processing power they have. And it allows you to have different things running in one place with different IP addresses.It sounds straightforward, but that turns out to be really helpful in a lot of ways. So, I'm actually kind of sympathetic with what Kubernetes is trying to be. My big gripe with it is that I think that good technology should make easy things easy and difficult things possible, and I think Kubernetes fails the first test there. I think the complexity that it involves is out of balance with the benefits you get. There's a lot of really, really smart people who disagree with me, so this is not a hill I'm going to die on.Corey: This is very much one of those areas where reasonable people can disagree. I find the complexity to be overwhelming; it has to collapse. At this point, it's finding someone who can competently run Kubernetes in production is a bit hard to do and they tend to be extremely expensive. You aren't going to find a team of those people at every company that wants to do things like this, and they're certainly not going to be able to find it in their budget in many cases. So, it's a challenging thing to do.Tim: Well, that's true. And another thing is that once you step onto the Kubernetes slope, you start looking about Istio and Envoy and [fabric 00:16:48] technology. And we're talking about extreme complexity squared at that point. But you know, here's the thing is, back in 2018 I think it was, in his keynote, Werner said that the big goal is that all the code you ever write should be application logic that delivers business value, which you know rep—Corey: Didn't CGI say the same thing? Didn't—like, isn't there, like, a long history dating back longer than I believe either of us have been alive have, “With this, all you're going to write is business logic.” That was the Java promise. That was the Google App Engine promise. Again, and again, we've had that carrot dangled in front of us, and it feels like the reality with Lambda is, the only code you will write is not necessarily business logic, it's getting the thing to speak to the other service you're trying to get it to talk to because a lot of these integrations are super finicky. At least back when I started learning how this stuff worked, they were.Tim: People understand where the pain points are and are indeed working on them. But I think we can agree that if you believe in that as a goal—which I still do; I mean, we may not have got there, but it's still a worthwhile goal to work on. We can agree that wrangling Istio configurations is not such a thing; it's not [laugh] directly value-adding business logic. To the extent that you can do that, I think serverless provides a plausible way forward. Now, you can be all cynical about, “Well, I still have trouble making my Lambda to talk to my other thing.” But you know, I've done that, and I've also deployed JVM on bare metal kind of thing.You know what? I'd rather do things at the Lambda level. I really rather would. Because capacity forecasting is a horribly difficult thing, we're all terrible at it, and the penalties for being wrong are really bad. If you under-specify your capacity, your customers have a lousy experience, and if you over-specify it, and you have an architecture that makes you configure for peak load, you're going to spend bucket-loads of money that you don't need to.Corey: “But you're then putting your availability in the cloud providers' hands.” “Yeah, you already were. Now, we're just being explicit about acknowledging that.”Tim: Yeah. Yeah, absolutely. And that's highly relevant to the current discussion because if you use the higher-level serverless function if you decide, okay, I'm going to go with Lambda and Dynamo and EventBridge and that kind of thing, well, that's not portable at all. I mean, APIs are totally idiosyncratic for AWS and GCP's equivalent, and Azure's—what do they call it? Permanent functions or something-a-rather functions. So yeah, that's part of the trade-off you have to think about. If you're going to do that, you're definitely not going to be multi-cloud in that application.Corey: And in many cases, one of the stated goals for going multi-cloud is that you can avoid the downtime of a single provider. People love to point at the big AWS outages or, “See? They were down for half a day.” And there is a societal question of what happens when everyone is down for half a day at the same time, but in most cases, what I'm seeing, your instead of getting rid of a single point of failure, introducing a second one. If either one of them is down your applications down, so you've doubled your outage surface area.On the rare occasions where you're able to map your dependencies appropriately, great. Are your third-party critical providers all doing the same? If you're an e-commerce site and Stripe processes your payments, well, they're public about being all-in on AWS. So, if you can't process payments, does it really matter that your website stays up? It becomes an interesting question. And those are the ones that you know about, let alone the third, fourth-order dependencies that are almost impossible to map unless everyone is as diligent as you are. It's a heavy, heavy lift.Tim: I'm going to push back a little bit. Now, for example, this company I'm advising that running GCP and calling out to Lambda is in that position; either GCP or Lambda goes off the air. On the other hand, if you've got somebody like Zoom, they're probably running parallel full stacks on the different cloud providers. And if you're doing that, then you can at least plausibly claim that you're in a good place because if Dynamo has an outage—and everything relies on Dynamo—then you shift your load over to GCP or Oracle [laugh] and you're still on the air.Corey: Yeah, but what is up as well because Zoom loves to sign me out on my desktop whenever I log into it on my laptop, and vice versa, and I wonder if that authentication and login system is also replicated full-stack to everywhere it goes, and what the fencing on that looks like, and how the communication between all those things works? I wouldn't doubt that it's possible that they've solved for this, but I also wonder how thoroughly they've really tested all of the, too. Not because I question them any; just because this stuff is super intricate as you start tracing it down into the nitty-gritty levels of the madness that consumes all these abstractions.Tim: Well, right, that's a conventional wisdom that is really wise and true, which is that if you have software that is alleged to do something like allow you to get going on another cloud, unless you've tested it within the last three weeks, it's not going to work when you need it.Corey: Oh, it's like a DR exercise: The next commit you make breaks it. Once you have the thing working again, it sits around as a binder, and it's a best guess. And let's be serious, a lot of these DR exercises presume that you're able to, for example, change DNS records on the fly, or be able to get a virtual machine provisioned in less than 45 minutes—because when there's an actual outage, surprise, everyone's trying to do the same things—there's a lot of stuff in there that gets really wonky at weird levels.Tim: A related similar exercise, which is people who want to be on AWS but want to be multi-region. It's actually, you know, a fairly similar kind of problem. If I need to be able to fail out of us-east-1—well, God help you, because if you need to everybody else needs to as well—but you know, would that work?Corey: Before you go multi-cloud go multi-region first. Tell me how easy it is because then you have full-feature parity—presumably—between everything; it should just be a walk in the park. Send me a postcard once you get that set up and I'll eat a bunch of words. And it turns out, basically, no one does.Tim: Mm-hm.Corey: Another area of lock-in around a lot of this stuff, and I think that makes it very hard to go multi-cloud is the security model of how does that interface with various aspects. In many cases, I'm seeing people doing full-on network overlays. They don't have to worry about the different security group models and VPCs and all the rest. They can just treat everything as a node sitting on the internet, and the only thing it talks to is an overlay network. Which is terrible, but that seems to be one of the only ways people are able to build things that span multiple providers with any degree of success.Tim: Well, that is painful because, much as we all like to scoff and so on, in the degree of complexity you get into there, it is the case that your typical public cloud provider can do security better than you can. They just can. It's a fact of life. And if you're using a public cloud provider and not taking advantage of their security offerings, infrastructure, that's probably dumb. But if you really want to be multi-cloud, you kind of have to, as you said.In particular, this gets back to the problem of expertise because it's hard enough to hire somebody who really understands IAM deeply and how to get that working properly, try and find somebody who can understand that level of thing on two different cloud providers at once. Oh, gosh.Corey: This episode is sponsored in part by LaunchDarkly. Take a look at what it takes to get your code into production. I'm going to just guess that it's awful because it's always awful. No one loves their deployment process. What if launching new features didn't require you to do a full-on code and possibly infrastructure deploy? What if you could test on a small subset of users and then roll it back immediately if results aren't what you expect? LaunchDarkly does exactly this. To learn more, visit launchdarkly.com and tell them Corey sent you, and watch for the wince.Corey: Another point you made in your blog post was the idea of lock-in, of people being worried that going all-in on a provider was setting them up to be, I think Oracle is the term that was tossed around where once you're dependent on a provider, what's to stop them from cranking the pricing knobs until you squeal?Tim: Nothing. And I think that is a perfectly sane thing to worry about. Now, in the short term, based on my personal experience working with, you know, AWS leadership, I think that it's probably not a big short-term risk. AWS is clearly aware that most of the growth is still in front of them. You know, the amount of all of it that's on the cloud is still pretty small and so the thing to worry about right now is growth.And they are really, really genuinely, sincerely focused on customer success and will bend over backwards to deal with the customers problems as they are. And I've seen places where people have negotiated a huge multi-year enterprise agreement based on Reserved Instances or something like that, and then realize, oh, wait, we need to switch our whole technology stack, but you've got us by the RIs and AWS will say, “No, no, it's okay. We'll tear that up and rewrite it and get you where you need to go.” So, in the short term, between now and 2025, would I worry about my cloud provider doing that? Probably not so much.But let's go a little further out. Let's say it's, you know, 2030 or something like that, and at that point, you know, Andy Jassy decided to be a full-time sports mogul, and Satya Narayana has gone off to be a recreational sailboat owner or something like that, and private equity operators come in and take very significant stakes in the public cloud providers, and get a lot of their guys on the board, and you have a very different dynamic. And you have something that starts to feel like Oracle where their priority isn't, you know, optimizing for growth and customer success; their priority is optimizing for a quarterly bottom line, and—Corey: Revenue extraction becomes the goal.Tim: That's absolutely right. And this is not a hypothetical scenario; it's happened. Most large companies do not control the amount of money they spend per year to have desktop software that works. They pay whatever Microsoft's going to say they pay because they don't have a choice. And a lot of companies are in the same situation with their database.They don't get to budget, their database budget. Oracle comes in and says, “Here's what you're going to pay,” and that's what you pay. You really don't want to be in a situation with your cloud, and that's why I think it's perfectly reasonable for somebody who is doing cloud transition at a major financial or manufacturing or service provider company to have an eye to this. You know, let's not completely ignore the lock-in issue.Corey: There is a significant scale with enterprise deals and contracts. There is almost always a contractual provision that says if you're going to raise a price with any cloud provider, there's a fixed period of time of notice you must give before it happens. I feel like the first mover there winds up getting soaked because everyone is going to panic and migrate in other directions. I mean, Google tried it with Google Maps for their API, and not quite Google Cloud, but also scared the bejesus out of a whole bunch of people who were, “Wait. Is this a harbinger of things to come?”Tim: Well, not in the short term, I don't think. And I think you know, Google Maps [is absurdly 00:26:36] underpriced. That's hellishly expensive service. And it's supposed to pay for itself by, you know, advertising on maps. I don't know about that.I would see that as the exception rather than the rule. I think that it's reasonable to expect cloud prices, nominally at least, to go on decreasing for at least the short term, maybe even the medium term. But that's—can't go on forever.Corey: It also feels to me, like having looked at an awful lot of AWS environments that if there were to be some sort of regulatory action or some really weird outage for a year that meant that AWS could not onboard a single new customer, their revenue year-over-year would continue to increase purely by organic growth because there is no forcing function that turns the thing off when you're done using it. In fact, they can migrate things around to hardware that works, they can continue building you for the things sitting there idle. And there is no governance path on that. So, on some level, winding up doing a price increase is going to cause a massive company focus on fixing a lot of that. It feels on some level like it is drawing attention to a thing that they don't really want to draw attention to from a purely revenue extraction story.When CentOS back-walked their ten-year support line two years, suddenly—and with an idea that it would drive [unintelligible 00:27:56] adoption. Well, suddenly, a lot of people looked at their environment, saw they had old [unintelligible 00:28:00] they weren't using. And massively short-sighted, massively irritated a whole bunch of people who needed that in the short term, but by the renewal, we're going to be on to Ubuntu or something else. It feels like it's going to backfire massively, and I'd like to imagine the strategist of whoever takes the reins of these companies is going to be smarter than that. But here we are.Tim: Here we are. And you know it's interesting you should mention regulatory action. At the moment, there are only three credible public cloud providers. It's not obvious the Google's really in it for the long haul, as last time I checked, they were claiming to maybe be breaking even on it. That's not a good number, you know? You'd like there to be more than that.And if it goes on like that, eventually, some politician is going to say, “Oh, maybe they should be regulated like public utilities,” because they kind of are right? And I would think that anybody who did get into Oracle-izing would be—you know, accelerate that happening. Having said that, we do live in the atmosphere of 21st-century capitalism, and growth is the God that must be worshiped at all costs. Who knows. It's a cloudy future. Hard to see.Corey: It really is. I also want to be clear, on some level, that with Google's current position, if they weren't taking a small loss at least, on these things, I would worry. Like, wait, you're trying to catch AWS and you don't have anything better to invest that money into than just well time to start taking profits from it. So, I can see both sides of that one.Tim: Right. And as I keep saying, I've already said once during this slot, you know, the total cloud spend in the world is probably on the order of one or two-hundred billion per annum, and global IT is in multiple trillions. So, [laugh] there's a lot more space for growth. Years and years worth of it.Corey: Yeah. The challenge, too, is that people are worried about this long-term strategic point of view. So, one thing you talked about in your blog post is the idea of using hosted open-source solutions. Like, instead of using Kinesis, you'd wind up using Kafka or instead of using DynamoDB you use their managed Cassandra service—or as I think of it Amazon Basics Cassandra—and effectively going down the path of letting them manage this thing, but you then have a theoretical Exodus path. Where do you land on that?Tim: I think that speaks to a lot of people's concerns, and I've had conversations with really smart people about that who like that idea. Now, to be realistic, it doesn't make migration easy because you've still got all the CI and CD and monitoring and management and scaling and alarms and alerts and paging and et cetera, et cetera, et cetera, wrapped around it. So, it's not as though you could just pick up your managed Kafka off AWS and drop a huge installation onto GCP easily. But at least, you know, your data plan APIs are the same, so a lot of your code would probably still run okay. So, it's a plausible path forward. And when people say, “I want to do that,” well, it does mean that you can't go all serverless. But it's not a totally insane path forward.Corey: So, one last point in your blog post that I think a lot of people think about only after they get bitten by it is the idea of data gravity. I alluded earlier in our conversation to data egress charges, but my experience has been that where your data lives is effectively where the rest of your cloud usage tends to aggregate. How do you see it?Tim: Well, it's a real issue, but I think it might perhaps be a little overblown. People throw the term petabytes around, and people don't realize how big a petabyte is. A petabyte is just an insanely huge amount of data, and the notion of transmitting one over the internet is terrifying. And there are lots of enterprises that have multiple petabytes around, and so they think, “Well, you know, it would take me 26 years to transmit that, so I can't.”And they might be wrong. The internet's getting faster all time. Did you notice? I've been able to move some—for purely personal projects—insane amounts of data, and it gets there a lot faster than you did. Secondly, in the case of AWS Snowmobile, we have an existence proof that you can do exabyte-ish scale data transfers in the time it takes to drive a truck across the country.Corey: Inbound only. Snowmobiles are not—at least according to public examples—are valid for Exodus.Tim: But you know, this is kind of place where regulatory action might come into play if what the people were doing was seen to be abusive. I mean, there's an existence proof you can do this thing. But here's another point. So, I suppose you have, like, 15 petabytes—that's an insane amount of data—displayed in your corporate application. So, are you actually using that to run the application, or is a huge proportion of that stuff just logs and data gathered of various kinds that's being used in analytics applications and AI models and so on?Do you actually need all that data to actually run your app? And could you in fact, just pick up the stuff you need for your app, move it to a different cloud provider from there and leave your analytics on the first one? Not a totally insane idea.Corey: It's not a terrible idea at all. It comes down to the idea as well of when you're trying to run a query against a bunch of that data, do you need all the data to transit or just the results of that query, as well? It's a question of, can you move the compute closer to the data as opposed to the data to where the compute lives?Tim: Well, you know and a lot of those people who have those huge data pools have it sitting on S3, and a lot of it migrated off into Glacier, so it's not as if you could get at it in milliseconds anyhow. I just ask myself, “How much data can anybody actually use in a day? In the course of satisfying some transaction requests from a customer?” And I think it's not petabyte. It just isn't.Now, there are—okay, there are exceptions. There's the intelligence community, there's the oil drilling community, there are some communities who genuinely will use insanely huge seas of data on a routine basis, but you know, I think that's kind of a corner case, so before you shake your head and say, “Ah, they'll never move because the data gravity,” you know… you need to prove that to me and I might be a little bit skeptical.Corey: And I think that is probably a very fair request. Just tell me what it is you're going to be doing here to validate the idea that is in your head because the most interesting lies I've found customers tell isn't intentionally to me or anyone else; it's to themselves. The narrative of what they think they're doing from the early days takes root, and never mind the fact that, yeah, it turns out that now that you've scaled out, maybe development isn't 80% of your cloud bill anymore. You learn things and your understanding of what you're doing has to evolve with the evolution of the applications.Tim: Yep. It's a fun time to be around. I mean, it's so great; right at the moment lock-in just isn't that big an issue. And let's be clear—I'm sure you'll agree with me on this, Corey—is if you're a startup and you're trying to grow and scale and prove you've got a viable business, and show that you have exponential growth and so on, don't think about lock-in; just don't go near it. Pick a cloud provider, pick whichever cloud provider your CTO already knows how to use, and just go all-in on them, and use all their most advanced features and be serverless if you can. It's the only sane way forward. You're short of time, you're short of money, you need growth.Corey: “Well, what if you need to move strategically in five years?” You should be so lucky. Great. Deal with it then. Or, “Well, what if we want to sell to retail as our primary market and they hate AWS?”Well, go all-in on a provider; probably not that one. Pick a different provider and go all in. I do not care which cloud any given company picks. Go with what's right for you, but then go all in because until you have a compelling reason to do otherwise, you're going to spend more time solving global problems locally.Tim: That's right. And we've never actually said this probably because it's something that both you and I know at the core of our being, but it probably needs to be said that being multi-cloud is expensive, right? Because the nouns and verbs that describe what clouds do are different in Google-land and AWS-land; they're just different. And it's hard to think about those things. And you lose the capability of using the advanced serverless stuff. There are a whole bunch of costs to being multi-cloud.Now, maybe if you're existentially afraid of lock-in, you don't care. But for I think most normal people, ugh, it's expensive.Corey: Pay now or pay later, you will pay. Wouldn't you ideally like to see that dollar go as far as possible? I'm right there with you because it's not just the actual infrastructure costs that's expensive, it costs something far more dear and expensive, and that is the cognitive expense of having to think about both of these things, not just how each cloud provider works, but how each one breaks. You've done this stuff longer than I have; I don't think that either of us trust a system that we don't understand the failure cases for and how it's going to degrade. It's, “Oh, right. You built something new and awesome. Awesome. How does it fall over? What direction is it going to hit, so what side should I not stand on?” It's based on an understanding of what you're about to blow holes in.Tim: That's right. And you know, I think particularly if you're using AWS heavily, you know that there are some things that you might as well bet your business on because, you know, if they're down, so is the rest of the world, and who cares? And, other things, eh, maybe a little chance here. So, understanding failure modes, understanding your stuff, you know, the cost of sharp edges, understanding manageability issues. It's not obvious.Corey: It's really not. Tim, I want to thank you for taking the time to go through this, frankly, excellent post with me. If people want to learn more about how you see things, and I guess how you view the world, where's the best place to find you?Tim: I'm on Twitter, just @timbray T-I-M-B-R-A-Y. And my blog is at tbray.org, and that's where that piece you were just talking about is, and that's kind of my online presence.Corey: And we will, of course, put links to it in the [show notes 00:37:42]. Thanks so much for being so generous with your time. It's always a pleasure to talk to you.Tim: Well, it's always fun to talk to somebody who has shared passions, and we clearly do.Corey: Indeed. Tim Bray principal at Textuality Services. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you then need to take to all of the other podcast platforms out there purely for redundancy, so you don't get locked into one of them.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
About AidanAidan is an AWS enthusiast, due in no small part to sharing initials with the cloud. He's been writing software for over 20 years and getting paid to do it for the last 10. He's still not sure what he wants to be when he grows up.Links: Stedi: https://www.stedi.com/ GitHub: https://github.com/aidansteele Blog posts: https://awsteele.com/ Ipv6-ghost-ship: https://github.com/aidansteele/ipv6-ghost-ship Twitter: https://twitter.com/__steele TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Couchbase Capella Database-as-a-Service is flexible, full-featured and fully managed with built in access via key-value, SQL, and full-text search. Flexible JSON documents aligned to your applications and workloads. Build faster with blazing fast in-memory performance and automated replication and scaling while reducing cost. Capella has the best price performance of any fully managed document database. Visit couchbase.com/screaminginthecloud to try Capella today for free and be up and running in three minutes with no credit card required. Couchbase Capella: make your data sing.Corey: Today's episode is brought to you in part by our friends at MinIO the high-performance Kubernetes native object store that's built for the multi-cloud, creating a consistent data storage layer for your public cloud instances, your private cloud instances, and even your edge instances, depending upon what the heck you're defining those as, which depends probably on where you work. It's getting that unified is one of the greatest challenges facing developers and architects today. It requires S3 compatibility, enterprise-grade security and resiliency, the speed to run any workload, and the footprint to run anywhere, and that's exactly what MinIO offers. With superb read speeds in excess of 360 gigs and 100 megabyte binary that doesn't eat all the data you've gotten on the system, it's exactly what you've been looking for. Check it out today at min.io/download, and see for yourself. That's min.io/download, and be sure to tell them that I sent you.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. I'm joined this week by someone who is honestly, feels like they're after my own heart. Aidan Steele by day is a serverless engineer at Stedi, but by night, he is an absolute treasure and a delight because not only does he write awesome third-party tooling and blog posts and whatnot around the AWS ecosystem, but he turns them into the most glorious, intricate, and technical shit posts that I think I've ever seen. Aidan, thank you for joining me.Aidan: Hi, Corey, thanks for having me. It's an honor to be here. Hopefully, we get to talk some AWS, and maybe also talk some nonsense as well.Corey: I would argue that in many ways, those things are one in the same. And one of the things I always appreciated about how you approach things is, you definitely seem to share that particular ethos with me. And there's been a lot of interesting content coming out from you in recent days. The thing that really wound up showing up on my radar in a big way was back at the start of January—2022, for those listening to this in the glorious future—about using IPv6 to use multi-factor auth, which it is so… I don't even have the adjectives to throw at this because, first it is ridiculous, two, it is effective, and three, it is just who thinks like that? What is this and what did you—what monstrosity have you built?Aidan: So, what did I end up calling it? I think it was ipv6-ghost-ship. And I think I called it that because I'd recently watched, oh, what was that series that was recently on Apple TV? Uh, the Isaac Asimov—Corey: If it's not Paw Patrol, I have no idea what it is because I have a four-year-old who is very insistent about these things. It is not so much a TV show as it is a way of life. My life is terrible. Please put me out of my misery.Aidan: Well, at least it's not Bluey. That's the one I usually hear about. That's Australia's greatest export. But it was one of the plot devices was a ship that would teleport around the place, and you could never predict where it was next. And so no one could access it. And I thought, “Oh, what about if I use the IPv6 address space?”Corey: Oh, Foundation?Aidan: That's the one. Foundation. That's how the name came about. The idea, honestly, it was because I saw—when was it?—sometime last year, AWS added support for those IP address prefixes. IPv4 prefixes were small; very useful and important, but IPv6 with more than 2 trillion IP addresses, per instance, I thought there's got to be fun to be had there.Corey: 281 trillion, I believe is the—Aidan: 281 trillion.Corey: Yeah. It is sarcastically large space. And that also has effectively, I would say in InfoSec sense, killed port scanning, the idea I'm going to scan the IP range and see what's there, just because that takes such a tremendous amount of time. Now here, in reality, you also wind up with people using compromised resources, and yeah, it turns out, I can absolutely scan trillions upon trillions of IP addresses as long as I'm using your AWS account and associated credit card in which to do it. But here in the real world, it is not an easily discoverable problem space.Aidan: Yeah. I made it as a novelty, really. I was looking for a reason to learn more about IPv6 and subnetting because it's the term I'd heard, a thing I didn't really understand, and the way I learn things is by trying to build them, realizing I have no idea what I'm doing, googling the error messages, reluctantly looking at the documentation, and then repeating until I've built something. And yeah, and then I built it, published it, and seemed to be pretty popular. It struck a chord. People retweeted it. It tickled your fancy. I think it spoke something in all of us who are trying not to take our jobs too seriously, you know, know we can have a little fun with this ludicrous tech that we get to play with.Corey: The idea being, you take the multi-factor auth code that your thing generates, and that is the last series of octets for the IP address you wind up going towards and that is such a large problem space that you're not going to find it in time, so whatever it is automatically connect to that particular IP address because that's the only one that's going to be listening for a 30 to 60-second span for the connection to be established. It is a great idea because SSH doesn't support this stuff natively. There's no good two-factor auth approach for this. And I love it. I'd be scared to death to run this in production for something that actually matters.And we also start caring a lot more about how accurate are the clocks on those instances, all of a sudden. But, oh, I just love the concept so much because it hits on the ethos of—I think—what so much of the cloud does were these really are fundamental building blocks that we can use to build incredible, awe-inspiring things that are globe-spanning, and also ridiculousness. And there's so much value of being able to do the same thing, sometimes at the same time.Aidan: Yeah, it's interesting, you mentioned, like, never using in prod, and I guess when I was building it, I thought, you know, that would be apparent. Like, “Yes, this is very neat, but surely no one's going to use it.” And I did see someone raised an issue on the GitHub project which was talking about clock skew. And I mentioned—Corey: Here at the bank where I'm running this in production, we're—Aidan: [laugh].Corey: —having some trouble with the clock. Yeah, it's—Aidan: You know, I mentioned that the underlying 2FA library did account for clock scheme 30 seconds either way, but it made me realize, I might need to put a disclaimer on the project. While the code is probably reasonably sound, I personally wouldn't run it in production, and it was more meant to be a piece of performance art or something to tickle one's fancy and to move on, not to roll it out. But I don't know, different strokes for different folks.Corey: I have gotten a lot better about calling out my ridiculous shitpost things when I do them. And the thing that really drove that home for me was talking about using DNS TXT records to store information about what server a virtual machine lives on—or container or whatnot—thus using Route 53 is a database. And that was a great gag, and then someone did a Reddit post of “This seems like a really good idea, so I'm going to start doing it, and I'm having these questions.”And at that point is like, “Okay, I've got a break character at that point.” And is, yeah, “Hi. That's my joke. Don't do it because X, Y, and Z are your failure modes, there are better tools for it. So yeah, there are ways you can do this with DNS, but it's not generally a great idea, and there are some risk factors to it. And okay, A, B, and C are the things you don't want to do, so let's instead do it in a halfway intelligent way because it's only funny if everyone's laughing. Otherwise, we fall into this trap of people take you seriously and they feel bad as a result when it doesn't work in production. So, calling it out as this is a joke tends to put a lot of that aside. It also keeps people from feeling left out.Aidan: Yeah. I realized that because the next novelty project I did a few days later—not sure if you caught it—it was a Rick Roll over ICMPv6 packets, where if you had run ping six to a certain IP range, it would return the lyrics to music's greatest treasure. So, I think that was hopefully a bit more self-evident that this should never be taken seriously. Who knows, I'm sure someone will find a use for it in prod.Corey: And I was looking through this, this is great. I love some of the stuff that you're doing because it's just fantastic. And I started digging a bit more to things you had done. And at that point, it was whoa, whoa, whoa, wait a minute. Back in 2020, you found an example of an issue with AWS's security model where CloudTrail would just start—if asked nicely—spewing other people's credential sets and CloudTrail events and whatnot into your account.And, A, that's kind of a problem. B, it was something that didn't make that big of a splash when it came out—I don't even think I linked to it at the time—and, C, it was examples of after the recent revelations around CloudFormation and Glue that the fine folks at Orca Security found out. That wasn't a one-off because you'd done this a year beforehand. We have now an established track record of cross-account data sharing and, potentially, exploits, and I'm looking at this and I got to level with you I felt incredibly naive because I had assumed that since we hadn't heard of this stuff in any real big sense that it simply didn't happen.So, when we heard about Azure; obviously, it's because Azure is complete clown shoes and the excellent people that AWS would never make these sorts of mistakes. Except we now have evidence that they absolutely did and didn't talk about it publicly. And I've got a level with you. I feel more than a little bit foolish, betrayed, naive for all this. What's your take on it?Aidan: Yeah, so just to clarify, it wasn't actually in your account. It was the new AWS custom resource execution model was you would upload a Lambda function that would run in an Amazon-managed account. And so that immediately set off my spidey sense because executing code in someone else's account seems fraught with peril. And so—Corey: Yeah, you can do all kinds of horrifying things there, like, use it to run containers.Aidan: Yeah. [laugh]. Thankfully, I didn't do anything that egregious. I stayed inside the Lambda function, but I look—I poked around at what credentials have had, and it would use CloudWatch to reinvoke itself and CloudWatch kept recording CloudTrail. And I won't go into all the details, but it ended up being that you could see credentials being recorded in CloudTrail in that account, and I could, sort of, funnel them out of there.When I found this, I was a little scared, and I don't think I'd reported an issue to AWS before, so I didn't want to go too far and do anything that could be considered malicious. So, I didn't actively seek out other people's credentials.Corey: Yeah, as a general rule, it's best once you discover things like that to do the right thing and report it, not proceed to, you know, inadvertently commit felonies.Aidan: Yeah. Especially because it was my first time. I felt better safe than sorry. So, I didn't see other credentials, but I had no reason to believe that, I wouldn't see it if I kept looking. I reported it to Amazon. Their security team was incredibly professional, made me feel very comfortable reporting it, and let me know when, you know, they'd remediated it, which was a matter of days later.But afterwards, it left me feeling a little surprised because I was able to publish about it, and a few people responded, you know, the sorts of people who pay close attention to the industry, but Amazon didn't publish anything as far as I was aware. And it changed the way I felt about AWS security, because like you, I sort of felt that AWS, more or less had a pretty perfect track record. They would have advisories about possible [Zen 00:12:04] exploits, and so on. But they'd never published anything about potential for compromise. And it makes me wonder how many of the things might have been reported in the past where either the third-party researcher either didn't end up publishing, or they published and it just disappeared into the blogosphere, and I hadn't seen it.Corey: They have a big earn trust principle over there, and I think that they always focus on the trust portion of it, but I think what got overlooked is the earn. When people are giving you trust that you haven't earned, on some level, the right thing to do is to call it out and be transparent around these things. Yes, I know, Wall Street's going to be annoyed and headlines, et cetera, et cetera, but I had always had the impression that had there been a cross-account vulnerability or a breach of some sort, they would communicate this and they would have their executives go on a speaking tour about it to explain how defense-in-depth mitigated some of it, and/or lessons learned, and/or what else we can learn. But it turns out that wasn't was happening at all. And I feel like they have been given trust that was unearned and now I am not happy with it.I suddenly have a lot more of a, I guess, skeptical position toward them as a result, and I have very little tolerance left for what has previously been a staple of the AWS security discussions, which is an executive getting on stage for a while and droning on about the shared responsibility model with the very strong implication that “Oh, yeah, we're fine. It's all on your side of the fence that things are going to break.” Yeah, turns out, that's not so true. Just you know, about the things on your side of the fence in a way that you don't about the things that are on theirs.Aidan: Yeah, it's an interesting one. Like, I think about it and I think, “Well, they never made an explicit promise that they would publish these things,” so, on one hand, I say to myself, “Oh, maybe that's on me for making that assumption.” But, I don't know, I feel like the way we felt was justified. Maybe naive in hindsight, but then, you know, I guess… I'm still not sure how to feel because of, like, I think about recent issues and how a couple of AWS Distinguished Engineers jumped on Twitter, and to their credit were extremely proactive in engaging with the community.But is that enough? It might be enough for say, to set my mind at ease or your mind at ease because we are, [laugh] to put it mildly, highly engaged, perhaps a little too engaged in the AWS space, but Twitter's very ephemeral. Very few of AWS's customers—Corey: Yeah, I can't link to tweets by distinguished engineers to present to an executive leadership team as an official statement from Amazon. I just can't.Aidan: Yeah. Yeah.Corey: And so the lesson we can take from this is okay, so “Well, we never actually said this.” “So, let me get this straight. You're content to basically let people assume whatever they want until they ask you an explicit question around these things. Really? Is that the lesson you want me to take from this? Because I have a whole bunch of very explicit questions that I will be asking you going forward, if that is in fact, your position. And you are not going to like the fact that I'm asking these questions.”Even if the answer is a hard no, people who did not have this context are going to wonder why are people asking those questions? It's a massive footgun here for them if that is the position that they intend to have. I want to be clear as well; this is also a messaging problem. It is not in any way, a condemnation of their excellent folks working on the security implementation themselves. This stuff is hard and those people are all-stars. I want to be very clear on this. It is purely around the messaging and positioning of the security posture.Aidan: Yeah, yeah. That's a good clarification because like you, my understanding that the service teams are doing a really stellar, above-average job, industry-wide, and the AWS Security Response Teams, I have absolute faith in them. It is a matter of messaging. And I guess what particularly brings it to front-of-mind is, it was earlier this month, or maybe it was last month, I received an email from a company called Sourcegraph. They do code search.I'm not even a customer of theirs yet, you know? I'm on a free trial, and I got an email that—I'm paraphrasing here—was something to the effect of, we discovered that it was possible for your code to appear in other customers' code search results. It was discovered by one of our own engineers. We found that the circumstances hadn't cropped up, but we wanted to tell you that it was possible. It didn't happen, and we're working on making sure it won't happen again.And I think about how radically different that is where they didn't have a third-party researcher forcing their hand; they could have very easily swept under the rug, but they were so proactive that, honestly, that's probably what's going to tipped me over to the edge into me becoming a customer. I mean, other than them having a great product. But yeah, it's a big contrast. It's how I like to see other companies work, especially Amazon.Corey: This episode is sponsored in part by our friends at Sysdig. Sysdig is the solution for securing DevOps. They have a blog post that went up recently about how an insecure AWS Lambda function could be used as a pivot point to get access into your environment. They've also gone deep in-depth with a bunch of other approaches to how DevOps and security are inextricably linked. To learn more, visit sysdig.com and tell them I sent you. That's S-Y-S-D-I-G dot com. My thanks to them for their continued support of this ridiculous nonsense.Corey: The two companies that I can think of that have had security problems have been CircleCI and Travis CI. Circle had an incredibly transparent early-on blog post, they engaged with customers on the forums, and they did super well. Travis basically denied, stonewalled for ages, and now the only people who use Travis are there because they haven't found a good way to get off of it yet. It is effectively DOA. And I don't think those two things are unrelated.Aidan: Yeah. No, that's a great point. Because you know, I've been in this industry long enough. You have to know that humans write code and humans make mistakes—I know I've made more than my fair share—and I'm not going to write off the company for making a mistake. It's entirely in their response. And yeah, you're right. That's why Circle is still a trustworthy business that should earn people's business and why Travis—why I recommend everyone move away from.Corey: Yeah, I like Orca Security as a company and as a product, but at the moment, I am not their customer. I am AWS's customer. So, why the hell am I hearing it from Orca and not AWS when this happens?Aidan: Yeah, yeah. It's… not great. On one hand, I'm glad I'm not in charge of finding a solution to this because I don't have the skills or the expertise to manage that communication. Because like I think you said in the past, there's a lot of different audiences that they have to communicate with. They have to communicate with the stock market, they have to communicate with execs, they have to communicate with developers, and each of those audiences demands a different level of detail, a different focus. And it's tricky. And how do you manage that? But, I don't know, I feel like you have an obligation to when people place that level of trust in you.Corey: It's just a matter of doing right by your customers, on some level.Aidan: Yeah.Corey: How long have you been working on an AWS-side environments? Clearly, this is not like, “Well, it's year two,” because if so I'm going to feel remarkably behind.Aidan: [laugh]. So, I've been writing code in some capacity or another for 20 years. It took about five years to get anyone to pay me to do so. But yeah, I guess the start of my professional career—and by ‘professional,' I want to use it in strictest term, means getting paid for money; not that I [laugh] am necessarily a professional—coincided with the launch of AWS. So, I don't hadn't experienced with the before times of data centers, never had to think about direct connect, but it means I have been using AWS since sometime in 2008.I was just looking at my bill earlier, I saw that my first bill was for $70. It was—I was using a C1xLarge, which was 80 cents an hour, and it had eight-core CPUs. And to put that in context at the time—Corey: Eight vCPUs, technically I believe.Aidan: An it basically is—Corey: —or were they using [eCPU 00:20:31] model back then?Aidan: Yeah, no, that was vCPUs. But to me, that was extraordinary. You know, I was somewhere just after high school. It was—the Netflix Prize was around. If you're not sure what that was, it was Netflix had this open competition where they said anyone who could improve upon their movie recommendation algorithm could win a million dollars.And obviously being a teenager, I had a massive ego and [laugh] no self-doubt, so I thought I could win this, but I just don't have enough CPUs or RAM on my laptop. And so when EC2 launched, and I could pay 80 cents an hour, rather than signing up for a 12-month contract with a colocation company, it was just a dream come true. I was able to run my terrible algorithms, but I could run them eight times faster. Unfortunately and obviously, I didn't win because it turns out, I'm not a world-class statistician. But—Corey: Common mistake. I make that mistake myself all the time.Aidan: [laugh]. Yeah. I mean, you know, I think I was probably 19 at the time, so I had—my ego did make me think I was one, but it turned out not to be so. But I think that was what really blew my mind was that me, a nobody, could create an account with Amazon and get access to these incredibly powerful machines for less than the dollar. And so I was hooked.Since then, I've worked at companies that are AWS customers since then. I've worked at places that have zero EC2 service, worked at places that have had thousands, and places in between. And it's got to a point, actually, where, I guess, my career is so entwined with AWS that one, my initials are actually AWS, but also—and this might sound ridiculous, and it's probably just a sign of my privilege—that I wouldn't consider working somewhere that used another cloud. Not—Corey: No, I think that's absolutely the right approach.Aidan: Yeah.Corey: I had a Twitter thread on this somewhat recently, and I'm going to turn it into a blog post because I got some pushback. If I were looking at doing something and I would come into the industry right now, my first choice would be Google Cloud because its developer experience is excellent. But I'm not coming to this without any experience. I have spent a decade or so learning not just how it was works, but also how it breaks, understanding the failure mode and what that's going to look like and what it's good at and what it's not. That's the valuable stuff for running things in a serious way.Aidan: Yeah. It's an interesting one. And I mean, for better or worse, AWS is big. I'm sure you will know much better than I do the exact numbers, but if a junior developer came to me and said, “Which cloud should I learn, or should I learn all of them?” I mean, you're right, Google Cloud does have a better developer experience, especially for new developers, but when I think about the sheer number of jobs that are available for developers, I feel like I would be doing them a disservice by not suggesting AWS, at least in Australia. It seems they've got such a huge footprint that you'll always be able to find a job working as an AWS-familiar engineer. It seems like that would be less the case with Google Cloud or Azure.Corey: Again, I am not sitting here, suggesting that anyone should, “Oh, clouds are insecure. We're going to run our own stuff in our own data centers.” That is ridiculous in this era. They are still going to do a better job of security than any of us will individually, let's be clear here. And it empowers and unlocks an awful lot of stuff.But with their privileged position as these hyperscale providers that are the default choice for building things, I think comes with a significant level of responsibility that I am displeased to discover that they've been abdicating. And I don't love that.Aidan: Yeah, it's an interesting one, right, because, like you're saying, they have access and the expertise that people doing it themselves will never match. So, you know, I'm never going to hesitate to recommend people use AWS on account security because your company's security posture will almost always be better for using AWS and following their guidelines, and so on. But yeah, like you say, with great power comes significant responsibility to earn trust and retain that trust by admitting and publicizing when mistakes are made.Corey: One last topic I want to get into with you is one that you and I have talked about very briefly elsewhere, that I feel like you and I are both relatively up-to-date on AWS intricacies. I think that we are both better than the average bear working with the platform. But I know that I feel this way, and I suspect you do too that VPCs have gotten confusing as hell. Is that just me? Am I a secret moron that no one bothered to ever tell me this, and I should update my own self-awareness?Aidan: [laugh]. Yeah, it's… I mean, that's been the story of my career with AWS. When I started, VPCs didn't exist. It was EC2 Classic—well, I guess at the time, it was just EC2—and it was simple. You launched an instance and you had an IP address.And then along came VPCs, and I think at the time, I thought something to the effect of “This seems like needless complexity. I'm not going to bother learning this. It will never be relevant.” In the end that wasn't true. I worked in much large deployments when VPCs made fantastic sense made a lot of things possible, but I still didn't go into the weeds.Since then, AWS has announced that EC2 Classic will be retired; an end of an era. I'm not personally still running anything in EC2 Classic, and I think they've done an incredible job of maintain support for this long, but VPC complexity has certainly been growing year-on-year since then. I recently was using the AWS console—like we all do and no one ever admits to—to edit a VPC subnet route table. And I clicked the drop-down box for a target, and I was overwhelmed by the number of options. There were NAT gateways, internet gateways, carrier gateways, I think there was a thing called a wavelength gateway, ENI, and… I [laugh] I think I was surprised because I just scroll through the list, and I thought, “Wow, that is a lot of different options. Why is that?”Especially because it's not so relevant to me. But I realized a big thing of what AWS has been doing lately is trying to make themselves available to people who haven't used the cloud yet. And they have these complicated networking needs, and it seems like they're trying to—reasonably successfully—make anything possible. But with that comes, you know, additional complexity.Corey: I appreciate that the capacity is there, but there has to be an abstraction model for getting rid of some of this complexity because otherwise, the failure mode is you wind up with this amazingly capable thing that can build marvels, but you also need to basically have a PhD in some of these things to wind up tying it all together. And if you bring someone else in to do it, then you have no idea how to run the thing. You're effectively a golden retriever trying to fly a space shuttle.Aidan: Yeah. It's interesting, like, clearly, they must be acutely aware of this because they have default VPCs, and for many use cases, that's all people should need. But as soon as you want, say a private subnet, then you need to either modify that default VPC or create a new one, and it's sort of going from 0 to 100 complexity extremely quickly because, you know, you need to create route tables to everyone's favorite net gateways, and it feels like the on-ramp needs to be not so steep. Not sure what the solution is, I hope they find one.Corey: As do I. I really want to thank you for taking the time to speak with me about so many of these things. If people want to learn more about what you're up to, where's the best place to find you?Aidan: Twitter's the best place. On Twitter, my username is @__Steele, which is S-T-E-E-L-E. From there, that's where I'll either—I'll at least speculate on the latest releases or link to some of the silly things I put on GitHub. Sometimes they're not so silly things. But yeah, that's where I can be found. And I'd love to chat to anyone about AWS. It's something I can geek out about all day, every day.Corey: And we will certainly include links to that in the [show notes 00:29:50]. Thank you so much for taking the time to speak with me today. I really appreciate it.Aidan: Well, thank you so much for having me. It's been an absolute delight.Corey: Aidan Steele, serverless engineer at Stedi, and shit poster extraordinaire. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an immediate request to correct the record about what I'm not fully understanding about AWS's piss-weak security communications.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.Announcer: This has been a HumblePod production. Stay humble.
In this episode Mike and Ken talk about the magic of software defined things and how skill crossover is becoming a thing of the future. Maybe history is repeating itself. Whether it's endpoint detection and response, physical security, disaster recovery, networks, or a firewall, it seems like everything has a software defined equivalent. Developers and Application Security engineers are being called on more and more to know things they didn't have to even 5 years ago. The team digs into this topic by looking at it through two lenses, what skills engineers need, and how software deals its own set of pros and cons in cloud and modern infrastructure.
Links: S3 Bucket Negligence Award: http://saharareporters.com/2022/01/10/exclusive-hacker-breaks-nimc-server-steals-over-three-million-national-identity-numbers Anyone in a VPC, any VPC, anywhere: https://Twitter.com/santosh_ankr/status/1481387630973493251 A disgruntled developer corrupts their own NPM libs ‘colors' and ‘faker', breaking thousands of apps: https://www.bleepingcomputer.com/news/security/dev-corrupts-npm-libs-colors-and-faker-breaking-thousands-of-apps/ “Top ten security best practices for securing backups in AWS”: https://aws.amazon.com/blogs/security/top-10-security-best-practices-for-securing-backups-in-aws/ Glue: https://aws.amazon.com/security/security-bulletins/AWS-2022-002/ CloudFormation: https://aws.amazon.com/security/security-bulletins/AWS-2022-001/ S3-credentials: https://simonwillison.net/2022/Jan/18/weeknotes/ TranscriptCorey: This is the AWS Morning Brief: Security Edition. AWS is fond of saying security is job zero. That means it's nobody in particular's job, which means it falls to the rest of us. Just the news you need to know, none of the fluff.Corey: This episode is sponsored in part by my friends at Thinkst Canary. Most companies find out way too late that they've been breached. Thinkst Canary changes this and I love how they do it. Deploy canaries and canary tokens in minutes, and then forget about them. What's great is then attackers tip their hand by touching them, giving you one alert, when it matters. I use it myself and I only remember this when I get the weekly update with a, “We're still here, so you're aware,” from them. It's glorious. There is zero admin overhead to this, there are effectively no false positives unless I do something foolish. Canaries are deployed and loved on all seven continents. You can check out what people are saying atcanary.love. And, their Kube config canary token is new and completely free as well. You can do an awful lot without paying them a dime, which is one of the things I love about them. It is useful stuff and not a, “Oh, I wish I had money.” It is spectacular. Take a look. That'scanary.love because it's genuinely rare to find a security product that people talk about in terms of love. It really is a neat thing to see.Canary.love. Thank you to Thinkst Canary for their support of my ridiculous, ridiculous nonsense.Corey: So, yesterday's episode put the boots to AWS, not so much for the issues that Orca Security uncovered, but rather for its poor communication around the topic. Now that that's done, let's look at the more mundane news from last week's cloud world. Every day is a new page around here, full of opportunity and possibility in equal measure.This week's S3 Bucket Negligence Award goes to the Nigerian government for exposing millions of their citizens to a third party who most assuredly did not follow coordinated disclosure guidelines. Whoops.There's an interesting tweet, and exploring it is still unfolding at time of this writing, but it looks that making an API Gateway ‘Private' doesn't mean, “To your VPCs,” but rather, “To anyone in a VPC, any VPC, anywhere.” This is evocative of the way that, “Any Authenticated AWS User,” for S3 buckets caused massive permissions issues industry-wide.And a periodic and growing concern is one of software supply chain—which is a fancy way of saying, “We're all built on giant dependency chains”—what happens when, say, a disgruntled developer corrupts their own NPM libs ‘colors' and ‘faker', breaking thousands of apps across the industry, including some of the AWS SDKs? How do we manage that risk? How do we keep developers gruntled?Corey: Are you building cloud applications with a distributed team? Check out Teleport, an open-source identity-aware access proxy for cloud resources. Teleport provides secure access for anything running somewhere behind NAT: SSH servers, Kubernetes clusters, internal web apps, and databases. Teleport gives engineers superpowers.Get access to everything via single sign-on with multi-factor, list and see all of SSH servers, Kubernetes clusters, or databases available to you in one place, and get instant access to them using tools you already have. Teleport ensures best security practices like role-based access, preventing data exfiltration, providing visibility, and ensuring compliance. And best of all, Teleport is open-source and a pleasure to use. Download Teleport at goteleport.com. That's goteleport.com.AWS had a couple of interesting things. The first is “Top ten security best practices for securing backups in AWS”. People really don't consider the security implications of their backups anywhere near seriously enough. It's not ‘live' but it's still got—by definition—a full set of your data just waiting to be harvested by nefarious types. Be careful with that.And of course, AWS had two security bulletins, one about its Glue issues, one about its CloudFormation issues. The former allowed cross-account access to other tenants. In theory. In practice, AWS did the responsible thing and kept every access event logged, going back for the full five years of the service's life. That's remarkably impressive.And lastly, I found an interesting tool called S3-credentials last week, and what it does is it helps generate tightly-scoped IAM policies that were previously limited to a single S3 bucket, but now are limited to a single prefix within that bucket. You can also make those credential sets incredibly short-lived. More things like this, please. I just tend to over-scope things way too much. And that's what happened Last Week in AWS: Security. Please feel free to reach out and tell me exactly what my problem is.Corey: Thank you for listening to the AWS Morning Brief: Security Edition with the latest in AWS security that actually matters. Please follow AWS Morning Brief on Apple Podcast, Spotify, Overcast—or wherever the hell it is you find the dulcet tones of my voice—and be sure to sign up for the Last Week in AWS newsletter at lastweekinaws.com.Announcer: This has been a HumblePod production. Stay humble.
TranscriptJesse: Welcome to Meanwhile in Security where I, your host Jesse Trucks, guides you to better security in the cloud.Announcer: If your mean time to WTF for a security alert is more than a minute, it's time to look at Lacework. Lacework will help you get your security act together for everything from compliance service configurations to container app relationships, all without the need for PhDs in AWS to write the rules. If you're building a secure business on AWS with compliance requirements, you don't really have time to choose between antivirus or firewall companies to help you secure your stack. That's why Lacework is built from the ground up for the cloud: low effort, high visibility, and detection. To learn more, visit lacework.com. That's lacework.com.Jesse: Don't be stupid. Focus on your real risks, not hacker movie risks. It is easy to get caught up in a type of advance for persistent threats and the latest in obscure attack methodologies to the point where you spend all of your energy and time hunting for these in your systems. This stuff is right out of the latest bad hacking movie. It's a colossal waste of time for most of us. Spend your time on learning and monitoring things based on your real risk, not your overblown sense of self-importance that the latest international crime ring of nation-state-backed hackers wants to breach your defenses. News flash: APTs probably don't care about you. If you make it fairly easy to get your data and use your resources, of course you'll get popped. That's like leaving your wallet on a bench in the park; of course someone will take it. Raise the barrier to entry for obtaining your resources and you reduce opportunistic crime, just like locking your car at night protects from casual pilfering through your things.Meanwhile, in the news. Amazon Sidewalk Mesh Network Raises Security, Privacy Concerns. Tangential to cloud security, these types of networks worry me for privacy and physical security concerns more than cybersecurity for the device and users. As this article says, privacy and security are separate issues. Conflating the two can compromise one or the other or both. Don't confuse privacy and security as being one and the same.This Week in Database Leaks: Cognyte, CVS, Wegmans. I routinely hammer on securing your cloud storage and other ways to minimize self-exposure of sensitive data for a reason. You should be scared of the implications of these exposures in terms of business risk, reputation loss, and regulatory violations and fines. In other words, don't be stupid.Data is Wealth: Data Security is Wealth Protection. Ignore the schilling of services as usual and take in the message: protecting your data is your prime directive. Ask yourself every morning, “How will I protect my data today?” Doing anything else is doing it wrong.Google Workspace Adds Client-Side Encryption. This means you can store encrypted data in your Google accounts without Google having access to the contents of your data. This is a big deal. Take advantage of this if you use Google for document creation and storage.Corey: This episode is sponsored by ExtraHop. ExtraHop provides threat detection and response for the Enterprise (not the starship). On-prem security doesn't translate well to cloud or multi-cloud environments, and that's not even counting IoT. ExtraHop automatically discovers everything inside the perimeter, including your cloud workloads and IoT devices, detects these threats up to 35 percent faster, and helps you act immediately. Ask for a free trial of detection and response for AWS today at extrahop.com/trial.Jesse: Cybersecurity Tips for Business Travelers: Best Practices for 2021. I plan to avoid a return to routine business travel, but if you want to, or don't have a choice not to get back on the road, do it safely. If you don't want the US Customs and Border Patrol agents searching your devices, wipe your phone before reaching customs. You can set your device to wipe on too many failed passcode entries then backup your phone right before boarding or departing the plane and wipe it on the way to the customs by tapping one number over and over as you walk off the plane.2021 Verizon Data Breach Incident Report insights. The annual Verizon data breach incident report—known as DBIR—has incredible and useful insights for all tech workers, not just security practitioners. Once again, humans are the weak link. I know spending more time educating your people than hunting for ABTs is boring sauce, but you'll be better off.One in Five Manufacturing Firms Targeted by Cyberattacks. If you create real-world goods, you are a prize target. Don't be fooled into thinking you're safer because it's harder to steal things in meatspace than in cyberspace.Confidential Computing: The Future of Cloud Computing Security. Using hardware-level security is still possible in the cloud. Most of us don't need to encrypt everything on a system or everything running in memory, but some of us do need to be that paranoid. However, don't do this unless you really truly have a business case for it, and to implement checkout services like AWS CloudHSM for encryption of in-use memory and data.Many Mobile Apps Intentionally Using Insecure Connections for Sending Data. Don't use insecure transport in your apps. Encrypt your data in transit. Eventually, consumers will have ways to disable all apps that don't use basic security measures like proper authentication without stored credentials or using unencrypted channels. Don't be stupid. Are you sensing a theme of the week?The Art and Strategy of Becoming More Cyber Resilient. Resiliency in IT architectures and applications is becoming the only way to survive the modern distributed world, especially in cybersecurity. You need to change your whole paradigm to be risk and recovery-based, not just the old-school defender attitude of building lots of walls.Cyber is the New Cold War & AI is the Arms Race. The whole AI marketing trope gets old. Ugh. But the message is accurate. There is too much data even in small systems to manage detection and protection without advanced math hunting for anomalous things that go bump in the night. We are in an arms race and we are at war. If nothing else, I like this article because it says what many of us in security always say: “It isn't if you get popped; it's when you get popped.”The Future of Machine Learning and Cybersecurity. A reality check on using advanced math for security monitoring and analysis is important. Use it but don't rely on it too much. Like with all things in life, find balance between known attack analysis and mathematically finding potential attack indicators.And now for the tip of the week. Use a virtual private cloud or VPC for any systems or services not requiring direct public interaction. All three of the biggest public cloud providers have these available. Both AWS and GCP use the term VPC, but Azure calls it an Azure Virtual Network or VNet. This is as simple as setting up a private network for your compute and storage systems and adding a second network for public access for your outside interactions with users and external services. They're easy to implement, and you get significant improvements in security and risk profile reduction quickly using VPCs. This is the cloud version of keeping your things hidden behind a firewall on-prem.And that's it for the week. Securely yours Jesse Trucks.Jesse: Thanks for listening. Please subscribe and rate us on Apple and Google Podcast, Spotify, or wherever you listen to podcasts.Announcer: This has been a HumblePod production. Stay humble.
Michael shares his learnings about IPv6 on AWS. Enabling IPv6 is highly recommended for public endpoints like CloudFront and ALB. On top of that, Michael explains how to enable IPv6 for your VPCs.
Some highlights of the show include: The company's cloud native journey, which accelerated with the acquisition of Uswitch. How the company assessed risk prior to their migration, and why they ultimately decided the task was worth the gamble. Uswitch's transformation into a profitable company resulting from their cloud native migration. The role that multidisciplinary, collaborative teams played in solving problems and moving projects forward. Paul also offers commentary on some of the tensions that resulted between different teams. Key influencing factors that caused the company to adopt containerization and Kubernetes. Paul goes into detail about their migration to Kubernetes, and the problems that it addressed. Paul's thoughts on management and prioritization as CTO. He also explains his favorite engineering tool, which may come as a surprise. Links: RVU Website: https://www.rvu.co.uk/ Uswitch Website: https://www.uswitch.com/ Twitter: https://twitter.com/pingles GitHub: https://github.com/pingles TranscriptAnnouncer: Welcome to The Business of Cloud Native podcast, where we explore how end users talk and think about the transition to Kubernetes and cloud-native architectures.Emily: Welcome to The Business of Cloud Native. I'm your host, Emily Omier, and today I am chatting with Paul Ingles. Paul, thank you so much for joining me.Paul: Thank you for having me.Emily: Could you just introduce yourself: where do you work? What do you do? And include, sort of, some specifics. We all have a job title, but it doesn't always reflect what our actual day-to-day is.Paul: I am the CTO at a company called RVU in London. We run a couple of reasonably big-ish price comparison, aggregator type sites. So, we help consumers figure out and compare prices on broadband products, mobile phones, energy—so in the UK, energy is something which is provided through a bunch of different private companies, so you've got a fair amount of choice on kind of that thing. So, we tried to make it easier and simpler for people to make better decisions on the household choices that they have. I've been there for about 10 years, so I've had a few different roles. So, as CTO now, I sit on the exec team and try to help inform the business and technology strategy. But I've come through a bunch of teams. So, I've worked on some of the early energy price comparison stuff, some data infrastructure work a while ago, and then some underlying DevOps type automation and Kubernetes work a couple of years ago.Emily: So, when you get in to work in the morning, what types of things are usually on your plate?Paul: So, I keep a journal. I use bullet journalling quite extensively. So, I try to track everything that I've got to keep on top of. Generally, what I would try to do each day is catch up with anybody that I specifically need to follow up with. So, at the start of the week, I make a list of every day, and then I also keep a separate column for just general priorities. So, things that are particularly important for the week, themes of work going on, like, technology changes, or things that we're trying to launch, et cetera. And then I will prioritize speaking to people based on those things. So, I'll try and make sure that I'm focusing on the most important thing. I do a weekly meeting with the team. So, we have a few directors that look after different aspects of the business, and so we do a weekly meeting to just run through everything that's going on and sharing the problems. We use the three P's model: so, sharing progress problems and plans. And we use that to try and steer on what we do. And we also look at some other team health metrics. Yeah, it's interesting actually. I think when I switched from working in one of the teams to being in the CTO role, things change quite substantially. That list of things that I had to care about increase hugely, to the point where it far exceeded how much time I had to spend on anything. So, nowadays, I find that I'm much more likely for some things to drop off. And so it's unfortunate, and you can't please everybody, so you just have to say, “I'm really sorry, but this thing is not high on the list of priorities, so I can't spend any time on it this week, but if it's still a problem in a couple of weeks time, then we'll come back to it.” But yeah, it can vary quite a lot.Emily: Hmm, interesting. I might ask you more questions about that later. For now, let's sort of dive into the cloud-native journey. What made RVU decide that containerization was a good idea and that Kubernetes was a good idea? What were the motivations and who was pushing for it?Paul: That's a really good question. So, I got involved about 10 years ago. So, I worked for a search marketing startup that was in London called Forward Internet Group, and they acquired USwitch in 2010. And prior to working at Forward, I'd worked as a consultant at ThoughtWorks in London, so I spent a lot of time working in banks on continuous delivery and things like that. And so when Uswitch came along, there were a few issues around the software release process. Although there was a ton of automation, it was still quite slow to actually get releases out. We were only doing a release every fortnight. And we also had a few issues with the scalability of data. So, it was a monolithic Windows Microsoft stack. So, there was SQL Server databases, and .NET app servers, and things like that. And our traffic can be quite spiky, so when companies are in the news, or there's policy changes and things like that, we would suddenly get an increase in traffic, and the Microsoft solution would just generally kind of fall apart as soon as we hit some kind of threshold. So, I got involved, partly to try and improve some of the automation and release practices because at the search start-up, we were releasing experiments every couple of hours, even. And so we wanted to try and take a bit of that ethos over to Uswitch, and also to try and solve some of the data scalability and system scalability problems. And when we got started doing that, a lot of it was—so that was in the early heyday of AWS, so this was about 2008, that I was at the search startup. And we were used to using EC2 to try and spin up Hadoop clusters and a few other bits and pieces that we were playing around with. And when we acquired Uswitch, we felt like it was quickest for us to just create a different environment, stick it under the load balancer so end users wouldn't realize that some requests was being served off of the AWS infrastructure instead, and then just gradually go from there. We found that that was just the fastest way to move. So, I think it was interesting, and it was both a deliberate move, but it was also I think the degree to which we followed through on it, I don't think we'd really anticipated quite how quickly we would shift everything. And so when Forward made the acquisition, I joined summer of 2010, and myself and a colleague wrote a little two-pager on, here are the problems we see, here are the things that we think we can help with and the ways that technology approach that we'd applied at Forward would carry across, and what benefits we thought it would bring. Unfortunately because Forward was a privately held business—we were relatively small but profitable—and the owner of that business was quite risk-affine. He was quite keen on playing blackjack and other stuff. So, he was pretty happy with talking about probabilities of success.And so we just said, we think there's a future in it if we can get the wheels turning a bit better. And he was up for it. He backed us and we just took it from there. And so we replaced everything from self-hosted physical infrastructure running on top of .NET to all AWS hosted, running a mix of Ruby, and Closure, and other bits and pieces in about two years. And that's just continued from there. So, the move to Kubernetes was a relatively recent one; that was only within the last—I say ‘recent.' it was about two years ago, we started moving things in earnest. And then you asked what was the rationale for switching to Kubernetes—Emily: Let me first ask you, when you were talking with the owner, what were the odds that you gave him for success?Paul: [laughs]. That's a good question. I actually don't know. I think we always knew that there was a big impact to be had. I don't think we knew the scale of the upside. So, I don't think we—I mean, at the time, Uswitch was just about breaking even, so we didn't realize that there was an opportunity to radically change that. I think we underestimated how long it would take to do. So, I think we'd originally thought that we could replace, I think maybe most of the stuff that we needed replaced within six months. We had an early prototype out within two weeks, two or three weeks because we'd always placed a big emphasis on releasing early, experimenting, iterative delivery, A/B testing, that kind of thing. So, I think it was almost like that middle term that was the harder piece. And there was definitely a point where… I don't know, I think it was this classic situation of pulling on a ball of string where it was like, what wanted to do was to focus on improving the end-user experience because our original belief was that, aside from the scalability issues, that the existing site just didn't solve the problem sufficiently well, that it needed an overhaul to simplify the journeys, and simplify the process, and improve the experience for people. We were focusing on that and we didn't want to get drawn into replacing a lot of the back office and integration type systems partly because there was a lot of complexity there. But also because you then have to engage with QA environments, and test environments, and sign-offs with the various people that we integrate with. But it was, as I said, it was this kind of tugging on a ball of string where every improvement that we made in the end-user experience—so we would increase conversion rate by 10 percent but through doing that, we would introduce downstream error in the ways that those systems would integrate, and so we gradually just ended up having to pull in slightly more and more pieces to make it work. I don't think we ever gave odds of success. I think we underestimated how long that middle piece would take. I don't think we really anticipated the degree of upside that we would get as a consequence, through nothing other than just making releases quicker, being able to test and move faster, and focusing on end-user experience was definitely the right thing to focus on.Emily: Do you think though, that everybody perceived it as a risk? I'm just asking because you mentioned the blackjack, was this a risk that could fail?Paul: Well, I think the interesting thing about it was that we knew it was the right thing to do. So, again, I think our experience as consultants at ThoughtWorks was on applying continuous delivery, what we would today call DevOps, applying those practices to software delivery. And so we'd worked on systems where there weren't continuous integration servers and where people weren't releasing every day, and then we'd worked in environments where we were releasing every couple of hours, and we were very quickly able to hone in on what worked and discard things that didn't. And so I think because we've been able to demonstrate that success within the search business, I think that carried a great deal of trust. And so when it came to talking about things we could potentially do, we were totally convinced that there were things that we could improve. I think it was a combination of, there was a ton of potential, we knew that there was a new confluence of technologies and approaches that could be successful if we were able to just start over. And then I think also probably a healthy degree of, like, naive, probably overconfidence in what we could do that we would just throw ourselves into it. So, it's hard work, but yeah, it was ultimately highly successful. So, it's something I'm exceedingly proud of today.Emily: You said something really interesting, which is that Uswitch was barely profitable. And if I understand correctly, that changed for the better. Can you talk about how this is related?Paul: Yeah, sure. I think the interesting thing about it was that we knew that there was something we could do better, but we weren't sure what it was. And so the focus was always on being able to release as frequently as we possibly could to try and understand what that was, as well as trying to just simplify and pay back some of the technical debt. Well, so, trying to overcome some of the artificial constraints that existed because of the technology choices that people have made—perfectly decent decisions on, back in the day, but platforms like AWS offered better alternatives, now. So, we just focused on being able to deliver iteratively, and just keep focusing on continual improvement, releasing, understanding what the problems were, and then getting rid of those little niggly things. The manager I had at Forward was this super—I don't know, he just had the perfect ethos, and he was driven—so we were a team that were focused on doing daily experiments. And so we would rely on data on our spend and data on our revenue. And that would come in on a daily cycle. So, a lot of the rhythm of the team was driven off of that cycle. And so as we could run experiments and measure their profitability, we could then inform what we would do on the day. And so, we have a handful of long-running technology things that we were doing, and then we would also have other tactical things that he would have ideas on, he would have some hypothesis of, well, “Maybe this is the reason that this is happening, let's come up with a test that we can use to try and figure out whether that's true.” We would build something quickly to throw it together to help us either disprove it or support it, and we would put it live, see what happened, and then move on to the next thing. And so I think a lot of the—what we wanted to do is to instill a bit of that environment in Uswitch. And so a lot of it was being able to release quickly, making sure that people had good data in front of them. I mean, even tools like Google Analytics were something which we were quite au fait with using but didn't have broad adoption at the time. And so we were using that to look at site behavior and what was going on and reason about what was happening. So, we just tried to make sure that people were directly using that, rather than just making changes on a longer cycle without data at all.Emily: And can you describe how you were working with the business side, and how you were communicating, what the sort of working relationship was like? If there was any misunderstandings on either side.Paul: Yeah, it's a good question. So, when I started at Uswitch, the organizational structure was, I guess, relatively classical. So, you had a pooled engineering team. So, it was a monolithic system, deployed onto physical infrastructure. So, there was an engineering team, there was an operations team, and then there were a handful of people that were business specific in the different markets that we operated in. So, there was a couple of people that focused on, like, the credit card market; a couple of people that focused on energy, for example. And, I used to call it the stand-up swarm: so, in the morning, we would sit on our desks and you would see almost the entire office moved from the different card walls that were based around the office. Although there was a high degree of interaction between the business stakeholders, the engineers, designers, and other people, it always felt slightly weird that you would have almost all of the company interested in almost everything that was going on, and so I think the intuition we had was that a lot of the ways that we would think about structuring software around loosely-coupled but highly cohesive, those same principles should or could apply to the organization itself. And so what we tried to do is to make sure that we had multidisciplinary teams that had the people in them to do the work. So, for the early days of the energy work, there was only a couple of us that were in it. So, we had a couple of engineers, and we had a lady called Emma, who was the product owner. She used to work in the production operations team, so she used to be focused on data entry from the products that different energy providers would send us, but she had the strongest insight into the domain problem, what problem consumers were trying to overcome, and what ways that we could react to it. And so, when we got involved, she had a couple of ideas that she'd been trying to get traction on, that she'd been unable to. And so what we—we had a, I don't know, probably a, I think a half-day session in an office. So, we took over the boardroom at the office and just said, “Look, we could really do with a separate space away from everybody to be able to focus on it. And we just want to prove something out for a couple of weeks. And we want to make sure that we've got space for people to focus.” And so we had a half-day in there, we had a conversation about, “Okay, well, what's the problem? What's the technical complexity of going after any of these things?” And there's a few nuances, too. Like, if you choose option A, then we have to get all of the historical information around it, as well as the current products and market. Whereas if we choose option B, then we can simplify it down, and we don't need to do all of that work, and we can try and experiment with something sooner. So, we wanted it to be as collaborative as possible because we knew that the way that we would be successful is by trying to execute on ideas faster than we'd been able to before. And at the same time, we also wanted to make sure that there was a feeling of momentum and that we would—I think there was probably a healthy degree of slight overconfidence, but we were also very keen to be able to show off what we could do. And so we genuinely wanted to try and improve the environment for people so that we could focus on solving problems quicker, trying out more experiments, being less hung up on whether it was absolutely the right thing to do, and instead just focus on testing it. So, were there tensions? I think there were definitely tensions; I don't think there weren't tensions so much on the technical side; we were very lucky that most of the engineers that already worked there were quite keen on doing something different, and so we would have conversations with them and just say, “Look, we'll try everything we can to try and remove as many of the constraints that exist today.” I think a lot of the disagreement or tension was whether or not it was the right problem to be going after. So, again, the search business that we worked in was doing a decent amount of money for the number of people that were there, and we knew there was a problem we could fix, but we didn't know how much runway it would have. And so there was a lot of tension on whether we should be pulling people into focusing on extending the search business, or whether we needed to focus on fixing Uswitch. So, there was a fair amount of back and forth about whether or not we needed to move people from one part of the business to another and that kind of thing.Emily: Let's talk a little bit about Kubernetes, and how Uswitch decided to use Kubernetes, what problem it solved, and who was behind the decision, who was really making the push.Paul: Yeah, interesting. So, I think containers was something that we'd been experimenting with for a little while. So, as I think a lot of the culture was, we were quite risk-affine. So, we were quite keen to be trying out new technologies, and we'd been using modern languages and platforms like Closure since the early days of them being available. We'd been playing around with containers for a while, and I think we knew there was something in it, but we weren't quite sure what it was. So, I think, although we were playing around with it quite early, I think we were quite slow to choose one platform or another. I think in the end, we—in the intervening period, I guess, between when we went from the more classical way of running Puppet across a bunch of EC2 instances that run a version of your application, the next step after that was switching over to using ECS. So, Amazon's container service. And I guess the thing that prompted a bit more curiosity into Kubernetes was that—I forgot the projects I was working on, but I was working on a team for a little while, and then I switched to go do something else. And I needed to put a new service up, and rather than just doing the thing that I knew, I thought, “Well, I'll go talk to the other teams.” I'll talk to some other people around the company, and find out what's the way that I ought to be doing this today, and there was a lot of work around standardizing the way that you would stand up an ECS cluster. But I think even then, it always felt like we were sharing things in the wrong way. So, if you were working on a team, you had to understand a great deal of Amazon to be able to make progress. And so, back when I got started at Uswitch, when I talk about doing the work about the energy migration, AWS at the time really only offered EC2, load balancers, firewalling, and then eventually relational databases, and so back then the amounts of complexity to stand up something was relatively small. And then come to a couple of years ago. You have to appreciate and understand routing tables, VPCs, the security rules that would permit traffic to flow between those, it was one of those—it was just relatively non-trivial to do something that was so core to what we needed to be able to do. And I think the thing that prompted Kubernetes was that, on the Kubernetes project side, we'd seen a gradual growth and evolution of the concepts, and abstractions, and APIs that it offered. And so there was a differentiation between ECS or—I actually forget what CoreOS's equivalent was. I think maybe it was just called CoreOS. But there are a few alternative offerings for running containerized, clustered services, and Kubernetes seems to take a slightly different approach that it was more focused on end-user abstractions. So, you had a notion of making a deployment: that would contain replicas of a container, and you would run multiple instances of your application, and then that would become a service, and you could then expose that via Ingress. So, there was a language that you could use to talk about your application and your system that was available to you in the environment that you're actually using. Whereas AWS, I think, would take the view that, “Well, we've already got these building blocks, so what we want our users to do is assemble the building blocks that already exist.” So, you still have to understand load balancers, you still have to understand security groups, you have to understand a great deal more at a slightly lower level of abstraction. And I think the thing that seemed exciting, or that seems—the potential about Kubernetes was that if we chose something that offered better concepts, then you could reasonably have a team that would run some kind of underlying platform, and then have teams build upon that platform without having to understand a great deal about what was going on inside. They could focus more on the applications and the systems that they were hoping to build. And that would be slightly harder on the alternative. So, I think at the time, again, it was one of those fortunate things where I was just coming to the end of another project and was in the fortunate position where I was just looking around at the various different things that we were doing as a business, and what opportunity there was to do something that would help push things on. And Kubernetes was one of those things which a couple of us had been talking about, and thinking, “Oh, maybe now is the time to give it a go. There's enough stability and maturity in it; we're starting to hit the problems that it's designed to address. Maybe there's a bit more appetite to do something different.” So, I think we just gave it a go. Built a proof of concept, showed that could run the most complex system that we had, and I think also did a couple of early experiments on the ways in which Kubernetes had support for horizontal scaling and other things which were slightly harder to put into practice in AWS. And so we did all that, I think gradually it just kind of growed out from there, just took the proof of concepts to other teams that were building products and services. We found a team that were struggling to keep their systems running because they were a tiny team. They only had, like, two or three engineers in. They had some stability problems over a weekend because the server ran out of hard disk space, and we just said, “Right. Well, look, if you use this, we'll take on that problem. You can just focus on the application.” It kind of just grew and grew from there.Emily: Was there anything that was a lot harder than you expected? So, I'm looking for surprises as you're adopting Kubernetes.Paul: Oh, surprises. I think there was a non-trivial amount that we had to learn about running it. And again, I think at the point at which we'd picked it up, it was, kind of, early days for automation, so there was—I think maybe Google had just launched Google Kubernetes engine on Google Cloud. Amazon certainly hadn't even announced that hosted Kubernetes would be an option. There was an early project within Kubernetes, called kops that you could use to create a cluster, but even then it didn't fit our network topology because it wouldn't work with the VPC networking that we needed and expected within our production infrastructure. So, there was a lot of that kind of work in the early days, to try and make something work, you had to understand in quite a level of detail what each component of Kubernetes was doing. As we were gradually rolling it out, I think the things that were most surprising were that, for a lot of people, it solved a lot of problems that meant they could move on, and I think people were actually slightly surprised by that. Which, [laughs], it sounds like quite a weird turn of phrase, but I think people were positively surprised at the amount of stuff that they didn't have to do for solving a fair few number of problems that they had. There was a couple of teams that were doing things that are slightly larger scale that we had to spend a bit more time on improving the performance of our setup. So, in particular, there was a team that had a reasonably strong requirement on the latency overheads of Ingress. So, they wanted their application to respond within, I don't know, I think it was maybe 200 milliseconds or something. And we, through setting up the monitoring and other bits and pieces that we had, we realized that Ingress currently was doing all right, but there was a fair amount of additional latency that was added at the tail that was a consequence of a couple of bugs or other things that existed in the infrastructure. So, there was definitely a lot of little niggly things that came up as we were going, but we were always confident that we could overcome it. And, as I said, I think that a lot of teams saw benefits very early on. And I think the other teams that were perhaps a little bit more skeptical because they got their own infrastructure already, they knew how to operate it, it was highly tested, they'd already run capacity and load tests on it, they were convinced that it was the most efficient thing that they could possibly run. I think even over the long run, I think they realized that there was more work that they needed to do than they should be focusing on, and so they were quite happy to ultimately switch over to the shared platform and infrastructure that the cloud infrastructure team run.Emily: As we wrap up, there's actually a question I want to go back to, which is how you were talking about the shifting priorities now that you've become CTO. Do you have any sort of examples of, like, what are the top three things that you will always care about, that you will always have the energy to think about? And then I'm curious to have some examples of things that you can't deal with, you can't think about. The things that tend to drop off.Paul: The top three things that I always think about. So, I think, actually, what's interesting about being CTO, that I perhaps wasn't expecting is that you're ever so slightly removed from the work, that you can't rely on the same signals or information to be able to make a decision on things, and so when I give the Kubernetes story, it's one of those, like, because I'd moved from one system to another, and I was starting a new project, I experienced some pain. It's like, “Right. Okay, I've got to go do something to fix this. I've had enough.” And I think the thing that I'm always paying attention to now, is trying to understand where that pain is next, and trying to make sure that I've got a mechanism for being able to appreciate that. So, I think a lot of the things I try to spend time on are things to help me keep track of what's going on, and then help me make decisions off the back of it. So, I think the things that I always spend time on are generally things trying to optimize some process or invest in automation. So, a good example at the moment is, we're talking about starting to do canary deployments. So, starting to automate the actual rollout of some new release, and being able to automate a comparison against the existing service, looking at latency, or some kind of transactional metrics to understand whether it's performing as well or different than something historical. So, I think the things that I tend to spend time on are process-oriented or are things to try and help us go quicker. One of the books that I read that changed my opinion of management was Andy Grove's, High Output Management. And I forget who recommended it to me, but somebody recommended it to me, and it completely altered my opinion of what value a manager can add. So, one of the lenses I try to apply to anything is of everything that's going on, what's the handful of things that are going to have the most impact or leverage across the organization, and try and spend my time on those. I think where it gets tricky is that you have to go broad and deep. So, as much as there are broad things that have a high consequence on the organization as a whole, you also need an appreciation of what's going on in the detail, and I think that's always tricky to manage. I'm sorry, I forgot what was the second part of your question.Emily: The second part was, do you have any examples of the things that you tend to not care about? That presumably someone is asking you to care about, and you don't?Paul: [laughs]. Yeah, it's a good question. I don't think it's that I don't care about it. I think it's that there are some questions that come my way that I know that I can defer, or they're things which are easy to hand off. So, I think the… that is a good question. I think one of the things that I think are always tricky to prioritize, are things which feel high consequence but are potentially also very close to bikeshedding. And I think that is something which is fair—I'd be interested to hear what other people said. So, a good example is, like, choice of tooling. And so when I was working on a team, or on a problem, we would focus on choosing the right tool for the job, and we would bias towards experimenting with tools early, and figuring out what worked, and I think now you have to view the same thing through a different lens. So, there's a degree to which you also incur an organizational cost as a consequence of having high variability in the programming languages that you choose to use. And so I don't think it's something I don't care about, but I think it's something which is interesting that I think it's something which, over the time I've been doing this role, I've gradually learned to let go of things that I would otherwise have previously thoroughly enjoyed getting involved in. And so you have to step back and say, “Well, actually I'm not the right person to be making a decision about which technology this team should be using. I should be trusting the team to make that decision.” And you have to kind of—I think that over the time I've been doing the role, you kind of learn which are the decisions that are high consequence that you should be involved in and which are the ones that you have to step back from. And you just have to say, look, I've got two hours of unblocked time this week where I can focus on something, so of the things on my priority list—the things that I've written in my journal that I want to get done this month—which of those things am I going to focus on, and which of the other things can I leave other people to get on with, and trust that things will work out all right?Emily: That's actually a very good segue into my final question, which is the same for everyone. And that is, what is an engineering tool that you can't live without—your favorite?Paul: Oh, that's a good question. So, I don't know if this is a cop-out by not mentioning something engineering-related, but I think the tool and technique which has helped me the most as I had more and more management responsibility and trying to keep track of things, is bullet journaling. So, I think, up until, I don't know, maybe five years ago, probably, I'd focus on using either iOS apps or note tools in both my laptop, and phone, and so on, and it never really stuck. And bullet journaling, through using a pen and a notepad, it forced me to go a bit slower. So, it forced me to write things down, to think through what was going on, and there is something about it being physical which makes me treat it slightly differently. So, I think bullet journaling is one of the things which has had the—yeah, it's really helped me deal with keeping track of what's going on, and then giving me the ability to then look back over the week, figure out what were the things that frustrated me, what can I change going into next week, one of the suggestions that the person that came up with bullet journaling recommended, is this idea of an end of week reflection. And so, one of the things I try to do—it's been harder doing it now that I'm working at home—is to spend just 15 minutes at the end of the week thinking of, what are the things that I'm really proud of? What are some good achievements that I should feel really good about going into next week? And so I think a lot of the activities that stem from bullet journaling have been really helpful. Yeah, it feels like a bit of a cop-out because it's not specifically technology related, but bullet journaling is something which has made a big difference to me.Emily: Not at all. That's totally fair. I think you are the first person who's had a completely non-technological answer, but I think I've had someone answer Slack, something along those lines.Paul: Yeah, I think what's interesting is there there are loads of those tools that we use all the time. Like Google Docs is something I can't live without. So, I think there's a ton of things that I use day-to-day that are hard to let go off, but I think the I think that the things that have made the most impact on my ability to deal with a stressful job, and give you the ability to manage yourself a little bit, I think yeah, it's been one of the most interesting things I've done.Emily: And where could listeners connect with you or follow you?Paul: Cool. So, I am @pingles on Twitter. My DMs are open, so if anybody wants to talk on that, I'm happy to. I'm also on GitHub under pingles, as well. So, @pingles, generally in most places will get you to me.Emily: Well, thank you so much for joining me.Paul: Thank you for talking. It's been good fun.Announcer: Thank you for listening to The Business of Cloud Native podcast. Keep up with the latest on the podcast at thebusinessofcloudnative.com and subscribe on iTunes, Spotify, Google Podcasts, or wherever fine podcasts are distributed. We'll see you next time.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
VPCs. Vnets. DirectConnect. Kubernetes. Calico. Public clouds. Hybrid clouds. Networking is no small feat when it comes to the cloud. How does an organization keep their cloud networks from turning into a flying spaghetti monster? Day Two Cloud tackles this critical question with guest Andrew Wertkin, Chief Strategy Officer at BlueCat Networks. We discuss design tips, the critical role of DNS, monitoring and troubleshooting options, and more.
David Bombal talks to Jeremy Grossmann (creator of GNS3) about the future of GNS3. Here we discuss Dynamips and VPCS and their future in GNS3. Will they be removed from GNS3? Are they recommended? What do they actually do? What should be used instead of them? Does Dynamips support switching? In future videos we will discuss additional options in gns3 such as Cisco VIRL and IOU. Menu: 0:12 - Devices in GNS3. It can be confusing. What is Dynamips 0:57 - Does GNS3 support switching? 1:17 - Are they real IOS images? 1:47 - Issue 1: Where do I get Cisco images? Cisco restrictions. 2:07 - Issues 2: Only older versions of Cisco IOS are supported on a lot of platforms 2:11 - Is it stable? Issue 3: More memory and processor intensive 2:25 - What is Idle PC Value 4:23 - Advantage 1: Supports serial interfaces 4:50 - Dynamips is a dying product 5:00 - You can run Dynamips locally 5:40 - What does Jeremy recommend we use? 5:50 - Switching in Dynamips? 7:18 - Will Dynamips be removed from GNS3? 7:48 - What is VPCS? 8:28 - What is the advantage of VPCS? 8:55 - Should we be using VPCS? 9:58 - Will VPCS be removed from GNS3? David's details: YouTube: https://www.youtube.com/davidbombal Twitter: https://www.twitter.com/davidbombal Instagram: https://www.instagram.com/davidbombal LinkedIn: https://www.linkedin.com/in/davidbombal Facebook: https://www.facebook.com/davidbombal.co Website: http://www.davidbombal.com #gns3 #dynamips #virl