Screaming in the Cloud

Follow Screaming in the Cloud
Share on
Copy link to clipboard

Screaming in the Cloud with Corey Quinn features conversations with domain experts in the world of Cloud Computing. Topics discussed include AWS, GCP, Azure, Oracle Cloud, and the "why" behind how businesses are coming to think about the Cloud.

Corey Quinn


    • May 5, 2023 LATEST EPISODE
    • weekdays NEW EPISODES
    • 36m AVG DURATION
    • 463 EPISODES

    4.9 from 73 ratings Listeners of Screaming in the Cloud that love the show mention: cloud, corey, snark, twitter.



    Search for episodes from Screaming in the Cloud with a specific topic:

    Latest episodes from Screaming in the Cloud

    Operating in the Kubernetes Cloud on Amazon EKS with Eswar Bala

    Play Episode Listen Later May 5, 2023 34:29


    Eswar Bala, Director of Amazon EKS at AWS, joins Corey on Screaming in the Cloud to discuss how and why AWS built a Kubernetes solution, and what customers are looking for out of Amazon EKS. Eswar reveals the concerns he sees from customers about the cost of Kubernetes, as well as the reasons customers adopt EKS over ECS. Eswar gives his reasoning on why he feels Kubernetes is here to stay and not just hype, as well as how AWS is working to reduce the complexity of Kubernetes. Corey and Eswar also explore the competitive landscape of Amazon EKS, and the new product offering from Amazon called Karpenter.About EswarEswar Bala is a Director of Engineering at Amazon and is responsible for Engineering, Operations, and Product strategy for Amazon Elastic Kubernetes Service (EKS). Eswar leads the Amazon EKS and EKS Anywhere teams that build, operate, and contribute to the services customers and partners use to deploy and operate Kubernetes and Kubernetes applications securely and at scale. With a 20+ year career in software , spanning multimedia, networking and container domains, he has built greenfield teams and launched new products multiple times.Links Referenced: Amazon EKS: https://aws.amazon.com/eks/ kubernetesthemuchharderway.com: https://kubernetesthemuchharderway.com kubernetestheeasyway.com: https://kubernetestheeasyway.com EKS documentation: https://docs.aws.amazon.com/eks/ EKS newsletter: https://eks.news/ EKS GitHub: https://github.com/aws/eks-distro TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Welcome to Screaming in the Cloud, I'm Corey Quinn. Today's promoted guest episode is brought to us by our friends at Amazon. Now, Amazon is many things: they sell underpants, they sell books, they sell books about underpants, and underpants featuring pictures of books, but they also have a minor cloud computing problem. In fact, some people would call them a cloud computing company with a gift shop that's attached. Now, the problem with wanting to work at a cloud company is that their interviews are super challenging to pass.If you want to work there, but can't pass the technical interview for a long time, the way to solve that has been, “Ah, we're going to run Kubernetes so we get to LARP as if we worked at a cloud company but don't.” Eswar Bala is the Director of Engineering for Amazon EKS and is going to basically suffer my slings and arrows about one of the most complicated, and I would say overwrought, best practices that we're seeing industry-wide. Eswar, thank you for agreeing to subject yourself to this nonsense.Eswar: Hey, Corey, thanks for having me here.Corey: [laugh]. So, I'm a little bit unfair to Kubernetes because I wanted to make fun of it and ignore it. But then I started seeing it in every company that I deal with in one form or another. So yes, I can still sit here and shake my fist at the tide, but it's turned into, “Old Man Yells at Cloud,” which I'm thrilled to embrace, but everyone's using it. So, EKS has recently crossed, I believe, the five-year mark since it was initially launched. What is EKS other than Amazon's own flavor of Kubernetes?Eswar: You know, the best way I can define EKS is, EKS is just Kubernetes. Not Amazon's version of Kubernetes. It's just Kubernetes that we get from the community and offer it to customers to make it easier for them to consume. So, EKS. I've been with EKS from the very beginning when we thought about offering a managed Kubernetes service in 2017.And at that point, the goal was to bring Kubernetes to enterprise customers. So, we have many customers telling us that they want us to make their life easier by offering a managed version of Kubernetes that they've actually beginning to [erupt 00:02:42] at that time period, right? So, my goal was to figure it out, what does that service look like and which customer base should be targeting service towards.Corey: Kelsey Hightower has a fantastic learning tool out there in a GitHub repo called, “Kubernetes the Hard Way,” where he talks you through building the entire thing, start to finish. I wound up forking it and doing that on top of AWS, and you can find that at kubernetesthemuchharderway.com. And that was fun.And I went through the process and my response at the end was, “Why on earth would anyone ever do this more than once?” And we got that sorted out, but now it's—customers aren't really running these things from scratch. It's like the Linux from Scratch project. Great learning tool; probably don't run this in production in the same way that you might otherwise because there are better ways to solve for the problems that you will have to solve yourself when you're building these things from scratch. So, as I look across the ecosystem, it feels like EKS stands in the place of the heavy, undifferentiated lifting of running the Kubernetes control plane so customers functionally don't have to. Is that an effective summation of this?Eswar: That is precisely right. And I'm glad you mentioned, “Kubernetes the Hard Way,” I'm a big fan of that when it came out. And if anyone who did that tutorial, and also your tutorial, “Kubernetes the Harder Way,” would walk away thinking, “Why would I pick this technology when it's super complicated to setup?” But then you see that customers love Kubernetes and you see that reflected in the adoption, even in 2016, 2017 timeframes.And the reason is, it made life easier for application developers in terms of offering web services that they wanted to offer to their customer base. And because of all the features that Kubernetes brought on, application lifecycle management, service discoveries, and then it evolved to support various application architectures, right, in terms of stateless services, stateful applications, and even daemon sets, right, like for running your logging and metrics agents. And these are powerful features, at the end of the day, and that's what drove Kubernetes. And because it's super hard to get going to begin with and then to operate, the day-two operator experience is super complicated.Corey: And the day one experience is super hard and the day two experience of, “Okay, now I'm running it and something isn't working the way it used to. Where do I start,” has been just tremendously overwrought. And frankly, more than a little intimidating.Eswar: Exactly. Right? And that exactly was our opportunity when we started in 2017. And when we started, there was question on, okay, should we really build a service when you have an existing service like ECS in place? And by the way, like, I did work in ECS before I started working in EKS from the beginning.So, the answer then was, it was about giving what customers want. And their space for many container orchestration systems, right, ECS was the AWS service at that point in time. And our thinking was, how do we give customers what they wanted? They wanted a Kubernetes solution. Let's go build that. But we built it in a way that we remove the undifferentiated heavy lifting of managing Kubernetes.Corey: One of the weird things that I find is that everyone's using Kubernetes, but I don't see it in the way that I contextualize the AWS universe, which of course, is on the bill. That's right. If you don't charge for something in AWS Lambda, and preferably a fair bit, I don't tend to know it exists. Like, “What's an IAM and what might that possibly do?” Always have reassuring thing to hear from someone who's often called an expert in this space. But you know, if it doesn't cost money, why do I pay attention to it?The control plane is what EKS charges for, unless you're running a bunch of Fargate-managed pods and containers to wind up handling those things. So, it mostly just shows up as an addenda to the actual big, meaty portions of the belt. It just looks like a bunch of EC2 instances with some really weird behavior patterns, particularly with regard to auto-scaling and crosstalk between all of those various nodes. So, it's a little bit of a murder mystery, figuring out, “So, what's going on in this environment? Do you folks use containers at all?” And the entire Kubernetes shop is looking at me like, “Are you simple?”No, it's just I tend to disregard the lies that customers say, mostly to themselves because everyone has this idea of what's going on in their environment, but the bill speaks. It's always been a little bit of an investigation to get to the bottom of anything that involves Kubernetes at significant points of scale.Eswar: Yeah, you're right. Like if you look at EKS, right, like, we started with managing the control plane to begin with. And managing the control plane is a drop in the bucket when you actually look at the costs in terms of operating a Kubernetes cluster or running a Kubernetes cluster. When you look at how our customers use and where they spend most of their cost, it's about where their applications run; it's actually the Kubernetes data plane and the amount of compute and memory that the applications end of using end up driving 90% of the cost. And beyond that is the storage, beyond that as a networking costs, right, and then after that is the actual control plane costs. So, the problem right now is figuring out, how do we optimize our costs for the application to run on?Corey: On some level, it requires a little bit of understanding of what's going on under the hood. There have been a number of cost optimization efforts that have been made in the Kubernetes space, but they tend to focus around stuff that I find relatively, well, I call it banal because it basically is. You're looking at the idea of, okay, what size instances should you be running, and how well can you fill them and make sure that all the resources per node wind up being taken advantage of? But that's also something that, I guess from my perspective, isn't really the interesting architectural point of view. Whether or not you're running a bunch of small instances or a few big ones or some combination of the two, that doesn't really move the needle on any architectural shift, whereas ingesting a petabyte a month of data and passing 50 petabytes back and forth between availability zones, that's where it starts to get really interesting as far as tracking that stuff down.But what I don't see is a whole lot of energy or effort being put into that. And I mean, industry-wide, to be clear. I'm not attempting to call out Amazon specifically on this. That's [laugh] not the direction I'm taking this in. For once. I know, I'm still me. But it seems to be just an industry-wide issue, where zone affinity for Kubernetes has been a very low priority item, even on project roadmaps on the Kubernetes project.Eswar: Yeah, the Kubernetes does provide ability for customers to restrict their workloads within as particular [unintelligible 00:09:20], right? Like, there is constraints that you can place on your pod specs that end up driving applications towards a particular AZ if they want, right? You're right, it's still left to the customers to configure. Just because there's a configuration available doesn't mean the customers use it. If it's not defaulted, most of the time, it's not picked up.That's where it's important for service providers—like EKS—to offer ability to not only provide the visibility by means of reporting that it's available using tools like [Cue Cards 00:09:50] and Amazon Billing Explorer but also provide insights and recommendations on what customers can do. I agree that there's a gap today. For example in EKS, in terms of that. Like, we're slowly closing that gap and it's something that we're actively exploring. How do we provide insights across all the resources customers end up using from within a cluster? That includes not just compute and memory, but also storage and networking, right? And that's where we are actually moving towards at this point.Corey: That's part of the weird problem I've found is that, on some level, you get to play almost data center archaeologists when you start exploring what's going on in these environments. I found one of the only reliable ways to get answers to some of this stuff has been oral tradition of, “Okay, this Kubernetes cluster just starts hurling massive data quantities at 3 a.m. every day. What's causing that?” And it leads to, “Oh, no no, have you talked to the data science team,” like, “Oh, you have a data science team. A common AWS billing mistake.” And exploring down that particular path sometimes pays dividends. But there's no holistic way to solve that globally. Today. I'm optimistic about tomorrow, though.Eswar: Correct. And that's where we are spending our efforts right now. For example, we recently launched our partnership with Cue Cards, and Cue Cards is now available as an add-on from the Marketplace that you can easily install and provision on Kubernetes EKS clusters, for example. And that is a start. And Cue Cards is amazing in terms of features, in terms of insight it offers, right, it looking into computer, the memory, and the optimizations and insights it provides you.And we are also working with the AWS Cost and Usage Reporting team to provide a native AWS solution for the cost reporting and the insights aspect as well in EKS. And it's something that we are going to be working really closely to solve the networking gaps in the near future.Corey: What are you seeing as far as customer concerns go, with regard to cost and Kubernetes? I see some things, but let's be very clear here, I have a certain subset of the market that I spend an inordinate amount of time speaking to and I always worry that what I'm seeing is not holistically what's going on in the broader market. What are you seeing customers concerned about?Eswar: Well, let's start from the fundamentals here, right? Customers really want to get to market faster, whatever services and applications that they want to offer. And they want to have it cheaper to operate. And if they're adopting EKS, they want it cheaper to operate in Kubernetes in the cloud. They also want a high performance, they also want scalability, and they want security and isolation.There's so many parameters that they have to deal with before they put their service on the market and continue to operate. And there's a fundamental tension here, right? Like they want cost efficiency, but they also want to be available in the market quicker and they want performance and availability. Developers have uptime, SLOs, and SLAs is to consider and they want the maximum possible resources that they want. And on the other side, you've got financial leaders and the business leaders who want to look at the spending and worry about, like, okay, are we allocating our capital wisely? And are we allocating where it makes sense? And are we doing it in a manner that there's very little wastage and aligned with our customer use, for example? And this is where the actual problems arise from [unintelligible 00:13:00].Corey: I want to be very clear that for a long time, one of the most expensive parts about running Kubernetes has not been the infrastructure itself. It's been the people to run this responsibly, where it's the day two, day three experience where for an awful lot of companies like, oh, we're moving to Kubernetes because I don't know we read it in an in-flight magazine or something and all the cool kids are doing it, which honestly during the pandemic is why suddenly everyone started making better IT choices because they're execs were not being exposed to airport ads. I digress. The point, though, is that as customers are figuring this stuff out and playing around with it, it's not sustainable that every company that wants to run Kubernetes can afford a crack SRE team that is individually incredibly expensive and collectively staggeringly so. That it seems to be the real cost is the complexity tied to it.And EKS has been great in that it abstracts an awful lot of the control plane complexity away. But I still can't shake the feeling that running Kubernetes is mind-bogglingly complicated. Please argue with me and tell me I'm wrong.Eswar: No, you're right. It's still complicated. And it's a journey towards reducing the complexity. When we launched EKS, we launched only with managing the control plane to begin with. And that's where we started, but customers had the complexity of managing the worker nodes.And then we evolved to manage the Kubernetes worker nodes in terms two products: we've got Managed Node Groups and Fargate. And then customers moved on to installing more agents in their clusters before they actually installed their business applications, things like Cluster Autoscaler, things like Metric Server, critical components that they have come to rely on, but doesn't drive their business logic directly. They are supporting aspects of driving core business logic.And that's how we evolved into managing the add-ons to make life easier for our customers. And it's a journey where we continue to reduce the complexity of making it easier for customers to adopt Kubernetes. And once you cross that chasm—and we are still trying to cross it—once you cross it, you have the problem of, okay so, adopting Kubernetes is easy. Now, we have to operate it, right, which means that we need to provide better reporting tools, not just for costs, but also for operations. Like, how easy it is for customers to get to the application level metrics and how easy it is for customers to troubleshoot issues, how easy for customers to actually upgrade to newer versions of Kubernetes. All of these challenges come out beyond day one, right? And those are initiatives that we have in flight to make it easier for customers [unintelligible 00:15:39].Corey: So, one of the things I see when I start going deep into the Kubernetes ecosystem is, well, Kubernetes will go ahead and run the containers for me, but now I need to know what's going on in various areas around it. One of the big booms in the observability space, in many cases, has come from the fact that you now need to diagnose something in a container you can't log into and incidentally stopped existing 20 minutes for you got the alert about the issue, so you'd better hope your telemetry is up to snuff. Now, yes, that does act as a bit of a complexity burden, but on the other side of it, we don't have to worry about things like failed hard drives taking systems down anymore. That has successfully been abstracted away by Kubernetes, or you know, your cloud provider, but that's neither here nor there these days. What are you seeing as far as, effectively, the sidecar pattern, for example of, “Oh, you have too many containers and need to manage them? Have you considered running more containers?” Sounds like something a container salesman might say.Eswar: So, running containers demands that you have really solid observability tooling, things that you're able to troubleshoot—successfully—debug without the need to log into the containers itself. In fact, that's an anti-pattern, right? You really don't want a container to have the ability to SSH into a particular container, for example. And to be successful at it demands that you publish your metrics and you publish your logs. All of these are things that a developer needs to worry about today in order to adopt containers, for example.And it's on the service providers to actually make it easier for the developers not to worry about these. And all of these are available automatically when you adopt a Kubernetes service. For example, in EKS, we are working with our managed Prometheus service teams inside Amazon, right—and also CloudWatch teams—to easily enable metrics and logging for customers without having to do a lot of heavy lifting.Corey: Let's talk a little bit about the competitive landscape here. One of my biggest competitors in optimizing AWS bills is Microsoft Excel, specifically, people are going to go ahead and run it themselves because, “Eh, hiring someone who's really good at this, that sounds expensive. We can screw it up for half the cost.” Which is great. It seems to me that one of your biggest competitors is people running their own control plane, on some level.I don't tend to accept the narrative that, “Oh, EKS is expensive that winds up being what 35 bucks or 70 bucks or whatever it is per control plane per cluster on a monthly basis.” Okay, yes, that's expensive if you're trying to stay completely within a free tier perhaps, but if you're running anything that's even slightly revenue-generating or a for-profit company, you will spend far more than that just on people's time. I have no problems—for once—with the EKS pricing model, start to finish. Good work on that. You've successfully nailed it. But are you seeing significant pushback from the industry of, “Nope, we're going to run our own Kubernetes management system instead because we enjoy pain, corporately speaking.”Eswar: Actually, we are in a good spot there, right? Like, at this point, customers who choose to run Kubernetes on AWS by themselves and not adopt EKS just fall into one main category, so—or two main categories: number one, they have existing technical stack built on running Kubernetes on themselves and they'd rather maintain that and not moving to EKS. Or they demand certain custom configurations of the Kubernetes control plane that EKS doesn't support. And those are the only two reasons why we see customers not moving into EKS and prefer to run their own Kubernetes on AWS clusters.[midroll 00:19:46]Corey: It really does seem, on some level, like there's going to be a… I don't want to say reckoning because that makes it sound vaguely ominous and that's not the direction that I intend for things to go in, but there has to be some form of collapsing of the complexity that is inherent to all of this because the entire industry has always done that. An analogy that I fall back on because I've seen this enough times to have the scars to show for it is that in the '90s, running a web server took about a week of spare time and an in-depth knowledge of GCC compiler flags. And then it evolved to ah, I could just unzip a tarball of precompiled stuff, and then RPM or Deb became a thing. And then Yum, or something else, or I guess apt over in the Debian land to wind up wrapping around that. And then you had things like Puppet where it was it was ensure installed. And now it's Docker Run.And today, it's a checkbox in the S3 console that proceeds to yell at you because you're making a website public. But that's neither here nor there. Things don't get harder with time. But I've been surprised by how I haven't yet seen that sort of geometric complexity collapsing of around Kubernetes to make it easier to work with. Is that coming or are we going to have to wait for the next cycle of things?Eswar: Let me think. I actually don't have a good answer to that, Corey.Corey: That's good, at least because if you did, I'd worried that I was just missing something obvious. That's kind of the entire reason I ask. Like, “Oh, good. I get to talk to smart people and see what they're picking up on that I'm absolutely missing.” I was hoping you had an answer, but I guess it's cold comfort that you don't have one off the top of your head. But man, is it confusing.Eswar: Yeah. So, there are some discussions in the community out there, right? Like, it's Kubernetes the right layer to do interact? And there are some tooling that's built on top of Kubernetes, for example, Knative that tries to provide a serverless layer on top of Kubernetes, for example. There are also attempts at abstracting Kubernetes completely and providing tooling that just completely removes any sort of Kubernetes API out of the picture and maybe a specific CI/CD-based solution that takes it from the source and deploys the service without even showing you that there's Kubernetes underneath, right?All of these are evolutions that are being tested out there in the community. Time will tell whether these end up sticking. But what's clear here is the gravity around Kubernetes. All sorts of tooling that gets built on top of Kubernetes, all the operators, all sorts of open-source initiatives that are built to run on Kubernetes. For example, Spark, for example, Cassandra, so many of these big, large-scale, open-source solutions are now built to run really well on Kubernetes. And that is the gravity that's pushing Kubernetes at this point.Corey: I'm curious to get your take on one other, I would consider interestingly competitive spaces. Now, because I have a domain problem, if you go to kubernetestheeasyway.com, you'll wind up on the ECS marketing page. That's right, the worst competition in the world: the people who work down the hall from you.If someone's considering using ECS, Elastic Container Service versus EKS, Elastic Kubernetes Service, what is the deciding factor when a customer's making that determination? And to be clear, I'm not convinced there's a right or wrong answer. But I am curious to get your take, given that you have a vested interest, but also presumably don't want to talk complete smack about your colleagues. But feel free to surprise me.Eswar: Hey, I love ECS, by the way. Like I said, I started my life in the AWS in ECS. So look, ECS is a hugely successful container orchestration service. I know we talk a lot about Kubernetes, I know there's a lot of discussions around Kubernetes, but I wouldn't make it a point that, like, ECS is a hugely successful service. Now, what determines how customers go to?If customers are… if the customers tech stack is entirely on AWS, right, they use a lot of AWS services and they want an easy way to get started in the container world that has really tight integration with other AWS services without them having to configure a lot, ECS is the way, right? And customers have actually seen terrific success adopting ECS for that particular use case. Whereas EKS customers, they start with, “Okay, I want an open-source solution. I really love Kubernetes. I lo—or, I have a tooling that I really like in the open-source land that really works well with Kubernetes. I'm going to go that way.” And those kind of customers end up picking EKS.Corey: I feel like, on some level, Kubernetes has become the most the default API across a wide variety of environments. AWS obviously, but on-prem other providers. It seems like even the traditional VPS companies out there that offer just rent-a-server in the cloud somewhere are all also offering, “Oh, and we have a Kubernetes service as well.” I wound up backing a Kickstarter project that runs a Kubernetes cluster with a shared backplane across a variety of Raspberries Pi, for example. And it seems to be almost everywhere you look.Do you think that there's some validity to that approach of effectively whatever it is that we're going to wind up running in the future, it's going to be done on top of Kubernetes or do you think that that's mostly hype-driven these days?Eswar: It's definitely not hype. Like we see the proof in the kind of adoption we see. It's becoming the de facto container orchestration API. And with all the tooling, open-source tooling that's continuing to build on top of Kubernetes, CNCF tooling ecosystem that's actually spawned to actually support Kubernetes at option, all of this is solid proof that Kubernetes is here to stay and is a really strong, powerful API for customers to adopt.Corey: So, four years ago, I had a prediction on Twitter, and I said, “In five years, nobody will care about Kubernetes.” And it was in February, I believe, and every year, I wind up updating an incrementing a link to it, like, “Four years to go,” “Three years to go,” and I believe it expires next year. And I have to say, I didn't really expect when I made that prediction for it to outlive Twitter, but yet, here we are, which is neither here nor there. But I'm curious to get your take on this. But before I wind up just letting you savage the naive interpretation of that, my impression has been that it will not be that Kubernetes has gone away. That is ridiculous. It is clearly in enough places that even if they decided to rip it out now, it would take them ten years, but rather than it's going to slip below the surface level of awareness.Once upon a time, there was a whole bunch of energy and drama and debate around the Linux virtual memory management subsystem. And today, there's, like, a dozen people on the planet who really have to care about that, but for the rest of us, it doesn't matter anymore. We are so far past having to care about that having any meaningful impact in our day-to-day work that it's just, it's the part of the iceberg that's below the waterline. I think that's where Kubernetes is heading. Do you agree or disagree? And what do you think about the timeline?Eswar: I agree with you; that's a perfect analogy. It's going to go the way of Linux, right? It's here to stay; it just going to get abstracted out if any of the abstraction efforts are going to stick around. And that's where we're testing the waters there. There are many, many open-source initiatives there trying to abstract Kubernetes. All of these are yet to gain ground, but there's some reasonable efforts being made.And if they are successful, they just end up being a layer on top of Kubernetes. Many of the customers, many of the developers, don't have to worry about Kubernetes at that point, but a certain subset of us in the tech world will need to do a deal with Kubernetes, and most likely teams like mine that end up managing and operating their Kubernetes clusters.Corey: So, one last question I have for you is that if there's one thing that AWS loves, it's misspelling things. And you have an open-source offering called Karpenter spelled with a K that is an extending of that tradition. What does Karpenter do and why would someone use it?Eswar: Thank you for that. Karpenter is one of my favorite launches in the last one year.Corey: Presumably because you're terrible at the spelling bee back when you were a kid. But please tell me more.Eswar: [laugh]. So Karpenter, is an open-source flexible and high performance cluster auto-scaling solution. So basically, when your cluster needs more capacity to support your workloads, Karpenter automatically scales the capacity as needed. For people that know the Kubernetes space well, there's an existing component called Cluster Autoscaler that fills this space today. And it's our take on okay, so what if we could reimagine the capacity management solution available in Kubernetes? And can we do something better? Especially for cases where we expect terrific performance at scale to enable cost efficiency and optimization use cases for our customers, and most importantly, provide a way for customers not to pre-plan a lot of capacity to begin with.Corey: This is something we see a lot, in the sense of very bursty workloads where, okay, you're going to steady state load. Cool. Buy a bunch of savings plans, get things set up the way you want them, and call it a day. But when it's bursty, there are challenges with it. Folks love using Spot, but in the event of a sudden capacity shortfall, the question is, is can we spin up capacity to backfill it within those two minutes that we have a warning on that on? And if the answer is no, then it becomes a bit of a non-starter.Customers have had to build an awful lot of those things around EC2 instances that handle a lot of that logic for them in ways that are tuned specifically for their use cases. I'm encouraged to see there's a Kubernetes story around this that starts to remove some of that challenge from the customer side.Eswar: Yeah. So, the burstiness is where complexity comes [here 00:29:42], right? Like many customers for steady state, they know what their capacity requirements are, they set up the capacity, they can also reason out what is the effective capacity needed for good utilization for economical reasons and they can actually pre plan that and set it up. But once burstiness comes in, which inevitably does it at [unintelligible 00:30:05] applications, customers worry about, “Okay, am I going to get the capacity that I need in time that I need to be able to service my customers? And am I confident at it?”If I'm not confident, I'm going to actually allocate capacity beforehand, assuming that I'm going to actually get the burst that I needed. Which means, you're paying for resources that you're not using at the moment. And the burstiness might happen and then you're on the hook to actually reduce the capacity for it once the peak subsides at the end of the [day 00:30:36]. And this is a challenging situation. And this is one of the use cases that we targeted Karpenter towards.Corey: I find that the idea that you're open-sourcing this is fascinating because of two reasons. One, it does show a willingness to engage with the community that… again, it's difficult. When you're a big company, people love to wind up taking issue with almost anything that you do. But for another, it also puts it out in the open, on some level, where, especially when you're talking about cost optimization and decisions that affect cost, it's all out in public. So, people can look at this and think, “Wait a minute, it's not—what is this line of code that means if it's toward the end of the month, crank it up because we might need to hit our numbers.” Like, there's nothing like that in there. At least I'm assuming. I'm trusting that other people have read this code because honestly, that seems like a job for people who are better at that than I am. But that does tend to breed a certain element of trust.Eswar: Right. It's one of the first things that we thought about when we said okay, so we have some ideas here to actually improve the capacity management solution for Kubernetes. Okay, should we do it out in the open? And the answer was a resounding yes, right? I think there's a good story here that actually enables not just AWS to offer these ideas out there, right, and we want to bring it to all sorts of Kubernetes customers.And one of the first things we did is to architecturally figure out all the core business logic of Karpenter, which is, okay, how to schedule better, how quickly to scale, what is the best instance types to pick for this workload. All of that business logic was abstracted out from the actual cloud provider implementation. And the cloud provider implementation is super simple. It's just creating instances, deleting instances, and describing instances. And it's something that we bake from the get-go so it's easier for other cloud providers to come in and to add their support to it. And we as a community actually can take these ideas forward in a much faster way than just AWS doing it.Corey: I really want to thank you for taking the time to speak with me today about all these things. If people want to learn more, where's the best place for them to find you?Eswar: The best place to learn about EKS, right, as EKS evolves, is using our documentation, we have an EKS newsletter that you can go subscribe, and you can also find us on GitHub where we share our product roadmap. So, it's a great places to learn about how EKS is evolving and also sharing your feedback.Corey: Which is always great to hear, as opposed to, you know, in the AWS Console, where we live, waiting for you to stumble upon us, which, yeah. No it's good does have a lot of different places for people to engage with you. And we'll put links to that, of course, in the [show notes 00:33:17]. Thank you so much for being so generous with your time. I appreciate it.Eswar: Corey, really appreciate you having me.Corey: Eswar Bala, Director of Engineering for Amazon EKS. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice telling me why, when it comes to tracking Kubernetes costs, Microsoft Excel is in fact the superior experience.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Learning eBPF with Liz Rice

    Play Episode Listen Later May 2, 2023 33:59


    Liz Rice, Chief Open Source Officer at Isovalent, joins Corey on Screaming in the Cloud to discuss the release of her newest book, Learning eBPF, and the exciting possibilities that come with eBPF technology. Liz explains what got her so excited about eBPF technology, and what it was like to write a book while also holding a full-time job. Corey and Liz also explore the learning curve that comes with kernel programming, and Liz illustrates why it's so important to be able to explain complex technologies in simple terminology. About LizLiz Rice is Chief Open Source Officer with eBPF specialists Isovalent, creators of the Cilium cloud native networking, security and observability project. She sits on the CNCF Governing Board, and on the Board of OpenUK. She was Chair of the CNCF's Technical Oversight Committee in 2019-2022, and Co-Chair of KubeCon + CloudNativeCon in 2018. She is also the author of Container Security, and Learning eBPF, both published by O'Reilly.She has a wealth of software development, team, and product management experience from working on network protocols and distributed systems, and in digital technology sectors such as VOD, music, and VoIP. When not writing code, or talking about it, Liz loves riding bikes in places with better weather than her native London, competing in virtual races on Zwift, and making music under the pseudonym Insider Nine.Links Referenced: Isovalent: https://isovalent.com/ Learning eBPF: https://www.amazon.com/Learning-eBPF-Programming-Observability-Networking/dp/1098135121 Container Security: https://www.amazon.com/Container-Security-Fundamental-Containerized-Applications/dp/1492056707/ GitHub for Learning eBPF: https://github.com/lizRice/learning-eBPF TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Our returning guest today is Liz Rice, who remains the Chief Open Source Officer with Isovalent. But Liz, thank you for returning, suspiciously closely timed to when you have a book coming out. Welcome back.Liz: [laugh]. Thanks so much for having me. Yeah, I've just—I've only had the physical copy of the book in my hands for less than a week. It's called Learning eBPF. I mean, obviously, I'm very excited.Corey: It's an O'Reilly book; it has some form of honeybee on the front of it as best I can tell.Liz: Yeah, I was really pleased about that. Because eBPF has a bee as its logo, so getting a [early 00:01:17] honeybee as the O'Reilly animal on the front cover of the book was pretty pleasing, yeah.Corey: Now, this is your second O'Reilly book, is it not?Liz: It's my second full book. So, I'd previously written a book on Container Security. And I've done a few short reports for them as well. But this is the second, you know, full-on, you can buy it on Amazon kind of book, yeah.Corey: My business partner wrote Practical Monitoring for O'Reilly and that was such an experience that he got entirely out of observability as a field and ran running to AWS bills as a result. So, my question for you is, why would anyone do that more than once?Liz: [laugh]. I really like explaining things. And I had a really good reaction to the Container Security book. I think already, by the time I was writing that book, I was kind of interested in eBPF. And we should probably talk about what that is, but I'll come to that in a moment.Yeah, so I've been really interested in eBPF, for quite a while and I wanted to be able to do the same thing in terms of explaining it to people. A book gives you a lot more opportunity to go into more detail and show people examples and get them kind of hands-on than you can do in their, you know, 40-minute conference talk. So, I wanted to do that. I will say I have written myself a note to never do a full-size book while I have a full-time job because it's a lot [laugh].Corey: You do have a full-time job and then some. As we mentioned, you're the Chief Open Source Officer over at Isovalent, you are on the CNCF governing board, you're on the board of OpenUK, and you've done a lot of other stuff in the open-source community as well. So, I have to ask, taking all of that together, are you just allergic to things that make money? I mean, writing the book as well on top of that. I'm told you never do it for the money piece; it's always about the love of it. But it seems like, on some level, you're taking it to an almost ludicrous level.Liz: Yeah, I mean, I do get paid for my day job. So, there is that [laugh]. But so, yeah—Corey: I feel like that's the only way to really write a book is, in turn, to wind up only to just do it for—what someone else is paying you to for doing it, viewing it as a marketing exercise. It pays dividends, but those dividends don't, in my experience from what I've heard from everyone say, pay off as of royalties on book payments.Liz: Yeah, I mean, it's certainly, you know, not a bad thing to have that income stream, but it certainly wouldn't make you—you know, I'm not going to retire tomorrow on the royalty stream unless this podcast has loads and loads of people to buy the book [laugh].Corey: Exactly. And I'm always a fan of having such [unintelligible 00:03:58]. I will order it while we're on the call right now having this conversation because I believe in supporting the things that we want to see more of in the world. So, explain to me a little bit about what it is. Whatever you talking about learning X in a title, I find that that's often going to be much more approachable than arcane nonsense deep-dive things.One of the O'Reilly books that changed my understanding was Linux Kernel Internals, or Understanding the Linux Kernel. Understanding was kind of a heavy lift at that point because it got very deep very quickly, but I absolutely came away understanding what was going on a lot more effectively, even though I was so slow I needed a tow rope on some of it. When you have a book that started with learning, though, I imagined it assumes starting at zero with, “What's eBPF?” Is that directionally correct, or does it assume that you know a lot of things you don't?Liz: Yeah, that's absolutely right. I mean, I think eBPF is one of these technologies that is starting to be, particularly in the cloud-native world, you know, it comes up; it's quite a hot technology. What it actually is, so it's an acronym, right? EBPF. That acronym is almost meaningless now.So, it stands for extended Berkeley Packet Filter. But I feel like it does so much more than filtering, we might as well forget that altogether. And it's just become a term, a name in its own right if you like. And what it really does is it lets you run custom programs in the kernel so you can change the way that the kernel behaves, dynamically. And that is… it's a superpower. It's enabled all sorts of really cool things that we can do with that superpower.Corey: I just pre-ordered it as a paperback on Amazon and it shows me that it is now number one new release in Linux Networking and Systems Administration, so you're welcome. I'm sure it was me that put it over the top.Liz: Wonderful. Thank you very much. Yeah [laugh].Corey: Of course, of course. Writing a book is one of those things that I've always wanted to do, but never had the patience to sit there and do it or I thought I wasn't prolific enough, but over the holidays, this past year, my wife and business partner and a few friends all chipped in to have all of the tweets that I'd sent bound into a series of leather volumes. Apparently, I've tweeted over a million words. And… yeah, oh, so I have to write a book 280 characters at a time, mostly from my phone. I should tweet less was really the takeaway that I took from a lot of that.But that wasn't edited, that wasn't with an overall theme or a narrative flow the way that an actual book is. It just feels like a term paper on steroids. And I hated term papers. Love reading; not one to write it.Liz: I don't know whether this should make it into the podcast, but it reminded me of something that happened to my brother-in-law, who's an artist. And he put a piece of video on YouTube. And for unknowable reasons if you mistyped YouTube, and you spelt it, U-T-U-B-E, the page that you would end up at from Google search was a YouTube video and it was in fact, my brother-in-law's video. And people weren't expecting to see this kind of art movie about matches burning. And he just had the worst comment—like, people were so mean in the comments. And he had millions of views because people were hitting this page by accident, and he ended up—Corey: And he made the cardinal sin of never read the comments. Never break that rule. As soon as you do that, it doesn't go well. I do read the comments on various podcast platforms on this show because I always tell people to insulted all they want, just make sure you leave a five-star review.Liz: Well, he ended up publishing a book with these comments, like, one comment per page, and most of them are not safe for public consumption comments, and he just called it Feedback. It was quite something [laugh].Corey: On some level, it feels like O'Reilly books are a little insulated from the general population when it comes to terrible nonsense comments, just because they tend to be a little bit more expensive than the typical novel you'll see in an airport bookstore, and again, even though it is approachable, Learning eBPF isn't exactly the sort of title that gets people to think that, “Ooh, this is going to be a heck of a thriller slash page-turner with a plot.” “Well, I found the protagonist unrelatable,” is not sort of the thing you're going to wind up seeing in the comments because people thought it was going to be something different.Liz: I know. One day, I'm going to have to write a technical book that is also a murder mystery. I think that would be, you know, quite an achievement. But yeah, I mean, it's definitely aimed at people who have already come across the term, want to know more, and particularly if you're the kind of person who doesn't want to just have a hand-wavy explanation that involves boxes and diagrams, but if, like me, you kind of want to feel the code, and you want to see how things work and you want to work through examples, then that's the kind of person who might—I hope—enjoy working through the book and end up with a possible mental model of how eBPF works, even though it's essentially kernel programming.Corey: So, I keep seeing eBPF in an increasing number of areas, a bunch of observability tools, a bunch of security tools all tend to tie into it. And I've seen people do interesting things as far as cost analysis with it. The problem that I run into is that I'm not able to wind up deploying it universally, just because when I'm going into a client engagement, I am there in a purely advisory sense, given that I'm biasing these days for both SaaS companies and large banks, that latter category is likely going to have some problems if I say, “Oh, just take this thing and go ahead and deploy it to your entire fleet.” If they don't have a problem with that, I have a problem with their entire business security posture. So, I don't get to be particularly prescriptive as far as what to do with it.But if I were running my own environment, it is pretty clear by now that I would have explored this in some significant depth. Do you find that it tends to be something that is used primarily in microservices environments? Does it effectively require Kubernetes to become useful on day one? What is the onboard path where people would sit back and say, “Ah, this problem I'm having, eBPF sounds like the solution.”Liz: So, when we write tools that are typically going to be some sort of infrastructure, observability, security, networking tools, if we're writing them using eBPF, we're instrumenting the kernel. And the kernel gets involved every time our application wants to do anything interesting because whenever it wants to read or write to a file, or send receive network messages, or write something to the screen, or allocate memory, or all of these things, the kernel has to be involved. And we can use eBPF to instrument those events and do interesting things. And the kernel doesn't care whether those processes are running in containers, under Kubernetes, just running directly on the host; all of those things are visible to eBPF.So, in one sense, doesn't matter. But one of the reasons why I think we're seeing eBPF-based tools really take off in cloud-native is that you can, by applying some programming, you can link events that happened in the kernel to specific containers in specific pods in whatever namespace and, you know, get the relationship between an event and the Kubernetes objects that are involved in that event. And then that enables a whole lot of really interesting observability or security tools and it enables us to understand how network packets are flowing between different Kubernetes objects and so on. So, it's really having this vantage point in the kernel where we can see everything and we didn't have to change those applications in any way to be able to use eBPF to instrument them.Corey: When I see the stories about eBPF, it seems like it's focused primarily on networking and flow control. That's where I'm seeing it from a security standpoint, that's where I'm seeing it from cost allocation aspect. Because, frankly, out of the box, from a cloud provider's perspective, Kubernetes looks like a single-tenant application with a really weird behavioral pattern, and some of that crosstalk gets very expensive. Is there a better way than either using eBPF and/or VPC flow logs to figure out what's talking to what in the Kubernetes ecosystem, or is BPF really your first port of call?Liz: So, I'm coming from a position of perspective of working for the company that created the Cilium networking project. And one of the reasons why I think Cilium is really powerful is because it has this visibility—it's got a component called Hubble—that allows you to see exactly how packets are flowing between these different Kubernetes identities. So, in a Kubernetes environment, there's not a lot of point having network flows that talk about IP addresses and ports when what you really want to know is, what's the Kubernetes namespace, what's the application? Defining things in terms of IP addresses makes no sense when they're just being refreshed and renewed every time you change pods. So yeah, Kubernetes changes the requirements on networking visibility and on firewalling as well, on network policy, and that, I think, is you don't have to use eBPF to create those tools, but eBPF is a really powerful and efficient platform for implementing those tools, as we see in Cilium.Corey: The only competitor I found to it that gives a reasonable explanation of why random things are transferring multiple petabytes between each other in the middle of the night has been oral tradition, where I'm talking to people who've been around there for a while. It's, “So, I'm seeing this weird traffic pattern at these times a day. Any idea what that might be?” And someone will usually perk up and say, “Oh, is it—” whatever job that they're doing. Great. That gives me a direction to go in.But especially in this era of layoffs and as environments exist for longer and longer, you have to turn into a bit of a data center archaeologist. That remains insufficient, on some level. And some level, I'm annoyed with trying to understand or needing to use tooling like this that is honestly this powerful and this customizable, and yes, on some level, this complex in order to get access to that information in a meaningful sense. But on the other, I'm glad that that option is at least there for a lot of workloads.Liz: Yeah. I think, you know, that speaks to the power of this new generation of tooling. And the same kind of applies to security forensics, as well, where you might have an enormous stream of events, but unless you can tie those events back to specific Kubernetes identities, which you can use eBPF-based tooling to do, then how do you—the forensics job of tying back where did that event come from, what was the container that was compromised, it becomes really, really difficult. And eBPF tools—like Cilium has a sub-project called Tetragon that is really good at this kind of tying events back to the Kubernetes pod or whether we want to know what node it was running on what namespace or whatever. That's really useful forensic information.Corey: Talk to me a little bit about how broadly applicable it is. Because from my understanding from our last conversation, when you were on the show a year or so ago, if memory serves, one of the powerful aspects of it was very similar to what I've seen some of Brendan Gregg's nonsense doing in his kind of various talks where you can effectively write custom programming on the fly and it'll tell you exactly what it is that you need. Is this something that can be instrument once and then effectively use it for basically anything, [OTEL 00:16:11]-style, or instead, does it need to be effectively custom configured every time you want to get a different aspect of information out of it?Liz: It can be both of those things.Corey: “It depends.” My least favorite but probably the most accurate answer to hear.Liz: [laugh]. But I think Brendan did a really great—he's done many talks talking about how powerful BPF is and built lots of specific tools, but then he's also been involved with Bpftrace, which is kind of like a language for—a high-level language for saying what it is that you want BPF to trace out for you. So, a little bit like, I don't know, awk but for events, you know? It's a scripting language. So, you can have this flexibility.And with something like Bpftrace, you don't have to get into the weeds yourself and do kernel programming, you know, in eBPF programs. But also there's gainful employment to be had for people who are interested in that eBPF kernel programming because, you know, I think there's just going to be a whole range of more tools to come, you know>? I think we're, you know, we're seeing some really powerful tools with Cilium and Pixie and [Parker 00:17:27] and Kepler and many other tools and projects that are using eBPF. But I think there's also a whole load of more to come as people think about different ways they can apply eBPF and instrument different parts of an overall system.Corey: We're doing this over audio only, but behind me on my wall is one of my least favorite gifts ever to have been received by anyone. Mike, my business partner, got me a thousand-piece puzzle of the Kubernetes container landscape where—Liz: [laugh].Corey: This diagram is psychotic and awful and it looks like a joke, except it's not. And building that puzzle was maddening—obviously—but beyond that, it was a real primer in just how vast the entire container slash Kubernetes slash CNCF landscape really is. So, looking at this, I found that the only reaction that was appropriate was a sense of overwhelmed awe slash frustration, I guess. It's one of those areas where I spend a lot of time focusing on drinking from the AWS firehose because they have a lot of products and services because their product strategy is apparently, “Yes,” and they're updating these things in a pretty consistent cadence. Mostly. And even that feels like it's multiple full-time jobs shoved into one.There are hundreds of companies behind these things and all of them are in areas that are incredibly complex and difficult to go diving into. EBPF is incredibly powerful, I would say ridiculously so, but it's also fiendishly complex, at least shoulder-surfing behind people who know what they're doing with it has been breathtaking, on some level. How do people find themselves in a situation where doing a BPF deep dive make sense for them?Liz: Oh, that's a great question. So, first of all, I'm thinking is there an AWS Jigsaw as well, like the CNCF landscape Jigsaw? There should be. And how many pieces would it have? [It would be very cool 00:19:28].Corey: No, because I think the CNCF at one point hired a graphic designer and it's unclear that AWS has done such a thing because their icons for services are, to be generous here, not great. People have flashcards that they've built for is what services does logo represent? Haven't a clue, in almost every case because I don't care in almost every case. But yeah, I've toyed with the idea of doing it. It's just not something that I'd ever want to have my name attached to it, unfortunately. But yeah, I want someone to do it and someone else to build it.Liz: Yes. Yeah, it would need to refresh every, like, five minutes, though, as they roll out a new service.Corey: Right. Because given that it appears from the outside to be impenetrable, it's similar to learning VI in some cases, where oh, yeah, it's easy to get started with to do this trivial thing. Now, step two, draw the rest of the freaking owl. Same problem there. It feels off-putting just from a perspective of you must be at least this smart to proceed. How do you find people coming to it?Liz: Yeah, there is some truth in that, in that beyond kind of Hello World, you quite quickly start having to do things with kernel data structures. And as soon as you're looking at kernel data structures, you have to sort of understand, you know, more about the kernel. And if you change things, you need to understand the implications of those changes. So, yeah, you can rapidly say that eBPF programming is kernel programming, so why would anybody want to do it? The reason why I do it myself is not because I'm a kernel programmer; it's because I wanted to really understand how this is working and build up a mental model of what's happening when I attach a program to an event. And what kinds of things can I do with that program?And that's the sort of exploration that I think I'm trying to encourage people to do with the book. But yes, there is going to be at some point, a pretty steep learning curve that's kernel-related but you don't necessarily need to know everything in order to really have a decent understanding of what eBPF is, and how you might, for example—you might be interested to see what BPF programs are running on your existing system and learn why and what they might be doing and where they're attached and what use could that be.Corey: Falling down that, looking at the process table once upon a time was a heck of an education, one week when I didn't have a lot to do and I didn't like my job in those days, where, “Oh, what is this Avahi daemon that constantly running? MDNS forwarding? Who would need that?” And sure enough, that tickled something in the back of my mind when I wound up building out my networking box here on top of BSD, and oh, yeah, I want to make sure that I can still have discovery work from the IoT subnet over to whatever it is that my normal devices live. Ah, that's what that thing always running for. Great for that one use case. Almost never needed in other cases, but awesome. Like, you fire up a Raspberry Pi. It's, “Why are all these things running when I'm just want to have an embedded device that does exactly one thing well?” Ugh. Computers have gotten complicated.Liz: I know. It's like when you get those pop-ups on—well certainly on Mac, and you get pop-ups occasionally, let's say there's such and such a daemon wants extra permissions, and you think I'm not hitting that yes button until I understand what that daemon is. And it turns out, it's related, something completely innocuous that you've actually paid for, but just under a different name. Very annoying. So, if you have some kind of instrumentation like tracing or logging or security tooling that you want to apply to all of your containers, one of the things you can use is a sidecar container approach. And in Kubernetes, that means you inject the sidecar into every single pod. And—Corey: Yes. Of course, the answer to any Kubernetes problem appears to be have you tried running additional containers?Liz: Well, right. And there are challenges that can come from that. And one of the reasons why you have to do that is because if you want a tool that has visibility over that container that's inside the pod, well, your instrumentation has to also be inside the pod so that it has visibility because your pod is, by design, isolated from the host it's running on. But with eBPF, well eBPF is in the kernel and there's only one kernel, however many containers were running. So, there is no kind of isolation between the host and the containers at the kernel level.So, that means if we can instrument the kernel, we don't have to have a separate instance in every single pod. And that's really great for all sorts of resource usage, it means you don't have to worry about how you get those sidecars into those pods in the first place, you know that every pod is going to be instrumented if it's instrumented in the kernel. And then for service mesh, service mesh usually uses a sidecar as a Layer 7 Proxy injected into every pod. And that actually makes for a pretty convoluted networking path for a packet to sort of go from the application, through the proxy, out to the host, back into another pod, through another proxy, into the application.What we can do with eBPF, we still need a proxy running in userspace, but we don't need to have one in every single pod because we can connect the networking namespaces much more efficiently. So, that was essentially the basis for sidecarless service mesh, which we did in Cilium, Istio, and now we're using a similar sort of approach with Ambient Mesh. So that, again, you know, avoiding having the overhead of a sidecar in every pod. So that, you know, seems to be the way forward for service mesh as well as other types of instrumentation: avoiding sidecars.Corey: On some level, avoiding things that are Kubernetes staples seems to be a best practice in a bunch of different directions. It feels like it's an area where you start to get aligned with the idea of service meesh—yes, that's how I pluralize the term service mesh and if people have a problem with that, please, it's imperative you've not send me letters about it—but this idea of discovering where things are in a variety of ways within a cluster, where things can talk to each other, when nothing is deterministically placed, it feels like it is screaming out for something like this.Liz: And when you think about it, Kubernetes does sort of already have that at the level of a service, you know? Services are discoverable through native Kubernetes. There's a bunch of other capabilities that we tend to associate with service mesh like observability or encrypted traffic or retries, that kind of thing. But one of the things that we're doing with Cilium, in general, is to say, but a lot of this is just a feature of the networking, the underlying networking capability. So, for example, we've got next generation mutual authentication approach, which is using SPIFFE IDs between an application pod and another application pod. So, it's like the equivalent of mTLS.But the certificates are actually being passed into the kernel and the encryption is happening at the kernel level. And it's a really neat way of saying we don't need… we don't need to have a sidecar proxy in every pod in order to terminate those TLS connections on behalf of the application. We can have the kernel do it for us and that's really cool.Corey: Yeah, at some level, I find that it still feels weird—because I'm old—to have this idea of one shared kernel running a bunch of different containers. I got past that just by not requiring that [unintelligible 00:27:32] workloads need to run isolated having containers run on the same physical host. I found that, for example, running some stuff, even in my home environment for IoT stuff, things that I don't particularly trust run inside of KVM on top of something as opposed to just running it as a container on a cluster. Almost certainly stupendous overkill for what I'm dealing with, but it's a good practice to be in to start thinking about this. To my understanding, this is part of what AWS's Firecracker project starts to address a bit more effectively: fast provisioning, but still being able to use different primitives as far as isolation boundaries go. But, on some level, it's nice to not have to think about this stuff, but that's dangerous.Liz: [laugh]. Yeah, exactly. Firecracker is really nice way of saying, “Actually, we're going to spin up a whole VM,” but we don't ne—when I say ‘whole VM,' we don't need all of the things that you normally get in a VM. We can get rid of a ton of things and just have the essentials for running that Lambda or container service, and it becomes a really nice lightweight solution. But yes, that will have its own kernel, so unlike, you know, running multiple kernels on the same VM where—sorry, running multiple containers on the same virtual machine where they would all be sharing one kernel, with Firecracker you'll get a kernel per instance of Firecracker.Corey: The last question I have for you before we wind up wrapping up this episode harkens back to something you said a little bit earlier. This stuff is incredibly technically nuanced and deep. You clearly have a thorough understanding of it, but you also have what I think many people do not realize is an orthogonal skill of being able to articulate and explain those complex concepts simply an approachably, in ways that make people understand what it is you're talking about, but also don't feel like they're being spoken to in a way that's highly condescending, which is another failure mode. I think it is not particularly well understood, particularly in the engineering community, that there are—these are different skill sets that do not necessarily align congruently. Is this something you've always known or is this something you've figured out as you've evolved your career that, oh I have a certain flair for this?Liz: Yeah, I definitely didn't always know it. And I started to realize it based on feedback that people have given me about talks and articles I'd written. I think I've always felt that when people use jargon or they use complicated language or they, kind of, make assumptions about how things are, it quite often speaks to them not having a full understanding of what's happening. If I want to explain something to myself, I'm going to use straightforward language to explain it to myself [laugh] so I can hold it in my head. And I think people appreciate that.And you can get really—you know, you can get quite in-depth into something if you just start, step by step, build it up, explain everything as you go along the way. And yeah, I think people do appreciate that. And I think people, if they get lost in jargon, it doesn't help anybody. And yeah, I very much appreciate it when people say that, you know, they saw a talk or they read something I wrote and it meant that they finally grokked whatever that concept was that that I was trying to explain. I will say at the weekend, I asked ChatGPT to explain DNS in the style of Liz Rice, and it started off, it was basically, “Hello there. I'm Liz Rice and I'm here to explain DNS in very simple terms.” I thought, “Okay.” [laugh].Corey: Every time I think I've understood DNS, there's another level to it.Liz: I'm pretty sure there is a lot about DNS that I don't understand, yeah. So, you know, there's always more to learn out there.Corey: There's certainly is. I really want to thank you for taking time to speak with me today about what you're up to. Where's the best place for people to find you to learn more? And of course, to buy the book.Liz: Yeah, so I am Liz Rice pretty much everywhere, all over the internet. There is a GitHub repo that accompanies the books that you can find that on GitHub: lizRice/learning-eBPF. So, that's a good place to find some of the example code, and it will obviously link to where you can download the book or buy it because you can pay for it; you can also download it from Isovalent for the price of your contact details. So, there are lots of options.Corey: Excellent. And we will, of course, put links to that in the [show notes 00:32:08]. Thank you so much for your time. It's always great to talk to you.Liz: It's always a pleasure, so thanks very much for having me, Corey.Corey: Liz Rice, Chief Open Source Officer at Isovalent. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that you have somehow discovered this episode by googling for knitting projects.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    CloudDev for Retail Companies with John Mille

    Play Episode Listen Later Apr 27, 2023 31:15


    John Mille, Principal Cloud Engineer at Sainsbury's UK joins Corey on Screaming in the Cloud to discuss how retail companies are using cloud services. John describes the lessons he's learned since joining the Sainsbury's UK team, including why it's important to share knowledge across your team if you don't want to be on call 24/7,  as well as why he doesn't subscribe to the idea that every developer needs access to production. Corey and John also discuss an open-source project John created called ECS Compose-X.About JohnJohn is an AWS Community Builder (devtools), Open Source enthusiast, SysAdmin born in the cloud, and has worked with AWS since his very first job. He enjoys writing code and creating projects. John likes to focus on automation & architecture that delivers business value, and has been dabbling with data & the wonderful world of Kafka for the past 3 years.Links Referenced: AWS Open-Source Roundup newsletter blog post about ECS Compose-X: https://aws.amazon.com/blogs/opensource/automating-your-ecs-container-architecture-deployments-with-ecs-composex/ ECS Compose-X: https://docs.compose-x.io/ LinkedIn: https://www.linkedin.com/in/john-mille/ Twitter: https://twitter.com/JohnPre32286850 TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: It's easy to **BEEP** up on AWS. Especially when you're managing your cloud environment on your own!Mission Cloud un **BEEP**s your apps and servers. Whatever you need in AWS, we can do it. Head to missioncloud.com for the AWS expertise you need. Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Today my guest is a long-time listener, first-time caller. John Mille is a Principal Cloud Engineer at Sainsbury's, which is UK-speak for ‘grocery store.' John, thank you for joining me.John: Hi, Corey. Thanks for having me.Corey: So, I have to begin with, I guess, the big question that I used to run into people in San Francisco with all the time. They would work at Walmart Labs and they would mention in conversation that they work at Walmart, and people who weren't aware that there was a labs out here figured they were a greeter at the grocery store. Do you ever wind up with people making that sort of fundamental assumption around the fact, oh, you work at Sainsbury's as a checker or whatnot?John: No. But it actually is one of the—if you look at one of the job descriptions from Sainsbury's, the first thing is, why would you join a retail company to do tech? And as it turns out, tech—I mean, I think retail companies, as any other companies in the world, rely on Cloud more and more and more. And I think that one of the things that is interesting today is, if you look at the landscape of retailers, I've heard many times people saying, “We don't want to go for AWS because we're giving money to the competition.” And actually, I think AWS does a fantastic job overall giving you all the tools to actually beat them as your competition. And as it turns out, we've had really, really great success running a lot of our workloads on AWS for many, many years now.Corey: On some level, if you can't come to terms with the idea of Amazon as competition, you shouldn't be using AWS, regardless of what industry you're in, because their entire company strategy is yes. It's very hard to start to even come up with industries that they don't have some form of presence within. On some level, that's a problem. In fact a lot of levels, that's something of a problem.Everyone tends to wind up viewing the world in a bunch of different ways. I like to divide companies into two groups. More or less it's, is the AWS bill one of the top three line items at the company? And if the answer's no, on some level, you know, that usually is an indicator that there's a sustainable business there that, you know, both our grandparents and our grandchildren will be able to recognize, in the fullness of time. You absolutely have a business that winds up falling into that category, whereas, “Oh yeah, I fix the AWS bill,” yeah, my parents would have no idea what I do and my kids don't have much of a better one. It feels like it's very point-in-time type of problem. At least I hope.Technology is not the core of what grocery stores tend to do, but I also don't get the sense that what you're doing is sitting there doing the back office corporate IT style of work, either. How do you use technology in the overall context of the business?John: Well, so we use it in a very wide variety of sense. So, you obviously have everything that has to do with online shopping, orders and all of those sort of things, which obviously, especially with the drive of Covid and being everybody from home, has been a huge driver to improve our ability to deliver to customers. But certainly, I think that Sainsbury's sees AWS as a key partner to be able to go and say we want to deliver more value. And so, there's been a number of transformation over the years to—and one of the reasons I was hired is actually to be part of one of those transformation, where we're going to take existing infrastructure servers that literally—I usually say to people, “Oh, are we doing an upgrade this month? Has somebody gotten their little brush to go and brush onto the hard drives to make sure that nothing is going to die?” And actually do that transformation and move over to the cloud in order to never have to really worry about whether or not they have to manage hardware and infrastructure.Corey: It's strange in that I never got very deep into containers until I was no longer hands-on hardware, managing things. I was more or less doing advisory work and then messing around with them. And you'd think given my proclivities historically, of being very unlucky when it comes to data, you would think that this would be great because, oh yeah, you blow away an ephemeral container? Well, that's kind of the point. We'll all laugh and it'll re-instantiate itself and life goes on.But no. Making fun of them was more or less how I tended to do approach them for the longest time until I started to see them a little bit… well I guess less as a culture, less as a religion, and more as an incredibly versatile packaging format, which is probably going to annoy the people I know who are the packaging [unintelligible 00:04:58] for Linux distributions. How do you tend to view them? And how did you start using them?John: Right. So, that's a great question. So historically, I was a student at, I think the school were one of the original creators of Docker were. And one of the things that you learn when you do development at the school is that, you know, containers [unintelligible 00:05:18] new invention. Docker, I think, came on the platform as the way to, you know, give everybody a great framework, a great API, to drive the deployment of containers in the world and bundle them and ship them around the world, on your laptop and somebody else's, and help a little bit with, you know, solving the problem of it works on my laptop, but not just on the laptop properly. Maybe.It's obviously gone viral over the years and I really enjoy containers; I quite like containers. What I find interesting is what people are going to do with. And I think that over the last few years, we've seen a number of technologies such as Kubernetes and others come into the scene and say—and trying to solve people's problem, but everybody seems to be doing, sort of, things on their own way. And historically, I started off using ECS, when it was terrible and you didn't have security groups per containers and all of this. But over the years, you know, you learn, and AWS has improved the service quite significantly with more and more features.And I think we are today in the place where there's this landscape, I think, where a lot of workloads are going to be extremely ephemeral and you can go [unintelligible 00:06:28], you know, wherever you want and you have a bit—if you have a platform or workflow that you need to have working in different places, maybe Kubernetes could be an easy way to have a different sort of sets of features that allows you to move around in maybe an easier way. But that also comes with a set of drawbacks. Again, I look at using EKS, for example, and I see okay, I have to manage IAM in our back now, whereas if I used something like ECS, for the whatever the [unintelligible 00:06:56] cloud vendor of choice, I don't have to deal with any of this. So, I think it's finding the fine balance between how you do orchestration of containers now and what works for you and is any sustainable over the time, more than about are you going to use containers? Because the chances are, somebody is using containers.Corey: My experiences and workflows and constraints are radically different than that of other folks because for a lot of the things I'm building, these are accounts that are I'm the only person that has access to them. It is me. So, the idea of fine-grained permissions for users from an ARBAC perspective doesn't really factor into it. Yes, yes, in theory, I should have a lot of the systems themselves with incidents roles being managed in safe and secure ways, but in many cases, the AWS account boundary is sufficient for that, depending on what it is we're talking about. But that changes when you start having a small team of people working with you and having to collaborate on these things.And we do a little bit of that with some of our consulting stuff that isn't just the shitpost stuff I build for fun. But there's multiple levels beyond that. You are clearly in a full-blown enterprise at this point where there are a bunch of different teams working on different things, all ideally going in the same direction. And it's easy to get stuck in the weeds of having to either go through central IT for these things, which gives rise to shadow IT every time you find a corporate credit card in the wild, or it winds up being everyone can do what they want, but then there's no consensus, there's no control, there's no architectural similarity. And I'm not sure which path is worse in some respects. How do you land on it?John: Right. So, what I've seen done in companies that works very well—and again, to the credit of my current company—is one of the things they've done really well is build a hub of people who are going to manage solely everything that has to do with accounts access, right? So, the control, IAM, Security Hub, all of those sorts of things, for you. There's things that are mandatory that you can't deal without, you have permissions boundary, that's it, you have to use those things, end of story. But beyond that point, once you have access to your accounts, you've been given all of the access that is necessary for you to deliver application and deploy them all the way up to production without asking permission for anybody else apart from your delivery managers, potentially.And I think from there, because there is the room to do all of this, one of the things that we've done within my business unit is that we've put in place a framework that enables developers—and when I say that it really is a question of allowing them to do everything they have to do, focus on the code, and I know it's a little catchy [unintelligible 00:09:33] a phrase that you hear these days, but the developers really are the customers that we have. And all that we do is to try to make sure that they have a framework in place that allows them to do what they need and deploy the applications in a secure fashion. And the only way to do that for us was to build the tools for them that allows them to do all of that. And I honestly haven't checked a single service IAM policies in a very are longtime because I know that by providing the tools to developers, they don't have this [will 00:10:05] to go and mess with the permissions because their application suddenly doesn't have the permissions. They just know that with the automation we've providing them, the application gets the access it needs and no more.Corey: On some level, it feels like there's a story around graduated development approach where in a dev environment you can do basically whatever you want with a big asterisk next to it. That's the same asterisk, by the way, next to the AWS free tier. But as you start elevating things into higher environments, you start to see gating around things like who has access to what, security reviews, et cetera, et cetera, and ideally, by the time you wind up getting into production, almost no one should have access and that access that people do have winds up being heavily gated. That is, of course, the vision that folks have. In practice, reality is what happens instead of what we plan on. The idea of it works in theory, but not in production is of course, why I call my staging environment ‘theory.' Does that tend to resonate as far as what you've seen in the wild?John: Yeah. Very much so. And when I joined the company, and we put together our [standard 00:11:11] pipelines for developers to be able to do everything, the rule that I would give to my team—so I manage a small team of cloud engineers—the one rule I would say is, “We have access to prod because we need to provision resources, but when we're going to build the pipelines for the developers, you have to build everything in such a way that the developers will only have read-only access to the production environment, and that is only to go and see their logs.” And at least try to foster this notion that developers do not need access to production, as much as possible because that avoids people going and do something they shouldn't be doing in those production environments.Now, as the pipeline progresses and applications get deployed to production, there are some operational capabilities that people need to have, and so in that case, what we do is we try to fine-tune what do people need to do and grant those people access to the accounts so that they can perform the jobs and I don't have to be woken up at two in the morning. The developers are.Corey: One thing that I think is going to be a cause of some consternation for folks—because I didn't really think about this in any meaningful sense until I started acting as a consultant, which means you're getting three years of experience for every year that you're in the wild, just by virtue of the variety of environments you encounter—on some level, there's a reasonable expectation you can have when you're at a small, scrappy startup, that everyone involved knows where all the moving parts live. That tends to break down with scale. So, the idea of a Cloud Center of Excellence has been bandied around a lot. And personally, I hate the term because it implies the ‘Data Center of Mediocrity,' which is a little on the nose for some people at times. So, the idea of having a sort of as a centralized tiger team that has the expertise and has the ability to go on deep dives and sort of loan themselves out to different teams seems to be a compromise between nobody knows what they're doing and, every person involved should have an in-depth knowledge of the following list of disciplines.For example, most folks do not need an in-depth primer on AWS billing constructs. They need about as much information fits on an index card. Do you find that having the centralized concentration of cloud knowledge on a particular team works out or do you find that effectively doing a rotating embedding story is the better answer?John: It varies a lot, I think, because it depends on the level of curiosity of the developers quite a lot. So, I have a huge developer background. People in my team are probably more coming from ex-IT environments or this sort of operation and then it just naturally went into the cloud. And in my opinion, is fairly rare to find somebody that is actually good at doing both AWS and coding. I am by no means really, really great at coding. I code pretty much every day but I wouldn't call myself a professional developer.However, it does bring to my knowledge the fact that there are some good patterns and good practices that you can bring into building your applications in the cloud and some really bad ones. However, I think it's really down to making sure that the knowledge is here within the team. If there's a specialized team, those really need to be specialists. And I think the important thing then is to make sure that the developers and the people around you that are curious and want to ask questions know that you're available to them to share that knowledge. Because at the end of the day, if I'm the only one with the knowledge, I'm going to be the one who is always going to be on call for this or doing that and this is no responsibility that I want. I am happy with a number of responsibilities, but not to be the only person to ever do this. I want to go on holidays from time to time.So, at the end of the day, I suppose it really is up to what people want or expect out of their careers. I do a job that it was a passion for me since I was about 14 years old. And I've always been extremely curious to understand how things work, but I do draw the line that I don't write anything else than Python these days. And if you ask me to write Java, I'll probably change job in the flip of a second. But that's the end of it. But I enjoy understanding how Java things work so that I can help my developers make better choices with what services in AWS to use.Corey: On some level, it feels like there's a, I guess, lack of the same kind of socialization that startups have sort of been somewhat guided by as far as core ethos goes, where, oh whatever I'm working on, I want to reach out to other people, and, “Hey, I'm trying to solve this problem. What is it that you have been working on that's germane to this and how can we collaborate together?” It has nothing to do, incidentally, with the idea that, oh, big company people aren't friendly or are dedicated or aren't good or aren't well-connected; none of that. But there are so many people internally that you're spending your time focusing on and there's so much more internal context that doesn't necessarily map to anything outside of the company that the idea of someone off the street who just solved a particular problem in a weird way could apply to what a larger company with, you know, regulatory burdens, starts to have in mind, it becomes a little bit further afield. Do you think that that's accurate? Do you think that there's still a strong sense of enterprise community that I'm just potentially not seeing in various ways because I don't work at big companies?John: It's a very fine line to walk. So, when I joined the company, I was made aware that there's a lot of Terraform and Kubernetes, which I went [unintelligible 00:16:28] all the way with CloudFormation is yes. So, that was one of the changes I knew I would have. But I can move an open mind and when I looked around at, okay, what are the Terraform modules—because I used Terraform with anger for an entire year of suffering—and I thought, “Okay, well, maybe people have actually got to a point where they've built great modules that I can just pick up off the shelf and reuse or customize only a tiny little bit, add maybe a couple of features and that's, it move on; it's good enough for me.” But as it turns out, there is I think, a lot of the time a case where the need for standardization goes against the need for business to move on.So, I think this is where you start to see silos start to being built within the company and people do their own thing and the other ones do their own. And I think it's always a really big challenge for a large company with extremely opinionated individuals to say, “All right, we're going to standardize on this way.” And it definitely was one of the biggest challenge that I had when I joined the company because again, big communities and Terraform place, we're going to need to do something else. So, then it was the case of saying, “Hey, I don't think we need Kubernetes and I definitely don't think we need Terraform for any the things—for any of those reasons, so how about we do something a little different?”Corey: Speaking of doing things a little bit different, you were recently featured in an AWS Open-Source Roundup newsletter that was just where you, I think, came across my desk one of the first times, has specifically around an open-source project that you built: ECS Compose-X.So, I assume it's like, oh, it's like Docker Compose for ECS and also the ‘X' implies that it is extreme, just, like, you know, snack foods at the convenience store. What does it do and where'd it come from?John: Right. So, you said most of it, right? It literally is a question where you take a Docker Compose file and you want to deploy your services that you worked on and all of that together, and you want to deploy it to AWS. So, ECS Compose-X is a CLI tool very much like the Copilot. I think it was released about four months just before Copilots came out—so, sorry, I beat you to the ball there—but with the Docker Compose specification supported.And again, it was really out of I needed to find a neat way to take my services and deploy them in AWS. So, Compose-X is just a CLI tool that is going to parse your Docker Compose file and create CloudFormation templates out of it. Now, the X is not very extreme or anything like that, but it's actually coming from the [finite 00:18:59] extension fields, which is something supported in Docker Compose. And so, you can do things like x-RDS, or x-DynamoDB, which Docker Compose on your laptop will totally ignore, but ECS Compose-X however will take that into account.And what it will do is if you need a database or a DynamoDB table, for example, in your Docker Compose file, you do [x-RDS, my database, some properties, 00:19:22]—exactly the same properties as CloudFormation, actually—and then you say, “I want this service to have access to it in read-only fashion.” And what ECS Compose-X is going to do is just understand what it has to do when—meaning creating IAM policies, opening security groups, all of that stuff, and make all of that available to the containers in one way or another.Corey: It feels like it's a bit of a miss for Copilot not to do this. It feels like they wanted to go off in their own direction with the way that they viewed the world—which I get; I'm not saying there's anything inherently wrong with that. There's a reason that I point kubernetestheeasyway.com to the ECS marketing site—but there's so much stuff out there that is shipped or made available in other ways with a Docker Compose file, and the question of okay, how do I take this and run it in Fargate or something because I don't want to run it locally for whatever reason, and the answer is, “That's the neat part. You don't.”And it just becomes such a clear miss. There have been questions about this Since Copilot launched. There's a GitHub issue tracking getting support for this that was last updated in September—we are currently recording this at the end of March—it just doesn't seem to be something that's a priority. I mean, I will say the couple of times that I've used Copilot myself, it was always for greenfield experiments, never for adopting something else that already existed. And that was… it just felt like a bit of a heavy lift to me of oh, you need to know from the beginning that this is the tool you're going to use for the thing. Docker Compose is what the ecosystem has settled on a long time ago and I really am disheartened by the fact that there's no direct ECS support for it today.John: Yeah, and it was definitely a motivation for me because I knew that ECS CLI version 1 was going into the sunset, and there wasn't going to be anything supporting it. And so, I just wanted to have Docker Compose because it's familiar to developers and again, if you want to have adoption and have people use your thing, it has to be easy. And when I looked at Copilot the first time around, I was extremely excited because I thought, “Yes, thank you, Amazon for making my life easy. I don't have to maintain this project anymore and I'm going to be able to just lift and shift, move over, and be happy about it.” But when the specification for Copilot was out and I could go for the documentation, I was equally disheartened because I was like, “Okay, not for me.”And something very similar happened when they announced Proton. I was extremely excited by Proton. I opened a GitHub issue on the roadmap immediately to say, “Hey, are you going to support to have some of those things together or not?” And the fact that the Proton templates—I mean, again, it was, what, two, three years ago now—and I haven't looked at Proton since, so it was a very long time now.Corey: The beta splasher was announced in 2020 and I really haven't seen much from it since.John: Well, and I haven't done anything [unintelligible 00:22:07] with it. And literally, one of the first thing did when the project came out. Because obviously, this is an open-source project that we use in Sainsbury's, right because we deploy everything in [ECS 00:22:17] so why would I reinvent the wheel the third time? It's been done, I might as well leverage it. But every time something on it came out, I was seeing it as the way out of nobody's going to need me anymore—which is great—and that doesn't create a huge potential dependency on the company for me, oh, well, we need this to, you know, keep working.Now, it's open-source, it's on the license you can fork it and do whatever you want with it, so from that point of view, nobody's going to ask me anything in the future, but from the point of view where I need to, as much as possible, use AWS native tools, or AWS-built tools, I differently wanted every time to move over to something different. But every time I tried and tiptoed with those alternative offerings, I just went back and said, “No, this [laugh] either is too new and not mature enough yet, or my tool is just better.” Right? And one of the things I've been doing for the past three years is look at the Docker ECS plugin, all of the issues, and I see all of the feature requests that people are asking for and just do that in my project. And some with Copilots. The only thing that Copilot does that I don't do is tell people how to do CI/CD pipelines.Corey: One thing you said a second ago just sort of, I guess, sent me spiraling for a second because I distinctly remember this particular painful part. You're right, there was an ECS CLI for a long time that has since been deprecated. But we had internal tooling built around that. When there was an issue with a particular task that failed, getting logs out of it was non-trivial, so great. Here's the magic incantation that does it.I still haven't found a great way to do that with the AWS v2 CLI and that feels like it's a gap where yes, I understand, old tools go away and new ones show up, but, “Hey, I [unintelligible 00:24:05] task. Can you tell me what the logs are?” “No. Well, Copilot's the new answer.” “Okay. Can I use this to get logs from something that isn't Copilot?” “Oh, absolutely not.” And the future is inherently terrible as a direct result.John: Yeah. Well, I mean, again, the [unintelligible 00:24:20]—the only thing that ECS Compose-X does is create all the templates for you so you can, you know, then just query it and know where everything has been created. And one of the things it definitely does create is all of the log groups. Because again, least-privileged permissions being something that is very dear to me, I create the log groups and just allow the services to only write in those log groups and that's it.Now, typically this is not a thing that I've thought Compose-X was going to do because that's not its purpose. It's not going to be an operational tool to troubleshoot all the things and this is where I think that other projects are much better suited and I would rather use them as an extension or library of the project as opposed to reinvent them. So, if you're trying to find a tool for yourself to look at logs, I highly recommend something called ‘AWS logs,' which is fantastic. You just say, “Hey, can you list the groups?” “Okay.” “Can you get me the groups and can I tell them on a terminal?”And that's it. Job done. So, as much as I enjoy building new features into the project, for example, I think that there's a clear definition between what the project is for and what it's not. And what it's for is giving people CloudFormation templates they can reuse in any region and deploy their services and not necessarily deal with their operations; that's up to them. At the end of the day, it's really up to the user to know what they want to do with it. I'm not trying to force anybody into doing something specific.Corey: I would agree. I think that there's value to there's more than one way to do it. The problem is, at some point, there's a tipping point where you have this proliferation of different options to the point where you end up in this analysis paralysis model where you're too busy trying to figure out what is the next clear step. And yes, that flexibility is incredibly valuable, especially when you get into, you know, large, sophisticated enterprises—ahem, ahem—but when you're just trying to kick the tires on something new, I feel like there's a certain lack of golden path where in the event of not having an opinion on any of these things, this is what you should do just to keep things moving forward, as opposed to here are two equal options that you can check with radio boxes and it's not at all clear what you which does what or what the longer-term implications are. We've all gotten caught with the one-way doors we didn't realize we were passing through at the time and then had to do significant technical debt repayment efforts to wind up making it right again.I just wish that those questions would be called out, but everything else just, it doesn't matter. If you don't like the name of the service that you're creating, you can change it later. Or if you can't, maybe you should know now, so you don't have—in my case—a DynamoDB table that is named ‘test' running in production forever.John: Yeah. You're absolutely right. And again, I think it goes back to one of the biggest challenges that I had when I joined the company, which was when I said, “I think we should be using CloudFormation, I think we should be using ECS and Terraforming Kubernetes for those reasons.” And one of the reasons was, the people. Meaning we were a very small team, only five cloud engineers at the time.And as I joined the company, they were already was three different teams using four different CI/CD tools. And they all wanted to use Kubernetes, for example, and they were all using different CI/CD—like I said, just now—different CI/CD tools. And so, the real big challenge for me was how do I pitch that simplicity is what's going to allow us to deliver value for the business? Because at the end of the day, like you said many, many times before, the AWS bill is a question of architecture, right? And there's a link and intricacy between the two things.So, the only thing that really mattered for me and the team was to find a way, find the service that was going to allow to do a number of things, A, delivering value quickly, being supported over time. Because one of the things that I think people forget these days—well, one of the things I'm allergic to and one of the things that makes me spiral is what I call CV-driven tech choices where people say, “Hey, I love this great thing I read about and I think that we should use that in production. How great idea.” But really, I don't know anything about it and is then up to somebody else to maintain it long-term.And that goes to the other point, which is, turnover-proof is what I call it. So, making tech choices that are going to be something that people will be able to use for many, many years, there is going to be a company behind the scenes that he's going to be able to support you as well as you go and use the service for the many, many years to go.Corey: I really want to thank you for taking the time to speak with me today. If people want to learn more, where's the best place for them to find you?John: So, people can find me on LinkedIn. I'm also around on Twitter these days, although I probably about have nine followers. Well, probably shouldn't say that [laugh] and that doesn't matter.Corey: It's fine. We'll put a link into it—we'll put a link to that in the [show notes 00:29:02] and maybe we'll come up with number ten. You never know. Thanks again for your time. I really appreciate it.John: Thanks so much, Corey, for having me.Corey: John Mille, Principal Cloud Engineer at Sainsbury's. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment that you go to great pains to type out but then fails to post because the version of the tool you use to submit it has been deprecated without a viable replacement.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Sysdig and Solving for Strategic Challenges in Cybersecurity with Michael Isbitski

    Play Episode Listen Later Apr 25, 2023 33:39


    Michael Isbitski, Director of Cybersecurity Strategy at Sysdig, joins Corey on Screaming in the Cloud to discuss the nuances of an effective cybersecurity strategy. Michael explains that many companies are caught between creating a strategy that's truly secure and one that's merely compliant and within the bounds of cost-effectiveness, and what can be done to help balance the two aims more effectively. Corey and Michael also explore what it means to hire for transferrable skills in the realm of cybersecurity and tech, and Michael reveals that while there's no such thing as a silver-bullet solution for cybersecurity, Sysdig can help bridge many gaps in a company's strategy. About MichaelMike has researched and advised on cybersecurity for over 5 years. He's versed in cloud security, container security, Kubernetes security,  API security, security testing, mobile security, application protection, and secure continuous delivery. He's guided countless organizations globally in their security initiatives and supporting their business.Prior to his research and advisory experience, Mike learned many hard lessons on the front lines of IT with over twenty years of practitioner and leadership experience focused on application security, vulnerability management, enterprise architecture, and systems engineering.Links Referenced: Sysdig: https://sysdig.com/ LinkedIn: https://www.linkedin.com/in/michael-isbitski/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Tailscale SSH is a new, and arguably better way to SSH. Once you've enabled Tailscale SSH on your server and user devices, Tailscale takes care of the rest. So you don't need to manage, rotate, or distribute new SSH keys every time someone on your team leaves. Pretty cool, right? Tailscale gives each device in your network a node key to connect to your VPN, and uses that same key for SSH authorization and encryption. So basically you're SSHing the same way that you're already managing your network.So what's the benefit? Well, built-in key rotation, the ability to manage permissions as code, connectivity between any two devices, and reduced latency. You can even ask users to re-authenticate SSH connections for that extra bit of security to keep the compliance folks happy. Try Tailscale now - it's free forever for personal use.Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud, I'm Corey Quinn. I periodically find myself in something of a weird spot when it comes to talking about security. I spent a lot of my time in previous lives having to care about it, but the word security was never in my job title. That's who my weekly podcast on the AWS Morning Brief and the accompanying newsletter goes out to: it's people who have to care about security but don't have it as part of their job title. They just want to know what's going on without all of the buzzwords.This promoted guest episode is brought to us by our friends at Sysdig and my guest is Mike Isbitski, Director of Cybersecurity Strategy at Sysdig. Mike, thanks for joining me.Michael: Thanks, Corey. Yeah, it's great to be here.Corey: So, you've been at Sysdig for a little bit, but your history is fascinating to me. You were at Gartner, which on the one hand would lead someone to think, “Oh okay, you talk about this stuff a lot, but might not have been particularly hands-on,” but that's not true. Either. You have a strong background as a practitioner, but not directly security-focused. Is that right?Michael: Yeah. Yeah, that is correct. I can certainly give the short version of the history lesson [laugh]. It is true, yes. As a Gartner analyst, you don't always get as hands-on, certainly talking to practitioners and leaders from all walks of life, different industries, different company sizes, and organization sizes.But yeah, as a Gartner analyst, I was in a different division that was much more technical. So, for me personally, I did actually try to tinker a lot: set up Docker, deploy Kubernetes clusters, all that fun stuff. But yeah, prior to my life, as an analyst, I was a practitioner, a security leader for close to 20 years at Verizon so, saw quite a bit. And actually started as enterprise architect building, kind of, systems and infrastructure to support all of those business needs, then I kind of transitioned over to application security towards the tail end of that career at Verizon.Corey: And one of the things that I find that I enjoy doing is talking with folks in positions like yours, the folks who did not come to the cybersecurity side of the world from a pure strategy advisory sense, but have been hands-on with these things at varying points in our careers, just because otherwise I feel like I'm sort of coming at this from a very different world. When I walk around the RSA show floor, I am consistently confronted by people trying to sell me the same dozen products over and over again with different words and different branding, but it seems like it's all buzzwords aimed from security people who are deep in the weeds to other security people who are deep in the weeds and it's just presumed that everyone knows what they're talking about already. And obviously worse. I'm not here to tell them that they're going about their business wrong, but for smaller companies, SMBs, folks who have to care about security but don't know the vernacular in the same way and don't have sophisticated security apparatus at their companies, it feels like a dense thicket of impenetrable buzzwords.Michael: Yes. Very, very fair assessment, [laugh] I would say. So, I'd say my life as an analyst was a lot of lengthy conversations. I guess a little bit of the secret behind analyst inquiry, I mean, a lot of times, they are hour-long conversations, sometimes multiple sets of them. But yeah, it's very true, right?There's a lot of nuance to how you work with technology and how you build things, but then also how you secure it, it's very hard to, kind of, condense that, you know, hours of conversation and many pages of documentation down into some bite-size nuggets that marketers might run with. So, I try to kind of live in that in-between world where I can kind of explain deep technology problems and business realities, and kind of explain that in more common language to people. Sometimes it's easier said than done when you're speaking it as opposed to writing it. But yeah, that's kind of where I tried to bring my skills and experience.Corey: It's a little counterintuitive to folks coming out from the other side, I suspect. For me, at least the hardest part of getting into the business of cloud cost optimization the way that I do with the Duckbill Group was learning to talk. Where I come from a background of heavy on the engineering and operations side, but being able to talk to business stakeholders who do not particularly care what a Kubernetes might be, is critical. You have to effectively be able to speak to different constituencies, sometimes in the same conversation, without alienating the rest of them. That was the hard part for me.Michael: Yeah, that's absolutely true and I certainly ran into that quite a bit as an enterprise architect at Verizon. There's kind of really need to work to identify, like, what is the business need. And typically, that is talking to the stakeholders, you know, what are they trying to achieve? They might not even know that, right, [laugh] because not everybody is very structured in how they think about the problem you're trying to solve. And then what is their daily workflow?And then you kind of arrive at the technology. I'd say, a common pitfall for anybody, right, Whether you're an engineer or a security practitioner is to kind of start with the technology or the solution and then try to force that on people, right? “Here's your solution to the problem that maybe you didn't know you had.” [laugh]. It kind of should work in reverse, right? What's the actual business need? What's your workflow? And what's the appropriate technology for that, right?Whether it's right-sizing the infrastructure or a particular type of functionality or protection, all those things, right? So, very similar kind of way of approaching the problem. It's just what you're trying to solve but [laugh] I've definitely seen that, kind of, Kubernetes is all the rage, right, or service mesh. Like, everybody needs to start deploying Istio, and you really should be asking the question—Corey: Oh, it's all resume-driven development.Michael: Yep, exactly. Yeah. It's kind of the new kid on the block, right? Let's push out this cool new technology and then problems be damned, right?Corey: I'm only half-kidding on that. I've talked to folks who are not running those types of things and they said that it is a bit of a drag on their being able to attract talent.Michael: Yeah, it's—you know, I mean, it's newer technologies, right, so it can be hard to find them, right, kind of unicorn status. I used to talk quite a bit in advisory calls to find DevOps practitioners that were kind of full-stack. That's tricky.Corey: I always wonder if it's possible to find them, on some level.Michael: Yeah. And it's like, well, can you find them and then when you do find them, can you afford them?Corey: Oh, yeah. What I'm seeing in these other direction, though, is people who are making, you know, sensible technology choices where you actually understand what lives were without turning it into a murder mystery where you need to hire a private investigator to track it down. Those are the companies that are having trouble hiring because it seems that an awful lot of the talent, or at least a significant subset of it, want to have the latest and greatest technologies on their resume on their next stop. Which, I'm not saying they're wrong for doing that, but it is a strange outcome that I wasn't quite predicting.Michael: Yeah. No, it is very true, I definitely see that quite a bit in tech sector. I've run into it myself, even with the amount of experience I have and skills. Yeah, companies sometimes get in a mode where they're looking for very specific skills, potentially even products or technologies, right? And that's not always the best way to go about it.If you understand concepts, right, with technology and systems engineering, that should translate, right? So, it's kind of learning the new syntax, or semantics, working with a framework or a platform or a piece of technology.Corey: One of the reasons that I started the security side of what I do on the newsletter piece, and it caught some people by surprise, but the reason I did it was because I have always found that, more or less, security and cost are closely aligned spiritually, if nothing else. They're reactive problems and they don't, in the general sense, get companies one iota closer to the business outcome they're chasing, but it's something you have to do, like buying fire insurance for the building. You can spend infinite money on those things, but it doesn't advance. It's all on the defensive, reactive side. And you tend to care about these things a lot right after you failed to care about them sufficiently. Does that track at all from your experience?Michael: Yeah. Yeah, absolutely. I'm just kind of flashing back to some war stories at Verizon, right? It was… I'd say very common that, once you've kind of addressed, well, these are the business problems we want to solve for and we're off to the races, right, we're going to build this cool thing. And then you deploy it, right [laugh], and then you forgot to account for backup, right? What's your disaster recovery plan? Do you have logging in place? Are you monitoring the thing effectively? Are your access controls accounted for?All those, kind of, tangential processes, but super-critical, right, when you think about, kind of, production systems, like, they have to be in place. So, it's absolutely true, right, and it's kind of definitely for just general availability, you need to be thinking about these things. And yeah, they almost always translate to that security piece of it as well, right, particularly with all the regulations that organizations are impacted with today. You really need to be thinking about, kind of, all these pieces of the puzzle, not just hey, let's build this thing and get it on running infrastructure and we're done with our work.Corey: A question that I've got for you—because I'm seeing a very definite pattern emerging tied to the overall macro environment, now, where after a ten-year bull run, suddenly a bunch of companies are discovering, holy crap, money means something again, where instead of being able to go out and gets infinite money, more or less, to throw at an AWS bill, suddenly, oh, that's a big number, and we have no idea what's in it. We should care about that. So, almost overnight, we've seen people suddenly caring about their bill. How are you seeing security over the past year or so? Has there been a similar awareness around that or has that not really been tied to the overall macro-cycle?Michael: Very good question, yeah. So unfortunately, security's often an afterthought, right, just like, kind of those things that support availability—probably going to get a little bit better ranking because it's going to support your customers and employees, so you're going to get budget and headcount to support that. Security, usually in the pecking order, is below that, right, which is unfortunate because [laugh] there can be severe repercussions with that, such as privacy impacts, or data breach, right, lost revenue, all kinds of things. But yeah, typically, security has been undercut, right? You're always seeking headcount, you need more budget.So, security teams tend to look to delegate security process out, right? So, you kind of see a lot of DevOps programs, like, can we empower engineers to run some of these processes and tooling, and then security, kind of, becomes the overseer. So, we see a lot of that where can we kind of have people satisfy some of these pieces. But then with respect to, like, security budgets, it is often security tools consolidation because a lot organizations tend to have a lot of things, right? So, security leaders are looking to scale that back, right, so they can work more effectively, but then also cut costs, which is definitely true these days in the current macroeconomic environment.Corey: I'm curious as well, to see what your take is on the interplay between cost and security. And what I mean by that is, I did the numbers once, and if you were to go into an AWS native environment, ignore third-party vendors for a second, just configure all of the AWS security services in your account, so the way that best practices dictate that you should, you're pretty quickly going to end up in a scenario where the cost of that outweighs that of the data breach that you're ostensibly trying to prevent. So—Michael: Yes.Corey: It's an infinite money pit that you can just throw everything into. So, people care about security, but they also care about cost. Plus, let's be very direct here, you can spend all the money on security and still lose. How do companies think about that now?Michael: A lot of leaders will struggle with, are we trying to be compliant or are we trying to be secure? Because those can be very different conversations and solutions to the problem. I mean, ideally, everybody would pursue that perfect model of security, right, enable all the things, but that's not necessarily cost-effective to do that. And so, most organizations and most security teams are going to prioritize their risks, right? So, they'll start to carve out, maybe these are all our internet-facing applications, these are the business-critical ones, so we're going to allocate more security focus to them and security spend, so [maybe we will be turn up 00:13:20] more security services to protect those things and monitor them.Then [laugh], unfortunately, you can end up with a glut of maybe internal applications or non-critical things that just don't get that TLC from security, unfortunately, for security teams, but fortunate for attackers, those things become attack targets, right? So, they don't necessarily care how you've prioritized your controls or your risk. They're going to go for the low-hanging fruit. So, security teams have always struggled with that, but it's very true. Like, in a cloud environment like AWS, yeah, if you start turning everything up, be prepared for a very, very costly cloud expense bill.Corey: Yeah, in my spare time, I'm working on a project that I was originally going to open-source, but I realized if I did it, it would cause nothing but pain and drama for everyone, of enabling a whole bunch of AWS misconfiguration options, given a set of arbitrary credentials, that just effectively try to get the high score on the bill. And it turned out that my early tests were way more successful than anticipated, and instead, I'm just basically treating it as a security vulnerability reporting exercise, just because people don't think about this in quite the same way. And again, it's not that these tools are necessarily overpriced; it's not that they aren't delivering value. It's that in many cases, it is unexpectedly expensive when they bill across dimensions that people are not aware of. And it's one of those everyone's aware of that trap the second time type of situations.It's a hard problem. And I don't know that there's a great way to answer it. I don't think that AWS is doing anything untoward here; I don't think that they're being intentionally malicious around these things, but it's very vast, very complex, and nobody sees all of it.Michael: Very good point, yes. Kind of, cloud complexity and ephemeral nature of cloud resources, but also the cost, right? Like, AWS isn't in the business of providing free service, right? Really, no cloud provider is. They are a business, right, so they want to make money on Cloud consumption.And it's interesting, I remember, like, the first time I started exploring Kubernetes, I did deploy clusters in cloud providers, so you can kind of tinker and see how these things work, right, and they give you some free credits, [a month of credit 00:15:30], to kind of work with this stuff. And, you know, if you spin up a [laugh] Kubernetes cluster with very bare bones, you're going to chew through that probably within a day, right? There's a lot of services in it. And that's even with defaults, which includes things like minimal, if anything, with respect to logging. Which is a problem, right, because then you're going to miss general troubleshooting events, but also actual security events.So, it's not necessarily something that AWS could solve for by turning everything up, right, because they are going to start giving away services. Although I'm starting to see some tide shifts with respect to cybersecurity. The Biden administration just released their cybersecurity strategy that talks about some of this, right? Like, should cloud providers start assuming more of the responsibility and accountability, potentially just turning up logging services? Like, why should those be additional cost to customers, right, because that's really critical to even support basic monitoring and security monitoring so you can report incidents and breaches.Corey: When you look across what customers are doing, you have a different problem than I do. I go in and I say, “Oh, I fixed the horrifying AWS bill.” And then I stop talking and I wait. Because if people [unintelligible 00:16:44] to that, “Ooh, that's a problem for us,” great. We're having a conversation.If they don't, then there's no opportunity for my consulting over in that part of the world. I don't have to sit down and explain to people why their bill is too high or why they wouldn't want it to be they intrinsically know and understand it or they're honestly not fit to be in business if they can't make a strategic evaluation of whether or not their bill is too high for what they're doing. Security is very different, especially given how vast it is and how unbounded the problem space is, relatively speaking. You have to first educate customers in some ways before attempting to sell them something. How do you do that without, I guess, drifting into the world of FUD where, “Here are all the terrible things that could happen. The solution is to pay me.” Which in many cases is honest, but people have an aversion to it.Michael: Yeah. So, that's how I feel [laugh] a lot of my days here at Sysdig. So, I do try to explain, kind of, these problems in general terms as opposed to just how Sysdig can help you solve for it. But you know, in reality, it is larger strategic challenges, right, there's not necessarily going to be one tool that's going to solve all your problems, the silver bullet, right, it's always true. Yes, Sysdig has a platform that can address a lot of cloud security-type issues, like over-permissioning or telling you what are the actual exploitable workloads in your environment, but that's not necessarily going to help you with, you know, if you have a regulator breathing down your neck and wants to know about an incident, how do you actually relay that information to them, right?It's really just going to help surface event data, stitch things together, that now you have to carry that over to that person or figure out within your organization who's handling that. So, there is kind of this larger piece of, you know, governance risk and compliance, and security tooling helps inform a lot of that, but yeah, every organization is, kind of, have to answer to [laugh] those authorities, often within their own organization, but it could also be government authorities.Corey: Part of the challenge as well is that there's—part of it is tooling, absolutely, but an awful lot of it is a people problem where you have these companies in the security space talking about a variety of advanced threats, of deeply sophisticated attackers that are doing incredibly arcane stuff, and then you have the CEO yelling about what they're doing on a phone call in the airport lounge and their password—which is ‘kitty' by the way—is on a Post-It note on their laptop for everyone to see. It feels like it's one of those, get the basic stuff taken care of first, before going down the path to increasingly arcane attacks. There's an awful lot of vectors to wind up attacking an infrastructure, but so much of what we see from data breaches is simply people not securing S3 buckets, as a common example. It's one of those crawl, walk, run types of stories. For what you do, is there a certain level of sophistication that companies need to get to before what you offer starts to bear fruit?Michael: Very good question, right, and I'd start with… right, there's certainly an element of truth that we're lagging behind on some of the security basics, right, or good security hygiene. But it's not as simple as, like, well, you picked a bad password or you left the port exposed, you know? I think certainly security practitioners know this, I'd even put forth that a lot of engineers know it, particularly if they're been trained more recently. There's been a lot of work to promote security awareness, so we know that we should provide IDs and passwords of sufficient strength, don't expose things you shouldn't be doing. But what tends to happen is, like, as you build monitoring systems, they're just extremely complex and distributed.Not to go down the weeds with app designs, with microservices architecture patterns, and containerized architectures, but that is what happens, right, because the days of building some heavyweight system in the confines of a data center in your organization, those things still do happen, but that's not typically how new systems are being architected. So, a lot of the old problems still linger, there's just many more instances of it and it's highly distributed. So it, kind of, the—the problem becomes very amplified very quickly.Corey: That's, I think, on some level, part of the challenge. It's worse in some ways that even the monitoring and observability space where, “All right, we have 15 tools that we're using right now. Why should we talk to yours?” And the answer is often, “Because we want to be number 16.” It's one of those stories where it winds up just adding incremental cost. And by cost, I don't just mean money; I mean complexity on top of these things. So, you folks are, of course, sponsoring this episode, so the least I can do is ask you, where do you folks start and stop? Sysdig: you do a lot of stuff. What's the sweet spot?Michael: Yeah, I mean, there's a few, right, because it is a larger platform. So, I often talk in terms of full lifecycle security, right? And a lot of organizations will split their approaches. We'll talk about shift left, which is really, let's focus very heavily on secure design, let's test all the code and all the artifacts prior to delivering that thing, try to knock out all quality issues, right, for kind of that general IT, but also security problems, which really should be tracked as quality issues, but including those things like vulnerabilities and misconfigs. So, Sysdig absolutely provides that capability that to satisfy that shift left approach.And Sysdig also focuses very heavily on runtime security or the shield right side of the equation. And that's, you know, give me those capabilities that allow me to monitor all types of workloads, whether they're virtual machines, or containers, serverless abstractions like Fargate because I need to know what's going on everywhere. In the event that there is a potential security incident or breach, I need all that information so I can actually know what happened or report that to a regulatory authority.And that's easier said than done, right? Because when you think about containerized environments, they are very ephemeral. A container might spin up a tear down within minutes, right, and if you're not thinking about your forensics and incident response processes, that data is going to be lost [unintelligible 00:23:10] [laugh]. You're kind of shooting yourself in the foot that way. So yeah, Sysdig kind of provides that platform to give you that full range of capabilities throughout the lifecycle.Corey: I think that that is something that is not fully understood in a lot of cases. I remember a very early Sysdig, I don't know if it was a demo or what exactly it was, I remember was the old Heavybit space in San Francisco, where they came out, it was, I believe, based on an open-source project and it was still taking the perspective, isn't this neat? It gives you really in-depth insight into almost a system-call level of what it is the system is doing. “Cool. So, what's the value proposition for this?”It's like, “Well, step one, be an incredibly gifted engineer when it comes to systems internals.” It's like, “Okay, I'll be back in five years. What's step two?” It's like, “We'll figure it out then.” Now, the story has gone up the stack. It originally felt a little bit like it was a solution in search of a problem. Now, I think you have found that problem, you have clearly hit product-market fit. I see you folks in the wild in many of my customer engagements. You are doing something very right. But it was neat watching, like, it's almost for me, I turned around, took my eye off the ball for a few seconds and it went from, “We have no idea of what we're doing” to, “We know exactly what we're doing.” Nice work.Michael: Yeah. Yeah. Thanks, Corey. Yeah, and there's quite a history with Sysdig in the open-source community. So, one of our co-founders, Loris Degioanni, was one of the creators of Wireshark, which some of your listeners may be familiar with.So, Wireshark was a great network traffic inspection and observability tool. It certainly could be used by, you know, just engineers, but also security practitioners. So, I actually used it quite a bit in my days when I would do pen tests. So, a lot of that design philosophy carried over to the Sysdig open source. So, you're absolutely correct.Sysdig open source is all about gathering that sys-call data on what is happening at that low level. But it's just one piece of the puzzle, exactly as you described. The other big piece of open-source that Sysdig does provide is Falco, which is kind of a threat detection and response engine that can act on all of those signals to tell you, well, what is actually happening is this potentially a malicious event? Is somebody trying to compromise the container runtime? Are they trying to launch a suspicious process? So that those pieces are there under the hood, right, and then Sysdig Secure is, kind of, the larger platform of capabilities that provide a lot of the workflow, nice visualizations, all those things you kind of need to operate at scale when you're supporting your systems and security.Corey: One thing that I do find somewhat interesting is there's always an evolution as companies wind up stumbling through the product lifecycle, where originally it starts off as we have an idea around one specific thing. And that's great. And for you folks, it feels like it was security. Then it started changing a little bit, where okay, now we're going to start doing different things. And I am very happy with the fact right now that when I look at your site, you have two offerings and not two dozen, like a number of other companies tend to. You do Sysdig Secure, which is around the security side of the world, and Sysdig Monitor, which is around the observability side of the world. How did that come to be?Michael: Yeah, it's a really good point, right, and it's kind of in the vendor space [laugh], there's also, like, chasing the acronyms. And [audio break 00:26:41] full disclosure, we are guilty of that at times, right, because sometimes practitioners and buyers seek those things. So, you have to kind of say, yeah, we checked that box for CSPM or CWPP. But yeah, it's kind of talking more generally to organizations and how they operate their businesses, like, that's more well-known constructs, right? I need to monitor this thing or I need to get some security. So, lumping into those buckets helps that way, right, and then you turn on those capabilities you need to support your environment, right?Because you might not be going full-bore into a containerized environment, and maybe you're focusing specifically on the runtime pieces and you're going to, kind of, circle back on security testing in your build pipeline. So, you're only going to use some of those features at the moment. So, it is kind of that platform approach to addressing that problem.Corey: Oh, I would agree. I think that one of the challenges I still have around the observability space—which let's remind people, is hipster monitoring; I don't care what other people say. That's what it is—is that it is depressingly tied to a bunch of other things. To this day, the only place to get a holistic view of everything in your AWS account in every region is the bill. That somehow has become an observability tool. And that's ridiculous.On the other side of it, I have had several engagements that inadvertently went from, “We're going to help optimize your cost,” to, “Yay. We found security incidents.” I don't love a lot of these crossover episodes we wind up seeing, but it is the nature of reality where security, observability, and yes, costs all seem to tie together to some sort of unholy triumvirate. So, I guess the big question is when does Sysdig launch a cost product?Michael: Well, we do have one [laugh], specifically for—Corey: [laugh]. Oh, events once again outpace me.Michael: [laugh]. But yeah, I mean, you touched on this a few times in our discussion today, right? There's heavy intersections, right, and the telemetry you need to gather, right, or the log data you need to gather to inform monitoring use cases or security use cases, a lot of the times that telemetry is the same set of data, it's just you're using it for different purposes. So, we actually see this quite commonly where Sysdig customers might pursue, Monitor or Secure, and then they actually find that there's a lot of value-add to look at the other pieces.And it goes both ways, right? They might start with the security use cases and then they find, well, we've over-allocated on our container environments and we're over-provisioning in Kubernetes resources, so all right, that's cool. We can actually reduce costs that could help create more funding to secure more hosts or more workloads in an environment, right? So it's, kind of, show me the things I'm doing wrong on this side of the equation, whether that's general IT security problems and then benefit the other. And yeah, typically we find that because things are so complex, yeah, you're over-permissioning you're over-allocating, it's just very common, rights? Kubernetes, as amazing as it can be or is, it's really difficult to operate that in practice, right? Things can go off the rails very, very quickly.Corey: I really want to thank you for taking time to speak about how you see the industry and the world. If people want to learn more, where's the best place for them to find you?Michael: Yes, thanks, Corey. It's really been great to be here and talk with you about these topics. So, for me personally, you know, I try to visit LinkedIn pretty regularly. Probably not daily but, you know, at least once a week, so please, by all means, if you ever have questions, do contact me. I love talking about this stuff.But then also on Sysdig, sysdig.com, I do author content on there. I speak regularly in all types of event formats. So yeah, you'll find me out there. I have a pretty unique last name. And yeah, that's kind of it. That's the, I'd say the main sources for me at the moment. Don't fall for the other Isbitski; that's actually my brother, who does work for AWS.Corey: [laugh]. That's okay. There's no accounting for family, sometimes.Michael: [laugh].Corey: I kid, I kid. Okay, great company. Great work. Thank you so much for your time. I appreciate it.Michael: Thank you, Corey.Corey: Mike Isbitski, Director of Cybersecurity Strategy at Sysdig. I'm Cloud Economist Corey Quinn and this has been a promoted guest episode brought to us by our friends at Sysdig. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry, insulting comment from your place, which is no doubt expensive, opaque, and insecure, hitting all three points of that triumvirate.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Fixing What's Broken in Monitoring and Observability with Jean Yang

    Play Episode Listen Later Apr 20, 2023 36:13


    Jean Yang, CEO of Akita Software, joins Corey on Screaming in the Cloud to discuss how she went from academia to tech founder, and what her company is doing to improve monitoring and observability. Jean explains why Akita is different from other observability & monitoring solutions, and how it bridges the gap from what people know they should be doing and what they actually do in practice. Corey and Jean explore why the monitoring and observability space has been so broken, and why it's important for people to see monitoring as a chore and not a hobby. Jean also reveals how she took a leap from being an academic professor to founding a tech start-up. About JeanJean Yang is the founder and CEO of Akita Software, providing the fastest time-to-value for API monitoring. Jean was previously a tenure-track professor in Computer Science at Carnegie Mellon University.Links Referenced: Akita Software: https://www.akitasoftware.com/ Aki the dog chatbot: https://www.akitasoftware.com/blog-posts/we-built-an-exceedingly-polite-ai-dog-that-answers-questions-about-your-apis Twitter: https://twitter.com/jeanqasaur TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is someone whose company has… well, let's just say that it has piqued my interest. Jean Yang is the CEO of Akita Software and not only is it named after a breed of dog, which frankly, Amazon service namers could take a lot of lessons from, but it also tends to approach observability slash monitoring from a perspective of solving the problem rather than preaching a new orthodoxy. Jean, thank you for joining me.Jean: Thank you for having me. Very excited.Corey: In the world that we tend to operate in, there are so many different observability tools, and as best I can determine observability is hipster monitoring. Well, if we call it monitoring, we can't charge you quite as much money for it. And whenever you go into any environment of significant scale, we pretty quickly discover that, “What monitoring tool are you using?” The answer is, “Here are the 15 that we use.” Then you talk to other monitoring and observability companies and ask them which ones of those they've replace, and the answer becomes, “We're number 16.” Which is less compelling of a pitch than you might expect. What does Akita do? Where do you folks start and stop?Jean: We want to be—at Akita—your first stop for monitoring and we want to be all of the monitoring, you need up to a certain level. And here's the motivation. So, we've talked with hundreds, if not thousands, of software teams over the last few years and what we found is there is such a gap between best practice, what people think everybody else is doing, what people are talking about at conferences, and what's actually happening in software teams. And so, what software teams have told me over and over again, is, hey, we either don't actually use very many tools at all, or we use 15 tools in name, but it's you know, one [laugh] one person on the team set this one up, it's monitoring one of our endpoints, we don't even know which one sometimes. Who knows what the thresholds are really supposed to be. We got too many alerts one day, we turned it off.But there's very much a gap between what people are saying they're supposed to do, what people in their heads say they're going to do next quarter or the quarter after that and what's really happening in practice. And what we saw was teams are falling more and more into monitoring debt. And so effectively, their customers are becoming their monitoring and it's getting harder to catch up. And so, what Akita does is we're the fastest, easiest way for teams to quickly see what endpoints you have in your system—so that's API endpoints—what's slow and what's throwing errors. And you might wonder, okay, wait, wait, wait, Jean. Monitoring is usually about, like, logs, metrics, and traces. I'm not used to hearing about API—like, what do APIs have to do with any of it?And my view is, look, we want the most simple form of what might be wrong with your system, we want a developer to be able to get started without having to change any code, make any annotations, drop in any libraries. APIs are something you can watch from the outside of a system. And when it comes to which alerts actually matter, where do you want errors to be alerts, where do you want thresholds to really matter, my view is, look, the places where your system interfaces with another system are probably where you want to start if you've really gotten nothing. And so, Akita view is, we're going to start from the outside in on this monitoring. We're turning a lot of the views on monitoring and observability on its head and we just want to be the tool that you reach for if you've got nothing, it's middle of the night, you have alerts on some endpoint, and you don't want to spend a few hours or weeks setting up some other tool. And we also want to be able to grow with you up until you need that power tool that many of the existing solutions out there are today.Corey: It feels like monitoring is very often one of those reactive things. I come from the infrastructure world, so you start off with, “What do you use for monitoring?” “Oh, we wait till the help desk calls us and users are reporting a problem.” Okay, that gets you somewhere. And then it becomes oh, well, what was wrong that time? The drive filled up. Okay, so we're going to build checks in that tell us when the drives are filling up.And you wind up trying to enumerate all of the different badness. And as a result, if you leave that to its logical conclusion, one of the stories that I heard out of MySpace once upon a time—which dates me somewhat—is that you would have a shift, so there were three shifts working around the clock, and each one would open about 5000 tickets, give or take, for the monitoring alerts that wound up firing off throughout their infrastructure. At that point, it's almost, why bother? Because no one is going to be around to triage these things; no one is going to see any of the signal buried and all of that noise. When you talk about doing this for an API perspective, are you running synthetics against those APIs? Are you shimming them in order to see what's passing through them? What's the implementation side look like?Jean: Yeah, that's a great question. So, we're using a technology called BPF, Berkeley Packet Filter. The more trendy, buzzy term is EBPF—Corey: The EBPF. Oh yes.Jean: Yeah, Extended Berkeley Packet Filter. But here's the secret, we only use the BPF part. It's actually a little easier for users to install. The E part is, you know, fancy and often finicky. But um—Corey: SEBPF then: Shortened Extended BPF. Why not?Jean: [laugh]. Yeah. And what BPF allows us to do is passively watch traffic from the outside of a system. So, think of it as you're sending API calls across the network. We're just watching that network. We're not in the path of that traffic. So, we're not intercepting the traffic in any way, we're not creating any additional overhead for the traffic, we're not slowing it down in any way. We're just sitting on the side, we're watching all of it, and then we're taking that and shipping an obfuscated version off to our cloud, and then we're giving you analytics on that.Corey: One of the things that strikes me as being… I guess, a common trope is there are a bunch of observability solutions out there that offer this sort of insight into what's going on within an environment, but it's, “Step one: instrument with some SDK or some agent across everything. Do an entire deploy across your fleet.” Which yeah, people are not generally going to be in a hurry to sign up for. And further, you also said a minute ago that the idea being that someone could start using this in the middle of the night in the middle of an outage, which tells me that it's not, “Step one: get the infrastructure sparkling. Step two: do a global deploy to everything.” How do you go about doing that? What is the level of embeddedness into the environment?Jean: Yeah, that's a great question. So, the reason we chose BPF is I wanted a completely black-box solution. So, no SDKs, no code annotations. I wanted people to be able to change a config file and have our solution apply to anything that's on the system. So, you could add routes, you could do all kinds of things. I wanted there to be no additional work on the part of the developer when that happened.And so, we're not the only solution that uses BPF or EBPF. There's many other solutions that say, “Hey, just drop us in. We'll let you do anything you want.” The big difference is what happens with the traffic once it gets processed. So, what EBPF or BPF gives you is it watches everything about your system. And so, you can imagine that's a lot of different events. That's a lot of things.If you're trying to fix an incident in the middle of the night and someone just dumps on you 1000 pages of logs, like, what are you going to do with that? And so, our view is, the more interesting and important and valuable thing to do here is not make it so that you just have the ability to watch everything about your system but to make it so that developers don't have to sift through thousands of events just to figure out what went wrong. So, we've spent years building algorithms to automatically analyze these API events to figure out, first of all, what are your endpoints? Because it's one thing to turn on something like Wireshark and just say, okay, here are the thousand API calls, I saw—ten thousand—but it's another thing to say, “Hey, 500 of those were actually the same endpoint and 300 of those had errors.” That's quite a hard problem.And before us, it turns out that there was no other solution that even did that to the level of being able to compile together, “Here are all the slow calls to an endpoint,” or, “Here are all of the erroneous calls to an endpoint.” That was blood, sweat, and tears of developers in the night before. And so, that's the first major thing we do. And then metrics on top of that. So, today we have what's slow, what's throwing errors. People have asked us for other things like show me what happened after I deployed. Show me what's going on this week versus last week. But now that we have this data set, you can imagine there's all kinds of questions we can now start answering much more quickly on top of it.Corey: One thing that strikes me about your site is that when I go to akitasoftware.com, you've got a shout-out section at the top. And because I've been doing this long enough where I find that, yeah, you work at a company; you're going to say all kinds of wonderful, amazing aspirational things about it, and basically because I have deep-seated personality disorders, I will make fun of those things as my default reflexive reaction. But something that AWS, for example, does very well is when they announce something ridiculous on stage at re:Invent, I make fun of it, as is normal, but then they have a customer come up and say, “And here's the expensive, painful problem that they solved for us.”And that's where I shut up and start listening. Because it's a very different story to get someone else, who is presumably not being paid, to get on stage and say, “Yeah, this solved a sophisticated, painful problem.” Your shout-outs page has not just a laundry list of people saying great things about it, but there are former folks who have been on the show here, people I know and trust: Scott Johnson over at Docker, Gergely Orosz over at The Pragmatic Engineer, and other folks who have been luminaries in the space for a while. These are not the sort of people that are going to say, “Oh, sure. Why not? Oh, you're going to send me a $50 gift card in a Twitter DM? Sure I'll say nice things,” like it's one of those respond to a viral tweet spamming something nonsense. These are people who have gravitas. It's clear that there's something you're building that is resonating.Jean: Yeah. And for that, they found us. Everyone that I've tried to bribe to say good things about us actually [laugh] refused.Corey: Oh, yeah. As it turns out that it's one of those things where people are more expensive than you might think. It's like, “What, you want me to sell my credibility down the road?” Doesn't work super well. But there's something like the unsolicited testimonials that come out of, this is amazing, once people start kicking the tires on it.You're currently in open beta. So, I guess my big question for you is, whenever you see a product that says, “Oh, yeah, we solve everything cloud, on-prem, on physical instances, on virtual machines, on Docker, on serverless, everything across the board. It's awesome.” I have some skepticism on that. What is your ideal application architecture that Akita works best on? And what sort of things are you a complete nonstarter for?Jean: Yeah, I'll start with a couple of things we work well on. So, container platforms. We work relatively well. So, that's your Fargate, that's your Azure Web Apps. But that, you know, things running, we call them container platforms. Kubernetes is also something that a lot of our users have picked us up and had success with us on. I will say our Kubernetes deploy is not as smooth as we would like. We say, you know, you can install us—Corey: Well, that is Kubernetes, yes.Jean: [laugh]. Yeah.Corey: Nothing in Kubernetes is as smooth as we would like.Jean: Yeah, so we're actually rolling out Kubernetes injection support in the next couple of weeks. So, those are the two that people have had the most success on. If you're running on bare metal or on a VM, we work, but I will say that you have to know your way around a little bit to get that to work. What we don't work on is any Platform as a Service. So, like, a Heroku, a Lambda, a Render at the moment. So those, we haven't found a way to passively listen to the network traffic in a good way right now.And we also work best for unencrypted HTTP REST traffic. So, if you have encrypted traffic, it's not a non-starter, but you need to fall into a couple of categories. You either need to be using Kubernetes, you can run Akita as a sidecar, or you're using Nginx. And so, that's something we're still expanding support on. And we do not support GraphQL or GRPC at the moment.Corey: That's okay. Neither do I. It does seem these days that unencrypted HTTP API calls are increasingly becoming something of a relic, where folks are treating those as anti-patterns to be stamped out ruthlessly. Are you still seeing significant deployments of unencrypted APIs?Jean: Yeah. [laugh]. So, Corey—Corey: That is the reality, yes.Jean: That's a really good question, Corey, because in the beginning, we weren't sure what we wanted to focus on. And I'm not saying the whole deployment is unencrypted HTTP, but there is a place to install Akita to watch where it's unencrypted HTTP. And so, this is what I mean by if you have encrypted traffic, but you can install Akita as a Kubernetes sidecar, we can still watch that. But there was a big question when we started: should this be GraphQL, GRPC, or should it be REST? And I read the “State of the API Report” from Postman for you know, five years, and I still keep up with it.And every year, it seemed that not only was REST, remaining dominant, it was actually growing. So, [laugh] this was shocking to me as well because people said, well, “We have this more structured stuff, now. There's GRPC, there's GraphQL.” But it seems that for the added complexity, people weren't necessarily seeing the value and so, REST continues to dominate. And I've actually even seen a decline in GraphQL since we first started doing this. So, I'm fully on board the REST wagon. And in terms of encrypted versus unencrypted, I would also like to see more encryption as well. That's why we're working on burning down the long tail of support for that.Corey: Yeah, it's one of those challenges. Whenever you're deploying something relatively new, there's this idea that it should be forward-looking and you, on some level, want to modernize your architecture and infrastructure to keep up with it. An AWS integration story I see that's like that these days is, “Oh, yeah, generate an IAM credential set and just upload those into our system.” Yeah, the modern way of doing that is role assumption: to find a role and here's how to configure it so that it can do what we need to do. So, whenever you start seeing things that are, “Oh, yeah, just turn the security clock back in time a little bit,” that's always a little bit of an eyebrow raise.I can also definitely empathize with the joys of dealing with anything that even touches networking in a Lambda context. Building the Lambda extension for Tailscale was one of the last big dives I made into that area and I still have nightmares as a result. It does a lot of interesting things right up until you step off the golden path. And then suddenly, everything becomes yaks all the way down, in desperate need of shaving.Jean: Yeah, Lambda does something we want to handle on our roadmap, but I… believe we need a bigger team before [laugh] we are ready to tackle that.Corey: Yeah, we're going to need a bigger boat is very often [laugh] the story people have when they start looking at entire new architectural paradigms. So, you end up talking about working in containerized environments. Do you find that most of your deployments are living in cloud environments, in private data centers, some people call them private cloud. Where does the bulk of your user applications tend to live these days?Jean: The bulk of our user applications are in the cloud. So, we're targeting small to medium businesses to start. The reason being, we want to give our users a magical deployment experience. So, right now, a lot of our users are deploying in under 30 minutes. That's in no small part due to automations that we've built.And so, we initially made the strategic decision to focus on places where we get the most visibility. And so—where one, we get the most visibility, and two, we are ready for that level of scale. So, we found that, you know, for a large business, we've run inside some of their production environments and there are API calls that we don't yet handle well or it's just such a large number of calls, we're not doing the inference as well and our algorithms don't work as well. And so, we've made the decision to start small, build our way up, and start in places where we can just aggressively iterate because we can see everything that's going on. And so, we've stayed away, for instance, from any on-prem deployments for that reason because then we can't see everything that's going on. And so, smaller companies that are okay with us watching pretty much everything they're doing has been where we started. And now we're moving up into the medium-sized businesses.Corey: The challenge that I guess I'm still trying to wrap my head around is, I think that it takes someone with a particularly rosy set of glasses on to look at the current state of monitoring and observability and say that it's not profoundly broken in a whole bunch of ways. Now, where it all falls apart, Tower of Babelesque, is that there doesn't seem to be consensus on where exactly it's broken. Where do you see, I guess, this coming apart at the seams?Jean: I agree, it's broken. And so, if I tap into my background, which is I was a programming languages person in my very recently, previous life, programming languages people like to say the problem and the solution is all lies in abstraction. And so, computing is all about building abstractions on top of what you have now so that you don't have to deal with so many details and you got to think at a higher level; you're free of the shackles of so many low-level details. What I see is that today, monitoring and observability is a sort of abstraction nightmare. People have just taken it as gospel that you need to live at the lowest level of abstraction possible the same way that people truly believe that assembly code was the way everybody was going to program forevermore back, you know, 50 years ago.So today, what's happening is that when people think monitoring, they think logs, not what's wrong with my system, what do I need to pay attention to? They think, “I have to log everything, I have to consume all those logs, we're just operating at the level of logs.” And that's not wrong because there haven't been any tools that have given people any help above the level of logs. Although that's not entirely correct, you know? There's also events and there's also traces, but I wouldn't say that's actually lifting the level of [laugh] abstraction very much either.And so, people today are thinking about monitoring and observability as this full control, like, I'm driving my, like, race car, completely manual transmission, I want to feel everything. And not everyone wants to or needs to do that to get to where they need to go. And so, my question is, how far are can we lift the level of abstraction for monitoring and observability? I don't believe that other people are really asking this question because most of the other players in the space, they're asking what else can we monitor? Where else can we monitor it? How much faster can we do it? Or how much more detail can we give the people who really want the power tools?But the people entering the buyer's market with needs, they're not people—you don't have, like, you know, hordes of people who need more powerful tools. You have people who don't know about the systems are dealing with and they want easier. They want to figure out if there's anything wrong with our system so they can get off work and do other things with their lives.Corey: That, I think, is probably the thing that gets overlooked the most. It's people don't tend to log into their monitoring systems very often. They don't want to. When they do, it's always out of hours, middle of the night, and they're confronted with a whole bunch of upsell dialogs of, “Hey, it's been a while. You want to go on a tour of the new interface?”Meanwhile, anything with half a brain can see there's a giant spike on the graph or telemetry stop coming in.Jean: Yeah.Corey: It's way outside of normal business hours where this person is and maybe they're not going to be in the best mood to engage with your brand.Jean: Yeah. Right now, I think a lot of the problem is, you're either working with monitoring because you're desperate, you're in the middle of an active incident, or you're a monitoring fanatic. And there isn't a lot in between. So, there's a tweet that someone in my network tweeted me that I really liked which is, “Monitoring should be a chore, not a hobby.” And right now, it's either a hobby or an urgent necessity [laugh].And when it gets to the point—so you know, if we think about doing dishes this way, it would be as if, like, only, like, the dish fanatics did dishes, or, like, you will just have piles of dishes, like, all over the place and raccoons and no dishes left, and then you're, like, “Ah, time to do a thing.” But there should be something in between where there's a defined set of things that people can do on a regular basis to keep up with what they're doing. It should be accessible to everyone on the team, not just a couple of people who are true fanatics. No offense to the people out there, I love you guys, you're the ones who are really helping us build our tool the most, but you know, there's got to be a world in which more people are able to do the things you do.Corey: That's part of the challenge is bringing a lot of the fire down from Mount Olympus to the rest of humanity, where at some level, Prometheus was a great name from that—Jean: Yep [laugh].Corey: Just from that perspective because you basically need to be at that level of insight. I think Kubernetes suffers from the same overall problem where it is not reasonably responsible to run a Kubernetes production cluster without some people who really know what's going on. That's rapidly changing, which is for the better, because most companies are not going to be able to afford a multimillion-dollar team of operators who know the ins and outs of these incredibly complex systems. It has to become more accessible and simpler. And we have an entire near century at this point of watching abstractions get more and more and more complex and then collapsing down in this particular field. And I think that we're overdue for that correction in a lot of the modern infrastructure, tooling, and approaches that we take.Jean: I agree. It hasn't happened yet in monitoring and observability. It's happened in coding, it's happened in infrastructure, it's happened in APIs, but all of that has made it so that it's easier to get into monitoring debt. And it just hasn't happened yet for anything that's more reactive and more about understanding what the system is that you have.Corey: You mentioned specifically that your background was in programming languages. That's understating it slightly. You were a tenure-track professor of computer science at Carnegie Mellon before entering industry. How tied to what your area of academic speciality was, is what you're now at Akita?Jean: That's a great question and there are two answers to that. The first is very not tied. If it were tied, I would have stayed in my very cushy, highly [laugh] competitive job that I worked for years to get, to do stuff there. And so like, what we're doing now is comes out of thousands of conversations with developers and desire to build on the ground tools that I'm—there's some technically interesting parts to it, for sure. I think that our technical innovation is our moat, but is it at the level of publishable papers? Publishable papers are a very narrow thing; I wouldn't be able to say yes to that question.On the other hand, everything that I was trained to do was about identifying a problem and coming up with an out-of-the-box solution for it. And especially in programming languages research, it's really about abstractions. It's really about, you know, taking a set of patterns that you see of problems people have, coming up with the right abstractions to solve that problem, evaluating your solution, and then, you know, prototyping that out and building on top of it. And so, in that case, you know, we identified, hey, people have a huge gap when it comes to monitoring and observability. I framed it as an abstraction problem, how can we lift it up?We saw APIs as this is a great level to build a new level of solution. And our solution, it's innovative, but it also solves the problem. And to me, that's the most important thing. Our solution didn't need to be innovative. If you're operating in an academic setting, it's really about… producing a new idea. It doesn't actually [laugh]—I like to believe that all endeavors really have one main goal, and in academia, the main goal is producing something new. And to me, building a product is about solving a problem and our main endeavor was really to solve a real problem here.Corey: I think that it is, in many cases, useful when we start seeing a lot of, I guess, overflow back and forth between academia and industry, in both directions. I think that it is doing academia a disservice when you start looking at it purely as pure theory, and oh yeah, they don't deal with any of the vocational stuff. Conversely, I think the idea that industry doesn't have anything to learn from academia is dramatically misunderstanding the way the world works. The idea of watching some of that ebb and flow and crossover between them is neat to see.Jean: Yeah, I agree. I think there's a lot of academics I super respect and admire who have done great things that are useful in industry. And it's really about, I think, what you want your main goal to be at the time. Is it, do you want to be optimizing for new ideas or contributing, like, a full solution to a problem at the time? But it's there's a lot of overlap in the skills you need.Corey: One last topic I'd like to dive into before we call it an episode is that there's an awful lot of hype around a variety of different things. And right now in this moment, AI seems to be one of those areas that is getting an awful lot of attention. It's clear too there's something of value there—unlike blockchain, which has struggled to identify anything that was not fraud as a value proposition for the last decade-and-a-half—but it's clear that AI is offering value already. You have recently, as of this recording, released an AI chatbot, which, okay, great. But what piques my interest is one, it's a dog, which… germane to my interest, by all means, and two, it is marketed as, and I quote, “Exceedingly polite.”Jean: [laugh].Corey: Manners are important. Tell me about this pupper.Jean: Yeah, this dog came really out of four or five days of one of our engineers experimenting with ChatGPT. So, for a little bit of background, I'll just say that I have been excited about the this latest wave of AI since the beginning. So, I think at the very beginning, a lot of dev tools people were skeptical of GitHub Copilot; there was a lot of controversy around GitHub Copilot. I was very early. And I think all the Copilot people retweeted me because I was just their earlies—like, one of their earliest fans. I was like, “This is the coolest thing I've seen.”I've actually spent the decade before making fun of AI-based [laugh] programming. But there were two things about GitHub Copilot that made my jaw drop. And that's related to your question. So, for a little bit of background, I did my PhD in a group focused on program synthesis. So, it was really about, how can we automatically generate programs from a variety of means? From constraints—Corey: Like copying and pasting off a Stack Overflow, or—Jean: Well, the—I mean, that actually one of the projects that my group was literally applying machine-learning to terabytes of other example programs to generate new programs. So, it was very similar to GitHub Copilot before GitHub Copilot. It was synthesizing API calls from analyzing terabytes of other API calls. And the thing that I had always been uncomfortable with these machine-learning approaches in my group was, they were in the compiler loop. So, it was, you know, you wrote some code, the compiler did some AI, and then it spit back out some code that, you know, like you just ran.And so, that never sat well with me. I always said, “Well, I don't really see how this is going to be practical,” because people can't just run random code that you basically got off the internet. And so, what really excited me about GitHub Copilot was the fact that it was in the editor loop. I was like, “Oh, my God.”Corey: It had the context. It was right there. You didn't have to go tabbing to something else.Jean: Exactly.Corey: Oh, yeah. I'm in the same boat. I think it is basically—I've seen the future unfolding before my eyes.Jean: Yeah. Was the autocomplete thing. And to me, that was the missing piece. Because in your editor, you always read your code before you go off and—you know, like, you read your code, whoever code reviews your code reads your code. There's always at least, you know, two pairs of eyes, at least theoretically, reading your code.So, that was one thing that was jaw-dropping to me. That was the revelation of Copilot. And then the other thing was that it was marketed not as, “We write your code for you,” but the whole Copilot marketing was that, you know, it kind of helps you with boilerplate. And to me, I had been obsessed with this idea of how can you help developers write less boilerplate for years. And so, this AI-supported boilerplate copiloting was very exciting to me.And I saw that is very much the beginning of a new era, where, yes, there's tons of data on how we should be programming. I mean, all of Akita is based on the fact that we should be mining all the data we have about how your system and your code is operating to help you do stuff better. And so, to me, you know, Copilot is very much in that same philosophy. But our AI chatbot is, you know, just a next step along this progression. Because for us, you know, we collect all this data about your API behavior; we have been using non-AI methods to analyze this data and show it to you.And what ChatGPT allowed us to do in less than a week was analyze this data using very powerful large-language models and I have this conversational interface that both gives you the opportunity to check over and follow up on the question so that what you're spitting out—so what we're spitting out as Aki the dog doesn't have to be a hundred percent correct. But to me, the fact that Aki is exceedingly polite and kind of goofy—he, you know, randomly woofs and says a lot of things about how he's a dog—it's the right level of seriousness so that it's not messaging, hey, this is the end all, be all, the way, you know, the compiler loop never sat well with me because I just felt deeply uncomfortable that an AI was having that level of authority in a system, but a friendly dog that shows up and tells you some things that you can ask some additional questions to, no one's going to take him that seriously. But if he says something useful, you're going to listen. And so, I was really excited about the way this was set up. Because I mean, I believe that AI should be a collaborator and it should be a collaborator that you never take with full authority. And so, the chat and the politeness covered those two parts for me both.Corey: Yeah, on some level, I can't shake the feeling that it's still very early days there for Chat-Gipity—yes, that's how I pronounce it—and it's brethren as far as redefining, on some level, what's possible. I think that it's in many cases being overhyped, but it's solving an awful lot of the… the boilerplate, the stuff that is challenging. A question I have, though, is that, as a former professor, a concern that I have is when students are using this, it's less to do with the fact that they're not—they're taking shortcuts that weren't available to me and wanting to make them suffer, but rather, it's, on some level, if you use it to write your English papers, for example. Okay, great, it gets the boring essay you don't want to write out of the way, but the reason you write those things is it teaches you to form a story, to tell a narrative, to structure an argument, and I think that letting the computer do those things, on some level, has the potential to weaken us across the board. Where do you stand on it, given that you see both sides of that particular snake?Jean: So, here's a devil's advocate sort of response to it, is that maybe the writing [laugh] was never the important part. And it's, as you say, telling the story was the important part. And so, what better way to distill that out than the prompt engineering piece of it? Because if you knew that you could always get someone to flesh out your story for you, then it really comes down to, you know, I want to tell a story with these five main points. And in some way, you could see this as a playing field leveler.You know, I think that as a—English is actually not my first language. I spent a lot of time editing my parents writing for their work when I was a kid. And something I always felt really strongly about was not discriminating against people because they can't form sentences or they don't have the right idioms. And I actually spent a lot of time proofreading my friends' emails when I was in grad school for the non-native English speakers. And so, one way you could see this as, look, people who are not insiders now are on the same playing field. They just have to be clear thinkers.Corey: That is a fascinating take. I think I'm going to have to—I'm going to have to ruminate on that one. I really want to thank you for taking the time to speak with me today about what you're up to. If people want to learn more, where's the best place for them to find you?Jean: Well, I'm always on Twitter, still [laugh]. I'm @jeanqasaur—J-E-A-N-Q-A-S-A-U-R. And there's a chat dialog on akitasoftware.com. I [laugh] personally oversee a lot of that chat, so if you ever want to find me, that is a place, you know, where all messages will get back to me somehow.Corey: And we will, of course, put a link to that into the [show notes 00:35:01]. Thank you so much for your time. I appreciate it.Jean: Thank you, Corey.Corey: Jean Yang, CEO at Akita Software. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry insulting comment that you will then, of course, proceed to copy to the other 17 podcast tools that you use, just like you do your observability monitoring suite.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Hacking Old Hardware and Developer Advocate Presentations with Darko Mesaroš

    Play Episode Listen Later Apr 18, 2023 27:46


    Darko Mesaroš, Senior Developer Advocate at AWS, joins Corey on Screaming in the Cloud to discuss all the weird and wonderful things that can be done with old hardware, as well as the necessary skills for being a successful Developer Advocate. Darko walks through how he managed to deploy Kubernetes on a computer from 1986, as well as the trade-offs we've made in computer technology as hardware has progressed. Corey and Darko also explore the forgotten art of optimizing when you're developing, and how it can help to cut costs. Darko also shares what he feels is the key skill every Developer Advocate needs to have, and walks through how he has structured his presentations to ensure he is captivating and delivering value to his audience.About DarkoDarko is a Senior Developer Advocate based in Seattle, WA. His goal is to share his passion and technological know-how with Engineers, Developers, Builders, and tech enthusiasts across the world. If it can be automated, Darko will definitely try to do so. Most of his focus is towards DevOps and Management Tools, where automation, pipelines, and efficient developer tools is the name of the game – click less and code more so you do not repeat yourself ! Darko also collects a lot of old technology and tries to make it do what it should not. Like deploy AWS infrastructure through a Commodore 64.Links Referenced: AWS: https://aws.amazon.com/ Blog post RE deploying Kubernetes on a TRS-80: https://www.buildon.aws/posts/i-deployed-kubernetes-with-a-1986-tandy-102-portable-computer AWS Twitch: https://twitch.tv/aws Twitter: https://twitter.com/darkosubotica Mastodon: https://hachyderm.io/@darkosubotica TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there's a problem. With Chronosphere, you can shape and transform observability data based on need, context and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io/corey-quinn. That's chronosphere.io/corey-quinn. And my thanks to them for sponsor ing my ridiculous nonsense. Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud. I'm Cloud Economist Corey Quinn and my guest today is almost as bizarre as I am, in a somewhat similar direction. Darko Mesaroš is a Senior Developer Advocate at AWS. And instead of following my path of inappropriately using things as databases that weren't designed to be used that way, he instead uses the latest of technology with the earliest of computers. Darko, thank you for joining me.Darko: Thank you so much, Corey. First of all, you know, you tell me, Darko is a senior developer advocate. No, Corey. I'm a system administrator by heart. I happen to be a developer advocate these days, but I was born in the cold, cold racks of a data center. I maintain systems, I've installed packages on Linux systems. I even set up Solaris Zones a long time ago. So yeah, but I happen to yell into the camera these days, [laugh] so thank you for having me here.Corey: No, no, it goes well. You started my career as a sysadmin. And honestly, my opinion, if you asked me—which no one does, but I share it anyway—is that the difference between an SRE and a sysadmin is about a 40% salary bump.Darko: Exactly.Corey: That's about it. It is effectively the same job. The tools are different, the approach we take is different, but the fundamental mandate of ‘keep the site up' has not materially changed.Darko: It has not. I don't know, like, what the modern SRS do, but like, I used to also semi-maintain AC units. Like, you have to walk around with a screwdriver nonetheless, so sometimes, besides just installing the freshest packages on your Red Hat 4 system, you have to also change the filters in the AC. So, not sure if that belongs into the SRE manifesto these days.Corey: Well, the reason that I wound up inviting you onto the show was a recent blog post you put up where you were able to deploy Kubernetes from the best computer from 1986, which is the TRS-80, or the Trash-80. For the record, the worst computer from 1986 was—and remains—IBM Cloud. But that's neither here nor there.What does it mean to deploy Kubernetes because, to be direct, the way that I tend to deploy anything these days, if you know, I'm sensible and being grown up about it, is a Git push and then the automation takes it away from there. I get the sense, you went a little bit deep.Darko: So, when it comes to deploying stuff from an old computer, like, you know, you kind of said the right thing here, like, I have the best computer from 1986. Actually, it's a portable version of the best computer from 1986; it's a TRS-80 Model 102. It's a portable, basically a little computer intended for journalists and people on the go to write stuff and send emails or whatever it was back in those days. And I deployed Kubernetes through that system. Now, of course, I cheated a bit because the way I did it is I just used it as a glorified terminal.I just hooked up the RS 232, the wonderful serial connection, to a Raspberry Pi somewhere out there and it just showed the stuff from a Raspberry Pi onto the TRS-80. So, the TRS-80 didn't actually know how to run kubectl—or ‘kube cuddle,' what they call it—it just asked somebody else to do it. But that's kind of the magic of it.Corey: You could have done a Lambda deployment then just as easily.Darko: Absolutely. Like that's the magic of, like, these old hunks of junks is that when you get down to it, they still do things with numbers and transmit electrical signals through some wires somewhere out there. So, if you're capable enough, if you are savvy, or if you just have a lot of time, you can take any old computer and have it do modern things, especially now. Like, and I will say 15 years ago, we could have not done anything like this because 15 years ago, a lot of the stuff at least that I was involved with, which was Microsoft products, were click only. I couldn't, for the love of me, deploy a bunch of stuff on an Active Directory domain by using a command line. PowerShell was not a thing back then. You could use VB Script, but sort of.Corey: Couldn't you wind up using something that would effect, like, Selenium or whatnot that winds up emulating a user session and moving the mouse to certain coordinates and clicking and then waiting some arbitrary time and clicking somewhere else?Darko: Yes.Corey: Which sounds like the absolute worst version of automation ever. That's like, “I deployed Kubernetes using a typewriter.” “Well, how the hell did you do that?” “Oh, I use the typewriter to hit the enter key. Problem solved.” But I don't think that counts.Darko: Well, yeah, so actually even back then, like, just thinking of, like, a 10, 12-year step back to my career, I automated stuff on Windows systems—like Windows 2000, and Windows 2003 systems—by a tool called AutoIt. It would literally emulate clicks of a mouse on a specific location on the screen. So, you were just really hoping that window pops up at the same place all the time. Otherwise, your automation doesn't work. So yeah, it was kind of like that.And so, if you look at it that way, I could take my Trash-80, I could write an AutoIt script with specific coordinates, and I could deploy Windows things. So actually, yeah, you can deploy anything with these days, with an old computer.Corey: I think that we've lost something in the world of computers. If I, like, throw a computer at you these days, you're going to be pretty annoyed with me. Those things are expensive, it'll probably break, et cetera. If I throw a computer from this era at you, your family is taking bereavement leave. Like, those things where—there would be no second hit.These things were beefy. They were a sense of solidity to them. The keyboards were phenomenal. We've been chasing that high ever since. And, yeah, they were obnoxiously heavy and the battery life was 20 seconds, but it was still something that—you felt like it is computer time. And now, all these things have faded into the background. I am not protesting the march of progress, particularly in this particular respect, but I do miss the sense of having keyboards didn't weren't overwhelmingly flimsy plastic.Darko: I think it's just a fact of, like, we have computers as commodities these days. Back then computers were workstations, computers were something you would buy to perform a specific tasks. Today, computer is anything from watching Twitch to going on Twitter, complaining about Twitter, to deploying Kubernetes, right? So, they have become such commodities such… I don't want to call them single-use items, but they're more becoming single-use items as time progresses because they're just not repairable anymore. Like, if you give me a computer that's five years old, I don't know what to do with it. I probably cannot fix it if it's broken. But if you give me a computer that's 35 years old, I bet you can fix it no matter what happened.Corey: And the sheer compute changes have come so fast and furious, it's easy to lose sight of them, especially with branding being more or less the same. But I saved up and took a additional loan out when I graduated high school to spend three grand on a Dell Inspiron laptop, this big beefy thing. And for fun, I checked the specs recently, and yep, that's a Raspberry Pi these days; they're $30, and it's not going to work super well to browse the web because it's underpowered. And I'm sitting here realizing wait a minute, even with a modern computer—forget the Raspberry Pi for a second—I'm sitting here and I'm pulling up web pages or opening Slack, or God forbid, Slack and Chrome simultaneously, and the fan spins up and it sounds incredibly anemic. And it's, these things are magical supercomputers from the future. Why are they churning this hard to show me a funny picture of a cat? What's going on here?Darko: So, my theory on this is… because we can. We can argue about this, but we currently—Corey: Oh, I think you're right.Darko: We have unlimited compute capacity in the world. Like, you can come up with an idea, you're probably going to find a supercomputer out there, you're probably going to find a cloud vendor out there that's going to give you all of the resources you need to perform this massive computation. So, we didn't really think about optimization as much as we used to do in the past. So, that's it: we can. Chrome doesn't care. You have 32 gigs of RAM, Corey. It doesn't care that it takes 28 gigs of that because you have—Corey: I have 128 gigs on this thing. I bought the Mac studio and maxed it out. I gave it the hostname of us-shitpost-1 and we run with it.Darko: [laugh]. There you go. But like, I did some fiddling around, like, recently with—and again, this is just the torture myself—I did some 6502 Assembly for the Atari 2600. 6502 is a CPU that's been used in many things, including the Commodore 64, the NES, and even a whole lot of Apple IIs, and whatnot. So, when you go down to the level of a computer that has 1.19 megahertz and it has only 128 bytes of RAM, you start to think about, okay, I can move these two numbers in memory in the following two ways: “Way number one will require four CPU cycles. Way number two will require seven CPU cycles. I'll go with way number one because it will save me three CPU cycles.”Corey: Oh, yeah. You take a look at some of the most advanced computer engineering out there and it's for embedded devices where—Darko: Yeah.Corey: You need to wind up building code to run in some very tight constraints, and that breeds creativity. And I remember those days. These days, it's well my computer is super-overpowered, what's it matter? In fact, when I go in and I look at customers' AWS bills, very often I'll start doing some digging, and sure enough, EC2 is always the number one expense—we accept that—but we take a look at the breakdown and invariably, there's one instance family and size that is the overwhelming majority, in most cases. You often a—I don't know—a c5.2xl or something or whatever it happens to be.Great. Why is that? And the answer—[unintelligible 00:10:17] to make sense is, “Well, we just started with that size and it seemed to work so we kept using it as our default.” When I'm building things, because I'm cheap, I take one of the smallest instances I possibly can—it used to be one of the Nanos and I'm sorry, half a gig or a gig of RAM is no longer really sufficient when I'm trying to build almost anything. Thanks, JavaScript. So okay, I've gone up a little bit.But at that point, when I need to do something that requires something beefier, well, I can provision those resources, but I don't have it as a default. That forces me to at least in the back of my mind, have a little bit of a sense of I should be parsimonious with what it is that I'm provisioning out there, which is apparently anathema to every data scientist I've ever met, but here we are.Darko: I mean, that's the thing, like, because we're so used to just having those resources, we don't really care about optimizations. Like, I'm not advocating that you all should go and just do assembly language. You should never do that, like, unless you're building embedded systems or you're working for something—Corey: If you need to use that level of programming, you know.Darko: Exactly.Corey: You already know and nothing you are going to talk about here is going to impact what people in that position are doing. Mostly you need to know assembly language because that's a weeder class and a lot of comp-sci programs and if you don't pass it, you don't graduate. That's the only reason to really know assembly language most of the time.Darko: But you know, like, it's also a thing, like, as a developer, right, think about the person using your thing, right? And they may have the 128 gig us—what is it you called it? Us-shitpost-1, right—that kind of power, kind of, the latest and greatest M2 Max Ultra Apple computer that just does all of the stuff. You may have a big ‘ol double Xeon workstation that does a thing.Or you just may have a Chromebook. Think about us with Chromebooks. Like, can I run your website properly? Did you really need all of those animations? Can you think about reducing the amount of animations depending on screen size? So, there's a lot of things that we need to kind of think about. Like, it goes back to the thing where ‘it works on my machine.' Oh, of course it works on your machine. You spent thousands of dollars on your machine. It's the best machine in the world. Of course, it runs smoothly.Corey: Wait 20 minutes and they'll release a new one, and now, “Who sold me this ancient piece of crap?” Honestly, the most depressing thing is watching an Apple Keynote because I love my computer until I watch the Apple Keynote and it's like, oh, like, “Look at this amazing keyboard,” and the keyboard I had was fine. It's like, “Who sold me this rickety piece of garbage?” And then we saw how the Apple butterfly keyboard worked out for everyone and who built that rickety piece of garbage. Let's go back again. And here we are.Darko: Exactly. So, that's kind of the thing, right? You know, like, your computer is the best. And if you develop for it, is great, but you always have to think other people who use it. Hence, containers are great to fix one part of that problem, but not all of the problems. So, there's a bunch of stuff you can do.And I think, like, for all of the developers out there, it's great what you're doing, you're building us so many tools, but always that take a step back and optimize stuff. Optimize, both for the end-user by the amount of JavaScript you're going to throw at me, and also for the back-end, think about if you have to run your web server on a Pentium III server, could you do it? And if you could, how bad would it be? And you don't have to run it on a Pentium III, but like, try to think about what's the bottom 5% of the capacity you need? So yeah, it's just—you'll save money. That's it. You'll save money, ultimately.Corey: So, I have to ask, what you do day to day is you're a senior developer advocate, which is, hmm, some words, yes. You spend a lot of your free time and public time talking about running ancient computers, but you also talk to customers who are looking forward, not back. How do you reconcile the two?Darko: So, I like to mix the two. There's a whole reason why I like old computers. Like, I grew up in Serbia. Like, when I was young in the '90s, I didn't have any of these computers. Like, I could only see, like, what was like a Macintosh from 1997 on TV and I would just drool. Like, I wouldn't even come close to thinking about getting that, let alone something better.So, I kind of missed all of that part. But now that I started collecting all of those old computers and just everything from the '80s and '90s, I've actually realized, well, these things are not that different from something else. So, I like to always make comparisons between, like, an old system. What does it actually do? How does it compare to a new system?So, I love to mix and match in my presentations. I like to mix it, mix and match in my videos. You saw my blog posts on deploying stuff. So, I think it's just a fun way to kind of create a little contrast. I do think we should still be moving forward. I do think that technology is getting better and better and it's going to help people do so much more things faster, hopefully cheaper, and hopefully better.So, I do think that we should definitely keep on moving forward. But I always have this nostalgic feeling about, like, old things and… sometimes I don't know why, but I miss the world without the internet. And I think that without the internet, I think I miss the world with dial-up internet. Because back then you would go on the internet for a purpose. You have to do a thing, you have to wait for a while, you have to make sure nobody's on the phone. And then—Corey: God forbid you dial into a long-distance call. And you have to figure out which town and which number would be long distance versus not, at least where I grew up, and your parents would lose their freaking minds because that was an $8 phone call, which you know, back in the '80s and early '90s was significant. And yeah, great. Now, I still think is a great prank opportunity to teach kids are something that it costs more to access websites that are far away, which I guess in theory, it kind of does, but not to the end-user. I digress.Darko: I have a story about this, and I'm going to take a little sidestep. But long-distance phone calls. Like in the '80s, the World Wide Web was not yet a thing. Like, the www, the websites all, just the general purpose internet was not yet a thing. We had things called BBSes, or Bulletin Board Systems. That was the extreme version of a dial-up system.You don't dial into the internet; you dial into a website. Imagine if you have a sole intent of visiting only one website and the cost of visiting such a website would depend on where that website currently is. If the website is in Germany and you're calling from Serbia, it's going to cost you a lot of money because you're calling internationally. I had a friend back then. The best software you can get were from American BBSes, but calling America from Serbia back then would have been prohibitively expensive, like, just insanely expensive.So, what this friend used to do, he figured out if he would be connected to a BBS six hours a day, it would actually reset the counter of his phone bill. It would loop through a mechanical counter from whatever number, it would loop back again to that number. So, it would take around six and some hours to complete the loop the entire phone counting metric—whatever they use back in the '80s—to kind of charge your bill, so it's effectively cost him zero money back then. So yeah, it was more expensive, kids, back then to call websites, the further away the websites were.[midroll 00:17:11]Corey: So, developer advocates do a lot of things. And I think it is unfair, but also true that people tend to shorthand those of those things do getting on stage and giving conference talks because that at least is the visible part of it. People see that and it's viscerally is understood that that takes work and a bit of courage for those who are not deep into public speaking and those who are, know it takes a lot of courage. And whereas writing a blog post, “Well, I have a keyboard and say dumb things on the internet all the time. I don't see why that's hard.” So, there's a definite perception story there. What's your take on giving technical presentations?Darko: So, yeah. Just as you said, like, I think being a DA, even in my head was always represented, like, oh, you're just on stage, you're traveling, you're doing presentations, you're doing all those things. But it's actually quite a lot more than that, right? We do a lot more. But still, we are the developer advocate. We are the front-facing thing towards you, the wonderful developers listening to this.And we tend to be on stage, we tend to do podcasts with wonderful internet personalities, we tend to do live streams, we tend to do videos. And I think one of the key skills that a DA needs to have—a Developer Advocate needs to have—is presentations, right? You need to be able to present a technical message in the best possible way. Now, being a good technical presenter doesn't mean you're funny, doesn't mean you're entertaining, that doesn't have to be a thing. You just need to take a complex technical message and deliver it in the best way possible so that everybody who has just given you their time, can get it fully.And this means—well, it means a lot of things, but it means taking this complicated topic, distilling it down so it can be digested within 30 to 45 minutes and it also needs to be… it needs to be interesting. Like, we can talk about the most interesting topic, but if I don't make it interesting, you're just going to walk out. So, I also lead, like, a coaching class within internally, like, to teach people how to speak better and I'm working with, like, really good speakers there, but a lot of the stuff I say applies to no matter if you're a top-level speaker, or if you're, like, just beginning out. And my challenge to all of you speakers out there, like, anybody who's listening to this and it has a plan to deliver a video, a keynote, a live stream or speak at a summit somewhere, is get outside of that box. Get outside of that PowerPoint box.I'm not saying PowerPoint is bad. I think PowerPoint is a wonderful tool, but I'm just saying you don't have to present in the way everybody else presents. The more memorable your presentation is, the more outside of that box it is, the more people will remember it. Again, you don't have to be funny. You don't have to be entertaining. You just have to take thing you are really passionate about and deliver it to us in the best possible way. What that best possible way is, well, it really depends. Like a lot of things, there is no concrete answer to this thing.Corey: One of the hard parts I found is that people will see a certain technical presenter that they like and want to emulate and they'll start trying to do what they do. And that works to a point. Like, “Well, I really enjoy how that presenter doesn't read their slides.” Yeah, that's a good thing to pick up. But past a certain point, other people's material starts to fit as well as other people's shoes and you've got to find your own path.My path has always been getting people's attention first via humor, but it's certainly not the only way. In many contexts, it's not even the most effective way. It works for me in the context in which I use it, but I assure you that when I'm presenting to clients, I don't start off with slapstick comedy. Usually. There are a couple of noteworthy exceptions because clients expect that for me, in some cases.Darko: I think one of the important things is that emulating somebody is okay, as you said, to an extent, like, just trying to figure out what the good things are, but good, very objectively good things. Never try to be funny if you're not funny. That's the thing where you can try comedy, but it's very difficult to—it's very difficult to do comedy if you're not that good at it. And I know that's very much a given, but a lot of people try to be funny when they're obviously not funny. And that's okay. You don't have to be funny.So, there are many of ways to get people's attentions, by again, just throwing a joke. What I did once on stage, I threw a bottle at the floor. I was just—I said, I said a thing and threw a bottle at the floor. Everybody started paying attention all of a sudden at me. I don't know why. So, it's going to be that. It can be something—it can be be a shocking statement. When I say shocking, I mean, something, well, not bad, but something that's potentially controversial. Like, for example, emacs is better than vim. I don't know, maybe—Corey: “Serverless is terrible.”Darko: Serverl—yeah.Corey: Like, it doesn't matter. It depends on the audience.Darko: It depends on the audience.Corey: “The cloud is a scam.” I gave a talk once called, “The Cloud is A Scam,” and it certainly got people's attention.Darko: Absolutely. So, breaking up the normal flow because as a participant of a show, of a presentation, you go there you expect, look, I'm going to sit down, Corey's going to come on stage and Corey says, “Hi, my name is Corey Quinn. I'm the CEO of The Duckbill Group. This is what I do. And welcome to my talk about blah.”Corey: Tactically, my business partner, Mike, is the CEO. I don't want to I don't want to step too close to that fire, let's be clear.Darko: Oh, okay [laugh]. Okay. Then, “Today's agenda is this. And slide one, slide two, slide three.” And that the expectation of the audience. And the audience comes in in this very autopilot way, like, “Okay, I'm just going to sit there and just nod my head as Corey speaks.”But then if Corey does a weird thing and Corey comes out in a bathtub. Just the bathtub and Corey. And Corey starts talking about how bathtubs are amazing, it's the best place to relax. “Oh, by the way, managing costs in the cloud is so easy, I can do it from a bathtub.” Right? All of a sudden, whoa [laugh], wait a second, this is something that's interesting. And then you can go through your rest of your conversation. But you just made a little—you ticked the box in our head, like, “Oh, this is something weird. This is different. I don't know what to expect anymore,” and people start paying more attention.Corey: “So, if you're managing AWS costs from your bathtub, what kind of computer do you use?” “In my case, a toaster.”Darko: [laugh]. Yes. But ultimately, like, some of those things are very good and they just kind of—they make you as a presenter, unpredictable, and that's a good thing. Because people will just want to sit on the edge of the seat and, like, listen to what you say because, I don't know what, maybe he throws that toaster in, right? I don't know. So, it is like that.And one of the things that you'll notice, Corey, especially if you see people who are more presenting for a longer time, like, they've been very common on events and people know them by name and their face, then that turns into, like, not just presenting but somebody comes, literally not because of the topic, but because they want to hear Corey talk about a thing. You can go there and talk about unicorns and cats, people will still come and listen to that because it's Corey Quinn. And that's where you, by getting outside of that box, getting outside of that ‘this is how we present things at company X,' this is what you get in the long run. People will know who you are people will know, what not to expect from your presentations, and they will ultimately be coming to your presentations to enjoy whatever you want to talk about.Corey: That is the dream. I really want to thank you for taking the time to talk so much about how you view the world and the state of ancient and modern technologies and the like. If people want to learn more, where's the best place for them to find you?Darko: The best way to find me is on twitch.tv/aws these days. So, you will find me live streaming twice a week there. You will find me on Twitter at @darkosubotica, which is my Twitter handle. You will find me at the same handle on Mastodon. And just search for my name Darko Mesaroš, I'm sure I'll pop up on MySpace as well or whatever. So, I'll post a lot of cloud-related things. I posted a lot of old computer-related things, so if you want to see me deploy Kubernetes through an Atari 2600, click that subscribe button or follow or whatever.Corey: And we will, of course, include a link to this in the show notes. Thank you so much for being so generous with your time. I appreciate it.Darko: Thank you so much, Corey, for having me.Corey: Darko Mesaroš, senior developer advocate at AWS, Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry and insulting comment that you compose and submit from your IBM Selectric typewriter.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Viewing Security through an Operational Lens with Jess Dodson

    Play Episode Listen Later Apr 13, 2023 32:04


    Jess Dodson, Senior Cloud Solution Architect at Microsoft, joins Corey on Screaming in the Cloud to discuss all things security. Corey and Jess discuss the phenomenon of companies that only care about security when reacting to a breach, and Jess highlights how important it is to have both a reactive and a proactive approach to security. Jess also shares her thoughts on why it's valuable to get security and operations working well together, and why getting the basics right in security is still a more pressing priority than solving for level 10 security threats. Jess and Corey also reveal best practices when it comes to monitoring and revoking admin rights and much more. About JessChances are if you've run into “GirlGerms” online, you've spoken to Jess! Based in Brisbane, Jess joined Microsoft in 2019 and is now a Senior Cloud Solution in Cyber Security, after working in a mixture of both government and higher education industries for over 15 years. Jess regards herself as a 'recovering systems administrator' and still wears her operations hat when looking at security - doing REAL SecOps!Outside of work, Jess is mum to a 5 year old daughter, a cat, 4 chickens and a hive of bees. In her downtime, she spends far too many hours building Lego, playing video games or doing random crafty projects.Links Referenced: Twitter: https://twitter.com/girlgerms Mastodon:https://infosec.exchange/@girlgerms DevNxt: https://devnxt.nz/ TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Do you wish your developers had less permanent access to AWS? Has the complexity of Amazon's reference architecture for temporary elevated access caused you to sob uncontrollably? With Sym, you can protect your cloud infrastructure with customizable, just-in-time access workflows that can be setup in minutes. By automating the access request lifecycle, Sym helps you reduce the scope of default access while keeping your developers moving quickly. Say goodbye to your cloud access woes with Sym. Go to symops.com/corey to learn more. That's S-Y-M-O-P-S.com/coreyCorey: Welcome to Screaming in the Cloud. I'm Corey Quinn. My guest today is Jess Dodson, who's a Senior Cloud Solution Architect at Microsoft. Jess, thank you for joining me. We have been passing like ships in the night on social media for years now. It is so good to finally talk to you.Jess: Lovely to talk to you in person. Thank you for inviting me on.Corey: Well, to be clear, we're talking remotely when we record this. You are presumably Australian, and I'm still operating from a somewhat American-centric viewpoint that more or less everything in Australia is deeply poisonous.Jess: Yeah, that includes me. Yes. So, I am in Australia at the moment. I believe it is the ninth of March for you. It is the 10th of March for me [laugh].Corey: Yes, some of us are living in the future.Jess: The future. So yes, it's about seven o'clock in the morning for me, which is fabulous. I'm awake, I'm awake.Corey: So, let's talk about security. It seems to be top-of-mind and everyone's talking about it. Unfortunately, it seems that they're usually talking about it in the form of an email that starts with, “Your security is extremely important to us,” and then transitions into, “Here's how we dropped the ball on it.” I was once told by an analyst client of mine that I was the only analyst who ever told them that companies don't care about security. Like, “No one says that. Why is that?” And my answer was, “Well, no one will say it out loud, but I ignore what people say, I pay attention to what they do, and where they spend the money, and it is clearly not a priority.”And I would argue that in some ways, that's okay, depending upon context and who you are, and what you do. And in other cases, it is the exact opposite of okay; it is an unmitigated disaster for everyone, just waiting to happen. How do you feel about security?Jess: A very loaded question.Corey: Isn't it though?Jess: [laugh]. [crosstalk 00:02:20]—Corey: “I don't care,” is probably not going to be your answer on this one, I'm just going to assume, but let's go.Jess: No. Don't—well, considering my title, I would hope not, considering the work that I do.Corey: You have words in your title, but none of them are, “Cares about,” so we're good.Jess: That's true. That's true. I do care about security and that's my job. I do agree that organizations probably don't care about security until they do care about security, and by that point, it's probably too late. And that's the issue that we're facing. It's that proactive versus reactive. Reactive is great. Reactive is wonderful. But unless you're doing the proactive and spending the money on the proactive, the reactive is just constantly going to be fighting fires.Corey: It's two classes of problems. My world of cost optimization absolutely is in this world as well. Security is buying fire insurance in case the building burns down. These are all things you should probably do, but you can spend infinite money and effort and time and all of those things and it doesn't get you one iota closer to your business goals, whereas speeding time-to-market, launching into new markets, and being able to ship faster, that is transformative that gets you to your next business milestone. So, companies will overwhelmingly bias towards investing in that and not the other stuff until they're caught flat-footed.Jess: And I'm sure that's wonderful, but my issue when I come in as a security person to an organization is, what if something goes wrong and it's no longer—and I hate using buzzwords, but—when we're thinking about zero trust, one of the principles around zero trust is assuming breach, which is, it's not a matter of if but when, and it's not a matter of when but it's happening right now. Because as blue teamers, which is what I think of myself am, we have to plug all of the holes, we've got to try and patch all of the defenses, we've got to be the ones who are blocking out every attempt. An attacker only has to succeed once. And so, the way that we see it is there is going to be something in your environment, so what are you doing to make sure that that is contained as much as possible? And that's proactive work and that's something I don't think you can skimp on.Corey: It is. And it's defense-in-depth. You can also turn it into business value if you're clever enough. For example, whenever a cloud provider releases a new service that I can't figure out how to configure all I do is, I wind up scoping a security role to just that service and then leaking credentials online. I wait, you know, 20 minutes for someone to exploit it and then configure the thing, presumably to mine Bitcoin. Great. Then I turn off the Bitcoin stuff and I take the config that they've built and there we go. It's how we outsource intelligently on a budget. Uh, professional advice: please don't do that.Jess: [laugh]. I was going to say, that's certainly a unique way of configuring services in the cloud. I'm not sure I would recommend it for everyone, but for you, I can totally see that working [laugh].Corey: Yeah, I learned it from my buddy. He works at a bank. What of it? Yeah, it doesn't go well.Jess: [laugh]. When it comes to all of the stuff—and I think that's probably one of the big issues that we have with the cloud, and I love the fact that the title of this is Screaming in the Cloud because we're looking at all of the stuff that keeps coming out, everything is changing so quickly, how do you secure it when you don't know from one week to the next what new services are going to be included, what changes are going to be made to the services that you are already currently using? How do you keep up to date with that? And I think that is what leads to security being seen as ‘the no people,' which I hate. Security shouldn't be ‘no.' Security should be ‘yes, but.'It also leads to, hopefully, our operations teams being a little bit more on the ball when it comes to that security. Because if they're the ones who are looking at putting these new services or new productivity features in place, they should be vetting them from a security perspective as well. I say should. Maybe not necessarily actually happening and that's a bridge we kind of need to cross at the moment.Corey: A couple of years ago, I looked around the ecosystem, trying to find a weekly AWS-centric security newsletter, and there are some great ones now, don't get me wrong. And some of them might have existed at the time, but I didn't trip over them for one reason or another. And they tended to all fall into two buckets. Either they were security people talking to security people with a bunch of acronyms that I wasn't tracking because I don't eat, sleep, and breathe security most days, and/or they were vendor-captured and everything was, “See how terrible it is. That's why you should buy our widget.” And it almost doesn't matter who the vendor is.So, I started a Thursday issue of the newsletter that I write just for the news that is security-centric for people who don't have the word security in their job title. So, all the DevOps people of the world, the folks who are building applications, the folks who do have to care about this, but they don't have time to filter through all of the noise that everyone's putting out, and what is the stuff I should actually be paying attention to this week? And that seems to have struck a nerve, on some level. The thing I'm continually testing and being pleasantly slash unpleasantly surprised about is that I'm rarely the only person who has a particular problem.Jess: And you're definitely not in that regard. And I think when we look at security and how security is marketed, it is very pushed towards CISOs and security analysts and security operations. I, most of my life, and I still regard myself as, a recovering systems administrator. I'm a sysadmin, at heart, so I come from an operations background. The work that I've done in operations is what feeds the work that I do in security because I knew how worked, I helped build those systems.And I don't see it purely through a security lens; I see it through an operations lens. Security is useless without some form of usability. And I constantly talk about the line that divides security and usability, and where that line gets drawn. And for each organization, it's going to be different depending on their risk, depending on their business profile, what they're actually doing, how big their teams are. And we've got to make sure that we get that line right.And that means getting your operations teams involved because they know how their stuff has to work. They know what they need to be able to make the business run. So, security can't just be constantly saying, “No, you can't do that.” We have to be working with operations teams, and in some cases, taking our lead from operations teams because at the end of the day, they know this stuff better than we do. So, it's us providing advice and then providing their expertise for us to come together to a joint agreement on how to secure the thing. And I don't think that's being done at the moment. I call it the operations sandpit. Everyone needs to play nicely in the sandpit. No one should be fighting over toys. Everyone should be playing with each other with no arguments. We're not there yet.Corey: When I read a lot of security researchers and the stuff that they're focusing on in stupendous depth, or I walk around expo halls at security conferences, making fun of the various vendors there, it seems like so many of them are talking about these incredibly intricate in-depth defenses against attacks that seem relatively esoteric. And then I fly home from the security conference that I'm at and I happened, in the airport lounge, see their CEO who's yelling into a phone, and I know that they're using the same password for work that they are for their personal stuff, and it's ‘kitty.' And I know this because the Post-It note on their laptop says it. And it's, it feels like, yeah, you're selling and solving the problems at, like, level ten for complexity, but you need to effectively handle the problems at level two first and do the easy stuff and the basics before you start getting into the world of imagining that the Mossad is coming for you.Jess: And it's not even level two. I'd say we're almost—we're at level zero for a lot of this stuff. And you're a hundred percent right. A lot of security people and security organizations will be talking up all of these heavy-duty, “This is what we can do to protect you,” without even recognizing the elephant in the room of you're not even using MFA. You're not even looking at securing your administrative accounts. You're not using separate administrative accounts and user accounts, so they are the same person and login to everything and can access the internet while still performing full administrative work using that account, which is just terrifying.Corey: I still get constant notifications of security alerts from various vendors when there's a new TLS policy that should be applied to the load balancer that I'm using with them. And it's, you really think that people decrypting TLS-encrypted traffic somewhere between a user and my web server for a website that has the actual term ‘shitposting' in its domain is my number one priority on these things? It just feels like it's such a—it's this incredible cacophony of noise. And then you completely miss the next email that comes in, “Oh, by the way, you put your credentials up on GitHub. Maybe do something about that.” It just becomes this entire walking modern version of a Nessus Report with the 7000 things you need to fix, and number 3768 is, “Oh, by the way, the kitchen's on fire.” It hides this stuff. And it's awful.Jess: The prioritization of what can be done to fix some of the security holes that we see. It's out of whack, I think is the polite term to use. It's absolutely awful because we are prioritizing things that, yes, they are considered high severity in terms of what could happen, but in probability and likelihood, really, really low. Whereas things like having your password stored on a Post-It note that you have taped to your laptop that you then carry around and everyone can see it and they know who you are, maybe that's a little bit more concerning. So, it's trying to, from a security basics perspective, which I've been, I want to say ‘talking about' but it's not; it's ranting about and screaming about since about 2018.It's about making sure that we get those basics right because without the basics, all you're doing is putting Band-Aids over giant holes and giant leaks that anyone can slip through. And it's a waste of time. So, making sure that we are doing those really small basic things that lets be—we've been talking about them for years. I think I've been screaming about MFA for at least ten years now. We've had this available to us as an option. Why are we not doing it? One of the statistics that I saw recently, inside our cloud provider, only 33% of accounts are protected with MFA. 33%? What's happening with the other 66 because that's just, that's a staggering number of accounts that I would consider unsecure.Corey: On some level, you almost have to put this back on the companies themselves. Our timing is apt on this. Earlier today, GIF-hub—which is, yes, how I pronounce it—announced that starting in a couple of weeks, they're going to require MFA for every contributor on GIF-hub—or GitHub if you want to use the old-school pronunciation—and I think that's great. Everyone is used to MFA, on some level. Mandating it for accounts for which the blast radius is significant goes a long way.And yes, down the road longer-term, passkeys might make this a lot easier or not as necessary and/or better methods of authenticating against an MFA because typing in six-digit codes is annoying and goes out-of-bounds on these things. But I don't think it's necessarily the end of the world to make the sensitive accounts just a little bit harder to access.Jess: I understand that for some folks, MFA, it's that next step, it's that next barrier, it's another thing that they then have to do. It sounds terrible finding what I call silver linings in some of the events that have been happening recently, but if anything, particularly here where I am in my very small, tiny neck of the woods in Australia, some of the recent breaches that we have seen over here have made not just operations and technical people, but end-users, consumers, more aware of the fact that it only takes one slip-up, it only takes one small thing for something bad to happen. And anything that they can do to protect themselves is a good thing. So, we're starting to see a bit of a shift towards an acceptance for some of those trickier things like MFA.It's a pain in the bum. It's not something that is what I would consider user-experience-friendly. We're getting better at it, but it's still an issue when it comes to signing in and having to find your phone, which most of the time you have no idea where it is; you've got to use whatever automation, you have to call your phone to find it to be able to find your code so you can actually log in. It is a pain. But anything that we can do to be able to protect even our personal accounts is something that I consider to be really important.And it's not just for organizations. I've got to admit, I'm still working on my parents. They are of that generation where maybe it's still a little bit of a step too far. But just trying to get people that I know who aren't technical to start looking at things like this because as we move forward, it's going to become the norm and I'd much rather people become comfortable with it now than later when it is forced upon them.Corey: One of the problems that I've got is, we just went through our annual security awareness training that we are contractually obligated to provide to everyone and all the vendors that do this, I have problems with it, in different ways, in different degrees. It's all terrible, on some level. And it's not that these vendors themselves are bad, but it's the state of security for most folks. One thing that gets hammered home in all of these trainings is, “Don't click on the wrong link because if you do that it might destroy the company.” And I can't help but think if someone in the accounting department clicks the wrong link and it destroys the company, I can't really get myself to a point where I blame the accountant for that. I don't feel that's the accountant's job.Jess: That is definitely not the accountants job. And there is no way that a single user or a single device should be able to take out an organization. If that is the case, something has gone [laugh] very badly wrong. And I would say the blame is definitely not on the person who clicked the link. There would be a quite a range of people who would probably be hauled over the coals in regards to that one.Educating users on what to be aware of and what to look out for, and what to not click on versus click on, how to spot scams, all of that is helpful and beneficial, not just for an organization, but I also see it as very useful even in their own personal lives. Because we are seeing ransomware scams targeting individuals, we're seeing some of those awful scams coming from tax departments or coming from anything that's talking about, “You've got to fine. You need to pay this. Please go off and buy Target gift cards.” Those are the kinds of things that we want people to be aware of even in their personal lives.But if an organization can be taken down by one of those emails, then there has been something that hasn't been put in place. I would say ‘something.' A lot of things that haven't been put in place. A lot of the work that we do from a security perspective is to limit what we call a blast radius, to try and reduce the impact an incident will have on an organization. And clicking on an email like that should produce an alert. It should say, “Hey, you probably shouldn't be accessing this website.”Even if they put their username and password in, it's the credentials of that account that have been compromised. That can be reset. If necessary, the account can be rebuilt, but it certainly shouldn't be something that brings down an entire organization. And I think our messaging around that puts the burden on users when it should be on those of us who are technical to have the… not necessarily the accountability, but the responsibility for looking after that and ensuring that that is not the case. And again, that comes back to that, number one basics and making sure we've got the basics right. Because if you're clicking on something like that and it's going to be installing malware, you shouldn't have admin rights on your machine. That's just j—no. Bad. But number two, it means that [crosstalk 00:19:41]—Corey: And as part of that, the software we need to do our jobs should not require admin rights to run.Jess: Oh—Corey: There are certain—a couple of vendors that are hearing their ears burn because I'm thinking about them very hard right now.Jess: Oh, if I have one more piece of software, “I need domain admin rights to be able to run.” “I need global admin rights.” No you don't [laugh]. You really, really don't. You are being lazy, you're taking the piss, you're making it easier for yourself, and causing a massive security hole for your customer. Stop it.Back to my point [laugh]. If our operations folks and our security folks can work well together, we can get around some of that burden that we put on our users to be able to work out what it is from a usability perspective they need while still giving them the security they require. And I think when we're looking at putting security in place, it's that putting security first. It's not an afterthought. It's built into whatever we are doing, whatever processes we have, whatever systems we're bringing in, whatever solutions we're looking at using.It's thought of at the beginning, rather than, “Ah, maybe we should talk to security about that.” That involves a little bit of a culture shift, I realize that's going to take a bit of time [laugh]. I can do technology, people are someone else's problem; that is definitely not me to try and fix, and each organization will have their own battle when it comes to that [laugh].Corey: There's also this idea that—you know, companies figure this out after the first time they get it hilariously wrong—that not every person should have access to everything. I appreciate the idea of transparent culture, don't get me wrong, but if I'm not working with a particular customer, why should I have access to that customer's data, just lying around? It should be something that takes work, that you have to affirmatively say, “Yes, this is what I should be looking at right now.” One of the early things I learned back when I was going through one of the innumerable compliance regimens that I have throughout the course of my career is, it's a heck of a lot easier if you can constrain the regulated or sensitive information to as smallest surface area as possible.Here's the PCI environment where all that stuff lives, and yeah, you turn on all the obnoxious, difficult security things involving that environment, but what that means then is you don't have customer data in staging and development environments to worry about and you can relax a lot of the other controls, just because you don't need to have that high-friction process for people to do things that are completely unrelated to the sensitivity of that data. And that still seems like it's a revelation for some folks.Jess: For a lot of folks. So, the idea of role-based access control and giving people the rights they need to be able to do their job, no more and no less. And again, it's a balancing act of trying to work out what that is. And when it comes to the people who are doing that job, I often find when looking at putting role-based access control in place, it's never end-users that are the problem. Ever. It's always technical people because they need all of the access all of the time. You don't.So, it's working out what they actually need versus what they want [laugh]. And that's an even harder discussion to have. So, I find it's not what access do you need, it's what are you trying to do, and let's work backwards from that. And that's something that I still think a lot of organizations haven't managed to get right. And also from an access perspective, that governance—again, we're getting into process and policy and people—ahh—but it's that access governance lifecycle as well.Just because someone had access doesn't necessarily they need that access going forward. And making sure we're reviewing those levels of access going forward and removing people who don't need that anymore. Normally, when I'm talking about that, people are thinking like IT folks who have lots of privilege or finance people who have lots of privilege. I'll be honest. The worst folks in any organization for access privileges are executive assistants.Those people collect access rights like it's going out of fashion because they never get revoked. And they will move from person to person, from organization to internal organization collecting all of those rights, and they'll maintain them for the entire length of their stay with that company. It's an extraordinary amount of rights.Corey: They're like the human equivalent of the CI/CD server because it has access into every environment, it's generally configured by hand and evolves naturally as bespoke. “Infrastructure as Code for everything except the Jenkins box. We just take a disk snapshot of that and kind of hope for the best if we ever have to rebuild it somehow.” Yeah.Jess: [laugh]. They are. And they want—like, don't get me wrong, they are wonderful folks. If something happens to them, usually the organization is in a lot of trouble. But when they leave, looking at the sheer amount of access they have, the mail permissions they have to be able to send on behalf of so many folks in the organization.And you're like, this is a lot of trust and a lot of responsibility to give to one person. So, making sure that the rights that the folks have in your organization are what they need, they… are checked regularly. And I think that's the part that gets skipped a lot. It's easy to give rights; it's harder to remove them. So, it's making sure that those access reviews are being done to say, “Hey, you're no longer the EA for that person. Maybe you don't need sender's permissions on their mailbox anymore. Maybe you don't need access into their personal files and their calendar to be able to see exactly what's going on and being able to look at that personal folder that contains all of their photos and files of what's going on.” So, making sure that [laugh] you're reducing the risk, not just on the organization, but on those people as well. Because that's a lot of responsibility to have should something go wrong.Corey: That's part of the problem, too, I think, is that the surface areas gotten complex and the sheer number of services that we all use to go about our jobs, regardless of what those jobs might happen to be, is so overwhelmingly massive. I remember going through an acquisition at one point, and we had something like 40 people at the company, and we were well over 800 distinct SaaS products that we were using throughout the entire company for different things. And how many of these are important? How many things are going in other directions? And if we look across the entire surface area of these companies and the things that we're using, no one knows what's going on in their environment and where.Okay, you get access to this? I don't know, there's this ridiculous little thing I used to caption funny pictures. Okay, it's technically a SaaS product, but it's probably not critical path. Oh, here's the system we use to do payroll. Yeah, you probably want to double-check that one. And it all ones have lumped together in the big bucket of, “We have no idea what's in this thing.”Jess: Oh, my God. I need people to understand that documentation is important [laugh] and this is exactly why. Unless it is written down, unless someone has been able to take out of their head a decision that was made about this as a product we are using, this as what it does, this is how critical it is to our organization, unless that is somewhere, you end up in exactly the situation you're talking about. You've got all of these products and you don't know which ones are business-critical. You don't know which ones are considered your primary goals, particularly in the case of a BCP, so a Business Continuity Plan or Disaster Recovery perspective.They're the ones that you want to protect and that should be somewhere. But a lot of people seem to think that documentation is boring and documentation is it's not necessary. It's my head. Why would I need to write any of that down? I know it all.Please make sure you're getting it out of your head, I don't want to have to pilfer through your head to find the stuff. And I know that there's a couple of folks, personally that I know who will feel very attacked by this. Just because it's in your head, just because you know, doesn't mean everybody else knows. And I'm very much of the opinion that if someone asks you a question twice, that is a piece of information that needs to be on a piece of paper somewhere that other people can see.And we don't do that enough. We don't pass on information enough and share information enough. There is very much an information-hoarding mentality for a lot of folk of, “I need to protect this information to be able to keep my job.” And that is just not the case. Because in the case of a disaster, in the case of a merger, where you're trying to work out what is actually needed versus what is not, that information is crucial and it can cost time and a lot of money if you don't know that information, if you don't have someone that has put that down and you can reference it in an easy and quick way.Corey: I really want to thank you for taking the time to go through how you think about these things. If people want to learn more, where's the best place for them to find you? And please don't say Australia.Jess: Well, yes, Australia, but no one wants to come down here because again, we have things that will kill you. I would say Twitter or Mastodon. You can find me as at @girlgerms, if you haven't found me already. I am on InfoSec.exchange as @girlgerms.I will also be speaking at a number of conferences coming up here in Australia. Again, if you want to come to Australia, please do, but I know that recordings of these will be available at some point, so I will be speaking. Luckily, I've been invited to go and speak in New Zealand, which is very exciting. So, I'm going to be speaking at DevNxt in Christchurch on the 21st of April.And I will also be at [unintelligible 00:29:33] between the 9th and 12th of May this year. Again, that's in the future, literally in the future. So, I will be talking about all sorts of things related to operations and security operations, my talk at [unintelligible 00:29:48] actually going to be really interesting. It's talking about using your operations folks and how to get the best out of them from a security perspective, based on my history being an operations person and how I've moved into security and how that can be a real benefit and an asset to an organization.Corey: And we will, of course, put links to that in the [show notes 00:30:08]. Thank you so much for being so generous with your time. I appreciate it.Jess: [No dramas 00:30:12]. Thank you very much for having me.Corey: Jess Dodson, Senior Cloud Solution Architect at Microsoft, I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice, along with an angry comment that I'll log in as you and delete later because you use the same password for everything you log into. It's ‘poisonouskitty.'Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Uptycs and Security Awareness with Jack Roehrig

    Play Episode Listen Later Apr 11, 2023 35:25


    Jack Roehrig, Technology Evangelist at Uptycs, joins Corey on Screaming in the Cloud for a conversation about security awareness, ChatGPT, and more. Jack describes some of the recent developments at Uptycs, which leads to fascinating insights about the paradox of scaling engineering teams large and small. Jack also shares how his prior experience working with AskJeeves.com has informed his perspective on ChatGPT and its potential threat to Google. Jack and Corey also discuss the evolution of Reddit, and the nuances of developing security awareness trainings that are approachable and effective.About JackJack has been passionate about (obsessed with) information security and privacy since he was a child. Attending 2600 meetings before reaching his teenage years, and DEF CON conferences shortly after, he quickly turned an obsession into a career. He began his first professional, full-time information-security role at the world's first internet privacy company; focusing on direct-to-consumer privacy. After working the startup scene in the 90's, Jack realized that true growth required a renaissance education. He enrolled in college, completing almost six years of coursework in a two-year period. Studying a variety of disciplines, before focusing on obtaining his two computer science degrees. University taught humility, and empathy. These were key to pursuing and achieving a career as a CSO lasting over ten years. Jack primarily focuses his efforts on mentoring his peers (as well as them mentoring him), advising young companies (especially in the information security and privacy space), and investing in businesses that he believes are both innovative, and ethical.Links Referenced: Uptycs: https://www.uptycs.com/ jack@jackroehrig.com: mailto:jack@jackroehrig.com jroehrig@uptycs.com: mailto:jroehrig@uptycs.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey:  LANs of the late 90's and early 2000's were a magical place to learn about computers, hang out with your friends, and do cool stuff like share files, run websites & game servers, and occasionally bring the whole thing down with some ill-conceived software or network configuration. That's not how things are done anymore, but what if we could have a 90's style LAN experience along with the best parts of the 21st century internet? (Most of which are very hard to find these days.) Tailscale thinks we can, and I'm inclined to agree. With Tailscale I can use trusted identity providers like Google, or Okta, or GitHub to authenticate users, and automatically generate & rotate keys to authenticate devices I've added to my network. I can also share access to those devices with friends and teammates, or tag devices to give my team broader access. And that's the magic of it, your data is protected by the simple yet powerful social dynamics of small groups that you trust. Try now - it's free forever for personal use. I've been using it for almost two years personally, and am moderately annoyed that they haven't attempted to charge me for what's become an absolutely-essential-to-my-workflow service.Corey: Kentik provides Cloud and NetOps teams with complete visibility into hybrid and multi-cloud networks. Ensure an amazing customer experience, reduce cloud and network costs, and optimize performance at scale — from internet to data center to container to cloud. Learn how you can get control of complex cloud networks at www.kentik.com, and see why companies like Zoom, Twitch, New Relic, Box, Ebay, Viasat, GoDaddy, booking.com, and many, many more choose Kentik as their network observability platform. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. This promoted episode is brought to us by our friends at Uptycs and they have once again subjected Jack Roehrig, Technology Evangelist, to the slings, arrows, and other various implements of misfortune that I like to hurl at people. Jack, thanks for coming back. Brave of you.Jack: I am brave [laugh]. Thanks for having me. Honestly, it was a blast last time and I'm looking forward to having fun this time, too.Corey: It's been a month or two, ish. Basically, the passing of time is one of those things that is challenging for me to wrap my head around in this era. What have you folks been up to? What's changed since the last time we've spoken? What's coming out of Uptycs? What's new? What's exciting? Or what's old with a new and exciting description?Jack: Well, we've GA'ed our agentless architecture scanning system. So, this is one of the reasons why I joined Uptycs that was so fascinating to me is they had kind of nailed XDR. And I love the acronyms: XDR and CNAPP is what we're going with right now. You know, and we have to use these acronyms so that people can understand what we do without me speaking for hours about it. But in short, our agentless system looks at the current resting risk state of production environment without the need to deploy agents, you know, as we talked about last time.And then the XDR piece, that's the thing that you get to justify the extra money on once you go to your CTO or whoever your boss is and show them all that risk that you've uncovered with our agentless piece. It's something I've done in the past with technologies that were similar, but Uptycs is continuously improving, our anomaly detection is getting better, our threat intel team is getting better. I looked at our engineering team the other day. I think we have over 300 engineers or over 250 at least. That's a lot.Corey: It's always wild for folks who work in small shops to imagine what that number of engineers could possibly be working on. Then you go and look at some of the bigger shops and you talk to them and you hear about all the different ways their stuff is built and how they all integrate together and you come away, on some level, surprised that they're able to work with that few engineers. So, it feels like there's a different perspective on scale. And no one has it right, but it is easy, I think, in the layperson's mindset to hear that a company like Twitter, for example, before it got destroyed, had 5000 engineers. And, “What are they all doing?” And, “Well, I can see where that question comes from and the answer is complicated and nuanced, which means that no one is going to want to hear it if it doesn't fit into a tweet itself.” But once you get into the space, you start realizing that everything is way more complicated than it looks.Jack: It is. Yeah. You know, it's interesting that you mention that about Twitter. I used to work for a company called Interactive Corporation. And Interactive Corporation is an internet conglomerate that owns a lot of those things that are at the corners of the internet that not many people know about. And also, like, the entire online dating space. So, I mean, it was a blast working there, but at one point in my career, I got heavily involved in M&A. And I was given the nickname Jack the RIFer. RIF standing for Reduction In Force.Corey: Oof.Jack: So, Jack the RIFer was—yeah [laugh] I know, right?Corey: It's like Buzzsaw Ted. Like, when you bring in the CEO with the nickname of Buzzsaw in there, it's like, “Hmm, I wonder who's going to hire a lot of extra people?” Not so much.Jack: [laugh]. Right? It's like, hey, they said they were sending, “Jack out to hang out with us,” you know, in whatever country we're based out of. And I go out there and I would drink them under the table. And I'd find out the dirty secrets, you know.We would be buying these companies because they would need optimized. But it would be amazing to me to see some of these companies that were massive and they produced what I thought was so little, and then to go on to analyze everybody's job and see that they were also intimately necessary.Corey: Yeah. And the question then becomes, if you were to redesign what that company did from scratch. Which again, is sort of an architectural canard; it was the easiest thing in the world to do is to design an architecture from scratch on a whiteboard with almost an arbitrary number of constraints. The problem is that most companies grow organically and in order to get to that idealized architecture, you've got to turn everything off and rebuild it from scratch. The problem is getting to something that's better without taking 18 months of downtime while you rebuild everything. Most companies cannot and will not sustain that.Jack: Right. And there's another way of looking at it, too, which is something that's been kind of a thought experiment for me for a long time. One of the companies that I worked with back at IC was Ask Jeeves. Remember Ask Jeeves?Corey: Oh, yes. That was sort of the closest thing we had at the time to natural language search.Jack: Right. That was the whole selling point. But I don't believe we actually did any natural language processing back then [laugh]. So, back in those days, it was just a search index. And if you wanted to redefine search right now and you wanted to find something that was like truly a great search engine, what would you do differently?If you look at the space right now with ChatGPT and with Google, and there's all this talk about, well, ChatGPT is the next Google killer. And then people, like, “Well, Google has Lambda.” What are they worried about ChatGPT for? And then you've got the folks at Google who are saying, “ChatGPT is going to destroy us,” and the folks in Google who are saying, “ChatGPT's got nothing on us.” So, if I had to go and do it all over from scratch for search, it wouldn't have anything to do with ChatGPT. I would go back and make a directed, cyclical graph and I would use node weight assignments based on outbound links. Which is exactly what Google was with the original PageRank algorithm, right [laugh]?Corey: I've heard this described as almost a vector database in various terms depending upon what it is that—how it is you're structuring this and what it looks like. It's beyond my ken personally, but I do see that there's an awful lot of hype around ChatGPT these days, and I am finding myself getting professionally—how do I put it—annoyed by most of it. I think that's probably the best way to frame it.Jack: Isn't it annoying?Corey: It is because it's—people ask, “Oh, are you worried that it's going to take over what you do?” And my answer is, “No. I'm worried it's going to make my job harder more than anything else.” Because back when I was a terrible student, great, write an essay on this thing, or write a paper on this. It needs to be five pages long.And I would write what I thought was a decent coverage of it and it turned out to be a page-and-a-half. And oh, great. What I need now is a whole bunch of filler fluff that winds up taking up space and word count but doesn't actually get us to anywhere—Jack: [laugh].Corey: —that is meaningful or useful. And it feels like that is what GPT excels at. If I worked in corporate PR for a lot of these companies, I would worry because it takes an announcement that fits in a tweet—again, another reference to that ailing social network—and then it turns it into an arbitrary length number of pages. And it's frustrating for me just because that's a lot more nonsense I have to sift through in order to get the actual, viable answer to whatever it is I'm going for here.Jack: Well, look at that viable answer. That's a really interesting point you're making. That fluff, right, when you're writing that essay. Yeah, that one-and-a-half pages out. That's gold. That one-and-a-half pages, that's the shit. That's the stuff you want, right? That's the good shit [laugh]. Excuse my French. But ChatGPT is what's going to give you that filler, right? The GPT-3 dataset, I believe, was [laugh] I think it was—there's a lot of Reddit question-and-answers that were used to train it. And it was trained, I believe—the data that it was trained with ceased to be recent in 2021, right? It's already over a year old. So, if your teacher asked you to write a very contemporary essay, ChatGPT might not be able to help you out much. But I don't think that that kind of gets the whole thing because you just said filler, right? You can get it to write that extra three-and-a-half pages from that five pages you're required to write. Well, hey, teachers shouldn't be demanding that you write five pages anyways. I once heard, a friend of mine arguing about one presidential candidate saying, “This presidential candidate speaks at a third-grade level.” And the other person said, “Well, your presidential candidate speaks at a fourth-grade level.” And I said, “I wish I could convey presidential ideas at a level that a third or a fourth grader could understand” You know? Right?Corey: On some level, it's actually not a terrible thing because if you can only convey a concept at an extremely advanced reading level, then how well do you understand—it felt for a long time like that was the problem with AI itself and machine-learning and the rest. The only value I saw was when certain large companies would trot out someone who was themselves deep into the space and their first language was obviously math and they spoke with a heavy math accent through everything that they had to say. And at the end of it, I didn't feel like I understood what they were talking about any better than I had at the start. And in time, it took things like ChatGPT to say, “Oh, this is awesome.” People made fun of the Hot Dog/Not A Hot Dog App, but that made it understandable and accessible to people. And I really think that step is not given nearly enough credit.Jack: Yeah. That's a good point. And it's funny, you mentioned that because I started off talking about search and redefining search, and I think I use the word digraph for—you know, directed gra—that's like a stupid math concept; nobody understands what that is. I learned that in discrete mathematics a million years ago in college, right? I mean, I'm one of the few people that remembers it because I worked in search for so long.Corey: Is that the same thing is a directed acyclic graph, or am I thinking of something else?Jack: Ah you're—that's, you know, close. A directed acyclic graph has no cycles. So, that means you'll never go around in a loop. But of course, if you're just mapping links from one website to another website, A can link from B, which can then link back to A, so that creates a cycle, right? So, an acyclic graph is something that doesn't have that cycle capability in it.Corey: Got it. Yeah. Obviously, my higher math is somewhat limited. It turns out that cloud economics doesn't generally tend to go too far past basic arithmetic. But don't tell them. That's the secret of cloud economics.Jack: I think that's most everything, I mean, even in search nowadays. People aren't familiar with graph theory. I'll tell you what people are familiar with. They're familiar with Google. And they're familiar with going to Google and Googling for something, and when you Google for something, you typically want results that are recent.And if you're going to write an essay, you typically don't care because only the best teachers out there who might not be tricked by ChatGPT—honestly, they probably would be, but the best teachers are the ones that are going to be writing the syllabi that require the recency. Almost nobody's going to be writing syllabi that requires essay recency. They're going to reuse the same syllabus they've been using for ten years.Corey: And even that is an interesting question there because if we talk about the results people want from search, you're right, I have to imagine the majority of cases absolutely care about recency. But I can think of a tremendous number of counterexamples where I have been looking for things explicitly and I do not want recent results, sometimes explicitly. Other times because no, I'm looking for something that was talked about heavily in the 1960s and not a lot since. I don't want to basically turn up a bunch of SEO garbage that trawled it from who knows where. I want to turn up some of the stuff that was digitized and then put forward. And that can be a deceptively challenging problem in its own right.Jack: Well, if you're looking for stuff has been digitized, you could use archive.org or one of the web archive projects. But if you look into the web archive community, you will notice that they're very secretive about their data set. I think one of the best archive internet search indices that I know of is in Portugal. It's a Portuguese project.I can't recall the name of it. But yeah, there's a Portuguese project that is probably like the axiomatic standard or like the ultimate prototype of how internet archiving should be done. Search nowadays, though, when you say things like, “I want explicitly to get this result,” search does not want to show you explicitly what you want. Search wants to show you whatever is going to generate them the most advertising revenue. And I remember back in the early search engine marketing days, back in the algorithmic trading days of search engine marketing keywords, you could spend $4 on an ad for flowers and if you typed the word flowers into Google, you just—I mean, it was just ad city.You typed the word rehabilitation clinic into Google, advertisements everywhere, right? And then you could type certain other things into Google and you would receive a curated list. These things are obvious things that are identified as flaws in the secrecy of the PageRank algorithm, but I always thought it was interesting because ChatGPT takes care of a lot of the stuff that you don't want to be recent, right? It provides this whole other end to this idea that we've been trained not to use search for, right?So, I was reviewing a contract the other day. I had this virtual assistant and English is not her first language. And she and I red-lined this contract for four hours. It was brutal because I kept on having to Google—for lack of a better word—I had to Google all these different terms to try and make sense of it. Two days later, I'm playing around with ChatGPT and I start typing some very abstract commands to it and I swear to you, it generated that same contract I was red-lining. Verbatim. I was able to get into generating multiple [laugh] clauses in the contract. And by changing the wording in ChatGPT to save, “Create it, you know, more plaintiff-friendly,” [laugh] that contract all of a sudden, was red-lined in a way that I wanted it to be [laugh].Corey: This is a fascinating example of this because I'm married to a corporate attorney who does this for a living, and talking to her and other folks in her orbit, the problem they have with it is that it works to a point, on a limited basis, but it then veers very quickly into terms that are nonsensical, terms that would absolutely not pass muster, but sound like something a lawyer would write. And realistically, it feels like what we've built is basically the distillation of a loud, overconfident white guy in tech because—Jack: Yes.Corey: —they don't know exactly what they're talking about, but by God is it confident when it says it.Jack: [laugh]. Yes. You hit the nail on that. Ah, thank you. Thank you.Corey: And there's as an easy way to prove this is pick any topic in the world in which you are either an expert or damn close to it or know more than the average bear about and ask ChatGPT to explain that to you. And then notice all the things that glosses over or what it gets subtly wrong or is outright wrong about, but it doesn't ever call that out. It just says it with the same confident air of a failing interview candidate who gets nine out of ten questions absolutely right, but the one they don't know they bluff on, and at that point, you realize you can't trust them because you never know if they're bluffing or they genuinely know the answer.Jack: Wow, that is a great analogy. I love that. You know, I mentioned earlier that the—I believe the part of the big portion of the GPT-3 training data was based on Reddit questions and answers. And now you can't categorize Reddit into a single community, of course; that would be just as bad as the way Reddit categories [laugh] our community, but Reddit did have a problem a wh—I remember, there was the Ellen Pao debacle for Reddit. And I don't know if it was so much of a debacle if it was more of a scapegoat situation, but—Corey: I'm very much left with a sense that it's the scapegoat. But still, continue.Jack: Yeah, we're adults. We know what happened here, right? Ellen Pao is somebody who is going through some very difficult times in her career. She's hired to be a martyr. They had a community called fatpeoplehate, right?I mean, like, Reddit had become a bizarre place. I used Reddit when I was younger and it didn't have subreddits. It was mostly about programming. It was more like Hacker News. And then I remember all these people went to Hacker News, and a bunch of them stayed at Reddit and there was this weird limbo of, like, the super pretentious people over at Hacker News.And then Reddit started to just get weirder and weirder. And then you just described ChatGPT in a way that just struck me as so Reddit, you know? It's like some guy mansplaining some answer. It starts off good and then it's overconfidently continues to state nonsensical things.Corey: Oh yeah, I was a moderator of the legal advice and personal finance subreddits for years, and—Jack: No way. Were you really?Corey: Oh, absolutely. Those corners were relatively reasonable. And like, “Well, wait a minute, you're not a lawyer. You're correct and I'm also not a financial advisor.” However, in both of those scenarios, what people were really asking for was, “How do I be a functional adult in society?”In high school curricula in the United States, we insist that people go through four years of English literature class, but we don't ever sit down and tell them how to file their taxes or how to navigate large transactions that are going to be the sort of thing that you encounter in adulthood: buying a car, signing a lease. And it's more or less yeah, at some point, you wind up seeing someone with a circumstance that yeah, talk to a lawyer. Don't take advice on the internet for this. But other times, it's no, “You cannot sue a dog. You have to learn to interact with people as a grown-up. Here's how to approach that.” And that manifests as legal questions or finance questions, but it all comes down to I have been left on prepared for the world I live in by the school system. How do I wind up addressing these things? And that is what I really enjoyed.Jack: That's just prolifically, prolifically sound. I'm almost speechless. You're a hundred percent correct. I remember those two subreddits. It always amazes me when I talk to my friends about finances.I'm not a financial person. I mean, I'm an investor, right, I'm a private equity investor. And I was on a call with a young CEO that I've been advising for while. He runs a security awareness training company, and he's like, you know, you've made 39% off of your investment three months. And I said, “I haven't made anything off of my investment.”I bought a safe and, you know—it's like, this is conversion equity. And I'm sitting here thinking, like, I don't know any of the stuff. And I'm like, I talk to my buddies in the—you know, that are financial planners and I ask them about finances, and it's—that's also interesting to me because financial planning is really just about when are you going to buy a car? When are you going to buy a house? When are you going to retire? And what are the things, the securities, the companies, what should you do with your money rather than store it under your mattress?And I didn't really think about money being stored under a mattress until the first time I went to Eastern Europe where I am now. I'm in Hungary right now. And first time I went to Eastern Europe, I think I was in Belgrade in Serbia. And my uncle at the time, he was talking about how he kept all of his money in cash in a bank account. In Serbian Dinar.And Serbian Dinar had already gone through hyperinflation, like, ten years prior. Or no, it went through hyperinflation in 1996. So, it was not—it hadn't been that long [laugh]. And he was asking me for financial advice. And here I am, I'm like, you know, in my early-20s.And I'm like, I don't know what you should do with your money, but don't put it under your mattress. And that's the kind of data that Reddit—that ChatGPT seems to have been trained on, this GPT-3 data, it seems like a lot of [laugh] Redditors, specifically Redditors sub-2001. I haven't used Reddit very much in the last half a decade or so.Corey: Yeah, I mean, I still use it in a variety of different ways, but I got out of both of those cases, primarily due to both time constraints, as well as my circumstances changed to a point where the things I spent my time thinking about in a personal finance sense, no longer applied to an awful lot of folk because the common wisdom is aimed at folks who are generally on a something that resembles a recurring salary where they can calculate in a certain percentage raises, in most cases, for the rest of their life, plan for other things. But when I started the company, a lot of the financial best practices changed significantly. And what makes sense for me to do becomes actively harmful for folks who are not in similar situations. And I just became further and further attenuated from the way that you generally want to give common case advice. So, it wasn't particularly useful at that point anymore.Jack: Very. Yeah, that's very well put. I went through a similar thing. I watched Reddit quite a bit through the Ellen Pao thing because I thought it was a very interesting lesson in business and in social engineering in general, right? And we saw this huge community, this huge community of people, and some of these people were ridiculously toxic.And you saw a lot of groupthink, you saw a lot of manipulation. There was a lot of heavy-handed moderation, there was a lot of too-late moderation. And then Ellen Pao comes in and I'm, like, who the heck is Ellen Pao? Oh, Ellen Pao is this person who has some corporate scandal going on. Oh, Ellen Pao is a scapegoat.And here we are, watching a community being socially engineered, right, into hating the CEO who's just going to be let go or step down anyways. And now they ha—their conversations have been used to train intelligence, which is being used to socially engineer people [laugh] into [crosstalk 00:22:13].Corey: I mean you just listed something else that's been top-of-mind for me lately, where it is time once again here at The Duckbill Group for us to go through our annual security awareness training. And our previous vendor has not been terrific, so I start looking to see what else is available in that space. And I see that the world basically divides into two factions when it comes to this. The first is something that is designed to check the compliance boxes at big companies. And some of the advice that those things give is actively harmful as in, when I've used things like that in the past, I would have an addenda that I would send out to the team. “Yeah, ignore this part and this part and this part because it does not work for us.”And there are other things that start trying to surface it all the time as it becomes a constant awareness thing, which makes sense, but it also doesn't necessarily check any contractual boxes. So it's, isn't there something in between that makes sense? I found one company that offered a Slackbot that did this, which sounded interesting. The problem is it was the most condescendingly rude and infuriatingly slow experience that I've had. It demanded itself a whole bunch of permissions to the Slack workspace just to try it out, so I had to spin up a false Slack workspace for testing just to see what happens, and it was, start to finish, the sort of thing that I would not inflict upon my team. So, the hell with it and I moved over to other stuff now. And I'm still looking, but it's the sort of thing where I almost feel like, this is something ChatGPT could have built and cool, give me something that sounds confident, but it's often wrong. Go.Jack: [laugh]. Yeah, Uptycs actually is—we have something called a Otto M8—spelled O-T-T-O space M and then the number eight—and I personally think that's the cutest name ever for Slackbot. I don't have a picture of him to show you, but I would personally give him a bit of a makeover. He's a little nerdy for my likes. But he's got—it's one of those Slackbots.And I'm a huge compliance geek. I was a CISO for over a decade and I know exactly what you mean with that security awareness training and ticking those boxes because I was the guy who wrote the boxes that needed to be ticked because I wrote those control frameworks. And I'm not a CISO anymore because I've already subjected myself to an absolute living hell for long enough, at least for now [laugh]. So, I quit the CISO world.Corey: Oh yeah.Jack: Yeah.Corey: And so, much of it also assumes certain things like I've had people reach out to me trying to shill whatever it is they've built in this space. And okay, great. The problem is that they've built something that is aligned at engineers and developers. Go, here you go. And that's awesome, but we are really an engineering-first company.Yes, most people here have an engineering background and we build some internal tooling, but we don't need an entire curriculum on how to secure the tools that we're building as web interfaces and public-facing SaaS because that's not what we do. Not to mention, what am I supposed to do with the accountants in the sales folks and the marketing staff that wind up working on a lot of these things that need to also go through training? Do I want to sit here and teach them about SQL injection attacks? No, Jack. I do not want to teach them that.Jack: No you don't.Corey: I want them to not plug random USB things into the work laptop and to use a password manager. I'm not here trying to turn them into security engineers.Jack: I used to give a presentation and I onboarded every single employee personally for security. And in the presentation, I would talk about password security. And I would have all these complex passwords up. But, like, “You know what? Let me just show you what a hacker does.”And I'd go and load up dhash and I'd type in my old email address. And oh, there's my password, right? And then I would—I copied the cryptographic hash from dhash and I'd paste that into Google. And I'd be like, “And that's how you crack passwords.” Is you Google the cryptographic hash, the insecure cryptographic hash and hope somebody else has already cracked it.But yeah, it's interesting. The security awareness training is absolutely something that's supposed to be guided for the very fundamental everyman employee. It should not be something entirely technical. I worked at a company where—and I love this, by the way; this is one of the best things I've ever read on Slack—and it was not a message that I was privy to. I had to have the IT team pull the Slack logs so that I could read these direct communications. But it was from one—I think it was the controller to the Vice President of accounting, and the VP of accounting says how could I have done this after all of those phishing emails that Jack sent [laugh]?Corey: Oh God, the phishing emails drives me up a wall, too. It's you're basically training your staff not to trust you and waste their time and playing gotcha. It really creates an adversarial culture. I refuse to do that stuff, too.Jack: My phishing emails are fun, all right? I did one where I pretended that I installed a camera in the break room refrigerator, and I said, we've had a problem with food theft out of the Oakland refrigerator and so I've we've installed this webcam. Log into the sketchy website with your username and password. And I got, like, a 14% phish rate. I've used this campaign at multinational companies.I used to travel around the world and I'd grab a mic at the offices that wanted me to speak there and I'd put the mic real close to my head and I say, “Why did you guys click on the link to the Oakland refrigerator?” [laugh]. I said, “You're in Stockholm for God's sake.” Like, it works. Phishing campaigns work.They just don't work if they're dumb, honestly. There's a lot of things that do work in the security awareness space. One of the biggest problems with security awareness is that people seem to think that there's some minimum amount of time an employee should have to spend on security awareness training, which is just—Corey: Right. Like, for example, here in California, we're required to spend two hours on harassment training every so often—I think it's every two years—and—Jack: Every two years. Yes.Corey: —at least for managerial staff. And it's great, but that leads to things such as, “Oh, we're not going to give you a transcript if you can read the video more effectively. You have to listen to it and make sure it takes enough time.” And it's maddening to me just because that is how the law is written. And yes, it's important to obey the law, don't get me wrong, but at the same time, it just feels like it's an intentional time suck.Jack: It is. It is an intentional time suck. I think what happens is a lot of people find ways to game the system. Look, when I did security awareness training, my controls, the way I worded them, didn't require people to take any training whatsoever. The phishing emails themselves satisfied it completely.I worded that into my control framework. I still held the trainings, they still made people take them seriously. And then if we have a—you know, if somebody got phished horrifically, and let's say wired $2 million to Hong Kong—you know who I'm talking about, all right, person who might is probably not listening to this, thankfully—but [laugh] she did. And I know she didn't complete my awareness training. I know she never took any of it.She also wired $2 million to Hong Kong. Well, we never got that money back. But we sure did spend a lot of executive time trying to. I spent a lot of time on the phone, getting passed around from department to department at the FBI. Obviously, the FBI couldn't help us.It was wired from Mexico to Hong Kong. Like the FBI doesn't have anything to do with it. You know, bless them for taking their time to humor me because I needed to humor my CEO. But, you know, I use those awareness training things as a way to enforce the Code of Conduct. The Code of Conduct requiring disciplinary action for people who didn't follow the security awareness training.If you had taken the 15 minutes of awareness training that I had asked people to do—I mean, I told them to do it; it was the Code of Conduct; they had to—then there would be no disciplinary action for accidentally wiring that money. But people are pretty darn diligent on not doing things like that. It's just a select few that seems to be the ones that get repeatedly—Corey: And then you have the group conversations. One person screws something up and then you wind up with the emails to everyone. And then you have the people who are basically doing the right thing thinking they're being singled out. And—ugh, management is hard, people is hard, but it feels like a lot of these things could be a lot less hard.Jack: You know, I don't think management is hard. I think management is about empathy. And management is really about just positive reinforce—you know what management is? This is going to sound real pretentious. Management's kind of like raising a kid, you know? You want to have a really well-adjusted kid? Every time that kid says, “Hey, Dad,” answer. [crosstalk 00:30:28]—Corey: Yeah, that's a good—that's a good approach.Jack: I mean, just be there. Be clear, consistent, let them know what to expect. People loved my security program at the places that I've implemented it because it was very clear, it was concise, it was easy to understand, and I was very approachable. If anybody had a security concern and they came to me about it, they would [laugh] not get any shame. They certainly wouldn't get ignored.I don't care if they were reporting the same email I had had reported to me 50 times that day. I would personally thank them. And, you know what I learned? I learned that from raising a kid, you know? It was interesting because it was like, the kid I was raising, when he would ask me a question, I would give him the same answer every time in the same tone. He'd be like, “Hey, Jack, can I have a piece of candy?” Like, “No, your mom says you can't have any candy today.” They'd be like, “Oh, okay.” “Can I have a piece of candy?” And I would be like, “No, your mom says you can't have any candy today.” “Can I have a piece of candy, Jack?” I said, “No. Your mom says he can't have any candy.” And I'd just be like a broken record.And he immediately wouldn't ask me for a piece of candy six different times. And I realized the reason why he was asking me for a piece of candy six different times is because he would get a different response the sixth time or the third time or the second time. It was the inconsistency. Providing consistency and predictability in the workforce is key to management and it's key to keeping things safe and secure.Corey: I think there's a lot of truth to that. I really want to thank you for taking so much time out of your day to talk to me about think topics ranging from GPT and ethics to parenting. If people want to learn more, where's the best place to find you?Jack: I'm jack@jackroehrig.com, and I'm also jroehrig@uptycs.com. My last name is spelled—heh, no, I'm kidding. It's a J-A-C-K-R-O-E-H-R-I-G dot com. So yeah, hit me up. You will get a response from me.Corey: Excellent. And I will of course include links to that in the show notes. Thank you so much for your time. I appreciate it.Jack: Likewise.Corey: This promoted guest episode has been brought to us by our friends at Uptycs, featuring Jack Roehrig, Technology Evangelist at same. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment ghostwritten for you by ChatGPT so it has absolutely no content worth reading.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    Improving the Developer Experience with Aja Hammerly

    Play Episode Listen Later Apr 6, 2023 33:59


    Aja Hammerly, Developer Relations Manager at Google Cloud, joins Corey on Screaming in the Cloud to discuss her unexpected career journey at Google and what she's learned about improving the developer experience throughout her career. Aja and Corey discuss the importance of not only creating tools for developers that are intuitive and easy to adopt, but also cater to different learning styles. Aja describes why it's so important to respond with curiosity when a user does something seemingly random within a piece of software, and also reveals why she feels so strongly about the principle of least surprise when it comes to the developer experience. About AjaAja lives in Seattle where's she's a Developer Relations Manager at Google. She's currently excited about developer experience, software supply chain security, and becoming a better manager and mentor. In her free time she enjoys skiing, kayaking, cooking, knitting, and spending long hours in the garden.Links Referenced: Google Cloud: http://cloud.google.com/developers Personal Website: https://www.thagomizer.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn and I am joined today by Aja Hammerly, who's a Developer Relations Manager over at a small company called Google Cloud. Aja, thank you for joining me.Aja: Thank you for having me. I've been looking forward to this for quite a while.Corey: You have been at Google for, well, let's call it eons because that's more or less how it feels when you're coming from my position of being great at getting yourself fired from various companies as a core skill. How long have you been there now? And what is your trajectory been like over that time?Aja: So, I've been in there a little over eight years. And the funny part of that is that it was intended to be a two-year gig for me. I moved from consulting developer working on, you know, building out websites for other people to Google's like, “Hey, you want to do this advocacy [unintelligible 00:01:19] relations thing?” And I'm like, “Sure.” And at the time, I'm like, there's no way I'm going to last more than two, three years in this. I hadn't really held the job much longer than that.Turns out, I like it. And then they're like, “Hey, do you want to manage people doing this job?” And I'm like, “That sounds like fun. Let's try that.” And it turns out, I like that just as much if not more. Because I haven't picked a major in tech yet and so managing people doing a bunch of different things is a fantastic way for me to keep my squirrel brain engaged in all the different topics and, you know, always bouncing around. So, it's been great. Cloud's been—well, I started, Cloud was very, very, very small back in 2014 compared to what it is now. Google Cloud is way bigger.Corey: Google Cloud, if you take a look at its entire portfolio, I'm probably one of the only people in the world who looks at it and says, “Yeah, that seems like a reasonably small number of services,” just because I eat, sleep, and breathe in the firehose of AWS announcing every feature as a service. But let's be clear here, Google Cloud is fairly broad in terms of what it does and what it offers. Where do you start and where do you stop? Because increasingly, the idea of, “Oh, I am responsible for talking about everything that Google Cloud does,” is a—it's clearly a fantasy.Aja: Yeah. No, there's too much for any one person to be an expert on. I could give you a list of products, but that's not interesting, quite frankly, because I would prefer that people don't think about it as a set of products. Because—Corey: Why is this such a rare perspective to have? It drives me nuts whenever I go to a cloud conference, and okay, here's the database track, and here's the container track, and here's the storage track. It doesn't matter if I'm building Hello World, let alone anything more complicated. I have to think about all of it.Aja: Yeah. So, I don't know why it's a rare perspective, but at least within my—the folks that I look up to, the folks that I consider mentors internally, we tend to think more about audiences or problems. And the problem that I care the most about—I cared about this well before Google, and Google just pays me to care about it, which is awesome—I care about developers. I one hundred percent want to help people build cool stuff, ideally efficiently, as quickly as possible.I worked at a startup and as part of that experience, I learned that sometimes you just need to get something out quick. I wanted tools that would let me do that. When I worked in consulting, the whole gig was to get something out quick that folks could look at, folks could touch, then we could do feedback, we could iterate, we could come back. And so, I want to make developers successful and I want to get out of their way. And I've always liked tools like that as a developer; I don't want to have to read your 10,000-page manual in order to learn your product.So, when I come to Google Cloud, I'm focused on the products that help developers build stuff quickly. And by developers, I mean, developers. I mean, the people who are hands-on-keyboard with Python, with Go, with Java, building it out features for their employer or, you know—Corey: What about really crappy bash? Does that count?Aja: Sure. If you're going to build some sort of application, a really crappy bash. Awesome.Corey: You'd be surprised. My primary development stack usually is a combination of brute force and enthusiasm.Aja: Yeah. Honestly, there are days that I feel that way, too. And I was working on some demo stuff over the weekend and I'm like, “Well, I could sit down and actually read this code or I could just run it, fix the first bug, and then run it again and fix the next bug. Yeah, let's do it that way.” Brute force is fine.Corey: I think the iterative development.Aja: Yeah, iterative development and/or brute force. Whatever. It works. And, you know, if people want to build cool stuff, cool. Let's help them do that. That's what I get to do every day is figure out how I can make it easier for developers to build cool stuff.Corey: The thing that keeps confusing me, for lack of a better term, is that I see a bunch of companies talking in similar directions of, “Yeah, we want to talk to developers and we want to meet them across the stack about the problems they're having.” “Great, so what's your answer?” And then they immediately devolve it into industry verticals. As if the finance company is going to have problems that the healthcare company could never fathom happening. It's, you get that you to look an awful lot alike, right, and things that work for one of you are going to map them at least 80, if not 90 percent of what the other is doing? But nope, nope, completely separate audiences; completely separate approaches. And I find myself increasingly skeptical about that model.Aja: Yes. I think—I see that too. I have sometimes behave that way. And I think it's because… it's a combination of factors. One is people want to believe that their problems are unique and special. I've worked in edtech, I've worked on real estate stuff, I've worked in… a lot of different fields. As I said, haven't picked a major over my career.I've done a lot of different jobs, worked on a lot of different fields. I have passing knowledge of the electrical industry from an early, early job. And yeah, it's all code. At the end of the day, it's all code. But people like to believe that their problems are unique and special because they want to be unique and special. And cool. People can be unique and special. I am there to support that.I also think that different altitudes see the problems differently. So, if you're someone fairly high up and you're at a healthcare company versus a finance company, you're going to be thinking about things like your different regulatory requirements, your different security requirements. And some of those are going to be imposed by you by law, some of those are going to be imposed by you by your company policies, ethics, I would hope. But if you're the actual developer, I need to store some stuff in a database. Like, down at the lower level where you're actually writing code, getting your hands on keyboard, getting dirty, the problems all start looking roughly the same after a while.And so, you need different people to tell those stories to the different audiences because the higher level folks thinking about regulatory requirements or thinking about efficiencies, they're going to just have a different perspective than the folks I like to go chat with who are the ones banging out features.Corey: I'll take it one step further. Whenever I'm building something and I'm Googling around and talking to people in the community about how to do a certain thing and everyone looks at me like I've lost it, that is a great early warning sign that maybe I'm not doing something the right way. Now, yes, the ultimate product that I'm developing at the end of the day, maybe that's going to be different and differentiated—or at least funnier in my case—but the idea of, well, then I need to write that value back to a database, and people look at me like, “Writing data to a database? Why would you do such a thing?” Like, that's an indication that I might be somewhat misaligned somewhere.The other failure mode, of course, is when you start Googling around—because that's what we do when we're trying to figure out how to do something with a given service—and the only people ever talking about that service are the company who has built that thing. That's also not a great sign. There, at least for my purposes, needs to be a critical mass of community around a particular product where I can at least be somewhat reassured that I'm not going to be out twisting in the wind as the only customer of this thing.Aja: Yeah. No, a hundred percent agree as someone who's, in past lives, evaluated, you know, which APIs, which products, which companies we're going to work with. Having really great developer docs, having really great materials was always important. And I don't tend to read docs, so when I say materials, I like stuff that's interactive because I'm just going to dive in and fix the issues later. That's just how my brain works.But you know, people are different. Everyone has different learning preferences. But if there is a community, that means that you have lots of ways to get help. And you know, super concretely, I'm not a Kubernetes expert, I did some talks on it back in 2015 when it was brand new and shiny, I can use it and understand it, but I'm not an expert. I have other people on my team who have had the time to go deep.When I need help with Kubernetes, even though I work at Google, I've actually gone to my local community. I go to my local DevOps Slack, or I go to my local Kubernetes groups and stuff to get help. And I like that because it gives me all those different perspectives. I also know that if I'm asking you a question that no one understands—and I've had that experience [laugh] numerous times—either I'm doing something wrong, or the actual thing that I found more often is I'm explaining it in words that people don't understand. And that's always a challenge is figuring out the right words to go search for, the right phrasing of something so that everyone else understands the terms you're using. And that's a huge challenge, especially for folks that don't speak English as their primary language or their first language. Because we have lots of different ways to say the same thing, especially when it comes to code.Corey: I've noticed that. There are almost too many ways to do certain things, and they're all terrible and different ways, let's be clear. But that does mean that whenever I'm trying to find something that's 90% done on GitHub or whatnot, I will often find something that fits pretty well, and it's, “Well, I guess I'm learning TypeScript today because that's”—Aja: Yep.Corey: —“What it's written in,” versus building it in the language I have a bit more familiarity with, like, you know, crappy bash.Aja: Yep. Nope, I think that's a problem that anyone who's been developing on a deadline or, you know, spending a lot of time doing proof-of-concept stuff is super familiar with. And I think sometimes folks who haven't worked in those environments, at least not recently, forget that that's our reality. Like, “Cool. Okay, I guess today I'm learning Elastic,” was definitely a day I had when I was consulting. Or, “Cool. Okay. Swift is new”—because that's how long ago that was—“I guess we're all learning swift this afternoon.”Corey: I've been banging on for a fair bit now about the value of developer experience from my point of view. But given that you work with developers all the time, I'm betting a you have a more evolved position on it than I do, which distills down to, the better the developer experience, the happier I am, which is generally not something you can measure super effectively. Where do you stand on the topic?Aja: So, this is one of my passion points. I feel very strongly that tools should fit the workflows that developers have; developers shouldn't alter themselves to work toward their tools. I also think there's kind of a misunderstanding of the nature of developer experience that I've seen, especially from some of… a lot of different companies. Developer experience is not actually tools. Developer experience is the experience you as a developer have while using those tools.So, APIs: I like things that don't have surprises; I like things to get out of my way. I know that we joke about there being 9000 ways to run containers, or you know, five different ways to do this other thing, but you know, if that means it's faster to get something done, and you know, most of those are equally bad or equally good, I'm okay with it because it gets out of my way, and lets me ship the thing I want to ship. I enjoy coding; I think computers are rad, but what I enjoy more is having something finished that I can show to other people, that I can use, that I can make better. So, some of the things I feel super strongly about with developer experience is principle of least surprise. I was a Rubyist for years and years and years.Haven't written a lot of Ruby the last two, three years—management, I'll do that to you—but… I loved that once you understood some of the underlying concepts behind Ruby, stuff generally worked the way you expected. I know a lot of people find the very nature of Ruby surprising, but for those of us who learned it, stuff generally worked the way I expected. So, I like it when APIs do that. I like it when it's super easy to guess. Consistent naming schemes, you know?If you're going to call… you're going to call the way to list a set of services ‘list' in one place, don't call it ‘directory' in another. Keep things consistent. I like it when APIs and the cloud companies all—I've had many thoughts about all of the big cloud companies in this—when their APIs that they provide a fit the language. If you're making me write TypeScript like C because your APIs are really just designed by C programmers and you've loosely skinned them, that's not a great experience, that's not going to make me happy, it's going to slow me down. And quite frankly, my tools should speed me up, not slow me down. And that's kind of the underlying theme behind all of my feelings about developer experience is, I don't want to be angry when I'm writing code unless I'm angry at myself because I can't stop writing bugs.Corey: I don't really want to bias for that one either, personally.Aja: Yeah. And then the other one is, I don't want my tools to be something that I have to learn as a thing. I don't want there to have to be a multi-week experience of learning this particular API. Because that is interesting, potentially, but I'm not being paid to learn an API, I'm being paid to get something done. So, all of the learning of the API needs to be in service of getting something done, so it should go as quickly as possible. Stuff should just work the way I expect it.We're never going to do that. This I acknowledge. No company is ever going to get that right no matter where I work because turns out, everyone's brains are different. We all have different expectations. But we can get closer. We can talk to folks, we can do UX studies. Everyone thinks about UI and UX and design is very much focused on the visual.And one of the things I've learned since I've had the opportunity to hang out with some really amazing UX folks at Google—because big companies have advantages like that, you have lots of people doing UX—is that they can actually help us with our command line interfaces, they can help us with how we name things in an API, they can do studies on that and turn, you know, “It feels good,” into numbers. And that is fascinating to me and I think something that a lot of developers who are building tools for other developers don't realize is actually up there as an option.I spend a lot of time reading UX studies on developer experience. Managers working at big companies, you get to have access to data like that going back years. And I've spent a lot of time reading about this because I want to understand how we turn, “Feels good,” into something that we can develop against, that we can design against, and we can measure.Corey: One of the things that I've always viewed as something of a… a smell or a sign that ‘Here Be Dragons' are when I'm looking for a tool to solve a problem and there's a vendor in the space, great, awesome, terrific. Someone has devoted a lot of energy and effort to it. I want the problem solved; I don't necessarily need to do it for free or cheap. But I'm looking through their website and they talk about how awesome their training programs are and here's how you can sign up for a four-day course, and et cetera, et cetera, et cetera. That feels to me like in most cases, someone has bungled the interface. If I need to spend weeks on end learning how a particular tool works in order to be effective with it, on some level, you reach a point, fairly quickly for anything small, where the cure is worse than the disease.Aja: Yep. This is an interesting thing for me because I, my personal feelings are very similar to yours. I don't want to take a class. Like, if I have to take a class, we have failed. I don't really want to read extensive documentation.I want to get in, get dirty, try it, see, you know, watch the errors, come back and learn from the errors, that kind of thing. If there's code to read and it's a language, I know, I will actually often go read code as opposed to reading docs, which… is weird. The interesting thing to me is that, as I've managed folks, as I've, you know, spent time working with customers, working with folks who I, you know, think would benefit from some of Google Cloud's products, there are some folks who really, really want that formal training, they want that multi-day course before they dig in. And so, while in the past, I definitely would have agreed with you, if it's the only thing, maybe, but if it's one of many different ways to get started, I just keep telling myself, “Hey, that's how someone else needs to learn this.” Isn't my preference, but my preference isn't necessarily better.It's just, this is the brain I got and the tools that came with it. And it doesn't do well for four days in a classroom learning all of the intricacies of this because I need to learn this stuff in context, otherwise, it doesn't stick. Whereas I have people that work for me, I've had people who I've worked with who are like, “No, I actually need to go read the book.” And I'm like, “Let's make sure that there's both a book.”Corey: Everyone learns differently.Aja: Yeah. I just constantly reminding myself, both as a manager and also as someone who works in developer relations, all of the above is the correct option for how are we going to teach this? How are we going to help people? We really need to offer as much as possible all of the above because we need to be able to reach everyone in a way that works for them.Corey: It's the height of hubris to believe that everyone thinks, processes, learns, et cetera, the same way that you do. This is a weird confession for someone who hosts a podcast. I don't learn very well by listening to podcasts. I find that when I'm trying to absorb something if I can read it, it sticks with me in a way that listening to something or even watching a video doesn't.Aja: Yeah, and I'm actually very much the opposite. I take most of my information and learn best through hearing things. So, while I don't particularly like watching video, relatively often, I'll actually have video if I'm just doing like email or something running in the background and I'm listening to the audio as I'm learning the concepts. I adore learning stuff from podcasts, I love audiobooks, but I don't retain stuff as well when I read it. And it's just because, you know, human beings are interesting and weird and not consistent, in all sorts of delightful and confusing ways.Which, you know, as an engineer sometimes drives me nuts because I really wish there was one right way to do stuff that worked for everyone, but there just isn't. There are all sorts of interesting problems. And just like there are multiple valid ways to solve problems, there are multiple valid ways to learn, and we have to support all of them. And we have to support engineers with all of those styles too. People often will say, “Oh, sure. There's lots of learning, different learning styles, but you know, most engineers are like X.” No. There is no ‘most engineers.'Corey: Early on in my career, one of the things I noticed—in myself as well, let's be clear here, I was no saint—that, oh, if people don't learn the way that I learned, then clearly they're deficient in some way. Of course that's not true. Everyone learns differently. And that, among other things, was part of the reason that I stopped being as dismissive as I was about certifications, for example, or signing up for a planned classroom experience. There is tremendous value to folks who learn well from that type of structured learning.I am just barely contained chaos. And for whatever reason, that doesn't resonate with me in the same way. If anything, I'm the one that's broken. The trick is, is making sure that when you're trying to drive adoption, no matter what method people learn best from, you need to be there with something that approximates that. One area that I think resonates with something you said earlier is this idea that the best way for me to learn something, at least is to sit down and build something with it. Bar none, that's what I actually want to experience. And that is only slightly informed by the unfortunate reality that I've been through too many cycles of an exec in a keynote getting on stage and making grandiose promises that the service does not backup.Aja: Yep. And I actually do have a bias here that I will own. I don't believe in anything until I can touch it. And by ‘touch it,' I mean, use it. And that also includes I don't believe in my own learning or the learning of others until I can see it practically applied.And so, even when I have folks on my team who are like, “Hey, I want to go read a book, take a class,” I'm like, “Cool. What else are you going to do with that? How are you going to know that you can actually take what you learned and apply it to a novel situation?” And this has been based on mentors I had early in my career who I'm like, “Hey, I just read this book.” And they're like, “That's awesome. Can you write anything with what you learned?”And I'm like, “Yes. Let me do that and prove it to myself.” So, I do have a bias there. I also have a bias, having worked in the industry for 20-plus years now, that a lot of people say a lot of things that are either theoretically true or true through, you know, a happy path lens. And I started my career as a tester and compu—I always joke computers run in fear of me because if there's a way to cause something to error out in a confusing and unknown way, I will find it. I will find it on accident. And when I can't find it on accident, I will find it on purpose.So, that's the other reason I don't believe stuff until I touch it. It doesn't matter if it's at a keynote, doesn't matter if it's a blog post, I want to know that this works beyond that happy case that you just showed me. And part of this is also that I've built some of those keynote demos and I know that they're explicitly designed so that we can fit in the timeframe allowed to avoid any possible dragons that might be lurking in the background. So, I always go get dirty with things, new products, new features. It's one of the things I actually love about my job is I get to try stuff before anyone else does.And I'm like, “Hey, so, um… I did this thing. You probably didn't expect anyone to do this thing, but I did this thing. Can we talk about whether this thing that I did is actually a valid use case? Because it made sense to me, but you know, I might have been thinking about this backwards, upside down, in purple, so let's back the truck up and have a discussion.”Corey: Yeah, I get to moonlight occasionally as something that looks vaguely like an analyst at a variety of different companies. And as a part of that, I'm often kicking the tires on something that they're going to be releasing soon. And a very common failure mode is that, for obvious reasons, no one has ever really looked at this thing from the perspective of I've never seen this before or heard of this before. Let me approach this as someone who's learning about it for the first time. The documentation is always treated as an afterthought at those stages where it's, “Oh yeah, just spin it up and do it. And you do the thing that we all know about, right?” “Well, okay, assume I don't have that shared understanding. What's the experience?” And, “Oh.” Yeah, if I'm not on the path of a few pre-planned test cases, then everything falls off pretty quickly. I think I share that somewhat special ability to cause chaos and destruction to all about me [laugh] when I start trying to do something in good faith on the computer.Aja: Yeah. No, it's both a blessing and a curse. It's really annoying when like, I managed to brick my work laptop on the morning that I have, you know, a super important talk and I call up, you know, internal tech support at Google and they're like, “You did what, and how?” But it's also great because I know that… I know that I get to—because I started my career in tests working at other companies, I've always done some informal testing no matter where I've worked, everything I find we at least know about, even if we don't have time to fix it. We at least know about it, so if someone else runs into it, we can at least help them untangle whatever crazy stuff they did.And I'm also just not afraid of breaking computers either, which means that I'm very willing to go off happy paths. If I see a tutorial that's close, you know, if all of the steps that work, and I'll guess on the others. And that's a thing that I don't actually see a ton of folks being always willing to do because they're afraid of breaking it. And I'm like, “It's software.”Corey: And a lot of products are designed though, that once you deviate from the happy path, well, now you've broken it and you get to keep all the pieces. There's little attention paid towards, okay, now you've done something else and you're bringing something back into the happy path. It feels like if you haven't been here for every step of the way, well, your problem now. I have work to do. Go away kids, you're bothering me.Aja: Yeah, I've seen that. And I've seen that open-source frameworks, too, when people—when I talk about, you know, deviating from the happy path—and this will date me—using multiple databases with Rails was one of the ones that I ran into numerous times. Just was not designed for that in the beginning. Did not work. There was also some easy security stuff, ages and ages ago, that you often wanted to do, but was not at that point integrated into the framework, so it was clunky.And so, anyone would come to, like, a Ruby meetup or something like, “Hey, I want to use three databases with my Rails application,” we'd be like, “So, you can… but you may not actually want to do it that way. Can we interest you in some microservices so that you can go one-to-one?” And that wasn't always the right choice. I worked on an app for years that had multiple databases in Rails, one was a data warehouse, one was our production database. And it was clunky.And eventually, you know, the Rails community got better. Eventually, people do improve, but people are weird. They do weird things. Like, and I don't think people truly understand that. One of my jobs at various points was I was working in education tech and I was working on an application for kindergarteners.And I don't have kids, but I think kindergarteners are just [unintelligible 00:24:44]. And until you see five-year-olds use software, I don't think people get a true appreciation for how random human beings can actually be when given a textbox or when given a mouse. And, like, we like to think that, you know, engineers and adults are better. We're not. We just, you know, have a different set of five-year-old tools available to us.So, we do have to at least acknowledge that people are going to go do weird stuff. And some of that weird stuff probably makes sense in the context they're living in, and so, the best step is not to say, “Hey, stop doing weird stuff.” The best thing to then say is, “Okay, why did you do it that way?” Because everyone has good reasons for the decisions they make most of the time. And understanding those is important.Corey: Yeah. It's very rare—not entirely unheard of, but at least rare—that when someone shows up and says, “Okay, I'm making a bunch of choices today. What are the worst ones I can possibly make that I'm going to be tripping over for the next five years and leave is my eternal legacy to every engineer who ever works at this company after I do?” But it happens all the time, for better or worse.Aja: Yeah.Corey: Never intentional, but it always hits us.Aja: Yeah. Well, one of the things that I learned in the last-ten ish years, and one of the things that I tried to bring to all of my developer relations, all my developer education work, is, “It made sense at the time.” Now, it may have been that they made a assumption six years ago, that led them down the path of chaos and sadness and now that they're deep into this, they're going to have to back up to that decision six years ago and undo it. But based on the knowledge they had, the assumptions they were making—which may or may not have been true, but you know, were likely made in good faith—they're doing their best. And even when that's not true, I haven't found a situation where, assuming that with regards to technical decisions is harmful.Assume that people are relatively intelligent. They may not have the time to go learn all of your tools, the intricacies and use things exactly the way that you want them to be used because time is a limited resource, but assume that they're relatively intelligent and they're doing their best. And then try to understand why. What assumptions, what skills, what previous knowledge led them down this crazy path? And you know, then you can start having a conversation about okay, well, what should the tools do? How should the tools work together? Just because I wouldn't make that decision doesn't mean that their version of it is necessarily bad. It may not be the most efficient way to get stuff done, but if it works, eh, okay.Corey: So, as we wind up coming towards the end of this episode, one thing that I want to explore a little bit is, you've been with Google Cloud for eight years now. How have you seen the organization evolve during that time? Because from my perspective, back then it was oh, “Google has a cloud? Yeah, I guess they do.” It's a very different story, but all of my perspective is external. How have you seen it?Aja: Oh, that's an interesting question. And I'll caveat that appropriately with I only see the parts I see. One of the hard parts of big companies is, I don't actually dig in on some of the areas particularly deeply. I don't go deep on data analytics, I don't go deep on AI/ML. And I will also [laugh] own the fact that when I started, I'm like, “Oh, Google has a cloud? Huh. Okay, yeah, sure, I'll work on that.”I didn't even know the list of products my first day. I knew about App Engine and I knew that it didn't work with my preferred languages so I had a sad. Some of the things that I've seen. I've seen a real focus on how we can help people with real big problems. I've seen a real focus on listening to customers that I really like.I've learned a lot of techniques that we've been shared out, things like empathy sessions, friction logging. If you're not with the community of developer relations about how we make sure that, as an engineering team, we're thinking about real customer problems. I've seen a lot of maturing thoughts around how we maintain products; how we make sure that we've got updates where we need them, as much as we can; how we talk about our products; how we listen to customers and take, you know, direct feature requests from them.The other big thing is, I've just seen us grow. And that's the big thing is that there's just so many more people than when I started. And I've never worked at a company this big before and just getting my head around the number of people who are actively trying to make cloud better, and spending every day doing their best to improve the products, to add the features that are missing, to make sure that we're keeping stuff up to date where we can, it's kind of mind-boggling. Like, when I go look at the org chart, I'm like, “Wait, there are how many people working on what?” And that in and of itself is a story because that, to me at least shows that we care about getting it right. Google cares about getting it right.I'm not Google, of course, but I feel like from the inside, I can say that Google cares about getting it right as much as we can. And you know, sometimes it's not a hundred percent what people want, which is why we iterate. But we've also had a couple of things that I'm particularly happy with Cloud Run, I think landed pretty well.Corey: I'd say that I would agree with what you're saying. I've had nothing but positive experiences when I've been building admittedly very small-scale shitposting-style stuff on top of Google Cloud. There have been times where the biggest criticism I have is, “It's not the particular flavor of broken that I'm used to coming from AWS-land.” But that's hardly a fair criticism. I think that by and large, it is a superior platform coming from the perspective of developer experience.And people don't like it when I say that and they often like it even less when I say, “And thus, it has better security than something that does not have better user experience because simplicity is everything in that space.” But it's true. It is foundationally and fundamentally true.Aja: I agree with you. Obviously, it's my employer. But I do think you actually were onto something interesting with, “My particular flavor of broken.” I've talked to a lot of folks who are migrating and sometimes they struggle because there are particular magic incantations or other things that they learn to work with a different tool. It's the same thing is when you're learning a new language, a new programming language, or a new framework. You're like, “Wait, I don't have to do this thing. But I'm really good at doing that thing.”And so, I do think there is to some degree, everything—nothing's perfect and it happens to be, you know, it's hard for some folks. And I think some folks resist the better developer experience because it isn't what they're used to. And that's okay, too. Like, if I was a developer, I wouldn't want to have to relearn everything from scratch, so I get that and I think that that is a valid piece of feedback.[unintelligible 00:31:22] it make it familiar to folks working from other clouds, we're working on it. There's stuff coming out of DevRel. There's other things that we do to try to make it easier. But no, I do think, and I'm very grateful I get to work with a lot of teams to do this, we want to make developers like working with Google Cloud. I want to make developers like working with Google Cloud.Like, at the end of the day, if I had to say the most important thing for me is I want to make developers enjoy their time using Google Cloud to get other stuff done. I don't need to live in a world of people who are like, “You know, I really just want to go spend some time on Google Cloud today,” but I want it to be something that they enjoy using or at least gets out of their way, out their way to doing the stuff that they actually want to do: you know, add features, build shitposting websites, whatever it ends up being.Corey: As someone who does an awful lot of that, thanks. It's appreciated. I really want to thank you for spending so much time talking to me. If people want to learn more, where's the best place to find you?Aja: Oh. That's the best place to find me right now is www.thagomizer.com. Thagomizer is the spiky part of the end—at the end—Corey: Of a Stegosaurus.Aja: —of a Stegosaurus.Corey: Yes.Aja: It is. That is my website and it has my most recent social, et cetera on it. That's also where I theoretically blog, although it's been about a year. I've got, as I said, as I mentioned before the show, I've got several blog posts three-quarters of the way done that I'm going to hopefully try to get out over the next couple of weeks… on various topics.Corey: I have a pile of those myself, that for some reason, it never quite ends up happening when you hope it will.Aja: Yeah, exactly.Corey: And we'll, of course, put links to all of that in the [show notes 00:32:47]. Thank you so much for being so generous with explaining your point of view I appreciate it.Aja: Yeah. And thank you for having me. This was lovely.Corey: Likewise. Aja Hammerly, Developer Relations Manager at Google Cloud. I'm Cloud Economist Corey Quinn, and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment telling me exactly which four-week course I need to sign up for to understand that comment.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    The Rise of Generative AI with Raj Bala

    Play Episode Listen Later Apr 4, 2023 32:13


    Raj Bala, Founder of Perspect, joins Corey on Screaming in the Cloud to discuss all things generative AI. Perspect is a new generative AI company that is democratizing the e-commerce space, by making it possible to place images of products in places that would previously require expensive photoshoots and editing. Throughout the conversation, Raj shares insights into the legal questions surrounding the rise of generative AI and potential ramifications of its widespread adoption. Raj and Corey also dig into the question, “Why were the big cloud providers beaten to the market by OpenAI?” Raj also shares his thoughts on why company culture has to be organic, and how he's hoping generative AI will move the needle for mom-and-pop businesses. About RajRaj Bala, formerly a VP, Analyst at Gartner, led the Magic Quadrant for Cloud Infrastructure and Platform Services since its inception and led the Magic Quadrant for IaaS before that.  He is deeply in-tune with market dynamics both in the US and Europe, but also extending to China, Africa and Latin America.  Raj is also a software developer and is capable of building and deploying scalable services on the cloud providers to which he wrote about as a Gartner analyst.  As such, Raj is now building Perspect, which is a SaaS offering at the intersection of AI and E-commerce.Raj's favorite language is Python and he is obsessed with making pizza and ice cream. Links Referenced:Perspect: https://perspect.com TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: This episode is sponsored in part by our friends at Thinkst Canary. Most folks find out way too late that they've been breached. Thinkst Canary changes this. Deploy Canaries and Canarytokens in minutes and then forget about them. Attackers tip their hand by touching 'em giving you one alert, when it matters. With 0 admin overhead and almost no false-positives, Canaries are deployed (and loved) on all 7 continents. Check out what people are saying at canary.love today!Corey: This episode is sponsored in part by our friends at Chronosphere. When it costs more money and time to observe your environment than it does to build it, there's a problem. With Chronosphere, you can shape and transform observability data based on need, context and utility. Learn how to only store the useful data you need to see in order to reduce costs and improve performance at chronosphere.io/corey-quinn. That's chronosphere.io/corey-quinn. And my thanks to them for sponsor ing my ridiculous nonsense. Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Back again, after a relatively brief point in time since the last time he was on, is Raj Bala. Formerly a VP analyst at Gartner, but now instead of talking about the past, we are talking, instead, about the future. Raj, welcome back. You're now the Founder at Perspect. What are you doing over there?Raj: I am indeed. I'm building a SaaS service around the generative AI space at the intersection of e-commerce. So, those two things are things that I'm interested in. And so, I'm building a SaaS offering in that space.Corey: This is the first episode in which we're having an in-depth discussion about generative AI. It's mostly been a topic that I've avoided because until now, relatively recently, it's all been very visual. And it turns into sort of the next generation's crappy version of Instagram, where, “Okay. Well, Instagram's down, so can you just describe your lunch to me?” It's not compelling to describe a generated image on an audio-based podcast. But with the advent of things like ChatGPT, where suddenly it's muscling into my realm, which is the written word, suddenly it feels like there's a lot more attention and effort being paid to it in a bunch of places where it wasn't getting a lot of coverage before, including this one. So, when you talk about generative AI, are you talking in the sense of visual, in terms of the written word, in terms of all of the above, and more? Where's your interest lie?Raj: I think it's all of the above and more. My interest is in all of it, but my focus right now is on the image aspect of things. I've been pretty knee-deep in stable diffusion and all the things that it entails, and it is largely about images at this point.Corey: So, talk to me more about how you're building something that stands between the intersection of e-commerce and generative AI. Because when I go to perspect.com, I'm not staring at a web store in the traditional sense. I'm looking at something that—again, early days, I'm not judging you based upon the content of your landing page—but it does present as a bit more of a developer tool and a little bit less of a “look how pretty it is.”Raj: Yeah. It's very much a developer-focused e-commerce offering. So, as a software developer, if you want programmatic access to all things e-commerce and generative AI that are related to e-commerce, you can do that on perspect.com. So, yeah. It is about taking images of products and being able to put them in AI-generated places essentially.Corey: Got it. So, effectively you're trying to sell, I don't know, titanium jewelry for the sake of argument. And you're talking about now you can place it on a generated model's hand to display this rather than having to either fake it or alternately have a whole bunch of very expensive product shoots and modeling sessions.Raj: Exactly. Exactly. If you want to put this piece of jewelry in front of the Eiffel Tower or the Pyramids of Giza, you can do that in a few seconds as opposed to the expensive photo shoots that were once required.Corey: On some level, given that I spend most of my time kicking around various SaaS products, I kind of love the idea of stock photography modeling, I don't know, Datadog or whatnot. I don't know how that would even begin to pan out, but I'm totally here for it.Raj: That's funny.Corey: Now, the hard part that I'm seeing right now is—I mean, you used to work at Gartner for years.Raj: I did.Corey: You folks are the origin of the Gartner-hype cycle. And given all the breathless enthusiasm, massive amounts of attention, and frankly, hilarious, more than a little horrifying, missteps that we start seeing in public, it feels like we are very much in the heady early days of hype around generative AI.Raj: No doubt about it. No doubt about it. But just thinking about what's possible and what is being accomplished even week to week in this space is just mind-blowing. I mean, this stuff didn't exist really six months ago. And now, the open-source frameworks are out there. The ecosystems are developing around it. A lot of people have compared generative AI to the iPhone. And I think it's actually maybe bigger than that. It's more internet-scale disruption, not just a single device like the iPhone.Corey: It's one of those things that I have the sneaking suspicion is going to start showing up in a whole bunch of different places, manifesting in a whole host of different ways. I've been watching academia, largely, freak out about the idea that, “Well, kids are using it to cheat on their homework.” Okay. I understand the position that they're coming from. But it seems like whenever a new technology is unleashed on the world, that is the immediate, instantaneous, reflexive blowback—not necessarily picking on academics, in particular—but rather, the way that we've always done something is now potentially very easy to use thanks to this advance in technology. “Oh, crap. What do we do?” And there's usually a bunch of scurrying around in futile attempts to put the genie back in the bottle, which frankly, never works. And you also see folks trying to sprint, to sort of keep up with this. And it never really pans out. Society adapts, adjusts, and evolves. And I don't think that that's an inherently terrible thing.Raj: I don't think so either. I mean, the same thing existed with the calculator, right? Do you remember early days in school, they said you can't use a calculator, right? And—Corey: Because remember you will not have a calculator in your pocket as you go through life. Well, that was a lie.Raj: But during the test—during the test that you have to take, you will not have a calculator. And when the rubber meets the road in person during that test, you're going to have to show your skills. And the same thing will happen here. We'll just have to have ground rules and ways to check and balance whether people are cheating or not and adapt, just like you said.Corey: On some level, you're only really cheating yourself past a certain point.Raj: Exactly.Corey: There's value in being able to tell a story in a cohesive, understandable way.Raj: Absolutely.Corey: Oh, the computer will do it for me. And I don't know that you can necessarily depend on that.Raj: Absolutely. Absolutely. You have to understand more than just the inputs and outputs. You have to understand the black box in between enough to show that you understand the subject.Corey: One thing that I find interesting is the position of the cloud providers in this entire—Raj: Mm-hmm.Corey: —space. We have Google, who has had a bunch of execs talking about how they've been working on this internally for years. Like you get how that makes you look worse instead of better, right? Like they're effectively tripping over one another on LinkedIn to talk about how they've been working on this for such a long time, and they have something just like it. Well, yeah. Okay. You got beaten to market by a company less than a decade old.Azure has partnered with OpenAI and brought a lot of this to Bing so rapidly they didn't have time to update their more of a Bing app away from the “Use Bing and earn Microsoft coins” nonsense. It's just wow. Talk about a—being caught flat-footed on this. And Amazon, of course, has said effectively nothing. The one even slightly generative AI service that they have come out with that I can think of that anyone could be forgiven for having missed is—they unleashed this one year at re:Invent's Midnight Madness where they had Dr. Matt Wood get on stage with the DeepComposer and play a little bit of a song. And it would, in turn, iterate on that. And that was the last time any of us ever really heard anything about the DeepComposer. I've got one on my shelf. And I do not hear about it mentioned even in passing other than in trivia contests.Raj: Yeah. It's pretty fascinating. Amazon with all their might, and AWS in particular—I mean, AWS has Alexa, and so they've—the thing you give to Alexa is a prompt, right? I mean, it is generative AI in a large way. You're feeding it a prompt and saying do something. And it spits out something tokenized to you. But the fact that OpenAI has upended all of these companies I think is massive. And it tells us something else about Microsoft too is that they didn't have the wherewithal integrally to really compete themselves. They had to do it with someone else, right? They couldn't muster up the effort to really build this themselves. They had to use OpenAI.Corey: On some level, it's a great time to be a cloud provider because all of these experiments are taking place on top of a whole bunch of very expensive, very specific compute.Raj: Sure.Corey: But that is necessary but not sufficient as we look at it across the board. Because even AWS's own machine-learning powered services, it's only relatively recently that they seemed to have gotten above the “Step one, get a PhD in this stuff. Step two, here's all the nuts and bolts you have to understand about how to build machine-learning models.” Whereas the thing that's really caused OpenAI's stuff to proliferate in the public awareness is, “Okay. You got to a webpage, and you tell it what to draw, and it draws the thing.” Or “go ahead and rename an AWS service if the naming manager had a sense of creativity and a slight bit of whimsy.” And it comes out with names that are six times better than anything AWS has ever come out with.Raj: So, funny. I saw your tweet on that actually. Yeah. If you want to do generative AI on AWS today, it is hard. Oh, my gosh. That's if you can get the capacity. That's if you can get the GPU capacity. That's if you can put together the ML ops necessary to make it all happen. It is extremely hard. Yeah, so putting stuff into a chat interface is 1,000 times easier. I mean, doing something like containers on GPUs is just massively difficult in the cloud today.Corey: It's hard to get them in many cases as well. I had customers that asked, “Okay. What special preferential treatment can we get to get access to more GPUs?” It's like can you break the laws of physics or change global supply chain because if so, great. You've got this unlocked. Otherwise, good luck.Raj: I think us-east-2 a couple weeks ago for like the entire week was out of the GPU capacity necessary the entire week.Corey: I haven't been really tracking a lot of the GPU-specific stuff. Do you happen to know what a lot of OpenAI's stuff is built on top of from a vendoring perspective?Raj: I mean, it's got to be Nvidia, right? Is that what you're asking me?Corey: Yeah. I'm—I don't know a lot of the—again, this is not my area.Raj: Yeah, yeah.Corey: I am not particularly deep in the differences between the various silicon manufacturers. I know that AWS has their Inferentia chipset that's named after something that sounds like what my grandfather had. You've got a bunch of AMD stuff coming out. You've have—Intel's been in this space for a while. But Nvidia has sort of been the gold standard based upon GPU stories. So, I would assume it's Nvidia.Raj: At this point, they're the only game in town. No one else matters. The frameworks simply don't support anything other than Nvidia. So, in fact, OpenAI—them and Facebook—they are kind of leading some—a bunch of the open-source right now. So, it's—Stability AI, Hugging Face, OpenAI, Facebook, and all their stuff is dependent on Nvidia. None of it—if you look through the source code, none of it really relies on Inferentia or Trainium or AMD or Intel. It's all Nvidia.Corey: As you look across the current landscape—at least—let me rephrase that. As I look across the current landscape, I am very hard-pressed to identify any company except probably OpenAI itself as doing anything other than falling all over itself having been caught—what feels like—completely flat-footed. We've seen announcements rushed. We've seen people talking breathlessly about things that are not yet actively available. When does that stop? When do we start to see a little bit of thought and consideration put into a lot of the things that are being rolled out, as opposed to “We're going to roll this out as an assistant now to our search engine” and then having to immediately turn it off because it becomes deeply and disturbingly problematic in how it responds to a lot of things?Raj: You mean Sam Altman saying he's got a lodge in Montana with a cache of firearms in case AI gets out of control? You mean that doesn't alarm you in any way?Corey: A little bit. Just a little bit. And like even now you're trying to do things that, to be clear, I am not trying to push the boundaries of these things. But all right. Write a limerick about Elon Musk hurling money at things that are ridiculous. Like, I am not going to make fun of individual people. It's like I get that. But there is a punching-up story around these things. Like, you also want to make sure that's it not “Write a limerick about the disgusting habit of my sixth-grade classmate.” Like, you don't want to, basically, automate the process of cyber-bullying. Let's be clear here. But finding that nuance, it's a societal thing to wrestle with, on some level. But I think that we're anywhere near having cohesive ideas around it.Raj: Yeah. I mean, this stuff is going to be used for nefarious ways. And it's beyond just cyberbullying, too. I think nation-states are going to use this stuff to—as a way to create disinformation. I mean, if we saw a huge flux of disinformation in 2020, just imagine what's going to happen in 2024 with AI-generated disinformation. That's going to be off the charts.Corey: It'll be at a point where you fundamentally have to go back to explicitly trusted sources as opposed to, “Well, I saw a photo of it or a video of it” or someone getting onstage and dancing about it. Well, all those things can be generated now for, effectively, pennies.Raj: I mean, think about evidence in a courtroom now. If I can generate an image of myself holding a gun to someone's head, you have to essentially dismiss all sorts of photographic evidence or video evidence soon enough in court because you can't trust the authenticity of it.Corey: It makes providence and chain-of-custody issues far more important than they were before. And it was still a big deal. Photoshop has been around for a while. And I remember thinking when I was younger, “I wonder how long it'll be until videos become the next evolution of this.” Because there was—we got to a point fairly early on in my life where you couldn't necessarily take a photograph at face value anymore because—I mean, look at some of the special effects we see in movies. Yeah, okay. Theoretically, someone could come up with an incredibly convincing fake of whatever it is that they're trying to show. But back then, it required massive render farms and significant investment to really want to screw someone over. Now, it requires drinking a couple glasses of wine, going on the internet late at night, navigating to the OpenAI webpage, and typing in the right prompt. Maybe you iterate on it a little bit, and it spits it out for basically free.Raj: That's one of the sectors, actually, that's going to adopt this stuff the soonest. It's happening now, the film and movie industry. Stability AI actually has a film director on staff. And his job is to be sort of the liaison to Hollywood. And they're going to help build AI solutions into films and so forth. So, yeah. But that's happening now.Corey: One of the more amazing things that I've seen has been the idea of generative voice where it used to be that in order to get an even halfway acceptable model of someone's voice, they had to read a script for the better part of an hour. That—and they had to make sure that they did it with certain inflection points and certain tones. Now, you can train these things on, “All right. Great. Here's this person just talking for ten minutes. Here you go.” And the reason I know this—maybe I shouldn't be disclosing this as publicly as I am, but the heck with it. We've had one of me on backup that we've used intermittently on those rare occasions when I'm traveling, don't have my recording setup with me, and this needs to go out in a short time period. And we've used it probably a dozen times over the course of the 400 and some odd episodes we've done. One person has ever noticed.Raj: Wow.Corey: Now, having a conversation going back and forth, start pairing some of those better models with something like ChatGPT, and basically, you're building your own best friend.Raj: Yeah. I mean, soon enough you'll be able to do video AI, completely AI-generated of your podcast perhaps.Corey: That would be kind of wild, on some level. Like now we're going to animate the whole thing.Raj: Yeah.Corey: Like I still feel like we need more action sequences. Don't know about you, but I don't have quite the flexibility I did when I was younger. I can't very well even do a pratfall without wondering if I just broke a hip.Raj: You can have an action sequence where you kick off a CloudFormation task. How about that?Corey: One area where I have found that generative text AI, at least, has been lackluster, has been right a parody of the following song around these particular dimensions. Their meter is off. Their—the cleverness is missing.Raj: Hmm.Corey: They at least understand what a parody is and they understand the lyrics of the song, but they're still a few iterative generations away. That said, I don't want to besmirch the work of people who put into these things. They are basically—Raj: Mm-hmm.Corey: —magic.Raj: For sure. Absolutely. I mean, I'm in wonderment of some of the artwork that I'm able to generate with generative AI. I mean, it is absolutely awe-inspiring. No doubt about it.Corey: So, what's gotten you excited about pairing this specifically with e-commerce? That seems like an odd couple to wind up smashing together. But you have had one of the best perspectives on this industry for a long time now. So, my question is not, “What's wrong with you?” But rather, “What are you seeing that I'm missing?”Raj: I think it's the biggest opportunity from an impact perspective. Generating AI avatars of yourself is pretty cool. But ultimately, I think that's a pretty small market. I think the biggest market you can go after right now is e-commerce in the generative AI space. I think that's the one that's going to move the needle for a lot of people. So, it's a big opportunity for one. I think there are interesting things you can do in it. The technical aspects are equally interesting. So, you know, there are a handful of compelling things that draw me to it.Corey: I think you're right. There's a lot of interest and a lot of energy and a lot of focus built around a lot of the neat, flashy stuff. But it's “Okay. How does this windup serving problems that people will pay money for?” Like right now to get early access to ChatGPT and not get rate-limited out, I'm paying them 20 bucks a month which, fine, whatever. I am also in a somewhat privileged position. If you're generating profile photos that same way, people are going to be very hard-pressed to wind up paying more than a couple bucks for it. That's not where the money is. But solving business problems—and I think you're onto something with the idea of generative photography of products that are for sale—that has the potential to be incredibly lucrative. It tackles what to most folks is relatively boring, if I can say that, as far as business problems go. And that's often where a lot of value is locked away.Raj: I mean, in a way, you can think of generative AI in this space as similar to what cloud providers themselves do. So, the cloud providers themselves afforded much smaller entities the ability to provision large-scale infrastructure without high fixed costs. And in some ways, I know the same applies to this space too. So, now mom-and-pop shop-type people will be able to generate interesting product photos without high fixed costs of photoshoots and Photoshop and so forth. And so, I think in some ways it helps to democratize some of the bigger tools that people have had access to.Corey: That's really what it comes down to is these technologies have existed in labs, at least, for a little while. But now, they're really coming out as interesting, I guess, technical demos, for lack of a better term. But the entire general public is having access to these things. There's not the requirement that we wind up investing an enormous pile of money in custom hardware and the rest. It feels evocative of the revolution that was cloud computing in its early days. Where suddenly, if I have an idea, I don't need either build it on a crappy computer under my desk or go buy a bunch of servers and wait eight weeks for them to show up in a rack somewhere. I can just start kicking the tires on it immediately. It's about democratizing access. That, I think, is the magic pill here.Raj: Exactly. And the entry point for people who want to do this as a business, so like me, it is a huge hurdle still to get this stuff running, lots of jagged edges, lots of difficulty. And I think that ultimately is going to dissuade huge segments of the population from doing it themselves. They're going to want completed services. They're going to want finish product, at least in some consumable form, for their persona.Corey: What do you think the shaking out of this is going to look like from a cultural perspective? I know that right now everyone's excited, both in terms of excited about the possibility and shrieking that the sky is falling, that is fairly normal for technical cycles. What does the next phase look like?Raj: The next phase, unfortunately, is probably going to be a lot of litigation. I think there's a lot of that on the horizon already. Right? Stability AI's being sued. I think the courts are going to have to decide, is this stuff above board? You know, the fact that these models have been trained on otherwise copywritten data—copywritten images and music and so forth, that amounts to billions of parameters. How does that translate—how does that affect ages of intellectual property law? I think that's a question that—it's an open question. And I don't think we know.Corey: Yeah. I wish, on some level, that we could avoid a lot of the unpleasantness. But you're right. It's going to come down to a lot of litigation, some of which clearly has a point, on some level.Raj: For sure.Corey: But it's a—but that is, frankly, a matter for the courts. I'm privileged that I don't have to sit here and worry about this in quite the same way because I am not someone who makes the majority of my income through my creative works. And I am also not on the other side of it where I've taken a bunch of other people's creative output and use that to train a bunch of models. So, I'm very curious to know how that is going to shake out as a society.Raj: Yeah.Corey: I think that regulation is also almost certainly on the horizon, on some level. I think that tech has basically burned through 25 years of goodwill at this point. And nobody trusts them to self-regulate. And based upon their track record, I don't think they should.Raj: And interestingly, I think that's actually why Google was caught so flat-footed. Google was so afraid of the ramifications of being first and the downside optics of that, that they got a little complacent. And so, they weren't sure how the market would react to saying, “Here's this company that's known for doing lots of, you know, kind of crazy things with people's data. And suddenly they come out with this AI thing that has these huge superpowers.” And how does that reflect poorly on them? But it ended up reflecting poorly on them anyway because they ended up being viewed as being very, very late to market. So, yeah. They got pie on their face one way or the other.Corey: For better or worse, that's going to be one of those things that haunts them. This is the classic example of the innovator's dilemma. By becoming so focused on avoiding downside risk and revenue protection, they effectively let their lunch get eaten. I don't know that there was another choice that they could've made. I'm not sitting here saying, “That's why they're foolish.” But it's painful. If I'm—I'm in the same position right now. If I decide I want to launch something new and exciting, my downside risk is fairly capped. The world is my theoretical oyster. Same with most small companies. I don't know about you, what do you right now as a founder, but over here at The Duckbill Group, at no point in the entire history of this company, going back six years now, have we ever sat down for a discussion around, “Okay. If we succeed at this, what are the antitrust implications?” It has never been on our roadmap. It's—that's very firmly in the category of great problems to have.Raj: Really confident companies will eat their own lunch. So, you in fact see AWS do this all the time.Corey: Yes.Raj: They will have no problem disrupting themselves. And they're lots of data points we can talk about to show this. They will disrupt themselves first because they're afraid of someone else doing it before them.Corey: And it makes perfect sense. Amazon has always had a—I'd call it a strange culture, but that doesn't do it enough of a service just because it feels like compared to virtually any other company on the planet, they may as well be an alien organism that has been dropped into the world here. And we see a fair number of times where folks have left Amazon, and they wind up being so immersed in that culture, that they go somewhere else, and “Ah, I'm just going to bring the leadership principles with me.” And it doesn't work that way. A lot of them do not pan out outside of the very specific culture that Amazon has fostered. Now, I'm not saying that they're good or that they're bad. But it is a uniquely Amazonian culture that they have going on there. And those leadership principles are a key part of it. You can transplant that in the same way to other companies.Raj: Can I tell you one of the funniest things one of these cloud providers has said to me? I'm not going to mention the cloud provider. You may be able to figure out which one anyway, though.Corey: No. I suspect I have a laundry list to go out of these various, ridiculous things I have heard from companies. Please, I am always willing to add to the list. Hit me with it.Raj: So, a cloud provider—a big cloud provider, mine you—told me that they wanted Amazon's culture so bad that they began a thing where during a meeting—before each meeting, everyone would sit quietly and read a paper that was written by someone in the room so they all got on the same page. And that is distinctly an Amazon thing, right? And this is not Amazon that is doing this. This is some other cloud provider. So, people are so desperate for that bit of weirdness that you mentioned inside of Amazon, that they're willing to replicate some of the movements and the behaviors whole cloth hoping that they can get that same level of culture. But it has to be—it has to be organic. And it has to be at the root. You can't just take an arm and stick it onto a body and expect it to work, right?Corey: My last real job before I started this place was at a small, scrappy startup for three months. And then we were bought by an enormous financial company. And one of their stated reasons for doing it was, “Well, we really admire a lot of your startup culture, and we want to, basically, socialize that and adopt that where we are.” Spoiler. This did not happen. It was more or less coming from a perspective, “Well, we visited your offices, and we saw that you had bikes in the entryway and dogs in the office. And well, we went back to our office, and we threw in some bikes and added some dogs, but we didn't get any different result. What's the story here?” It's—you cannot cargo cult bits and pieces of a culture. It has to be something holistic. And let's be clear, you're only ever going to be second best at being another company. They're going to be first place. We saw this a lot in the early-2000s of “We're going to be the next Yahoo.” It's—why would I care about that? We already have original Yahoo. The fortune's faded, but here we are.Raj: Yeah. Agreed.Corey: On our last recording, you mentioned that you would be building this out on top of AWS. Is that how it's panned out? You still are?Raj: For the most part. For the most part. I've dipped my toes into trying to use GPU capacity elsewhere, using things like ECS Anywhere, which is an interesting route. There's some promise there, but there's still lots of jagged edges there too. But for the most part, there's not another cloud provider that really has everything I need from GPU capacity to serverless functions at the edge, to CDNs, to SQL databases. That's actually a pretty disparate set of things. And there's not another cloud provider that has literally all of that except AWS at this point.Corey: So far, positive experience or annoying? Let's put it that way.Raj: Some of it's really, really hard. So, like doing GPUs with generative AI, with containers for instance, is still really, really hard. And the documentation is almost nonexistent. The documentation is wrong. I've actually submitted pull requests to fix AWS documentation because a bunch of it is just wrong. So, yeah. It's hard. Some of it's really easy. Some it's really difficult.Corey: I really want to thank you for taking time to speak about what you're up to over at Perspect. Where can people go to learn more?Raj: www.perspect.com.Corey: And we will of course put a link to that in the [show notes 00:30:02]. Thank you so much for being so generous with your time. I appreciate it.Raj: Any time, Corey.Corey: Raj Bala, Founder at Perspect. I'm Cloud Economist Corey Quinn. And this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas, if you haven't hated this podcast, please, leave a five-star review on your podcast platform of choice along with an angry, insulting comment that you got an AI generative system to write for you.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.

    The Benefits of Mocking Clouds Locally with Waldemar Hummer

    Play Episode Listen Later Mar 30, 2023 32:24


    Waldemar Hummer, Co-Founder & CTO of LocalStack, joins Corey on Screaming in the Cloud to discuss how LocalStack changed Corey's mind on the futility of mocking clouds locally. Waldemar reveals why LocalStack appeals to both enterprise companies and digital nomads, and explains how both see improvements in their cost predictability as a result. Waldemar also discusses how LocalStack is an open-source company first and foremost, and how they're working with their community to evolve their licensing model. Corey and Waldemar chat about the rising demand for esoteric services, and Waldemar explains how accommodating that has led to an increase of adoption from the big data space. About WaldemarWaldemar is Co-Founder and CTO of LocalStack, where he and his team are building the world-leading platform for local cloud development, based on the hugely popular open source framework with 45k+ stars on Github. Prior to founding LocalStack, Waldemar has held several engineering and management roles at startups as well as large international companies, including Atlassian (Sydney), IBM (New York), and Zurich Insurance. He holds a PhD in Computer Science from TU Vienna.Links Referenced: LocalStack website: https://localstack.cloud/ LocalStack Slack channel: https://slack.localstack.cloud LocalStack Discourse forum: https://discuss.localstack.cloud LocalStack GitHub repository: https://github.com/localstack/localstack TranscriptAnnouncer: Hello, and welcome to Screaming in the Cloud with your host, Chief Cloud Economist at The Duckbill Group, Corey Quinn. This weekly show features conversations with people doing interesting work in the world of cloud, thoughtful commentary on the state of the technical world, and ridiculous titles for which Corey refuses to apologize. This is Screaming in the Cloud.Corey: Welcome to Screaming in the Cloud. I'm Corey Quinn. Until a bit over a year ago or so, I had a loud and some would say fairly obnoxious opinion around the futility of mocking cloud services locally. This is not to be confused with mocking cloud services on the internet, which is what I do in lieu of having a real personality. And then one day I stopped espousing that opinion, or frankly, any opinion at all. And I'm glad to be able to talk at long last about why that is. My guest today is Waldemar Hummer, CTO and co-founder at LocalStack. Waldemar, it is great to talk to you.Waldemar: Hey, Corey. It's so great to be on the show. Thank you so much for having me. We're big fans of what you do at The Duckbill Group and Last Week in AWS. So really, you know, glad to be here with you today and have this conversation.Corey: It is not uncommon for me to have strong opinions that I espouse—politely to be clear; I'll make fun of companies and not people as a general rule—but sometimes I find that I've not seen the full picture and I no longer stand by an opinion I once held. And you're one of my favorite examples of this because, over the course of a 45-minute call with you and one of your business partners, I went from, “What you're doing is a hilarious misstep and will never work,” to, “Okay, and do you have room for another investor?” And in the interest of full disclosure, the answer to that was yes, and I became one of your angel investors. It's not exactly common for me to do that kind of a hard pivot. And I kind of suspect I'm not the only person who currently holds the opinion that I used to hold, so let's talk a little bit about that. At the very beginning, what is LocalStack and what does it you would say that you folks do?Waldemar: So LocalStack, in a nutshell, is a cloud emulator that runs on your local machine. It's basically like a sandbox environment where you can develop your applications locally. We have currently a range of around 60, 70 services that we provide, things like Lambda Functions, DynamoDB, SQS, like, all the major AWS services. And to your point, it is indeed a pretty large undertaking to actually implement the cloud and run it locally, but with the right approach, it actually turns out that it is feasible and possible, and we've demonstrated this with LocalStack. And I'm glad that we've convinced you to think of it that way as well.Corey: A couple of points that you made during that early conversation really stuck with me. The first is, “Yeah, AWS has two, no three no four-hundred different service offerings. But look at your customer base. How many of those services are customers using in any real depth? And of those services, yeah, the APIs are vast, and very much a sprawling pile of nonsense, but how many of those esoteric features are those folks actually using?” That was half of the argument that won me over.The other half was, “Imagine that you're an enormous company that's an insurance company or a bank. And this year, you're hiring 5000 brand new developers, fresh out of school. Two to 3000 of those developers will still be working here in about a year as they wind up either progressing in other directions, not winding up completing internships, or going back to school after internships, or for a variety of reasons. So, you have that many people that you need to teach how to use cloud in the context that we use cloud, combined with the question of how do you make sure that one of them doesn't make a fun mistake that winds up bankrupting the entire company with a surprise AWS bill?” And those two things combined turned me from, “What you're doing is ridiculous,” to, “Oh, my God. You're absolutely right.”And since then, I've encountered you in a number of my client environments. You were absolutely right. This is something that resonates deeply and profoundly with larger enterprise customers in particular, but also folks who just don't want to wind up being beholden to every time they do a deploy to anything to test something out, yay, I get to spend more money on AWS services.Waldemar: Yeah, totally. That's spot on. So, to your first point, so definitely we have a core set of services that most people are using. So, things like Lambda, DynamoDB, SQS, like, the core serverless, kind of, APIs. And then there's kind of a long tail of more exotic services that we support these days, things like, even like QLDB, the quantum ledger database, or, you know, managed streaming for Kafka.But like, certainly, like, the core 15, 20 services are the ones that are really most used by the majority of people. And then we also, you know, pro offering have some very, sort of, advanced services for different use cases. So, that's to your first point.And second point is, yeah, totally spot on. So LocalStack, like, really enables you to experiment in the sandbox. So, we both see it as an experimentation, also development environment, where you don't need to think about cloud costs. And this, I guess, will be very close to your heart in the work that you're doing, the costs are becoming really predictable as well, right? Because in the cloud, you know, work to different companies before doing LocalStack where we were using AWS resources, and you can end up in a situation where overnight, you accumulate, you know, hundreds of thousands of dollars of AWS bill because you've turned on a certain feature, or some, you know, connectivity into some VPC or networking configuration that just turns out to be costly.Also, one more thing that is worth mentioning, like, we want to encourage, like, frequent testing, and a lot of the cloud's billing and cost structure is focused around, for example, hourly billing of resources, right? And if you have a test that just spins up resources that run for a couple of minutes, you still end up paying the entire hour. And we LocalStack, really, that brings down the cloud builds significantly because you can really test frequently, the cycles become much faster, and it's also again, more efficient, more cost-effective.Corey: There's something useful to be said for, “Well, how do I make sure that I turn off resources when I'm done?” In cloud, it's a bit of a game of guess-and-check. And you turn off things you think are there and you wait a few days and you check the bill again, and you go and turn more things off, and the cycle repeats. Or alternately, wait for the end of the month and wonder in perpetuity why you're being billed 48 cents a month, and not be clear on why. Restarting the laptop is a lot more straightforward.I also want to call out some of my own bias on this where I used to be a big believer in being able to build and deploy and iterate on things locally because well, what happens when I'm in a plane with terrible WiFi? Well, in the before times, I flew an awful lot and was writing a fair bit of, well, cloudy nonsense and I still never found that to be a particular blocker on most of what I was doing. So, it always felt a little bit precious to me when people were talking about, well, what if I can't access the internet to wind up building and deploying these things? It's now 2023. How often does that really happen? But is that a use case that you see a lot of?Waldemar: It's definitely a fair point. And probably, like, 95% of cloud development these days is done in a high internet bandwidth environment, maybe some corporate network where you have really fast internet access. But that's only a subset, I guess, of the world out there, right? So, there might be situations where, you know, you may have bad connectivity. Also, maybe you live in a region—or maybe you're traveling even, right? So, there's a lot more and more people who are just, “Digital nomads,” quote-unquote, right, who just like to work in remote places.Corey: You're absolutely right. My bias is that I live in San Francisco. I have symmetric gigabit internet at home. There's not a lot of scenarios in my day-to-day life—except when I'm, you know, on the train or the bus traveling through the city—because thank you, Verizon—where I have impeded connectivity.Waldemar: Right. Yeah, totally. And I think the other aspect of this is kind of the developers just like to have things locally, right, because it gives them the feeling of you know, better control over the code, like, being able to integrate into their IDEs, setting breakpoints, having these quick cycles of iterations. And again, this is something that there's more and more tooling coming up in the cloud ecosystem, but it's still inherently a remote execution that just, you know, takes the round trip of uploading your code, deploying, and so on, and that's just basically the pain point that we're addressing with LocalStack.Corey: One thing that did surprise me as well was discovering that there was a lot more appetite for this sort of thing in enterprise-scale environments. I mean, some of the reference customers that you have on your website include divisions of the UK Government and 3M—you know, the Post-It note people—as well as a number of other very large environments. And at first, that didn't make a whole lot of sense to me, but then it suddenly made an awful lot of sense because it seems—and please correct me if I'm wrong—that in order to use something like this at scale and use it in a way that isn't, more or less getting it into a point where the administration of it is more trouble than it's worth, you need to progress past a certain point of scale. An individual developer on their side project is likely just going to iterate against AWS itself, whereas a team of thousands of developers might not want to be doing that because they almost certainly have their own workflows that make that process high friction.Waldemar: Yeah, totally. So, what we see a lot is, especially in larger enterprises, dedicated teams, like, developer experience teams, whose main job is to really set up a workflow and environment where developers can be productive, most productive, and this can be, you know, on one side, like, setting up automated pipelines, provisioning maybe AWS sandbox and test accounts. And like some of these teams, when we introduce LocalStack, it's really a game-changer because it becomes much more decoupled and like, you know, distributed. You can basically configure your CI pipeline, just, you know, spin up the container, run your tests, tear down again afterwards. So, you know, it's less dependencies.And also, one aspect to consider is the aspect of cloud approvals. A lot of companies that we work with have, you know, very stringent processes around, even getting access to the clouds. Some SRE team needs to enable their IAM permissions and so on. With LocalStack, you can just get started from day one and just get productive and start testing from the local machine. So, I think those are patterns that we see a lot, in especially larger enterprise environments as well, where, you know, there might be some regulatory barriers and just, you know, process-wise steps as well.Corey: When I started playing with LocalStack myself, one of the things that I found disturbingly irritating is, there's a lot that AWS gets largely right with its AWS command-line utility. You can stuff a whole bunch of different options into the config for different profiles, and all the other tools that I use mostly wind up respecting that config. The few that extend it add custom lines to it, but everything else is mostly well-behaved and ignores the things it doesn't understand. But there is no facility that lets you say, “For this particular profile, use this endpoint for AWS service calls instead of the normal ones in public regions.” In fact, to do that, you effectively have to pass specific endpoint URLs to arguments, and I believe the syntax on that is not globally consistent between different services.It just feels like a living nightmare. At first, I was annoyed that you folks wound up having to ship your own command-line utility to wind up interfacing with this. Like, why don't you just add a profile? And then I tried it myself and, oh, I'm not the only person who knows how this stuff works that has ever looked at this and had that idea. No, it's because AWS is just unfortunate in that respect.Waldemar: That is a very good point. And you're touching upon one of the major pain points that we have, frankly, with the ecosystem. So, there are some pull requests against the AWS open-source repositories for the SDKs and various other tools, where folks—not only LocalStack, but other folks in the community have asked for introducing, for example, an AWS endpoint URL environment variable. These [protocols 00:12:32], unfortunately, were never merged. So, it would definitely make our lives a whole lot easier, but so far, we basically have to maintain these, you know, these wrapper scripts, basically, AWS local, CDK local, which basically just, you know, points the client to local endpoints. It's a good workaround for now, but I would assume and hope that the world's going to change in the upcoming years.Corey: I really hope so because everything else I can think of is just bad. The idea of building a custom wrapper around the AWS command-line utility that winds up checking the profile section, and oh, if this profile is that one, call out to this tool, otherwise it just becomes a pass-through. That has security implications that aren't necessarily terrific, you know, in large enterprise companies that care a lot about security. Yeah, pretend to be a binary you're not is usually the kind of thing that makes people sad when security politely kicks their door in.Waldemar: Yeah, we actually have pretty, like, big hopes for the v3 wave of the SDKs, AWS, because there is some restructuring happening with the endpoint resolution. And also, you can, in your profile, by now have, you know, special resolvers for endpoints. But still the case of just pointing all the SDKs and CLI to a custom endpoint is just not yet resolved. And this is, frankly, quite disappointing, actually.Corey: While we're complaining about the CLI, I'll throw one of my recurring issues with it in. I would love for it to adopt the Linux slash Unix paradigm of having a config.d directory that you can reference from within the primary config file, and then any file within that directory in the proper syntax winds up getting adopted into what becomes a giant composable config file, generated dynamically. The reason being is, I can have entire lists of profiles in separate files that I could then wind up dropping in and out on a client-by-client basis. So, I don't inadvertently expose who some of my clients are, in the event that winds up being part of the way that they have named their AWS accounts.That is one of those things I would love but it feels like it's not a common enough use case for there to be a whole lot of traction around it. And I guess some people would make a fair point if they were to say that the AWS CLI is the most widely deployed AWS open-source project, even though all it does is give money to AWS more efficiently.Waldemar: Yeah. Great point. Yeah, I think, like, how and some way to customize and, like, mingle or mangle your configurations in a more easy fashion would be super useful. I guess it might be a slippery slope to getting, you know, into something like I don't know, Helm for EKS and, like, really, you know, having to maintain a whole templating language for these configs. But certainly agree with you, to just you know, at least having [plug 00:15:18] points for being able to customize the behavior of the SDKs and CLIs would be extremely helpful and valuable.Corey: This is not—unfortunately—my first outing with the idea of trying to have AWS APIs done locally. In fact, almost a decade ago now, I did a build-out at a very large company of a… well, I would say that the build-out was not itself very large—it was about 300 nodes—that were all running Eucalyptus, which before it died on the vine, was imagined as a way of just emulating AWS APIs locally—done in Java, as I recall—and exposing local resources in ways that comported with how AWS did things. So, the idea being that you could write configuration to deploy any infrastructure you wanted in AWS, but also treat your local data center the same way. That idea unfortunately did not survive in the marketplace, which is kind of a shame, on some level. What was it that inspired you folks to wind up building this with an eye towards local development rather than run this as a private cloud in your data center instead?Waldemar: Yeah, very interesting. And I do also have some experience [unintelligible 00:16:29] from my past university days with Eucalyptus and OpenStack also, you know, running some workloads in an on-prem cluster. I think the main difference, first of all, these systems were extremely hard, notoriously hard to set up and maintain, right? So, lots of moving parts: you had your image server, your compute system, and then your messaging subsystems. Lots of moving parts, and wanting to have everything basically much more monolithic and in a single container.And Docker really sort of provides a great platform for us, which is create everything in a single container, spin up locally, make it very lightweight and easy to use. But I think really the first days of LocalStack, the idea was really, was actually with the use case of somebody from our team. Back then, I was working at Atlassian in the data engineering team and we had folks in the team were commuting to work on the train. And it was literally this use case that you mentioned before about being able to work basically offline on your commute. And this is kind of were the first lines of code were written and then kind of the idea evolves from there.We put it into the open-source, and then, kind of, it was growing over the years. But it really started as not having it as an on-prem, like, heavyweight server, but really as a lightweight system that you can easily—that is easily portable across different systems as well.Corey: That is a good question. Very often, when I'm using various tools that are aimed at development use cases, it is very clear that one particular operating system is invariably going to be the first-class citizen and everything else is a best effort. Ehh, it might work; it might not. Does LocalStack feel that way? And if so, what's the operating system that you want to be on?Waldemar: I would say we definitely work best on Mac OS and Linux. It also works really well on Windows, but I think given that some of our tooling in the ecosystem also pretty much geared towards Unix systems, I think those are the platforms it will work well with. Again, on the other hand, Docker is really a platform that helps us a lot being compatible across operating systems and also CPU architectures. We have a multi-arch build now for AMD and ARM64. So, I think in that sense, we're pretty broad in terms of the compatibility spectrum.Corey: I do not have any insight into how the experience goes on Windows, given that I don't use that operating system in anger for, wow, 15 years now, but I will say that it's been top-flight on Mac OS, which is what I spend most of my time. Depressed that I'm using, but for desktop experiences, it seems to work out fairly well. That said, having a focus on Windows seems like it would absolutely be a hard requirement, given that so many developer workstations in very large enterprises tend to skew very Windows-heavy. My hat is off to people who work with Linux and Linux-like systems in environments like that where even line endings becomes psychotically challenging. I don't envy them their problems. And I have nothing but respect for people who can power through it. I never had the patience.Waldemar: Yeah. Same here and definitely, I think everybody has their favorite operating system. For me, it's also been mostly Linux and Mac in the last couple of years. But certainly, we definitely want to be broad in terms of the adoption, and working with large enterprises often you have—you know, we want to fit into the existing landscape and environment that people work in. And we solve this by platform abstractions like Docker, for example, as I mentioned, and also, for example, Python, which is some more toolings within Python is also pretty nicely supported across platforms. But I do feel the same way as you, like, having been working with Windows for quite some time, especially for development purposes.Corey: What have you noticed that your customer usage patterns slash requests has been saying about AWS service adoption? I have to imagine that everyone cares whether you can mock S3 effectively. EC2, DynamoDB, probably. SQS, of course. But beyond the very small baseline level of offering, what have you seen surprising demand for, as I guess, customer implementation of more esoteric services continues to climb?Waldemar: Mm-hm. Yeah, so these days it's actually pretty [laugh] pretty insane the level of coverage we already have for different services, including some very exotic ones, like QLDB as I mentioned, Kafka. We even have Managed Airflow, for example. I mean, a lot of these services are essentially mostly, like, wrappers around the API. This is essentially also what AWS is doing, right? So, they're providing an API that basically provisions some underlying resources, some infrastructure.Some of the more interesting parts, I guess, we've seen is the data or big data ecosystem. So, things like Athena, Glue, we've invested quite a lot of time in, you know, making that available also in LocalStack so you can have your maybe CSV files or JSON files in an S3 bucket and you can query them from Athena with a SQL language, basically, right? And that makes it very—especially these big data-heavy jobs that are very heavyweight on AWS, you can iterate very quickly in LocalStack. So, this is where we're seeing a lot of adoption recently. And then also, obviously, things like, you know, Lambda and ECS, like, all the serverless and containerized applications, but I guess those are the more mainstream ones.Corey: I imagine you probably get your fair share of requests for things like CloudFormation or CloudFront, where, this is great, but can you go ahead and add a very lengthy sleep right here, just because it returns way too fast and we don't want people to get their hopes up when they use the real thing. On some level, it feels like exact replication of the AWS customer experience isn't quite in line with what makes sense from a developer productivity point of view.Waldemar: Yeah, that's a great point. And I'm sure that, like, a lot of code out there is probably littered with sleep statements that is just tailored to the specific timing in AWS. In fact, we recently opened an issue in the AWS Terraform provider repository to add a configuration option to configure the timings that Terraform is using for the resource deployment. So, just as an example, an S3 bucket creation takes 60 seconds, like, more than a minute against [unintelligible 00:22:37] AWS. I guess LocalStack, it's a second basically, right?And AWS Terraform provider has these, like, relatively slow cycles of checking whether the packet has already been created. And we want to get that configurable to actually reduce the time it takes for local development, right? So, we have an open, sort of, feature request, and we're probably going to contribute to a Terraform repository. But definitely, I share the sentiment that a lot of the tooling ecosystem is built and tailored and optimized towards the experience against the cloud, which often is just slow and, you know, that's what it is, right?Corey: One thing that I didn't expect, though, in hindsight, is blindingly obvious, is your support for a variety of different frameworks and deployment methodologies. I've found that it's relatively straightforward to get up and running with the CDK deploying to LocalStack, for instance. And in hindsight, of course; that's obvious. When you start out down that path, though it's well, you tend to think—at least I don't tend to think in that particular way. It's, “Well, yeah, it's just going to be a console-like experience, or I wind up doing CloudFormation or Terraform.” But yeah, that the world is advancing relatively quickly and it's nice to see that you are very comfortably keeping pace with that advancement.Waldemar: Yeah, true. And I guess for us, it's really, like, the level of abstraction is sort of increasing, so you know, once you have a solid foundation, with, you know, CloudFormation implementation, you can leverage a lot of tools that are sitting on top of it, CDK, serverless frameworks. So, CloudFormation is almost becoming, like, the assembly language of the AWS cloud, right, and if you have very solid support for that, a lot of, sort of, tools in the ecosystem will natively be supported on LocalStack. And then, you know, you have things like Terraform, and in the Terraform CDK, you know, some of these derived versions of Terraform which also are very straightforward because you just need to point, you know, the target endpoint to localhost and then the rest of the deployment loop just works out of the box, essentially.So, I guess for us, it's really mostly being able to focus on, like, the core emulation, making sure that we have very high parity with the real services. We spend a lot of time and effort into what we call parity testing and snapshot testing. We make sure that our API responses are identical and really the same as they are in AWS. And this really gives us, you know, a very strong confidence that a lot of tools in the ecosystem are working out-of-the-box against LocalStack as well.Corey: I would also like to point out that I'm also a proud LocalStack contributor at this point because at the start of this year, I noticed, ah, in one of the pages, the copyright year was still saying 2022 and not 2023. So, a single-character pull request? Oh, yes, I am on the board now because that is how you ingratiate yourself with an open-source project.Waldemar: Yeah. Eternal fame to you and kudos for your contribution. But, [laugh] you know, in all seriousness, we do have a quite an active community of contributors. We are an open-source first project; like, we were born in the open-source. We actually—maybe just touching upon this for a second, we use GitHub for our repository, we use a lot of automation around, you know, doing pull requests, and you know, service owners.We also participate in things like the Hacktoberfest, which we participated in last year to really encourage contributions from the community, and also host regular meetups with folks in the community to really make sure that there's an active ecosystem where people can contribute and make contributions like the one that you did with documentation and all that, but also, like, actual features, testing and you know, contributions of different levels. So really, kudos and shout out to the entire community out there.Corey: Do you feel that there's an inherent tension between being an open-source product as well as being a commercial product that is available for sale? I find that a lot of companies feel vaguely uncomfortable with the various trade-offs that they make going down that particular path, but I haven't seen anyone in the community upset with you folks, and it certainly hasn't seemed to act as a brake on your enterprise adoption, either.Waldemar: That is a very good point. So, we certainly are—so we're following an open-source-first model that we—you know, the core of the codebase is available in the community version. And then we have pro extensions, which are commercial and you basically, you know, setup—you sign up for a license. We are certainly having a lot of discussions on how to evolve this licensing model going forward, you know, which part to feed back into the community version of LocalStack. And it's certainly an ongoing evolving model as well, but certainly, so far, the support from the community has been great.And we definitely focus to, kind of, get a lot of the innovation that we're doing back into our open-source repo and make sure that it's, like, really not only open-source but also open contribution for folks to contribute their contributions. We also integrate with other third-party libraries. We're built on the shoulders of giants, if I may say so, other open-source projects that are doing great work with emulators. To name just a few, it's like, [unintelligible 00:27:33] which is a great project that we sort of use and depend upon. We have certain mocks and emulations, for Kinesis, for example, Kinesis mock and a bunch of other tools that we've been leveraging over the years, which are really great community efforts out there. And it's great to see such an active community that's really making this vision possible have a truly local emulated clouds that gives the best experience to developers out there.Corey: So, as of, well, now, when people are listening to this and the episode gets released, v2 of LocalStack is coming out. What are the big differences between LocalStack and now LocalStack 2: Electric Boogaloo, or whatever it is you're calling the release?Waldemar: Right. So, we're super excited to release our v2 version of LocalStack. Planned release date is end of March 2023, so hopefully, we will make that timeline. We did release our first version of OpenStack in July 2022, so it's been roughly seven months since then and we try to have a cadence of roughly six to nine months for the major releases. And what you can expect is we've invested a lot of time and effort in last couple of months and in last year to really make it a very rock-solid experience with enhancements in the current services, a lot of performance optimizations, we've invested a lot in parity testing.So, as I mentioned before, parity is really important for us to make sure that we have a high coverage of the different services and how they behave the same way as AWS. And we're also putting out an enhanced version and a completely polished version of our Cloud Pods experience. So, Cloud Pods is a state management mechanism in LocalStack. So, by default, the state in LocalStack is ephemeral, so when you restart the instance, you basically have a fresh state. But with Cloud Pods, we enable our users to take persistent snapshot of the states, save it to disk or to a server and easily share it with team members.And we have very polished experience with Community Cloud Pods that makes it very easy to share the state among team members and with the community. So, those are just some of the highlights of things that we're going to be putting out in the tool. And we're super excited to have it done by, you know, end of March. So, stay tuned for the v2 release.Corey: I am looking forward to seeing how the experience shifts and evolves. I really want to thank you for taking time out of your day to wind up basically humoring me and effectively re-covering ground that you and I covered about a year and a half ago now. If people want to learn more, where should they go?Waldemar: Yeah. So definitely, our Slack channel is a great way to get in touch with the community, also with the LocalStack team, if you have any technical questions. So, you can find it on our website, I think it's slack.localstack.cloud.We also host a Discourse forum. It's discuss.localstack.cloud, where you can just, you know, make feature requests and participate in the general conversation.And we do host monthly community meetups. Those are also available on our website. If you sign up, for example, for a newsletter, you will be notified where we have, you know, these webinars. Take about an hour or so where we often have guest speakers from different companies, people who are using, you know, cloud development, local cloud development, and just sharing the experiences of how the space is evolving. And we're always super happy to accept contributions from the community in these meetups as well. And last but not least, our GitHub repository is a great way to file any issues you may have, feature requests, and just getting involved with the project itself.Corey: And we will, of course, put links to that in the [show notes 00:31:09]. Thank you so much for taking the time to speak with me today. I appreciate it.Waldemar: Thank you so much, Corey. It's been a pleasure. Thanks for having me.Corey: Waldemar Hummer, CTO and co-founder at LocalStack. I'm Cloud Economist Corey Quinn and this is Screaming in the Cloud. If you've enjoyed this podcast, please leave a five-star review on your podcast platform of choice, whereas if you've hated this podcast, please leave a five-star review on your podcast platform of choice along with an angry comment, presumably because your compensation structure requires people to spend ever-increasing amounts of money on AWS services.Corey: If your AWS bill keeps rising and your blood pressure is doing the same, then you need The Duckbill Group. We help companies fix their AWS bill by making it smaller and less horrifying. The Duckbill Group works for you, not AWS. We tailor recommendations to your business and we get to the point. Visit duckbillgroup.com to get started.